Grammatical Error Correction in Low Error Density Domains

Grammatical Error Correction in Low Error Density Domains: A New Benchmark and Analyses

Publikation: Bidrag til bog/antologi/rapport › Konferencebidrag i proceedings › Forskning › fagfællebedømt

Standard

Grammatical Error Correction in Low Error Density Domains : A New Benchmark and Analyses. / Flachs, Simon Hellemann; Lacroix, Ophélie; Yannakoudakis, Helen; Rei, Marek; Søgaard, Anders.

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, 2020. s. 8467–8478.

Publikation: Bidrag til bog/antologi/rapport › Konferencebidrag i proceedings › Forskning › fagfællebedømt

Harvard

Flachs, SH, Lacroix, O, Yannakoudakis, H, Rei, M & Søgaard, A 2020, Grammatical Error Correction in Low Error Density Domains: A New Benchmark and Analyses. i Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, s. 8467–8478, The 2020 Conference on Empirical Methods in Natural Language Processing, 16/11/2020. https://doi.org/10.18653/v1/2020.emnlp-main.680

APA

Flachs, S. H., Lacroix, O., Yannakoudakis, H., Rei, M., & Søgaard, A. (2020). Grammatical Error Correction in Low Error Density Domains: A New Benchmark and Analyses. I Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (s. 8467–8478). Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.emnlp-main.680

Vancouver

Flachs SH, Lacroix O, Yannakoudakis H, Rei M, Søgaard A. Grammatical Error Correction in Low Error Density Domains: A New Benchmark and Analyses. I Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics. 2020. s. 8467–8478 https://doi.org/10.18653/v1/2020.emnlp-main.680

Author

Flachs, Simon Hellemann ; Lacroix, Ophélie ; Yannakoudakis, Helen ; Rei, Marek ; Søgaard, Anders. / Grammatical Error Correction in Low Error Density Domains : A New Benchmark and Analyses. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, 2020. s. 8467–8478

Bibtex

@inproceedings{2bfe73c1b53f4780a8fffa581faf9434,

title = "Grammatical Error Correction in Low Error Density Domains: A New Benchmark and Analyses",

abstract = "Evaluation of grammatical error correction (GEC) systems has primarily focused on essays written by non-native learners of English, which however is only part of the full spectrum of GEC applications. We aim to broaden the target domain of GEC and release CWEB, a new benchmark for GEC consisting of website text generated by English speakers of varying levels of proficiency. Website data is a common and important domain that contains far fewer grammatical errors than learner essays, which we show presents a challenge to state-of-the-art GEC systems. We demonstrate that a factor behind this is the inability of systems to rely on a strong internal language model in low error density domains. We hope this work shall facilitate the development of open-domain GEC models that generalize to different topics and genres.",

author = "Flachs, {Simon Hellemann} and Oph{\'e}lie Lacroix and Helen Yannakoudakis and Marek Rei and Anders S{\o}gaard",

year = "2020",

doi = "10.18653/v1/2020.emnlp-main.680",

language = "English",

pages = "8467–8478",

booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)",

publisher = "Association for Computational Linguistics",

note = "The 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020 ; Conference date: 16-11-2020 Through 20-11-2020",

url = "http://2020.emnlp.org",

}

RIS

TY - GEN

T1 - Grammatical Error Correction in Low Error Density Domains

T2 - The 2020 Conference on Empirical Methods in Natural Language Processing

AU - Flachs, Simon Hellemann

AU - Lacroix, Ophélie

AU - Yannakoudakis, Helen

AU - Rei, Marek

AU - Søgaard, Anders

PY - 2020

Y1 - 2020

N2 - Evaluation of grammatical error correction (GEC) systems has primarily focused on essays written by non-native learners of English, which however is only part of the full spectrum of GEC applications. We aim to broaden the target domain of GEC and release CWEB, a new benchmark for GEC consisting of website text generated by English speakers of varying levels of proficiency. Website data is a common and important domain that contains far fewer grammatical errors than learner essays, which we show presents a challenge to state-of-the-art GEC systems. We demonstrate that a factor behind this is the inability of systems to rely on a strong internal language model in low error density domains. We hope this work shall facilitate the development of open-domain GEC models that generalize to different topics and genres.

AB - Evaluation of grammatical error correction (GEC) systems has primarily focused on essays written by non-native learners of English, which however is only part of the full spectrum of GEC applications. We aim to broaden the target domain of GEC and release CWEB, a new benchmark for GEC consisting of website text generated by English speakers of varying levels of proficiency. Website data is a common and important domain that contains far fewer grammatical errors than learner essays, which we show presents a challenge to state-of-the-art GEC systems. We demonstrate that a factor behind this is the inability of systems to rely on a strong internal language model in low error density domains. We hope this work shall facilitate the development of open-domain GEC models that generalize to different topics and genres.

U2 - 10.18653/v1/2020.emnlp-main.680

DO - 10.18653/v1/2020.emnlp-main.680

M3 - Article in proceedings

SP - 8467

EP - 8478

BT - Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

PB - Association for Computational Linguistics

Y2 - 16 November 2020 through 20 November 2020

ER -

ID: 258376622

Datalogisk Institut