Grammatical Error Correction in Low Error Density Domains

Grammatical Error Correction in Low Error Density Domains: A New Benchmark and Analyses

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Standard

Grammatical Error Correction in Low Error Density Domains : A New Benchmark and Analyses. / Flachs, Simon Hellemann; Lacroix, Ophélie; Yannakoudakis, Helen; Rei, Marek; Søgaard, Anders.

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, 2020. p. 8467–8478.

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Harvard

Flachs, SH, Lacroix, O, Yannakoudakis, H, Rei, M & Søgaard, A 2020, Grammatical Error Correction in Low Error Density Domains: A New Benchmark and Analyses. in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, pp. 8467–8478, The 2020 Conference on Empirical Methods in Natural Language Processing, 16/11/2020. https://doi.org/10.18653/v1/2020.emnlp-main.680

APA

Flachs, S. H., Lacroix, O., Yannakoudakis, H., Rei, M., & Søgaard, A. (2020). Grammatical Error Correction in Low Error Density Domains: A New Benchmark and Analyses. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 8467–8478). Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.emnlp-main.680

Vancouver

Flachs SH, Lacroix O, Yannakoudakis H, Rei M, Søgaard A. Grammatical Error Correction in Low Error Density Domains: A New Benchmark and Analyses. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics. 2020. p. 8467–8478 https://doi.org/10.18653/v1/2020.emnlp-main.680

Author

Flachs, Simon Hellemann ; Lacroix, Ophélie ; Yannakoudakis, Helen ; Rei, Marek ; Søgaard, Anders. / Grammatical Error Correction in Low Error Density Domains : A New Benchmark and Analyses. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, 2020. pp. 8467–8478

Bibtex

@inproceedings{2bfe73c1b53f4780a8fffa581faf9434,

title = "Grammatical Error Correction in Low Error Density Domains: A New Benchmark and Analyses",

abstract = "Evaluation of grammatical error correction (GEC) systems has primarily focused on essays written by non-native learners of English, which however is only part of the full spectrum of GEC applications. We aim to broaden the target domain of GEC and release CWEB, a new benchmark for GEC consisting of website text generated by English speakers of varying levels of proficiency. Website data is a common and important domain that contains far fewer grammatical errors than learner essays, which we show presents a challenge to state-of-the-art GEC systems. We demonstrate that a factor behind this is the inability of systems to rely on a strong internal language model in low error density domains. We hope this work shall facilitate the development of open-domain GEC models that generalize to different topics and genres.",

author = "Flachs, {Simon Hellemann} and Oph{\'e}lie Lacroix and Helen Yannakoudakis and Marek Rei and Anders S{\o}gaard",

year = "2020",

doi = "10.18653/v1/2020.emnlp-main.680",

language = "English",

pages = "8467–8478",

booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)",

publisher = "Association for Computational Linguistics",

note = "The 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020 ; Conference date: 16-11-2020 Through 20-11-2020",

url = "http://2020.emnlp.org",

}

RIS

TY - GEN

T1 - Grammatical Error Correction in Low Error Density Domains

T2 - The 2020 Conference on Empirical Methods in Natural Language Processing

AU - Flachs, Simon Hellemann

AU - Lacroix, Ophélie

AU - Yannakoudakis, Helen

AU - Rei, Marek

AU - Søgaard, Anders

PY - 2020

Y1 - 2020

N2 - Evaluation of grammatical error correction (GEC) systems has primarily focused on essays written by non-native learners of English, which however is only part of the full spectrum of GEC applications. We aim to broaden the target domain of GEC and release CWEB, a new benchmark for GEC consisting of website text generated by English speakers of varying levels of proficiency. Website data is a common and important domain that contains far fewer grammatical errors than learner essays, which we show presents a challenge to state-of-the-art GEC systems. We demonstrate that a factor behind this is the inability of systems to rely on a strong internal language model in low error density domains. We hope this work shall facilitate the development of open-domain GEC models that generalize to different topics and genres.

AB - Evaluation of grammatical error correction (GEC) systems has primarily focused on essays written by non-native learners of English, which however is only part of the full spectrum of GEC applications. We aim to broaden the target domain of GEC and release CWEB, a new benchmark for GEC consisting of website text generated by English speakers of varying levels of proficiency. Website data is a common and important domain that contains far fewer grammatical errors than learner essays, which we show presents a challenge to state-of-the-art GEC systems. We demonstrate that a factor behind this is the inability of systems to rely on a strong internal language model in low error density domains. We hope this work shall facilitate the development of open-domain GEC models that generalize to different topics and genres.

U2 - 10.18653/v1/2020.emnlp-main.680

DO - 10.18653/v1/2020.emnlp-main.680

M3 - Article in proceedings

SP - 8467

EP - 8478

BT - Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

PB - Association for Computational Linguistics

Y2 - 16 November 2020 through 20 November 2020

ER -

ID: 258376622

Department of Computer Science