Some Languages Seem Easier to Parse Because Their Treebanks Leak

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Standard

Some Languages Seem Easier to Parse Because Their Treebanks Leak. / Søgaard, Anders.

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, 2020. p. 2765–2770.

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Harvard

Søgaard, A 2020, Some Languages Seem Easier to Parse Because Their Treebanks Leak. in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, pp. 2765–2770, The 2020 Conference on Empirical Methods in Natural Language Processing, 16/11/2020. https://doi.org/10.18653/v1/2020.emnlp-main.220

APA

Søgaard, A. (2020). Some Languages Seem Easier to Parse Because Their Treebanks Leak. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 2765–2770). Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.emnlp-main.220

Vancouver

Søgaard A. Some Languages Seem Easier to Parse Because Their Treebanks Leak. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics. 2020. p. 2765–2770 https://doi.org/10.18653/v1/2020.emnlp-main.220

Author

Søgaard, Anders. / Some Languages Seem Easier to Parse Because Their Treebanks Leak. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, 2020. pp. 2765–2770

Bibtex

@inproceedings{733210f70b624301bc2421b840eb21cf,

title = "Some Languages Seem Easier to Parse Because Their Treebanks Leak",

abstract = "Cross-language differences in (universal) dependency parsing performance are mostly attributed to treebank size, average sentence length, average dependency length, morphological complexity, and domain differences. We point at a factor not previously discussed: If we abstract away from words and dependency labels, how many graphs in the test data were seen in the training data? We compute graph isomorphisms, and show that, treebank size aside, overlap between training and test graphs explain more of the observed variation than standard explanations such as the above.",

author = "Anders S{\o}gaard",

year = "2020",

doi = "10.18653/v1/2020.emnlp-main.220",

language = "English",

pages = "2765–2770",

booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)",

publisher = "Association for Computational Linguistics",

note = "The 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020 ; Conference date: 16-11-2020 Through 20-11-2020",

url = "http://2020.emnlp.org",

}

RIS

TY - GEN

T1 - Some Languages Seem Easier to Parse Because Their Treebanks Leak

AU - Søgaard, Anders

PY - 2020

Y1 - 2020

N2 - Cross-language differences in (universal) dependency parsing performance are mostly attributed to treebank size, average sentence length, average dependency length, morphological complexity, and domain differences. We point at a factor not previously discussed: If we abstract away from words and dependency labels, how many graphs in the test data were seen in the training data? We compute graph isomorphisms, and show that, treebank size aside, overlap between training and test graphs explain more of the observed variation than standard explanations such as the above.

AB - Cross-language differences in (universal) dependency parsing performance are mostly attributed to treebank size, average sentence length, average dependency length, morphological complexity, and domain differences. We point at a factor not previously discussed: If we abstract away from words and dependency labels, how many graphs in the test data were seen in the training data? We compute graph isomorphisms, and show that, treebank size aside, overlap between training and test graphs explain more of the observed variation than standard explanations such as the above.

U2 - 10.18653/v1/2020.emnlp-main.220

DO - 10.18653/v1/2020.emnlp-main.220

M3 - Article in proceedings

SP - 2765

EP - 2770

BT - Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

PB - Association for Computational Linguistics

T2 - The 2020 Conference on Empirical Methods in Natural Language Processing

Y2 - 16 November 2020 through 20 November 2020

ER -

ID: 258390141

Department of Computer Science