Weakly Supervised POS Taggers Perform Poorly on Truly Low-Resource Languages

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Standard

Weakly Supervised POS Taggers Perform Poorly on Truly Low-Resource Languages. / Kann, Katharina ; Lacroix, Ophélie; Søgaard, Anders.

Proceedings of the AAAI Conference on Artificial Intelligence (AAAI 2020): [AAAI-20 Technical Tracks 5]. AAAI Press, 2020. p. 8066-8073.

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Harvard

Kann, K, Lacroix, O & Søgaard, A 2020, Weakly Supervised POS Taggers Perform Poorly on Truly Low-Resource Languages. in Proceedings of the AAAI Conference on Artificial Intelligence (AAAI 2020): [AAAI-20 Technical Tracks 5]. AAAI Press, pp. 8066-8073., Thirty-Forth AAAI Conference on Artificial Intelligence, New York, United States, 07/02/2020. https://doi.org/10.1609/aaai.v34i05.6317

APA

Kann, K., Lacroix, O., & Søgaard, A. (2020). Weakly Supervised POS Taggers Perform Poorly on Truly Low-Resource Languages. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI 2020): [AAAI-20 Technical Tracks 5] (pp. 8066-8073.). AAAI Press. https://doi.org/10.1609/aaai.v34i05.6317

Vancouver

Kann K, Lacroix O, Søgaard A. Weakly Supervised POS Taggers Perform Poorly on Truly Low-Resource Languages. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI 2020): [AAAI-20 Technical Tracks 5]. AAAI Press. 2020. p. 8066-8073. https://doi.org/10.1609/aaai.v34i05.6317

Author

Kann, Katharina ; Lacroix, Ophélie ; Søgaard, Anders. / Weakly Supervised POS Taggers Perform Poorly on Truly Low-Resource Languages. Proceedings of the AAAI Conference on Artificial Intelligence (AAAI 2020): [AAAI-20 Technical Tracks 5]. AAAI Press, 2020. pp. 8066-8073.

Bibtex

@inproceedings{f4534be8374b44ff8edeffffbecd617b,

title = "Weakly Supervised POS Taggers Perform Poorly on Truly Low-Resource Languages",

abstract = "Part-of-speech (POS) taggers for low-resource languages which are exclusively based on various forms of weak supervision – e.g., cross-lingual transfer, type-level supervision, or a combination thereof – have been reported to perform almost as well as supervised ones. However, weakly supervised POS taggers are commonly only evaluated on languages that are very different from truly low-resource languages, and the taggers use sources of information, like high-coverage and almost error-free dictionaries, which are likely not available for resource-poor languages. We train and evaluate state-of-the-art weakly supervised POS taggers for a typologically diverse set of 15 truly low-resource languages. On these languages, given a realistic amount of resources, even our best model gets only less than half of the words right. Our results highlight the need for new and different approaches to POS tagging for truly low-resource languages.",

author = "Katharina Kann and Oph{\'e}lie Lacroix and Anders S{\o}gaard",

year = "2020",

doi = "10.1609/aaai.v34i05.6317",

language = "English",

pages = "8066--8073.",

booktitle = "Proceedings of the AAAI Conference on Artificial Intelligence (AAAI 2020)",

publisher = "AAAI Press",

note = "Thirty-Forth AAAI Conference on Artificial Intelligence : AAAI 2020 ; Conference date: 07-02-2020 Through 12-02-2020",

url = "https://aaai.org/Conferences/AAAI-20/",

}

RIS

TY - GEN

T1 - Weakly Supervised POS Taggers Perform Poorly on Truly Low-Resource Languages

AU - Kann, Katharina

AU - Lacroix, Ophélie

AU - Søgaard, Anders

PY - 2020

Y1 - 2020

N2 - Part-of-speech (POS) taggers for low-resource languages which are exclusively based on various forms of weak supervision – e.g., cross-lingual transfer, type-level supervision, or a combination thereof – have been reported to perform almost as well as supervised ones. However, weakly supervised POS taggers are commonly only evaluated on languages that are very different from truly low-resource languages, and the taggers use sources of information, like high-coverage and almost error-free dictionaries, which are likely not available for resource-poor languages. We train and evaluate state-of-the-art weakly supervised POS taggers for a typologically diverse set of 15 truly low-resource languages. On these languages, given a realistic amount of resources, even our best model gets only less than half of the words right. Our results highlight the need for new and different approaches to POS tagging for truly low-resource languages.

AB - Part-of-speech (POS) taggers for low-resource languages which are exclusively based on various forms of weak supervision – e.g., cross-lingual transfer, type-level supervision, or a combination thereof – have been reported to perform almost as well as supervised ones. However, weakly supervised POS taggers are commonly only evaluated on languages that are very different from truly low-resource languages, and the taggers use sources of information, like high-coverage and almost error-free dictionaries, which are likely not available for resource-poor languages. We train and evaluate state-of-the-art weakly supervised POS taggers for a typologically diverse set of 15 truly low-resource languages. On these languages, given a realistic amount of resources, even our best model gets only less than half of the words right. Our results highlight the need for new and different approaches to POS tagging for truly low-resource languages.

U2 - 10.1609/aaai.v34i05.6317

DO - 10.1609/aaai.v34i05.6317

M3 - Article in proceedings

SP - 8066-8073.

BT - Proceedings of the AAAI Conference on Artificial Intelligence (AAAI 2020)

PB - AAAI Press

T2 - Thirty-Forth AAAI Conference on Artificial Intelligence

Y2 - 7 February 2020 through 12 February 2020

ER -

ID: 258334497

Department of Computer Science