Weakly Supervised POS Taggers Perform Poorly on Truly Low-Resource Languages

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Documents

Weakly Supervised POS Taggers Perform Poorly on Truly Low-Resource Languages
Accepted author manuscript, 156 KB, PDF document

Katharina Kann
Ophélie Lacroix
Søgaard, Anders

Part-of-speech (POS) taggers for low-resource languages which are exclusively based on various forms of weak supervision – e.g., cross-lingual transfer, type-level supervision, or a combination thereof – have been reported to perform almost as well as supervised ones. However, weakly supervised POS taggers are commonly only evaluated on languages that are very different from truly low-resource languages, and the taggers use sources of information, like high-coverage and almost error-free dictionaries, which are likely not available for resource-poor languages. We train and evaluate state-of-the-art weakly supervised POS taggers for a typologically diverse set of 15 truly low-resource languages. On these languages, given a realistic amount of resources, even our best model gets only less than half of the words right. Our results highlight the need for new and different approaches to POS tagging for truly low-resource languages.

Original language	English
Title of host publication	Proceedings of the AAAI Conference on Artificial Intelligence (AAAI 2020) : [AAAI-20 Technical Tracks 5]
Publisher	AAAI Press
Publication date	2020
Pages	8066-8073.
ISBN (Electronic)	978-1-57735-835-0
DOIs	https://doi.org/10.1609/aaai.v34i05.6317
Publication status	Published - 2020
Event	Thirty-Forth AAAI Conference on Artificial Intelligence: AAAI 2020 - New York, United States Duration: 7 Feb 2020 → 12 Feb 2020 https://aaai.org/Conferences/AAAI-20/

Conference

Conference	Thirty-Forth AAAI Conference on Artificial Intelligence
Land	United States
By	New York
Periode	07/02/2020 → 12/02/2020
Internetadresse	https://aaai.org/Conferences/AAAI-20/

ID: 258334497

Department of Computer Science