Are Pretrained Multilingual Models Equally Fair across Languages?

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Documents

Fulltext
Final published version, 777 KB, PDF document

Pretrained multilingual language models can help bridge the digital language divide, enabling high-quality NLP models for lower-resourced languages. Studies of multilingual models have so far focused on performance, consistency, and cross-lingual generalisation. However, with their wide-spread application in the wild and downstream societal impact, it is important to put multilingual models under the same scrutiny as monolingual models. This work investigates the group fairness of multilingual models, asking whether these models are equally fair across languages. To this end, we create a new four-way multilingual dataset of parallel cloze test examples (MozArt), equipped with demographic information (balanced with regard to gender and native tongue) about the test participants. We evaluate three multilingual models on MozArt –mBERT, XLM-R, and mT5– and show that across the four target languages, the three models exhibit different levels of group disparity, e.g., exhibiting near-equal risk for Spanish, but high levels of disparity for German.

Original language	English
Title of host publication	Proceedings of the 29th International Conference on Computational Linguistics
Publisher	International Committee on Computational Linguistics
Publication date	2022
Pages	3597–3605
Publication status	Published - 2022
Event	THE 29TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL LINGUISTICS - Hwabaek International Convention Center, GYEONGJU, Korea, Republic of Duration: 12 Oct 2022 → 17 Oct 2022 Conference number: 29 https://coling2022.org/coling

Conference

Conference	THE 29TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL LINGUISTICS
Nummer	29
Location	Hwabaek International Convention Center
Land	Korea, Republic of
By	GYEONGJU
Periode	12/10/2022 → 17/10/2022
Internetadresse	https://coling2022.org/coling

Number of downloads are based on statistics from Google Scholar and www.ku.dk

No data available

ID: 341498752

Department of Computer Science