Are Pretrained Multilingual Models Equally Fair across Languages?

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Standard

Are Pretrained Multilingual Models Equally Fair across Languages? / Cabello Piqueras, Laura; Søgaard, Anders.

Proceedings of the 29th International Conference on Computational Linguistics. International Committee on Computational Linguistics, 2022. p. 3597–3605.

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Harvard

Cabello Piqueras, L & Søgaard, A 2022, Are Pretrained Multilingual Models Equally Fair across Languages? in Proceedings of the 29th International Conference on Computational Linguistics. International Committee on Computational Linguistics, pp. 3597–3605, THE 29TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL LINGUISTICS, GYEONGJU, Korea, Republic of, 12/10/2022. <https://aclanthology.org/2022.coling-1.318>

APA

Cabello Piqueras, L., & Søgaard, A. (2022). Are Pretrained Multilingual Models Equally Fair across Languages? In Proceedings of the 29th International Conference on Computational Linguistics (pp. 3597–3605). International Committee on Computational Linguistics. https://aclanthology.org/2022.coling-1.318

Vancouver

Cabello Piqueras L, Søgaard A. Are Pretrained Multilingual Models Equally Fair across Languages? In Proceedings of the 29th International Conference on Computational Linguistics. International Committee on Computational Linguistics. 2022. p. 3597–3605

Author

Cabello Piqueras, Laura ; Søgaard, Anders. / Are Pretrained Multilingual Models Equally Fair across Languages?. Proceedings of the 29th International Conference on Computational Linguistics. International Committee on Computational Linguistics, 2022. pp. 3597–3605

Bibtex

@inproceedings{19b2db0f8b0444259be56cb6628f20de,
title = "Are Pretrained Multilingual Models Equally Fair across Languages?",
abstract = "Pretrained multilingual language models can help bridge the digital language divide, enabling high-quality NLP models for lower-resourced languages. Studies of multilingual models have so far focused on performance, consistency, and cross-lingual generalisation. However, with their wide-spread application in the wild and downstream societal impact, it is important to put multilingual models under the same scrutiny as monolingual models. This work investigates the group fairness of multilingual models, asking whether these models are equally fair across languages. To this end, we create a new four-way multilingual dataset of parallel cloze test examples (MozArt), equipped with demographic information (balanced with regard to gender and native tongue) about the test participants. We evaluate three multilingual models on MozArt –mBERT, XLM-R, and mT5– and show that across the four target languages, the three models exhibit different levels of group disparity, e.g., exhibiting near-equal risk for Spanish, but high levels of disparity for German.",
author = "{Cabello Piqueras}, Laura and Anders S{\o}gaard",
year = "2022",
language = "English",
pages = "3597–3605",
booktitle = "Proceedings of the 29th International Conference on Computational Linguistics",
publisher = "International Committee on Computational Linguistics",
note = "THE 29TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL LINGUISTICS, COLING 2022 ; Conference date: 12-10-2022 Through 17-10-2022",
url = "https://coling2022.org/coling",

}

RIS

TY - GEN

T1 - Are Pretrained Multilingual Models Equally Fair across Languages?

AU - Cabello Piqueras, Laura

AU - Søgaard, Anders

N1 - Conference code: 29

PY - 2022

Y1 - 2022

N2 - Pretrained multilingual language models can help bridge the digital language divide, enabling high-quality NLP models for lower-resourced languages. Studies of multilingual models have so far focused on performance, consistency, and cross-lingual generalisation. However, with their wide-spread application in the wild and downstream societal impact, it is important to put multilingual models under the same scrutiny as monolingual models. This work investigates the group fairness of multilingual models, asking whether these models are equally fair across languages. To this end, we create a new four-way multilingual dataset of parallel cloze test examples (MozArt), equipped with demographic information (balanced with regard to gender and native tongue) about the test participants. We evaluate three multilingual models on MozArt –mBERT, XLM-R, and mT5– and show that across the four target languages, the three models exhibit different levels of group disparity, e.g., exhibiting near-equal risk for Spanish, but high levels of disparity for German.

AB - Pretrained multilingual language models can help bridge the digital language divide, enabling high-quality NLP models for lower-resourced languages. Studies of multilingual models have so far focused on performance, consistency, and cross-lingual generalisation. However, with their wide-spread application in the wild and downstream societal impact, it is important to put multilingual models under the same scrutiny as monolingual models. This work investigates the group fairness of multilingual models, asking whether these models are equally fair across languages. To this end, we create a new four-way multilingual dataset of parallel cloze test examples (MozArt), equipped with demographic information (balanced with regard to gender and native tongue) about the test participants. We evaluate three multilingual models on MozArt –mBERT, XLM-R, and mT5– and show that across the four target languages, the three models exhibit different levels of group disparity, e.g., exhibiting near-equal risk for Spanish, but high levels of disparity for German.

M3 - Article in proceedings

SP - 3597

EP - 3605

BT - Proceedings of the 29th International Conference on Computational Linguistics

PB - International Committee on Computational Linguistics

T2 - THE 29TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL LINGUISTICS

Y2 - 12 October 2022 through 17 October 2022

ER -

ID: 341498752