Are Multilingual Sentiment Models Equally Right for the Right Reasons?

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Standard

Are Multilingual Sentiment Models Equally Right for the Right Reasons? / Jørgensen, Rasmus Kær; Caccavale, Fiammetta; Igel, Christian; Søgaard, Anders.

Proceedings of the Fifth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP. Association for Computational Linguistics (ACL), 2022. p. 131–141.

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Harvard

Jørgensen, RK, Caccavale, F, Igel, C & Søgaard, A 2022, Are Multilingual Sentiment Models Equally Right for the Right Reasons? in Proceedings of the Fifth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP. Association for Computational Linguistics (ACL), pp. 131–141, Fifth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, Abu Dhabi, 08/12/2022. <https://aclanthology.org/2022.blackboxnlp-1.11/>

APA

Jørgensen, R. K., Caccavale, F., Igel, C., & Søgaard, A. (2022). Are Multilingual Sentiment Models Equally Right for the Right Reasons? In Proceedings of the Fifth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP (pp. 131–141). Association for Computational Linguistics (ACL). https://aclanthology.org/2022.blackboxnlp-1.11/

Vancouver

Jørgensen RK, Caccavale F, Igel C, Søgaard A. Are Multilingual Sentiment Models Equally Right for the Right Reasons? In Proceedings of the Fifth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP. Association for Computational Linguistics (ACL). 2022. p. 131–141

Author

Jørgensen, Rasmus Kær ; Caccavale, Fiammetta ; Igel, Christian ; Søgaard, Anders. / Are Multilingual Sentiment Models Equally Right for the Right Reasons?. Proceedings of the Fifth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP. Association for Computational Linguistics (ACL), 2022. pp. 131–141

Bibtex

@inproceedings{5e9996bbecfa4f3bb45938ae58cd83dd,
title = "Are Multilingual Sentiment Models Equally Right for the Right Reasons?",
abstract = "Multilingual NLP models provide potential solutions to the digital language divide, i.e., cross-language performance disparities. Early analyses of such models have indicated good performance across training languages and good generalization to unseen, related languages. This work examines whether, between related languages, multilingual models are equally right for the right reasons, i.e., if interpretability methods reveal that the models put emphasis on the same words as humans. To this end, we provide a new trilingual, parallel corpus of rationale annotations for English, Danish, and Italian sentiment analysis models and use it to benchmark models and interpretability methods. We propose rank-biased overlap as a better metric for comparing input token attributions to human rationale annotations. Our results show: (i) models generally perform well on the languages they are trained on, and align best with human rationales in these languages; (ii) performance is higher on English, even when not a source language, but this performance is not accompanied by higher alignment with human rationales, which suggests that language models favor English, but do not facilitate successful transfer of rationales.",
author = "J{\o}rgensen, {Rasmus K{\ae}r} and Fiammetta Caccavale and Christian Igel and Anders S{\o}gaard",
year = "2022",
language = "English",
pages = "131–141",
booktitle = "Proceedings of the Fifth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP",
publisher = "Association for Computational Linguistics (ACL)",
address = "United States",
note = "Fifth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP ; Conference date: 08-12-2022 Through 08-12-2022",

}

RIS

TY - GEN

T1 - Are Multilingual Sentiment Models Equally Right for the Right Reasons?

AU - Jørgensen, Rasmus Kær

AU - Caccavale, Fiammetta

AU - Igel, Christian

AU - Søgaard, Anders

PY - 2022

Y1 - 2022

N2 - Multilingual NLP models provide potential solutions to the digital language divide, i.e., cross-language performance disparities. Early analyses of such models have indicated good performance across training languages and good generalization to unseen, related languages. This work examines whether, between related languages, multilingual models are equally right for the right reasons, i.e., if interpretability methods reveal that the models put emphasis on the same words as humans. To this end, we provide a new trilingual, parallel corpus of rationale annotations for English, Danish, and Italian sentiment analysis models and use it to benchmark models and interpretability methods. We propose rank-biased overlap as a better metric for comparing input token attributions to human rationale annotations. Our results show: (i) models generally perform well on the languages they are trained on, and align best with human rationales in these languages; (ii) performance is higher on English, even when not a source language, but this performance is not accompanied by higher alignment with human rationales, which suggests that language models favor English, but do not facilitate successful transfer of rationales.

AB - Multilingual NLP models provide potential solutions to the digital language divide, i.e., cross-language performance disparities. Early analyses of such models have indicated good performance across training languages and good generalization to unseen, related languages. This work examines whether, between related languages, multilingual models are equally right for the right reasons, i.e., if interpretability methods reveal that the models put emphasis on the same words as humans. To this end, we provide a new trilingual, parallel corpus of rationale annotations for English, Danish, and Italian sentiment analysis models and use it to benchmark models and interpretability methods. We propose rank-biased overlap as a better metric for comparing input token attributions to human rationale annotations. Our results show: (i) models generally perform well on the languages they are trained on, and align best with human rationales in these languages; (ii) performance is higher on English, even when not a source language, but this performance is not accompanied by higher alignment with human rationales, which suggests that language models favor English, but do not facilitate successful transfer of rationales.

UR - https://aclanthology.org/2022.blackboxnlp-1.11

M3 - Article in proceedings

SP - 131

EP - 141

BT - Proceedings of the Fifth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP

PB - Association for Computational Linguistics (ACL)

T2 - Fifth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP

Y2 - 8 December 2022 through 8 December 2022

ER -

ID: 338603346