Evaluation of Summarization Systems across Gender, Age, and Race

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Standard

Evaluation of Summarization Systems across Gender, Age, and Race. / Jørgensen, Anna; Søgaard, Anders.

Proceedings of the Third Workshop on New Frontiers in Summarization. Association for Computational Linguistics, 2021. p. 51–56.

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Harvard

Jørgensen, A & Søgaard, A 2021, Evaluation of Summarization Systems across Gender, Age, and Race. in Proceedings of the Third Workshop on New Frontiers in Summarization. Association for Computational Linguistics, pp. 51–56, 3rd Workshop on New Frontiers in Summarization, Online, 10/11/2021. https://doi.org/10.18653/v1/2021.newsum-1.6

APA

Jørgensen, A., & Søgaard, A. (2021). Evaluation of Summarization Systems across Gender, Age, and Race. In Proceedings of the Third Workshop on New Frontiers in Summarization (pp. 51–56). Association for Computational Linguistics. https://doi.org/10.18653/v1/2021.newsum-1.6

Vancouver

Jørgensen A, Søgaard A. Evaluation of Summarization Systems across Gender, Age, and Race. In Proceedings of the Third Workshop on New Frontiers in Summarization. Association for Computational Linguistics. 2021. p. 51–56 https://doi.org/10.18653/v1/2021.newsum-1.6

Author

Jørgensen, Anna ; Søgaard, Anders. / Evaluation of Summarization Systems across Gender, Age, and Race. Proceedings of the Third Workshop on New Frontiers in Summarization. Association for Computational Linguistics, 2021. pp. 51–56

Bibtex

@inproceedings{32bcd7d856354297ae827283a66460e7,
title = "Evaluation of Summarization Systems across Gender, Age, and Race",
abstract = "Summarization systems are ultimately evaluated by human annotators and raters. Usually, annotators and raters do not reflect the demographics of end users, but are recruited through student populations or crowdsourcing platforms with skewed demographics. For two different evaluation scenarios – evaluation against gold summaries and system output ratings – we show that summary evaluation is sensitive to protected attributes. This can severely bias system development and evaluation, leading us to build models that cater for some groups rather than others.",
author = "Anna J{\o}rgensen and Anders S{\o}gaard",
year = "2021",
doi = "10.18653/v1/2021.newsum-1.6",
language = "English",
pages = "51–56",
booktitle = "Proceedings of the Third Workshop on New Frontiers in Summarization",
publisher = "Association for Computational Linguistics",
note = "3rd Workshop on New Frontiers in Summarization ; Conference date: 10-11-2021 Through 10-11-2021",

}

RIS

TY - GEN

T1 - Evaluation of Summarization Systems across Gender, Age, and Race

AU - Jørgensen, Anna

AU - Søgaard, Anders

PY - 2021

Y1 - 2021

N2 - Summarization systems are ultimately evaluated by human annotators and raters. Usually, annotators and raters do not reflect the demographics of end users, but are recruited through student populations or crowdsourcing platforms with skewed demographics. For two different evaluation scenarios – evaluation against gold summaries and system output ratings – we show that summary evaluation is sensitive to protected attributes. This can severely bias system development and evaluation, leading us to build models that cater for some groups rather than others.

AB - Summarization systems are ultimately evaluated by human annotators and raters. Usually, annotators and raters do not reflect the demographics of end users, but are recruited through student populations or crowdsourcing platforms with skewed demographics. For two different evaluation scenarios – evaluation against gold summaries and system output ratings – we show that summary evaluation is sensitive to protected attributes. This can severely bias system development and evaluation, leading us to build models that cater for some groups rather than others.

U2 - 10.18653/v1/2021.newsum-1.6

DO - 10.18653/v1/2021.newsum-1.6

M3 - Article in proceedings

SP - 51

EP - 56

BT - Proceedings of the Third Workshop on New Frontiers in Summarization

PB - Association for Computational Linguistics

T2 - 3rd Workshop on New Frontiers in Summarization

Y2 - 10 November 2021 through 10 November 2021

ER -

ID: 300074299