Evaluation of Summarization Systems across Gender, Age, and Race

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Documents

Summarization systems are ultimately evaluated by human annotators and raters. Usually, annotators and raters do not reflect the demographics of end users, but are recruited through student populations or crowdsourcing platforms with skewed demographics. For two different evaluation scenarios – evaluation against gold summaries and system output ratings – we show that summary evaluation is sensitive to protected attributes. This can severely bias system development and evaluation, leading us to build models that cater for some groups rather than others.
Original languageEnglish
Title of host publicationProceedings of the Third Workshop on New Frontiers in Summarization
PublisherAssociation for Computational Linguistics
Publication date2021
Pages51–56
DOIs
Publication statusPublished - 2021
Event3rd Workshop on New Frontiers in Summarization - Online
Duration: 10 Nov 202110 Nov 2021

Conference

Conference3rd Workshop on New Frontiers in Summarization
ByOnline
Periode10/11/202110/11/2021

ID: 300074299