Guideline Bias in Wizard-of-Oz Dialogues

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Documents

Fulltext
Final published version, 304 KB, PDF document

Victor Petrén Bach Hansen
Søgaard, Anders

NLP models struggle with generalization due to sampling and annotator bias. This paper focuses on a different kind of bias that has received very little attention: guideline bias, i.e., the bias introduced by how our annotator guidelines are formulated. We examine two recently introduced dialogue datasets, CCPE-M and Taskmaster-1, both collected by trained assistants in a Wizard-of-Oz set-up. For CCPE-M, we show how a simple lexical bias for the word like in the guidelines biases the data collection. This bias, in effect, leads to poor performance on data without this bias: a preference elicitation architecture based on BERT suffers a 5.3% absolute drop in performance, when like is replaced with a synonymous phrase, and a 13.2% drop in performance when evaluated on out-of-sample data. For Taskmaster-1, we show how the order in which instructions are presented, biases the data collection.

Original language	English
Title of host publication	BPPF 2021 - 1st Workshop on Benchmarking : Past, Present and Future, Proceedings
Editors	Kenneth Church, Mark Liberman, Valia Kordoni
Publisher	Association for Computational Linguistics
Publication date	2021
Pages	8-14
ISBN (Electronic)	9781954085589
DOIs	https://doi.org/10.18653/v1/2021.bppf-1.2
Publication status	Published - 2021
Event	1st Workshop on Benchmarking: Past, Present and Future, BPPF 2021 - Virtual, Bangkok, Thailand Duration: 5 Aug 2021 → 6 Aug 2021

Conference

Conference	1st Workshop on Benchmarking: Past, Present and Future, BPPF 2021
Land	Thailand
By	Virtual, Bangkok
Periode	05/08/2021 → 06/08/2021

Bibliographical note

ID: 291812390

Department of Computer Science