Guideline Bias in Wizard-of-Oz Dialogues

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Standard

Guideline Bias in Wizard-of-Oz Dialogues. / Bach Hansen, Victor Petrén; Søgaard, Anders.

BPPF 2021 - 1st Workshop on Benchmarking: Past, Present and Future, Proceedings. ed. / Kenneth Church; Mark Liberman; Valia Kordoni. Association for Computational Linguistics, 2021. p. 8-14.

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Harvard

Bach Hansen, VP & Søgaard, A 2021, Guideline Bias in Wizard-of-Oz Dialogues. in K Church, M Liberman & V Kordoni (eds), BPPF 2021 - 1st Workshop on Benchmarking: Past, Present and Future, Proceedings. Association for Computational Linguistics, pp. 8-14, 1st Workshop on Benchmarking: Past, Present and Future, BPPF 2021, Virtual, Bangkok, Thailand, 05/08/2021. https://doi.org/10.18653/v1/2021.bppf-1.2

APA

Bach Hansen, V. P., & Søgaard, A. (2021). Guideline Bias in Wizard-of-Oz Dialogues. In K. Church, M. Liberman, & V. Kordoni (Eds.), BPPF 2021 - 1st Workshop on Benchmarking: Past, Present and Future, Proceedings (pp. 8-14). Association for Computational Linguistics. https://doi.org/10.18653/v1/2021.bppf-1.2

Vancouver

Bach Hansen VP, Søgaard A. Guideline Bias in Wizard-of-Oz Dialogues. In Church K, Liberman M, Kordoni V, editors, BPPF 2021 - 1st Workshop on Benchmarking: Past, Present and Future, Proceedings. Association for Computational Linguistics. 2021. p. 8-14 https://doi.org/10.18653/v1/2021.bppf-1.2

Author

Bach Hansen, Victor Petrén ; Søgaard, Anders. / Guideline Bias in Wizard-of-Oz Dialogues. BPPF 2021 - 1st Workshop on Benchmarking: Past, Present and Future, Proceedings. editor / Kenneth Church ; Mark Liberman ; Valia Kordoni. Association for Computational Linguistics, 2021. pp. 8-14

Bibtex

@inproceedings{d284e7b74b504015b7d7dd3d2ebd992a,

title = "Guideline Bias in Wizard-of-Oz Dialogues",

abstract = "NLP models struggle with generalization due to sampling and annotator bias. This paper focuses on a different kind of bias that has received very little attention: guideline bias, i.e., the bias introduced by how our annotator guidelines are formulated. We examine two recently introduced dialogue datasets, CCPE-M and Taskmaster-1, both collected by trained assistants in a Wizard-of-Oz set-up. For CCPE-M, we show how a simple lexical bias for the word like in the guidelines biases the data collection. This bias, in effect, leads to poor performance on data without this bias: a preference elicitation architecture based on BERT suffers a 5.3% absolute drop in performance, when like is replaced with a synonymous phrase, and a 13.2% drop in performance when evaluated on out-of-sample data. For Taskmaster-1, we show how the order in which instructions are presented, biases the data collection.",

author = "{Bach Hansen}, {Victor Petr{\'e}n} and Anders S{\o}gaard",

note = "Publisher Copyright: {\textcopyright}2021 Association for Computational Linguistics; 1st Workshop on Benchmarking: Past, Present and Future, BPPF 2021 ; Conference date: 05-08-2021 Through 06-08-2021",

year = "2021",

doi = "10.18653/v1/2021.bppf-1.2",

language = "English",

pages = "8--14",

editor = "Kenneth Church and Mark Liberman and Valia Kordoni",

booktitle = "BPPF 2021 - 1st Workshop on Benchmarking",

publisher = "Association for Computational Linguistics",

}

RIS

TY - GEN

T1 - Guideline Bias in Wizard-of-Oz Dialogues

AU - Bach Hansen, Victor Petrén

AU - Søgaard, Anders

PY - 2021

Y1 - 2021

N2 - NLP models struggle with generalization due to sampling and annotator bias. This paper focuses on a different kind of bias that has received very little attention: guideline bias, i.e., the bias introduced by how our annotator guidelines are formulated. We examine two recently introduced dialogue datasets, CCPE-M and Taskmaster-1, both collected by trained assistants in a Wizard-of-Oz set-up. For CCPE-M, we show how a simple lexical bias for the word like in the guidelines biases the data collection. This bias, in effect, leads to poor performance on data without this bias: a preference elicitation architecture based on BERT suffers a 5.3% absolute drop in performance, when like is replaced with a synonymous phrase, and a 13.2% drop in performance when evaluated on out-of-sample data. For Taskmaster-1, we show how the order in which instructions are presented, biases the data collection.

AB - NLP models struggle with generalization due to sampling and annotator bias. This paper focuses on a different kind of bias that has received very little attention: guideline bias, i.e., the bias introduced by how our annotator guidelines are formulated. We examine two recently introduced dialogue datasets, CCPE-M and Taskmaster-1, both collected by trained assistants in a Wizard-of-Oz set-up. For CCPE-M, we show how a simple lexical bias for the word like in the guidelines biases the data collection. This bias, in effect, leads to poor performance on data without this bias: a preference elicitation architecture based on BERT suffers a 5.3% absolute drop in performance, when like is replaced with a synonymous phrase, and a 13.2% drop in performance when evaluated on out-of-sample data. For Taskmaster-1, we show how the order in which instructions are presented, biases the data collection.

UR - http://www.scopus.com/inward/record.url?scp=85123954959&partnerID=8YFLogxK

U2 - 10.18653/v1/2021.bppf-1.2

DO - 10.18653/v1/2021.bppf-1.2

M3 - Article in proceedings

AN - SCOPUS:85123954959

SP - 8

EP - 14

BT - BPPF 2021 - 1st Workshop on Benchmarking

A2 - Church, Kenneth

A2 - Liberman, Mark

A2 - Kordoni, Valia

PB - Association for Computational Linguistics

T2 - 1st Workshop on Benchmarking: Past, Present and Future, BPPF 2021

Y2 - 5 August 2021 through 6 August 2021

ER -

ID: 291812390

Department of Computer Science