The Sensitivity of Language Models and Humans to Winograd Schema Perturbations

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Documents

The Sensitivity of Language Models and Humans
Final published version, 633 KB, PDF document

Mostafa Abdou
Vinit Ravishankar
Maria Barrett
Yonatan Belinkov
Elliott, Desmond
Søgaard, Anders

Large-scale pretrained language models are the major driving force behind recent improvements in perfromance on the Winograd Schema Challenge, a widely employed test of commonsense reasoning ability. We show, however, with a new diagnostic dataset, that these models are sensitive to linguistic perturbations of the Winograd examples that minimally affect human understanding. Our results highlight interesting differences between humans and language models: language models are more sensitive to number or gender alternations and synonym replacements than humans, and humans are more stable and consistent in their predictions, maintain a much higher absolute performance, and perform better on non-associative instances than associative ones.

Original language	English
Title of host publication	Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
Publisher	Association for Computational Linguistics
Publication date	2020
Pages	7590-7604
DOIs	https://doi.org/10.18653/v1/2020.acl-main.679
Publication status	Published - 2020
Event	58th Annual Meeting of the Association for Computational Linguistics - Online Duration: 5 Jul 2020 → 10 Jul 2020

Conference

Conference	58th Annual Meeting of the Association for Computational Linguistics
By	Online
Periode	05/07/2020 → 10/07/2020

Number of downloads are based on statistics from Google Scholar and www.ku.dk

No data available

ID: 258374819

Department of Computer Science