What does the Failure to Reason with “Respectively” in Zero/Few-Shot Settings Tell Us about Language Models?

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Standard

What does the Failure to Reason with “Respectively” in Zero/Few-Shot Settings Tell Us about Language Models? / Cui, Ruixiang; Lee, Seolhwa; Hershcovich, Daniel; Søgaard, Anders.

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics: Long Papers. Vol. 1 Association for Computational Linguistics (ACL), 2023. p. 8786-8800.

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Harvard

Cui, R, Lee, S, Hershcovich, D & Søgaard, A 2023, What does the Failure to Reason with “Respectively” in Zero/Few-Shot Settings Tell Us about Language Models? in Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics: Long Papers. vol. 1, Association for Computational Linguistics (ACL), pp. 8786-8800, 61st Annual Meeting of the Association for Computational Linguistics, ACL 2023, Toronto, Canada, 09/07/2023. https://doi.org/10.18653/v1/2023.acl-long.489

APA

Cui, R., Lee, S., Hershcovich, D., & Søgaard, A. (2023). What does the Failure to Reason with “Respectively” in Zero/Few-Shot Settings Tell Us about Language Models? In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics: Long Papers (Vol. 1, pp. 8786-8800). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.acl-long.489

Vancouver

Cui R, Lee S, Hershcovich D, Søgaard A. What does the Failure to Reason with “Respectively” in Zero/Few-Shot Settings Tell Us about Language Models? In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics: Long Papers. Vol. 1. Association for Computational Linguistics (ACL). 2023. p. 8786-8800 https://doi.org/10.18653/v1/2023.acl-long.489

Author

Cui, Ruixiang ; Lee, Seolhwa ; Hershcovich, Daniel ; Søgaard, Anders. / What does the Failure to Reason with “Respectively” in Zero/Few-Shot Settings Tell Us about Language Models?. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics: Long Papers. Vol. 1 Association for Computational Linguistics (ACL), 2023. pp. 8786-8800

Bibtex

@inproceedings{f90e3147ea6c4468a476125be376e9ac,
title = "What does the Failure to Reason with “Respectively” in Zero/Few-Shot Settings Tell Us about Language Models?",
abstract = "Humans can effortlessly understand the coordinate structure of sentences such as “Niels Bohr and Kurt Cobain were born in Copenhagen and Seattle, respectively”. In the context of natural language inference (NLI), we examine how language models (LMs) reason with respective readings (Gawron and Kehler, 2004) from two perspectives: syntactic-semantic and commonsense-world knowledge. We propose a controlled synthetic dataset WikiResNLI and a naturally occurring dataset NatResNLI to encompass various explicit and implicit realizations of “respectively”. We show that fine-tuned NLI models struggle with understanding such readings without explicit supervision. While few-shot learning is easy in the presence of explicit cues, longer training is required when the reading is evoked implicitly, leaving models to rely on common sense inferences. Furthermore, our fine-grained analysis indicates models fail to generalize across different constructions. To conclude, we demonstrate that LMs still lag behind humans in generalizing to the long tail of linguistic constructions.",
author = "Ruixiang Cui and Seolhwa Lee and Daniel Hershcovich and Anders S{\o}gaard",
note = "Publisher Copyright: {\textcopyright} 2023 Association for Computational Linguistics.; 61st Annual Meeting of the Association for Computational Linguistics, ACL 2023 ; Conference date: 09-07-2023 Through 14-07-2023",
year = "2023",
doi = "10.18653/v1/2023.acl-long.489",
language = "English",
volume = "1",
pages = "8786--8800",
booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics",
publisher = "Association for Computational Linguistics (ACL)",
address = "United States",

}

RIS

TY - GEN

T1 - What does the Failure to Reason with “Respectively” in Zero/Few-Shot Settings Tell Us about Language Models?

AU - Cui, Ruixiang

AU - Lee, Seolhwa

AU - Hershcovich, Daniel

AU - Søgaard, Anders

N1 - Publisher Copyright: © 2023 Association for Computational Linguistics.

PY - 2023

Y1 - 2023

N2 - Humans can effortlessly understand the coordinate structure of sentences such as “Niels Bohr and Kurt Cobain were born in Copenhagen and Seattle, respectively”. In the context of natural language inference (NLI), we examine how language models (LMs) reason with respective readings (Gawron and Kehler, 2004) from two perspectives: syntactic-semantic and commonsense-world knowledge. We propose a controlled synthetic dataset WikiResNLI and a naturally occurring dataset NatResNLI to encompass various explicit and implicit realizations of “respectively”. We show that fine-tuned NLI models struggle with understanding such readings without explicit supervision. While few-shot learning is easy in the presence of explicit cues, longer training is required when the reading is evoked implicitly, leaving models to rely on common sense inferences. Furthermore, our fine-grained analysis indicates models fail to generalize across different constructions. To conclude, we demonstrate that LMs still lag behind humans in generalizing to the long tail of linguistic constructions.

AB - Humans can effortlessly understand the coordinate structure of sentences such as “Niels Bohr and Kurt Cobain were born in Copenhagen and Seattle, respectively”. In the context of natural language inference (NLI), we examine how language models (LMs) reason with respective readings (Gawron and Kehler, 2004) from two perspectives: syntactic-semantic and commonsense-world knowledge. We propose a controlled synthetic dataset WikiResNLI and a naturally occurring dataset NatResNLI to encompass various explicit and implicit realizations of “respectively”. We show that fine-tuned NLI models struggle with understanding such readings without explicit supervision. While few-shot learning is easy in the presence of explicit cues, longer training is required when the reading is evoked implicitly, leaving models to rely on common sense inferences. Furthermore, our fine-grained analysis indicates models fail to generalize across different constructions. To conclude, we demonstrate that LMs still lag behind humans in generalizing to the long tail of linguistic constructions.

U2 - 10.18653/v1/2023.acl-long.489

DO - 10.18653/v1/2023.acl-long.489

M3 - Article in proceedings

AN - SCOPUS:85174409678

VL - 1

SP - 8786

EP - 8800

BT - Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics

PB - Association for Computational Linguistics (ACL)

T2 - 61st Annual Meeting of the Association for Computational Linguistics, ACL 2023

Y2 - 9 July 2023 through 14 July 2023

ER -

ID: 371030992