Implications of the Convergence of Language and Vision Model Geometries

Research output: Working paper › Preprint › Research

Standard

Implications of the Convergence of Language and Vision Model Geometries. / Li, Jiaang; Kementchedjhieva, Yova Radoslavova; Søgaard, Anders.

arXiv.org, 2023.

Research output: Working paper › Preprint › Research

Harvard

Li, J, Kementchedjhieva, YR & Søgaard, A 2023 'Implications of the Convergence of Language and Vision Model Geometries' arXiv.org. <https://arxiv.org/abs/2302.06555>

APA

Li, J., Kementchedjhieva, Y. R., & Søgaard, A. (2023). Implications of the Convergence of Language and Vision Model Geometries. arXiv.org. https://arxiv.org/abs/2302.06555

Vancouver

Li J, Kementchedjhieva YR, Søgaard A. Implications of the Convergence of Language and Vision Model Geometries. arXiv.org. 2023.

Author

Li, Jiaang ; Kementchedjhieva, Yova Radoslavova ; Søgaard, Anders. / Implications of the Convergence of Language and Vision Model Geometries. arXiv.org, 2023.

Bibtex

@techreport{eceb01b73cf2482db4139348cdc36a84,

title = "Implications of the Convergence of Language and Vision Model Geometries",

abstract = "Large-scale pretrained language models (LMs) are said to ``lack the ability to connect [their] utterances to the world'' (Bender and Koller, 2020). If so, we would expect LM representations to be unrelated to representations in computer vision models. To investigate this, we present an empirical evaluation across three different LMs (BERT, GPT2, and OPT) and three computer vision models (VMs, including ResNet, SegFormer, and MAE). Our experiments show that LMs converge towards representations that are partially isomorphic to those of VMs, with dispersion, and polysemy both factoring into the alignability of vision and language spaces. We discuss the implications of this finding.",

author = "Jiaang Li and Kementchedjhieva, {Yova Radoslavova} and Anders S{\o}gaard",

year = "2023",

language = "English",

publisher = "arXiv.org",

type = "WorkingPaper",

institution = "arXiv.org",

}

RIS

TY - UNPB

T1 - Implications of the Convergence of Language and Vision Model Geometries

AU - Li, Jiaang

AU - Kementchedjhieva, Yova Radoslavova

AU - Søgaard, Anders

PY - 2023

Y1 - 2023

N2 - Large-scale pretrained language models (LMs) are said to ``lack the ability to connect [their] utterances to the world'' (Bender and Koller, 2020). If so, we would expect LM representations to be unrelated to representations in computer vision models. To investigate this, we present an empirical evaluation across three different LMs (BERT, GPT2, and OPT) and three computer vision models (VMs, including ResNet, SegFormer, and MAE). Our experiments show that LMs converge towards representations that are partially isomorphic to those of VMs, with dispersion, and polysemy both factoring into the alignability of vision and language spaces. We discuss the implications of this finding.

AB - Large-scale pretrained language models (LMs) are said to ``lack the ability to connect [their] utterances to the world'' (Bender and Koller, 2020). If so, we would expect LM representations to be unrelated to representations in computer vision models. To investigate this, we present an empirical evaluation across three different LMs (BERT, GPT2, and OPT) and three computer vision models (VMs, including ResNet, SegFormer, and MAE). Our experiments show that LMs converge towards representations that are partially isomorphic to those of VMs, with dispersion, and polysemy both factoring into the alignability of vision and language spaces. We discuss the implications of this finding.

M3 - Preprint

BT - Implications of the Convergence of Language and Vision Model Geometries

PB - arXiv.org

ER -

ID: 381730287

Department of Computer Science