Grounding the Vector Space of an Octopus: Word Meaning from Raw Text

Research output: Contribution to journalJournal articleResearchpeer-review

Standard

Grounding the Vector Space of an Octopus : Word Meaning from Raw Text. / Søgaard, Anders.

In: Minds and Machines, Vol. 33, No. 1, 2023, p. 33–54.

Research output: Contribution to journalJournal articleResearchpeer-review

Harvard

Søgaard, A 2023, 'Grounding the Vector Space of an Octopus: Word Meaning from Raw Text', Minds and Machines, vol. 33, no. 1, pp. 33–54. https://doi.org/10.1007/s11023-023-09622-4

APA

Søgaard, A. (2023). Grounding the Vector Space of an Octopus: Word Meaning from Raw Text. Minds and Machines, 33(1), 33–54. https://doi.org/10.1007/s11023-023-09622-4

Vancouver

Søgaard A. Grounding the Vector Space of an Octopus: Word Meaning from Raw Text. Minds and Machines. 2023;33(1):33–54. https://doi.org/10.1007/s11023-023-09622-4

Author

Søgaard, Anders. / Grounding the Vector Space of an Octopus : Word Meaning from Raw Text. In: Minds and Machines. 2023 ; Vol. 33, No. 1. pp. 33–54.

Bibtex

@article{3e08b95ba7a747c0b46f878120333c85,
title = "Grounding the Vector Space of an Octopus: Word Meaning from Raw Text",
abstract = "Most, if not all, philosophers agree that computers cannot learn what words refers to from raw text alone. While many attacked Searle{\textquoteright}s Chinese Room thought experiment, no one seemed to question this most basic assumption. For how can computers learn something that is not in the data? Emily Bender and Alexander Koller (2020) recently presented a related thought experiment—the so-called Octopus thought experiment, which replaces the rule-based interlocutor of Searle{\textquoteright}s thought experiment with a neural language model. The Octopus thought experiment was awarded a best paper prize and was widely debated in the AI community. Again, however, even its fiercest opponents accepted the premise that what a word refers to cannot be induced in the absence of direct supervision. I will argue that what a word refers to is probably learnable from raw text alone. Here{\textquoteright}s why: higher-order concept co-occurrence statistics are stable across languages and across modalities, because language use (universally) reflects the world we live in (which is relatively stable). Such statistics are sufficient to establish what words refer to. My conjecture is supported by a literature survey, a thought experiment, and an actual experiment.",
keywords = "Chinese room, Grounding, Language models",
author = "Anders S{\o}gaard",
note = "Publisher Copyright: {\textcopyright} 2023, The Author(s).",
year = "2023",
doi = "10.1007/s11023-023-09622-4",
language = "English",
volume = "33",
pages = "33–54",
journal = "Minds and Machines",
issn = "0924-6495",
publisher = "Springer Netherlands",
number = "1",

}

RIS

TY - JOUR

T1 - Grounding the Vector Space of an Octopus

T2 - Word Meaning from Raw Text

AU - Søgaard, Anders

N1 - Publisher Copyright: © 2023, The Author(s).

PY - 2023

Y1 - 2023

N2 - Most, if not all, philosophers agree that computers cannot learn what words refers to from raw text alone. While many attacked Searle’s Chinese Room thought experiment, no one seemed to question this most basic assumption. For how can computers learn something that is not in the data? Emily Bender and Alexander Koller (2020) recently presented a related thought experiment—the so-called Octopus thought experiment, which replaces the rule-based interlocutor of Searle’s thought experiment with a neural language model. The Octopus thought experiment was awarded a best paper prize and was widely debated in the AI community. Again, however, even its fiercest opponents accepted the premise that what a word refers to cannot be induced in the absence of direct supervision. I will argue that what a word refers to is probably learnable from raw text alone. Here’s why: higher-order concept co-occurrence statistics are stable across languages and across modalities, because language use (universally) reflects the world we live in (which is relatively stable). Such statistics are sufficient to establish what words refer to. My conjecture is supported by a literature survey, a thought experiment, and an actual experiment.

AB - Most, if not all, philosophers agree that computers cannot learn what words refers to from raw text alone. While many attacked Searle’s Chinese Room thought experiment, no one seemed to question this most basic assumption. For how can computers learn something that is not in the data? Emily Bender and Alexander Koller (2020) recently presented a related thought experiment—the so-called Octopus thought experiment, which replaces the rule-based interlocutor of Searle’s thought experiment with a neural language model. The Octopus thought experiment was awarded a best paper prize and was widely debated in the AI community. Again, however, even its fiercest opponents accepted the premise that what a word refers to cannot be induced in the absence of direct supervision. I will argue that what a word refers to is probably learnable from raw text alone. Here’s why: higher-order concept co-occurrence statistics are stable across languages and across modalities, because language use (universally) reflects the world we live in (which is relatively stable). Such statistics are sufficient to establish what words refer to. My conjecture is supported by a literature survey, a thought experiment, and an actual experiment.

KW - Chinese room

KW - Grounding

KW - Language models

U2 - 10.1007/s11023-023-09622-4

DO - 10.1007/s11023-023-09622-4

M3 - Journal article

AN - SCOPUS:85146724493

VL - 33

SP - 33

EP - 54

JO - Minds and Machines

JF - Minds and Machines

SN - 0924-6495

IS - 1

ER -

ID: 335693678