On the Limitations of Unsupervised Bilingual Dictionary Induction

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Søgaard, Anders
Sebastian Ruder
Ivan Vulic

Unsupervised machine translation—i.e.,not assuming any cross-lingual supervisionsignal, whether a dictionary, translations,or comparable corpora—seems impossible,but nevertheless, Lample et al.(2018a) recently proposed a fully unsupervisedmachine translation (MT) model.The model relies heavily on an adversarial,unsupervised alignment of word embeddingspaces for bilingual dictionary induction(Conneau et al., 2018), which weexamine here. Our results identify the limitationsof current unsupervised MT: unsupervisedbilingual dictionary inductionperforms much worse on morphologicallyrich languages that are not dependent marking,when monolingual corpora from differentdomains or different embedding algorithmsare used. We show that a simpletrick, exploiting a weak supervision signalfrom identical words, enables more robustinduction, and establish a near-perfectcorrelation between unsupervised bilingualdictionary induction performance and a previouslyunexplored graph similarity metric

Original language	English
Title of host publication	Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics : (Long papers)
Publisher	Association for Computational Linguistics
Publication date	2018
Pages	778–788
Publication status	Published - 2018
Event	56th Annual Meeting of the Association for Computational Linguistics - System Demonstrations - Melbourne, Australia Duration: 15 Jul 2018 → 20 Jul 2018

Conference

Conference	56th Annual Meeting of the Association for Computational Linguistics - System Demonstrations
Land	Australia
By	Melbourne
Periode	15/07/2018 → 20/07/2018

ID: 214756841

Department of Computer Science

On the Limitations of Unsupervised Bilingual Dictionary Induction

Conference