On the Limitations of Unsupervised Bilingual Dictionary Induction

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Standard

On the Limitations of Unsupervised Bilingual Dictionary Induction. / Søgaard, Anders; Ruder, Sebastian ; Vulic, Ivan.

Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics: (Long papers). Association for Computational Linguistics, 2018. p. 778–788.

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Harvard

Søgaard, A, Ruder, S & Vulic, I 2018, On the Limitations of Unsupervised Bilingual Dictionary Induction. in Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics: (Long papers). Association for Computational Linguistics, pp. 778–788, 56th Annual Meeting of the Association for Computational Linguistics - System Demonstrations, Melbourne, Australia, 15/07/2018.

APA

Søgaard, A., Ruder, S., & Vulic, I. (2018). On the Limitations of Unsupervised Bilingual Dictionary Induction. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics: (Long papers) (pp. 778–788). Association for Computational Linguistics.

Vancouver

Søgaard A, Ruder S, Vulic I. On the Limitations of Unsupervised Bilingual Dictionary Induction. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics: (Long papers). Association for Computational Linguistics. 2018. p. 778–788

Author

Søgaard, Anders ; Ruder, Sebastian ; Vulic, Ivan. / On the Limitations of Unsupervised Bilingual Dictionary Induction. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics: (Long papers). Association for Computational Linguistics, 2018. pp. 778–788

Bibtex

@inproceedings{f82497c434ec4f1381db65541dd85098,

title = "On the Limitations of Unsupervised Bilingual Dictionary Induction",

abstract = "Unsupervised machine translation—i.e.,not assuming any cross-lingual supervisionsignal, whether a dictionary, translations,or comparable corpora—seems impossible,but nevertheless, Lample et al.(2018a) recently proposed a fully unsupervisedmachine translation (MT) model.The model relies heavily on an adversarial,unsupervised alignment of word embeddingspaces for bilingual dictionary induction(Conneau et al., 2018), which weexamine here. Our results identify the limitationsof current unsupervised MT: unsupervisedbilingual dictionary inductionperforms much worse on morphologicallyrich languages that are not dependent marking,when monolingual corpora from differentdomains or different embedding algorithmsare used. We show that a simpletrick, exploiting a weak supervision signalfrom identical words, enables more robustinduction, and establish a near-perfectcorrelation between unsupervised bilingualdictionary induction performance and a previouslyunexplored graph similarity metric",

author = "Anders S{\o}gaard and Sebastian Ruder and Ivan Vulic",

year = "2018",

language = "English",

pages = "778–788",

booktitle = "Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics",

publisher = "Association for Computational Linguistics",

note = "null ; Conference date: 15-07-2018 Through 20-07-2018",

}

RIS

TY - GEN

T1 - On the Limitations of Unsupervised Bilingual Dictionary Induction

AU - Søgaard, Anders

AU - Ruder, Sebastian

AU - Vulic, Ivan

PY - 2018

Y1 - 2018

N2 - Unsupervised machine translation—i.e.,not assuming any cross-lingual supervisionsignal, whether a dictionary, translations,or comparable corpora—seems impossible,but nevertheless, Lample et al.(2018a) recently proposed a fully unsupervisedmachine translation (MT) model.The model relies heavily on an adversarial,unsupervised alignment of word embeddingspaces for bilingual dictionary induction(Conneau et al., 2018), which weexamine here. Our results identify the limitationsof current unsupervised MT: unsupervisedbilingual dictionary inductionperforms much worse on morphologicallyrich languages that are not dependent marking,when monolingual corpora from differentdomains or different embedding algorithmsare used. We show that a simpletrick, exploiting a weak supervision signalfrom identical words, enables more robustinduction, and establish a near-perfectcorrelation between unsupervised bilingualdictionary induction performance and a previouslyunexplored graph similarity metric

AB - Unsupervised machine translation—i.e.,not assuming any cross-lingual supervisionsignal, whether a dictionary, translations,or comparable corpora—seems impossible,but nevertheless, Lample et al.(2018a) recently proposed a fully unsupervisedmachine translation (MT) model.The model relies heavily on an adversarial,unsupervised alignment of word embeddingspaces for bilingual dictionary induction(Conneau et al., 2018), which weexamine here. Our results identify the limitationsof current unsupervised MT: unsupervisedbilingual dictionary inductionperforms much worse on morphologicallyrich languages that are not dependent marking,when monolingual corpora from differentdomains or different embedding algorithmsare used. We show that a simpletrick, exploiting a weak supervision signalfrom identical words, enables more robustinduction, and establish a near-perfectcorrelation between unsupervised bilingualdictionary induction performance and a previouslyunexplored graph similarity metric

M3 - Article in proceedings

SP - 778

EP - 788

BT - Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics

PB - Association for Computational Linguistics

Y2 - 15 July 2018 through 20 July 2018

ER -

ID: 214756841

Department of Computer Science