Factored Translation with Unsupervised Word Clusters

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Standard

Factored Translation with Unsupervised Word Clusters. / Rishøj, Christian; Søgaard, Anders.

Proceedings of the Sixth Workshop on Statistical Machine Translation. Edinburgh, Scotland : Association for Computational Linguistics, 2011. p. 447-451.

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Harvard

Rishøj, C & Søgaard, A 2011, Factored Translation with Unsupervised Word Clusters. in Proceedings of the Sixth Workshop on Statistical Machine Translation. Association for Computational Linguistics, Edinburgh, Scotland, pp. 447-451.

APA

Rishøj, C., & Søgaard, A. (2011). Factored Translation with Unsupervised Word Clusters. In Proceedings of the Sixth Workshop on Statistical Machine Translation (pp. 447-451). Association for Computational Linguistics.

Vancouver

Rishøj C, Søgaard A. Factored Translation with Unsupervised Word Clusters. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Edinburgh, Scotland: Association for Computational Linguistics. 2011. p. 447-451

Author

Rishøj, Christian ; Søgaard, Anders. / Factored Translation with Unsupervised Word Clusters. Proceedings of the Sixth Workshop on Statistical Machine Translation. Edinburgh, Scotland : Association for Computational Linguistics, 2011. pp. 447-451

Bibtex

@inproceedings{b9b2a9093f5847909b44b58f0e99a55a,

title = "Factored Translation with Unsupervised Word Clusters",

abstract = "Unsupervised word clustering algorithms — which form word clusters based on a measure of distributional similarity — have proven to be useful in providing beneficial features for various natural language processing tasks involving supervised learning. This work explores the utility of such word clusters as factors in statistical machine translation.Although some of the language pairs in this work clearly benefit from the factor augmentation, there is no consistent improvement in translation accuracy across the board. For all language pairs, the word clusters clearly improve translation for some proportion of the sentences in the test set, but has a weak or even detrimental effect on the rest.It is shown that if one could determine whether or not to use a factor when translating a given sentence, rather substantial improvements in precision could be achieved for all of the language pairs evaluated. While such an “oracle” method is not identified, evaluations indicate that unsupervised word cluster are most beneficial in sentences without unknown words.",

author = "Christian Rish{\o}j and Anders S{\o}gaard",

year = "2011",

month = jul,

day = "1",

language = "English",

isbn = "ISBN 978-1-937284-12-1/1-937284-12-3",

pages = "447--451",

booktitle = "Proceedings of the Sixth Workshop on Statistical Machine Translation",

publisher = "Association for Computational Linguistics",

}

RIS

TY - GEN

T1 - Factored Translation with Unsupervised Word Clusters

AU - Rishøj, Christian

AU - Søgaard, Anders

PY - 2011/7/1

Y1 - 2011/7/1

N2 - Unsupervised word clustering algorithms — which form word clusters based on a measure of distributional similarity — have proven to be useful in providing beneficial features for various natural language processing tasks involving supervised learning. This work explores the utility of such word clusters as factors in statistical machine translation.Although some of the language pairs in this work clearly benefit from the factor augmentation, there is no consistent improvement in translation accuracy across the board. For all language pairs, the word clusters clearly improve translation for some proportion of the sentences in the test set, but has a weak or even detrimental effect on the rest.It is shown that if one could determine whether or not to use a factor when translating a given sentence, rather substantial improvements in precision could be achieved for all of the language pairs evaluated. While such an “oracle” method is not identified, evaluations indicate that unsupervised word cluster are most beneficial in sentences without unknown words.

AB - Unsupervised word clustering algorithms — which form word clusters based on a measure of distributional similarity — have proven to be useful in providing beneficial features for various natural language processing tasks involving supervised learning. This work explores the utility of such word clusters as factors in statistical machine translation.Although some of the language pairs in this work clearly benefit from the factor augmentation, there is no consistent improvement in translation accuracy across the board. For all language pairs, the word clusters clearly improve translation for some proportion of the sentences in the test set, but has a weak or even detrimental effect on the rest.It is shown that if one could determine whether or not to use a factor when translating a given sentence, rather substantial improvements in precision could be achieved for all of the language pairs evaluated. While such an “oracle” method is not identified, evaluations indicate that unsupervised word cluster are most beneficial in sentences without unknown words.

M3 - Article in proceedings

SN - ISBN 978-1-937284-12-1/1-937284-12-3

SP - 447

EP - 451

BT - Proceedings of the Sixth Workshop on Statistical Machine Translation

PB - Association for Computational Linguistics

CY - Edinburgh, Scotland

ER -

ID: 34349592

Department of Computer Science