Cross-lingual and cross-domain discourse segmentation of entire documents

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Standard

Cross-lingual and cross-domain discourse segmentation of entire documents. / Braud, Chloé; Lacroix, Ophélie; Søgaard, Anders.

Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics: Short papers. Vol. 2 Association for Computational Linguistics, 2017. p. 237-243.

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Harvard

Braud, C, Lacroix, O & Søgaard, A 2017, Cross-lingual and cross-domain discourse segmentation of entire documents. in Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics: Short papers. vol. 2, Association for Computational Linguistics, pp. 237-243, 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017, Vancouver, Canada, 30/07/2017. https://doi.org/10.18653/v1/P17-2037

APA

Braud, C., Lacroix, O., & Søgaard, A. (2017). Cross-lingual and cross-domain discourse segmentation of entire documents. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics: Short papers (Vol. 2, pp. 237-243). Association for Computational Linguistics. https://doi.org/10.18653/v1/P17-2037

Vancouver

Braud C, Lacroix O, Søgaard A. Cross-lingual and cross-domain discourse segmentation of entire documents. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics: Short papers. Vol. 2. Association for Computational Linguistics. 2017. p. 237-243 https://doi.org/10.18653/v1/P17-2037

Author

Braud, Chloé ; Lacroix, Ophélie ; Søgaard, Anders. / Cross-lingual and cross-domain discourse segmentation of entire documents. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics: Short papers. Vol. 2 Association for Computational Linguistics, 2017. pp. 237-243

Bibtex

@inproceedings{41d95411625e48929fc3c7c030851f71,

title = "Cross-lingual and cross-domain discourse segmentation of entire documents",

abstract = "Discourse segmentation is a crucial step in building end-to-end discourse parsers. However, discourse segmenters only exist for a few languages and domains. Typically they only detect intra-sentential segment boundaries, assuming gold standard sentence and token segmentation, and relying on high-quality syntactic parses and rich heuristics that are not generally available across languages and domains. In this paper, we propose statistical discourse segmenters for five languages and three domains that do not rely on gold pre-annotations. We also consider the problem of learning discourse segmenters when no labeled data is available for a language. Our fully supervised system obtains 89.5% F1 for English newswire, with slight drops in performance on other domains, and we report supervised and unsupervised (cross-lingual) results for five languages in total.",

author = "Chlo{\'e} Braud and Oph{\'e}lie Lacroix and Anders S{\o}gaard",

year = "2017",

month = jan,

day = "1",

doi = "10.18653/v1/P17-2037",

language = "English",

volume = "2",

pages = "237--243",

booktitle = "Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics",

publisher = "Association for Computational Linguistics",

note = "55th Annual Meeting of the Association for Computational Linguistics, ACL 2017 ; Conference date: 30-07-2017 Through 04-08-2017",

}

RIS

TY - GEN

T1 - Cross-lingual and cross-domain discourse segmentation of entire documents

AU - Braud, Chloé

AU - Lacroix, Ophélie

AU - Søgaard, Anders

PY - 2017/1/1

Y1 - 2017/1/1

N2 - Discourse segmentation is a crucial step in building end-to-end discourse parsers. However, discourse segmenters only exist for a few languages and domains. Typically they only detect intra-sentential segment boundaries, assuming gold standard sentence and token segmentation, and relying on high-quality syntactic parses and rich heuristics that are not generally available across languages and domains. In this paper, we propose statistical discourse segmenters for five languages and three domains that do not rely on gold pre-annotations. We also consider the problem of learning discourse segmenters when no labeled data is available for a language. Our fully supervised system obtains 89.5% F1 for English newswire, with slight drops in performance on other domains, and we report supervised and unsupervised (cross-lingual) results for five languages in total.

AB - Discourse segmentation is a crucial step in building end-to-end discourse parsers. However, discourse segmenters only exist for a few languages and domains. Typically they only detect intra-sentential segment boundaries, assuming gold standard sentence and token segmentation, and relying on high-quality syntactic parses and rich heuristics that are not generally available across languages and domains. In this paper, we propose statistical discourse segmenters for five languages and three domains that do not rely on gold pre-annotations. We also consider the problem of learning discourse segmenters when no labeled data is available for a language. Our fully supervised system obtains 89.5% F1 for English newswire, with slight drops in performance on other domains, and we report supervised and unsupervised (cross-lingual) results for five languages in total.

UR - http://www.scopus.com/inward/record.url?scp=85040622591&partnerID=8YFLogxK

U2 - 10.18653/v1/P17-2037

DO - 10.18653/v1/P17-2037

M3 - Article in proceedings

AN - SCOPUS:85040622591

VL - 2

SP - 237

EP - 243

BT - Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics

PB - Association for Computational Linguistics

T2 - 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017

Y2 - 30 July 2017 through 4 August 2017

ER -

ID: 195013952

Department of Computer Science