DaNE: A named entity resource for danish

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Standard

DaNE : A named entity resource for danish. / Hvingelby, Rasmus; Pauli, Amalie Brogaard; Barrett, Maria; Rosted, Christina; Lidegaard, Lasse Malm; Søgaard, Anders.

LREC 2020 - 12th International Conference on Language Resources and Evaluation, Conference Proceedings. ed. / Nicoletta Calzolari; Frederic Bechet; Philippe Blache; Khalid Choukri; Christopher Cieri; Thierry Declerck; Sara Goggi; Hitoshi Isahara; Bente Maegaard; Joseph Mariani; Helene Mazo; Asuncion Moreno; Jan Odijk; Stelios Piperidis. European Language Resources Association (ELRA), 2020. p. 4597-4604.

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Harvard

Hvingelby, R, Pauli, AB, Barrett, M, Rosted, C, Lidegaard, LM & Søgaard, A 2020, DaNE: A named entity resource for danish. in N Calzolari, F Bechet, P Blache, K Choukri, C Cieri, T Declerck, S Goggi, H Isahara, B Maegaard, J Mariani, H Mazo, A Moreno, J Odijk & S Piperidis (eds), LREC 2020 - 12th International Conference on Language Resources and Evaluation, Conference Proceedings. European Language Resources Association (ELRA), pp. 4597-4604, 12th International Conference on Language Resources and Evaluation, LREC 2020, Marseille, France, 11/05/2020.

APA

Hvingelby, R., Pauli, A. B., Barrett, M., Rosted, C., Lidegaard, L. M., & Søgaard, A. (2020). DaNE: A named entity resource for danish. In N. Calzolari, F. Bechet, P. Blache, K. Choukri, C. Cieri, T. Declerck, S. Goggi, H. Isahara, B. Maegaard, J. Mariani, H. Mazo, A. Moreno, J. Odijk, & S. Piperidis (Eds.), LREC 2020 - 12th International Conference on Language Resources and Evaluation, Conference Proceedings (pp. 4597-4604). European Language Resources Association (ELRA).

Vancouver

Hvingelby R, Pauli AB, Barrett M, Rosted C, Lidegaard LM, Søgaard A. DaNE: A named entity resource for danish. In Calzolari N, Bechet F, Blache P, Choukri K, Cieri C, Declerck T, Goggi S, Isahara H, Maegaard B, Mariani J, Mazo H, Moreno A, Odijk J, Piperidis S, editors, LREC 2020 - 12th International Conference on Language Resources and Evaluation, Conference Proceedings. European Language Resources Association (ELRA). 2020. p. 4597-4604

Author

Hvingelby, Rasmus ; Pauli, Amalie Brogaard ; Barrett, Maria ; Rosted, Christina ; Lidegaard, Lasse Malm ; Søgaard, Anders. / DaNE : A named entity resource for danish. LREC 2020 - 12th International Conference on Language Resources and Evaluation, Conference Proceedings. editor / Nicoletta Calzolari ; Frederic Bechet ; Philippe Blache ; Khalid Choukri ; Christopher Cieri ; Thierry Declerck ; Sara Goggi ; Hitoshi Isahara ; Bente Maegaard ; Joseph Mariani ; Helene Mazo ; Asuncion Moreno ; Jan Odijk ; Stelios Piperidis. European Language Resources Association (ELRA), 2020. pp. 4597-4604

Bibtex

@inproceedings{5b3947353c4e40648ea191aa67dff396,
title = "DaNE: A named entity resource for danish",
abstract = "We present a named entity annotation for the Danish Universal Dependencies treebank using the CoNLL-2003 annotation scheme: DaNE. It is the largest publicly available, Danish named entity gold annotation. We evaluate the quality of our annotations intrinsically by double annotating the entire treebank and extrinsically by comparing our annotations to a recently released named entity annotation of the validation and test sections of the Danish Universal Dependencies treebank. We benchmark the new resource by training and evaluating competitive architectures for supervised named entity recognition (NER), including FLAIR, monolingual (Danish) BERT and multilingual BERT. We explore cross-lingual transfer in multilingual BERT from five related languages in zero-shot and direct transfer setups, and we show that even with our modestly-sized training set, we improve Danish NER over a recent cross-lingual approach, as well as over zero-shot transfer from five related languages. Using multilingual BERT, we achieve higher performance by fine-tuning on both DaNE and a larger Bokm{\aa}l (Norwegian) training set compared to only using DaNE. However, the highest performance is achieved by using a Danish BERT fine-tuned on DaNE. Our dataset enables improvements and applicability for Danish NER beyond cross-lingual methods. We employ a thorough error analysis of the predictions of the best models for seen and unseen entities, as well as their robustness on un-capitalized text. The annotated dataset and all the trained models are made publicly available.",
keywords = "Cross-lingual transfer, Danish, Named entity recognition, Resource",
author = "Rasmus Hvingelby and Pauli, {Amalie Brogaard} and Maria Barrett and Christina Rosted and Lidegaard, {Lasse Malm} and Anders S{\o}gaard",
year = "2020",
language = "English",
pages = "4597--4604",
editor = "Nicoletta Calzolari and Frederic Bechet and Philippe Blache and Khalid Choukri and Christopher Cieri and Thierry Declerck and Sara Goggi and Hitoshi Isahara and Bente Maegaard and Joseph Mariani and Helene Mazo and Asuncion Moreno and Jan Odijk and Stelios Piperidis",
booktitle = "LREC 2020 - 12th International Conference on Language Resources and Evaluation, Conference Proceedings",
publisher = "European Language Resources Association (ELRA)",
note = "12th International Conference on Language Resources and Evaluation, LREC 2020 ; Conference date: 11-05-2020 Through 16-05-2020",

}

RIS

TY - GEN

T1 - DaNE

T2 - 12th International Conference on Language Resources and Evaluation, LREC 2020

AU - Hvingelby, Rasmus

AU - Pauli, Amalie Brogaard

AU - Barrett, Maria

AU - Rosted, Christina

AU - Lidegaard, Lasse Malm

AU - Søgaard, Anders

PY - 2020

Y1 - 2020

N2 - We present a named entity annotation for the Danish Universal Dependencies treebank using the CoNLL-2003 annotation scheme: DaNE. It is the largest publicly available, Danish named entity gold annotation. We evaluate the quality of our annotations intrinsically by double annotating the entire treebank and extrinsically by comparing our annotations to a recently released named entity annotation of the validation and test sections of the Danish Universal Dependencies treebank. We benchmark the new resource by training and evaluating competitive architectures for supervised named entity recognition (NER), including FLAIR, monolingual (Danish) BERT and multilingual BERT. We explore cross-lingual transfer in multilingual BERT from five related languages in zero-shot and direct transfer setups, and we show that even with our modestly-sized training set, we improve Danish NER over a recent cross-lingual approach, as well as over zero-shot transfer from five related languages. Using multilingual BERT, we achieve higher performance by fine-tuning on both DaNE and a larger Bokmål (Norwegian) training set compared to only using DaNE. However, the highest performance is achieved by using a Danish BERT fine-tuned on DaNE. Our dataset enables improvements and applicability for Danish NER beyond cross-lingual methods. We employ a thorough error analysis of the predictions of the best models for seen and unseen entities, as well as their robustness on un-capitalized text. The annotated dataset and all the trained models are made publicly available.

AB - We present a named entity annotation for the Danish Universal Dependencies treebank using the CoNLL-2003 annotation scheme: DaNE. It is the largest publicly available, Danish named entity gold annotation. We evaluate the quality of our annotations intrinsically by double annotating the entire treebank and extrinsically by comparing our annotations to a recently released named entity annotation of the validation and test sections of the Danish Universal Dependencies treebank. We benchmark the new resource by training and evaluating competitive architectures for supervised named entity recognition (NER), including FLAIR, monolingual (Danish) BERT and multilingual BERT. We explore cross-lingual transfer in multilingual BERT from five related languages in zero-shot and direct transfer setups, and we show that even with our modestly-sized training set, we improve Danish NER over a recent cross-lingual approach, as well as over zero-shot transfer from five related languages. Using multilingual BERT, we achieve higher performance by fine-tuning on both DaNE and a larger Bokmål (Norwegian) training set compared to only using DaNE. However, the highest performance is achieved by using a Danish BERT fine-tuned on DaNE. Our dataset enables improvements and applicability for Danish NER beyond cross-lingual methods. We employ a thorough error analysis of the predictions of the best models for seen and unseen entities, as well as their robustness on un-capitalized text. The annotated dataset and all the trained models are made publicly available.

KW - Cross-lingual transfer

KW - Danish

KW - Named entity recognition

KW - Resource

UR - http://www.scopus.com/inward/record.url?scp=85092317937&partnerID=8YFLogxK

M3 - Article in proceedings

AN - SCOPUS:85092317937

SP - 4597

EP - 4604

BT - LREC 2020 - 12th International Conference on Language Resources and Evaluation, Conference Proceedings

A2 - Calzolari, Nicoletta

A2 - Bechet, Frederic

A2 - Blache, Philippe

A2 - Choukri, Khalid

A2 - Cieri, Christopher

A2 - Declerck, Thierry

A2 - Goggi, Sara

A2 - Isahara, Hitoshi

A2 - Maegaard, Bente

A2 - Mariani, Joseph

A2 - Mazo, Helene

A2 - Moreno, Asuncion

A2 - Odijk, Jan

A2 - Piperidis, Stelios

PB - European Language Resources Association (ELRA)

Y2 - 11 May 2020 through 16 May 2020

ER -

ID: 258327332