Named-Entity Tagging a Very Large Unbalanced Corpus. Training and Evaluating NE classifiers.

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Standard

Named-Entity Tagging a Very Large Unbalanced Corpus. Training and Evaluating NE classifiers. / Bingel, Joachim; Haider, Thomas.

Proceedings of the Ninth International Conference on Language Resources and Evaluation: LREC '14. 2014. p. 2578-2583 967.

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Harvard

Bingel, J & Haider, T 2014, Named-Entity Tagging a Very Large Unbalanced Corpus. Training and Evaluating NE classifiers. in Proceedings of the Ninth International Conference on Language Resources and Evaluation: LREC '14., 967, pp. 2578-2583. <http://www.lrec-conf.org/proceedings/lrec2014/pdf/967_Paper.pdf>

APA

Bingel, J., & Haider, T. (2014). Named-Entity Tagging a Very Large Unbalanced Corpus. Training and Evaluating NE classifiers. In Proceedings of the Ninth International Conference on Language Resources and Evaluation: LREC '14 (pp. 2578-2583). [967] http://www.lrec-conf.org/proceedings/lrec2014/pdf/967_Paper.pdf

Vancouver

Bingel J, Haider T. Named-Entity Tagging a Very Large Unbalanced Corpus. Training and Evaluating NE classifiers. In Proceedings of the Ninth International Conference on Language Resources and Evaluation: LREC '14. 2014. p. 2578-2583. 967

Author

Bingel, Joachim ; Haider, Thomas. / Named-Entity Tagging a Very Large Unbalanced Corpus. Training and Evaluating NE classifiers. Proceedings of the Ninth International Conference on Language Resources and Evaluation: LREC '14. 2014. pp. 2578-2583

Bibtex

@inproceedings{e2c03ca9ef654c6896d638bf34113caa,

title = "Named-Entity Tagging a Very Large Unbalanced Corpus. Training and Evaluating NE classifiers.",

abstract = "We describe a systematic and application-oriented approach to training and evaluating named entity recognition and classification (NERC) systems, the purpose of which is to identify an optimal system and to train an optimal model for named entity tagging DeReKo, a very large general-purpose corpus of contemporary German (Kupietz et al., 2010). DeReKo 's strong dispersion wrt. genre, register and time forces us to base our decision for a specific NERC system on an evaluation performed on a representative sample of DeReKo instead of performance figures that have been reported for the individual NERC systems when evaluated on more uniform and less diverse data. We create and manually annotate such a representative sample as evaluation data for three different NERC systems, for each of which various models are learnt on multiple training data. The proposed sampling method can be viewed as a generally applicable method for sampling evaluation data from an unbalanced target corpus for any sort of natural language processing.",

author = "Joachim Bingel and Thomas Haider",

year = "2014",

language = "English",

isbn = "978-2-9517408-8-4",

pages = "2578--2583",

booktitle = "Proceedings of the Ninth International Conference on Language Resources and Evaluation",

}

RIS

TY - GEN

T1 - Named-Entity Tagging a Very Large Unbalanced Corpus. Training and Evaluating NE classifiers.

AU - Bingel, Joachim

AU - Haider, Thomas

PY - 2014

Y1 - 2014

N2 - We describe a systematic and application-oriented approach to training and evaluating named entity recognition and classification (NERC) systems, the purpose of which is to identify an optimal system and to train an optimal model for named entity tagging DeReKo, a very large general-purpose corpus of contemporary German (Kupietz et al., 2010). DeReKo 's strong dispersion wrt. genre, register and time forces us to base our decision for a specific NERC system on an evaluation performed on a representative sample of DeReKo instead of performance figures that have been reported for the individual NERC systems when evaluated on more uniform and less diverse data. We create and manually annotate such a representative sample as evaluation data for three different NERC systems, for each of which various models are learnt on multiple training data. The proposed sampling method can be viewed as a generally applicable method for sampling evaluation data from an unbalanced target corpus for any sort of natural language processing.

AB - We describe a systematic and application-oriented approach to training and evaluating named entity recognition and classification (NERC) systems, the purpose of which is to identify an optimal system and to train an optimal model for named entity tagging DeReKo, a very large general-purpose corpus of contemporary German (Kupietz et al., 2010). DeReKo 's strong dispersion wrt. genre, register and time forces us to base our decision for a specific NERC system on an evaluation performed on a representative sample of DeReKo instead of performance figures that have been reported for the individual NERC systems when evaluated on more uniform and less diverse data. We create and manually annotate such a representative sample as evaluation data for three different NERC systems, for each of which various models are learnt on multiple training data. The proposed sampling method can be viewed as a generally applicable method for sampling evaluation data from an unbalanced target corpus for any sort of natural language processing.

M3 - Article in proceedings

SN - 978-2-9517408-8-4

SP - 2578

EP - 2583

BT - Proceedings of the Ninth International Conference on Language Resources and Evaluation

ER -

ID: 154008746

Department of Computer Science