Crowdsourcing and annotating NER for Twitter #drift

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Standard

Crowdsourcing and annotating NER for Twitter #drift. / Fromreide, Hege; Hovy, Dirk; Søgaard, Anders.

Proceedings of the 9th International Conference on Language Resources and Evaluation : LREC2014. European Language Resources Association, 2014.

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Harvard

Fromreide, H, Hovy, D & Søgaard, A 2014, Crowdsourcing and annotating NER for Twitter #drift. in Proceedings of the 9th International Conference on Language Resources and Evaluation : LREC2014. European Language Resources Association.

APA

Fromreide, H., Hovy, D., & Søgaard, A. (2014). Crowdsourcing and annotating NER for Twitter #drift. In Proceedings of the 9th International Conference on Language Resources and Evaluation : LREC2014 European Language Resources Association.

Vancouver

Fromreide H, Hovy D, Søgaard A. Crowdsourcing and annotating NER for Twitter #drift. In Proceedings of the 9th International Conference on Language Resources and Evaluation : LREC2014. European Language Resources Association. 2014

Author

Fromreide, Hege ; Hovy, Dirk ; Søgaard, Anders. / Crowdsourcing and annotating NER for Twitter #drift. Proceedings of the 9th International Conference on Language Resources and Evaluation : LREC2014. European Language Resources Association, 2014.

Bibtex

@inproceedings{a456905f3d6a4e2197d4eca23f890a7a,

title = "Crowdsourcing and annotating NER for Twitter #drift",

abstract = "We present two new NER datasets for Twitter; a manually annotated set of 1,467 tweets (kappa=0.942) and a set of 2,975 expert-corrected, crowdsourced NER annotated tweets from the dataset described in Finin et al. (2010). In our experiments with these datasets, we observe two important points: (a) language drift on Twitter is significant, and while off-the-shelf systems have been reported to perform well on in-sample data, they often perform poorly on new samples of tweets, (b) state-of-the-art performance across various datasets can beobtained from crowdsourced annotations, making it more feasible to “catch up” with language drift.",

author = "Hege Fromreide and Dirk Hovy and Anders S{\o}gaard",

year = "2014",

language = "English",

booktitle = "Proceedings of the 9th International Conference on Language Resources and Evaluation",

publisher = "European Language Resources Association",

}

RIS

TY - GEN

T1 - Crowdsourcing and annotating NER for Twitter #drift

AU - Fromreide, Hege

AU - Hovy, Dirk

AU - Søgaard, Anders

PY - 2014

Y1 - 2014

N2 - We present two new NER datasets for Twitter; a manually annotated set of 1,467 tweets (kappa=0.942) and a set of 2,975 expert-corrected, crowdsourced NER annotated tweets from the dataset described in Finin et al. (2010). In our experiments with these datasets, we observe two important points: (a) language drift on Twitter is significant, and while off-the-shelf systems have been reported to perform well on in-sample data, they often perform poorly on new samples of tweets, (b) state-of-the-art performance across various datasets can beobtained from crowdsourced annotations, making it more feasible to “catch up” with language drift.

AB - We present two new NER datasets for Twitter; a manually annotated set of 1,467 tweets (kappa=0.942) and a set of 2,975 expert-corrected, crowdsourced NER annotated tweets from the dataset described in Finin et al. (2010). In our experiments with these datasets, we observe two important points: (a) language drift on Twitter is significant, and while off-the-shelf systems have been reported to perform well on in-sample data, they often perform poorly on new samples of tweets, (b) state-of-the-art performance across various datasets can beobtained from crowdsourced annotations, making it more feasible to “catch up” with language drift.

M3 - Article in proceedings

BT - Proceedings of the 9th International Conference on Language Resources and Evaluation

PB - European Language Resources Association

ER -

ID: 105105333

Department of Computer Science