Evaluating hypotheses in geolocation on a very large sample of Twitter

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Bahar Salehi
Søgaard, Anders

Recent work in geolocation has madeseveral hypotheses about what linguisticmarkers are relevant to detect where peoplewrite from. In this paper, we examinesix hypotheses against a corpus consistingof all geo-tagged tweets from theUS, or whose geo-tags could be inferred,in a 19% sample of Twitter history. Ourexperiments lend support to all six hypotheses,including that spelling variantsand hashtags are strong predictors of location.We also study what kinds of commonnouns are predictive of location aftercontrolling for named entities such as dolphinsor sharks.

Original language	English
Title of host publication	Proceedings of the 3rd Workshop on Noisy User-generated Text
Number of pages	6
Publisher	Association for Computational Linguistics
Publication date	2017
Pages	62-67
ISBN (Print)	978-1-945626-94-4
Publication status	Published - 2017
Event	3rd Workshop on Noisy User-generated Text - Copenhagen, Denmark Duration: 7 Sep 2017 → 7 Sep 2017

Conference

Conference	3rd Workshop on Noisy User-generated Text
Land	Denmark
By	Copenhagen
Periode	07/09/2017 → 07/09/2017

Department of Computer Science

Evaluating hypotheses in geolocation on a very large sample of Twitter

Conference

Links