Article Preview
Top1. Introduction
The current society has knowledge as one of its most important values and indeed this is often called Knowledge Society. The application of advanced software technologies in the context of the Knowledge Society is a bold contribution of the software engineering scientific community and a joint vision for applied humanistic computing.
In the last years, Twitter has become a popular micro blogging platform, with over 400 million tweets posted daily according to tweespeed.com1, making it a tool that can help significantly in the knowledge society due to its agile reading (no more than 140 characters), dynamic (information available in real time), accessible (for almost any device connected to Internet), functional (allows you to embed pictures, videos and links to other content), organized (with hashtags that represent subjects and ordered by date of publication), interactive (can view posts from other people, follow them, respond, share your posts by retweet or save them to mark them as favorites), non-invasive (no chat Instant Messaging) and with the possibility of anonymity (using nicknames or impersonal nicknames) (Pérez et al., 2012; Kassens, 2012; Welch & Bonnan, 2012).
This has led many research efforts on various topics to exploit this information such as event detection (Agarwal et al., 2012; Atefeh & Khreich, 2015), health monitoring (Nielsen et al., 2015), emergency detection (Seol et al., 2013), and among others. Many of these applications can benefit from information about the location, where the events occur, but unfortunately, this information is very poor, because only 1% of tweets contain geo-tags (Takhteyev et al., 2012).
The extraction of information from tweets presents some challenges, i.e., information is completely unstructured and its limited to 140 characters, tweets can contain grammatical errors, and abbreviations and each user has its own writing style, so information can be incomplete, false or not credible (Ritter, 2012).
However, Gutierrez et al. (2015) and Oussalah et al. (2013) established that the use of information content in tweets, provides geographic information, because the texts commonly refers to further locations. The tweet analysis allows us to know and evaluate social and natural events. Nevertheless, geocoding methods are used to translate geographic locations represented in the text (e.g. detection and location of events in a geographic area). They have focused on point feature type (Quin et al., 2013; Hart & Zandbergen, 2013; Krumm & Horvitz, 2015) and there are not approaches oriented towards polygon representation.
Thus, in this paper, a methodology focused on geocoding events appearing in tweets about traffic events of the Mexico City is proposed. The work consists of identifying events, geographic features and their spatial relationships, supported by conceptual representations, Natural Language Processing techniques and classification algorithms. Unlike other works, the detection of trending topics is not supported in this paper.
This paper is organized as follows: Section 2 describes the related work in the fields of geocoding based on short texts. The proposed methodology is presented in Section 3. In Section 4 the evaluation and comparison of the proposed method with other approaches are presented. Section 5 outlines the conclusions and future work.