Expanding the vitality of ALiB terms through Geolocated Information Extraction on social media
Abstract
The ALiB Project (Atlas Linguístico do Brasil) describes contemporary geolinguistics, prioritizing diatopic variation in the analysis of Brazilian Portuguese. Terms were collected between 1996 and 2013. With the advent of social networks, the need arose to analyze the vitality of these terms. Among the many challenges inherent to social networks, there is the non-mandatory geolocation and the wide use of Internet slang. This work presents a new approach to extract geolocation information directly from tweets, with the aim of expanding location coverage. BERTimbau was trained to perform Named Entity Recognition tasks and used to extract user geolocation content. This approach was compared with the vitality of manually analyzed ALiB terms. Results indicate that location extraction is a possibility to expand and improve the analysis of the vitality of ALiB terms.
References
Cardoso, S., Mota, J., Aguilera, V., de Aragão, M. d. S., Isquerdo, A., Razky, A., Margotti, F., and Altenhofen, C. (2014a). Atlas linguístico do Brasil, volume 1. Londrina: Eduel.
Cardoso, S., Mota, J., Aguilera, V., de Aragão, M. d. S., Isquerdo, A., Razky, A., Margotti, F., and Altenhofen, C. (2014b). Atlas linguístico do Brasil, volume 2. Londrina: Eduel.
Gupta, S. and Nishu, K. (2020). Mapping local news coverage: Precise location extraction in textual news content using fine-tuned BERT based language model. In Proceedings of the Fourth Workshop on Natural Language Processing and Computational Social Science, pages 155–162, Online. Association for Computational Linguistics.
Nunes, A. P. M., de Jesus, L. E. N., Claro, D. B., Mota, J., Ribeiro, S., Paim, M., and Oliveira, J. (2020). Vitality analysis of the linguistic atlas of brazil on twitter. In Quaresma, P., Vieira, R., Aluísio, S., Moniz, H., Batista, F., and Gonçalves, T., editors, Computational Processing of the Portuguese Language, pages 184–194, Cham. Springer International Publishing.
Souza, F., Nogueira, R., and Lotufo, R. (2020). Bertimbau: Pretrained bert models for brazilian portuguese. In Cerri, R. and Prati, R. C., editors, Intelligent Systems, pages 403–417, Cham. Springer International Publishing.
