Automated Statistics Extraction of Public Security Events Reported Through Microtexts on Social Networks


Lately, Rio de Janeiro State has been characterized by the occurrence of successive public security events (shootings, assaults, robberies, etc.), causing great insecurity, affecting the daily lives of the population, and worrying public security agencies in the fight against crime. Although the indicators of public security events recently decreased, there is still a feeling of insecurity, while the population uses social networks to notify illegal acts that occurred in their vicinity. Although this collaboration is limited to the crimes that occurred, many published messages are difficult to interpret. Knowledge Discovery is a process of extracting data in an implicit, previously unknown, and useful way that can be applied for different purposes. In this context, Natural Language Processing is a powerful tool that allows the extraction of information from these unstructured data. This work proposes a methodology for automatic knowledge extraction, in the form of statistics related to public security events posted on social networks, particularly the ones occurred in Rio de Janeiro. The main contribution of this work is the proposal of a methodology for the construction of an Information System that allows the collection of statistics of notified public security events. In addition to this methodology, which can also be used in the construction of other Information Systems, this work contributes with a public security event recognition model that has a performance of 95%, and an available dataset that can be used to support other researches, such as: the identification of new behavior patterns, the discovery of hidden knowledge, among other fronts.
Palavras-chave: Machine Learning, Natural Language Processing, Artificial Intelligence, Public Security, Text Classification, Data Mining, Text Mining, Twitter


Johannes Bendler, Tobias Brandt, Sebastian Wagner, and Dirk Neumann. 2014. Investigating crime-to-twitter relationships in urban environments - facilitating a virtual neighborhood watch. In Proceedings of the European Conference on Information Systems (22 ed.). ECIS, Israel, 1–15. 

Eduardo Bezerra, Emanuel Passos, and Ronaldo Goldschmidt. 2015. Data Mining: Conceitos, Técnicas, Algoritmos, Orientações E Aplicações. Campus, Rio de Janeiro, Brazil. 296 pages. 

C. Boscarioli, R. M. Araújo, and R. S. P. Maciel. 2017. I GranDSI-BR – Grand Research Challenges in Information Systems in Brazil 2016-2026.Special Committee on Information Systems (CE-SI)., Brazilian Computer Society (SBC). 

Presidência da República. 2020. DECRETO Nº 9.288, DE 16 DE FEVEREIRO DE 2018. [link]. 16 nov. de 2020.

Flavio Ferreira da Silva. 2020. Metodologia para a Extração Automatizada de Estatísticas Relacionadas a Eventos de Segurança de Microtextos das Redes Sociais. Master's thesis. Instituto Militar de Engenharia. 

Tirthankar Dasgupta, Abrir Naskar, Rupsa Saha, and Lipika Dey. 2017. CrimeProfiler: Crime Information Extraction and Visualization from News Media. In Proceedings of the International Conference on Web Intelligence (1 ed.)(WI-17, 9). ACM, New York, 549–549. 

Mike Dillinger. 1991. Modeling message diffusion in epidemical DTN. Ad Hoc Networks 16, 2 (1991), 197–209. 

Usama Fayyad, Gregory Piatetsky Shapiro, and Padhraic Smyth. 1996. From Data Mining to Knowledge Discovery in Databases. AI Magazine 17, 3 (1996), 37–54. 

Ronen Feldman and James Sanger. 2007. The text mining handbook: advanced approaches in analyzing unstructured data (1 ed.). Cambridge University Press, New York. 476 pages. 

Jiawei Han, Micheline Kamber, and Jian Pie. 2012. Data Mining: concepts and techiques(3 ed.). Elsevier, United States of America. 476 pages. 

Hossein Hassanix, Xu Huang, Emmanuel S Silva, and Mansi Ghodsi. 2016. A Review of Data Mining Applications in Crime. Stat. Anal. Data Min. 9(2016), 139–154. 12 nov de 2018.

Adrian Holovaty and Jacob Kaplan-Moss. 2008. The Definitive Guide to Django: Web Development Done Right (2 ed.). Apress, New York. 433 pages. 

Rizwan Iqbal, Masrah Azrifah Azmi Murad, Aida Mustapha, Payam Hassany Shariat Panahy, and Nasim Khanahmadliravi. 2013. An Experimental Study of Classification Algorithms for Crime Prediction. Indian Journal of Science and Technology 6 (2013), 4219–4225. 3 mar de 2013. 

Alicia Iriberri and Gondy Leroy. 2007. Natural Language Processing and e-Government: Extracting Reusable Crime Report Information. In 2007 IEEE International Conference on Information Reuse and Integration. IEEE, Las Vegas, Nevada, USA, 221–226.

Instituto Segurança Publica ISP. 2020. Instituto de Segurança Publica divulga dados do primeiro semestre. 16 jul. de 2020.

Aydano Machado. 2010. Mineração de Texto em Redes Sociais Aplicada à Educação a Distância. Mineração de dados 530(2010). 07 out. de 2018.

Python Data Leaflet Maps. 2020. folium builds on the data wrangling strengths of the Python ecosystem and the mapping strengths of the Leaflet.js library. Manipulate your data in Python, then visualize it in a Leaflet map via folium. 13 jun. de 2020.

Honnibal Matthew and Montani Ines. 2020. spaCy 2 Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing. [Online; accessed 25-Abril-2020].

O Globo. 2020. Mesmo em meio à pandemia, mortes violentas crescem 7% no primeiro semestre. Levantamento do Fórum Brasileiro de Segurança Pública aponta que redução de circulação de pessoas nas ruas não impediu aumento nos números. [link]. 19 out. de 2020.

Vládia Pinheiro, Vasco Furtado, Tarcisio Pequeno, and Douglas Nogueira. 2010. Natural Language Processing based on Semantic inferentialism for extracting crime information from text. In International Conference on Intelligence and Security Informatics (1 ed.). IEEE, Canada, 19–24. 
FERREIRA, Flávio; DUARTE, Julio; UGULINO, Wallace. Automated Statistics Extraction of Public Security Events Reported Through Microtexts on Social Networks. In: SIMPÓSIO BRASILEIRO DE SISTEMAS DE INFORMAÇÃO (SBSI), 18. , 2022, Curitiba. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2022 .