Analysis of Criminal Patterns in Police Report Narratives using Spectral Clustering with K-means

Resumo


The high volume and heterogeneity of police report narratives in Brazil pose challenges for manual analysis and investigative prioritization. This work proposes an approach for identifying criminal patterns using clustering techniques applied to unstructured textual data. The methodology integrates Spectral Clustering with K-means, leveraging MPNet embeddings for vector representation and UMAP for dimensionality reduction. The resulting six clusters revealed thematic coherence, highlighting patterns such as bank fraud, judicial scams, social media crimes, and account hacking. Comparative experiments with Agglomerative Clustering were conducted using different linkage strategies, with Spectral Clustering achieving the highest silhouette score (0.38), indicating better-defined groups. A manual qualitative analysis of samples from each cluster supported the thematic distinctions. The study demonstrates that automatic clustering can contribute to investigative triage, offering relevant insights for public security applications.

Palavras-chave: Spectral clustering, Police report, Public security, Clustering, UMAP

Referências

Andrade, Rafael Lara Mazoni and de Faria, Bruno Lopes COVID-19 E Clusters de Homicídios em Belo Horizonte: Analise dos Impactos da Pandemia Sobre a Distribuição Espacial de Crimes, Caderno de Geografia, PP.489–489, 2023. DOI: 10.5752/P.2318-2962.2023v33n73p489

Aggarwal, Charu C and Zhai, ChengXiang, A survey of text clustering algorithms, Springer, pp.77-128, 2012. DOI: 10.1007/978-1-4614-3223-4_4

Bellman, Richard E and Dreyfus, Stuart E, Applied dynamic programming, Princeton university press, 1957.

Chainey, Spencer, Crime mapping, Springer New York, Encyclopedia of Criminology and Criminal Justice, pp.699-709, 2013. DOI: 10.1007/978-1-4614-5690-2_317

Jain, Anil K, Data clustering: 50 years beyond K-means, Pattern recognition letters, Elsevier, pp.651-666, 2010. DOI: 10.1016/j.patrec.2009.09.011

Janez-Martino, Francisco and Alaiz-Rodriguez, Rocio and Gonzalez-Castro, Victor and Fidalgo, Eduardo and Alegre, Enrique, Classifying spam emails using agglomerative hierarchical clustering and a topic-based approach, Elsevier, Applied Soft Computing, 2023. DOI: 10.1016/j.asoc.2023.110226

Joshi, Anant and Sabitha, A Sai and Choudhury, Tanupriya, Crime analysis using K-means clustering, 2017 3rd International conference on computational intelligence and networks (CINE), pp.33-39, 2017. DOI: 10.1109/CINE.2017.23

Lal Beejal, Chaman and Ahmed, Awais and Siyal, Reshma and Kumar, Suresh and Aftab, Shagufta and Jamali, Arshad, Text Clustering using K-MEAN, International Journal of Advanced Trends in Computer Science and Engineering, pp. 2892-2897, 2021. DOI: 10.30534/ijatcse/2021/371042021

McInnes, Leland and Healy, John and Melville, James, Umap: Uniform manifold approximation and projection for dimension reduction, Journal of Open Source Software, pp. 861, 2025. DOI: 10.21105/joss.00861

Muennighoff, Niklas and Tazi, Nouamane and Magne, Loic and Reimers, Nils, Mteb: Massive text embedding benchmark, Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, Dubrovnik, Croatia, 2023. DOI: 10.18653/v1/2023.eacl-main.148

Oliveira, Fabiano R and Zanusso, Maria B, Clusterização de ocorrências policiais utilizando k-means e um mapa auto-organizável, CBRN, 2005. DOI: 10.21528/CBRN2005-006

Reimers, N.; Gurevych, I., Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks, Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing, pp. 3982-3992, 2019. DOI: 10.18653/v1/D19-1410

Rousseeuw, Peter J., Silhouettes: a graphical aid to the interpretation and validation of cluster analysis, Elsevier, Journal of computational and applied mathematics, pp. 53-65, 1987. DOI: 10.1016/0377-0427(87)90125-7

SSPMT - Sistema de Registro de Ocorências Policiais do Estado de Mato Grosso. Dados extraídos do módulo SROP, referente ao registro de boletins de ocorrência. Cuiabá: SSP-MT, 2025. Dados obtidos via acesso interno, 2025.

Von Luxburg, Ulrike, A tutorial on spectral clustering, Statistics and computing, Springer pp. 395-416, 2007. DOI: 10.1007/s11222-007-9033-z
Publicado
29/09/2025
BARCELAR, Ricardo Rodrigues; LUIS, Flávia Rosane de Mendonça; MARTINS, Claudia Aparecida; GOMES, Raphael de Souza Rosa; OLIVEIRA, Anderson Castro Soares de; VENTURA, Thiago Meirelles. Analysis of Criminal Patterns in Police Report Narratives using Spectral Clustering with K-means. In: SYMPOSIUM ON KNOWLEDGE DISCOVERY, MINING AND LEARNING (KDMILE), 13. , 2025, Fortaleza/CE. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2025 . p. 89-96. ISSN 2763-8944. DOI: https://doi.org/10.5753/kdmile.2025.247764.