Assessing the Impact of Mining Techniques on Criminal Data Quality

  • Lucas Zanco Ladeira
  • Matheus Ferraroni Sanches
  • Cassio Viana
  • Leonardo Castro Botega


Crime data refers to crime events reported in natural language to the emergency response center of the police forces. Furthermore, it comprehends the textual description of the event and may be utilized to understand what characterizes crime situations, considering weapon of crime, objects stolen, criminal activity and more. In this work it's applied data pre-processing, transformation and mining techniques to discover hidden crime details in the dataset relating similar records. Consequently, the crime records are classified into 3 groups considering the sophistication of the criminal action, being: A (low sophistication), B (medium sophistication), or C (high sophistication). To find out the impact of absence and usage of pre-processing techniques and which data mining technique achieves the best results, two experiments were performed and had their mean accuracy compared. The usage of pre-processing and Random Forest algorithm achieved better results and also the capability of understanding a high dimensional and dynamic data. Consequently, the joint of these techniques can provide better information to police forces.
LADEIRA, Lucas Zanco; SANCHES, Matheus Ferraroni; VIANA, Cassio; BOTEGA, Leonardo Castro. Assessing the Impact of Mining Techniques on Criminal Data Quality. In: WORKSHOP DE COMPUTAÇÃO URBANA (COURB), 2. , 2018, Campos do Jordão. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2018 . ISSN 2595-2706.