Understanding the effects of removing common blocks on Approximate Matching scores under different scenarios for digital forensic investigations

  • Vitor Hugo Moia UNICAMP
  • Frank Breitinger University of New Haven
  • Marco Aurélio Henriques UNICAMP


Finding similarity in digital forensics investigations can be assisted with the use of Approximate Matching (AM) functions. These algorithms create small and compact representations of objects (similar to hashes) which can be compared to identify similarity. However, often results are biased due to common blocks (data structures found in many different files regardless of content). In this paper, we evaluate the precision and recall metrics for AM functions when removing common blocks. In detail, we analyze how the similarity score changes and impacts different investigation scenarios. Results show that many irrelevant matches can be filtered out and that a new interpretation of the score allows a better similarity detection.


MOIA, Vitor Hugo; BREITINGER, Frank; HENRIQUES, Marco Aurélio. Understanding the effects of removing common blocks on Approximate Matching scores under different scenarios for digital forensic investigations. In: SIMPÓSIO BRASILEIRO DE SEGURANÇA DA INFORMAÇÃO E DE SISTEMAS COMPUTACIONAIS (SBSEG), 19. , 2019, São Paulo. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2019 . p. 113-126. DOI: https://doi.org/10.5753/sbseg.2019.13966.