An operational costs analysis of similarity digest search strategies using approximate matching tools

  • Vitor Hugo Galhardo Moia UNICAMP
  • Marco Aurélio Amaral Henriques UNICAMP


Approximate matching functions are suitable tools for forensic investigators to detect similarity between two digital objects. With the rapid increase in data storage capacity, these functions appear as candidates to perform Known File Filtering (KFF) efficiently, separating relevant from irrelevant information. However, comparing sets of approximate matching digests can be overwhelming, since the usual approach is by brute force (all-against-all). In this paper, we evaluate some strategies to better perform KFF using approximate matching tools. A detailed analysis of their operational costs when performing over large data sets is done. Our results show significant improvements over brute force and how the strategies scale for different database sizes.


MOIA, Vitor Hugo Galhardo; HENRIQUES, Marco Aurélio Amaral. An operational costs analysis of similarity digest search strategies using approximate matching tools. In: SIMPÓSIO BRASILEIRO DE SEGURANÇA DA INFORMAÇÃO E DE SISTEMAS COMPUTACIONAIS (SBSEG), 17. , 2017, Brasília. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2017 . p. 154-167. DOI:

