An operational costs analysis of similarity digest search strategies using approximate matching tools

  • Vitor Hugo Galhardo Moia UNICAMP
  • Marco Aurélio Amaral Henriques UNICAMP


Approximate matching functions are suitable tools for forensic investigators to detect similarity between two digital objects. With the rapid increase in data storage capacity, these functions appear as candidates to perform Known File Filtering (KFF) efficiently, separating relevant from irrelevant information. However, comparing sets of approximate matching digests can be overwhelming, since the usual approach is by brute force (all-against-all). In this paper, we evaluate some strategies to better perform KFF using approximate matching tools. A detailed analysis of their operational costs when performing over large data sets is done. Our results show significant improvements over brute force and how the strategies scale for different database sizes.


Breitinger, F. and Baier, H. (2012). Performance Issues About Context-Triggered Piecewise Hashing, pages 141–155. Springer Berlin Heidelberg, Berlin, Heidelberg.

Breitinger, F., Baier, H., and White, D. (2014a). On the database lookup problem of approximate matching. Digital Investigation, 11:S1–S9.

Breitinger, F., Guttman, B., McCarrin, M., Roussev, V., and White, D. (2014b). Approximate matching: definition and terminology. NIST Special Publication, 800:168.

Breitinger, F., Rathgeb, C., and Baier, H. (2014c). An efficient similarity digests database lookup-a logarithmic divide & conquer approach. The Journal of Digital Forensics, Security and Law: JDFSL, 9(2):155.

Harichandran, V. S., Breitinger, F., and Baggili, I. (2016). Bytewise approximate matching: The good, the bad, and the unknown. The Journal of Digital Forensics, Security and Law: JDFSL, 11(2):59.

Kornblum, J. (2006). Identifying almost identical files using context triggered piecewise hashing. Digital investigation, 3:91–97.

Martínez, V. G., Álvarez, F. H., and Encinas, L. H. (2014). State of the art in similarity preserving hashing functions. In Proceedings of the International Conference on Security and Management (SAM), page 1. The Steering Committee of The World Congress in Computer Science, Computer Engineering and Applied Computing (WorldComp).

Moia, V. H. G. and Henriques, M. A. A. (2017). A comparative analysis about similarity search strategies for digital forensics investigations. In XXXV Simpósio Brasileiro de Telecomunicações e Processamento de Sinais (SBrT 2017), São Pedro, Brazil.

NIST (2016). National software reference library. Accessed 2016 Set 13.

Roussev, V. (2010). Data fingerprinting with similarity digests. In IFIP International Conf. on Digital Forensics, pages 207–226. Springer.

Winter, C., Schneider, M., and Yannikos, Y. (2013). F2s2: Fast forensic similarity search through indexing piecewise hash signatures. Digital Investigation, 10(4):361–371.
Como Citar
MOIA, Vitor Hugo Galhardo; HENRIQUES, Marco Aurélio Amaral. An operational costs analysis of similarity digest search strategies using approximate matching tools. Anais do Simpósio Brasileiro de Segurança da Informação e de Sistemas Computacionais (SBSeg), [S.l.], p. 154-167, nov. 2017. ISSN 0000-0000. Disponível em: <>. Acesso em: 18 maio 2024. doi:


1 2 > >>