Evaluation of the impact of removing common features on similar files search strategies

Abstract


Digital forensic investigations face an important problem: The large amount of files that are stored in seized devices. To better assess these devices, one can use similarity search strategies, which are capable of finding identical, or even similar, files to a given set of files, using approximate matching techniques. However, this search can be impaired due to common blocks, pieces of similar or identical information (like headers, templates, etc) that are present in different files. This paper aims to assess the impact of removing common blocks on the strategies' performance. The results show a significative reduction on false positive rates, with a acceptable increase on runtime.
Keywords: Digital Forensics, Approximate Matching, Similarity Search Strategies, Commons Blocks

References

Breitinger, F. and Baier, H. (2013). Similarity preserving hashing: Eligible properties and a new algorithm mrsh-v2. In Digital Forensics and Cyber Crime: 4th International Conference, ICDF2C 2012, Lafayette, IN, USA, pages 167–182. Springer Berlin Heidelberg.

Breitinger, F., Baier, H., and White, D. (2014a). On the database lookup problem of approximate matching. Digital Investigation, 11:S1–S9.

Breitinger, F., Guttman, B., McCarrin, M., Roussev, V., and White, D. (2014b). Approximate matching: denition and terminology. NIST Special Publication, 800:168.

Kornblum, J. (2006). Identifying almost identical les using context triggered piecewise hashing. Digital investigation, 3:91–97.

Lillis, D., Breitinger, F., and Scanlon, M. (2017). Expediting mrsh-v2 approximate matching with hierarchical bloom lter trees. In International Conference on Digital Forensics and Cyber Crime, pages 144–157. Springer.

Moia, V. H. G., Breitinger, F., and Henriques, M. (2020a). Understanding the effects of removing common blocks on approximate matching scores under different scenarios for digital forensic investigations. XIX Brazilian Symposium on information and computational systems security, Brazilian Computer Society (SB).

Moia, V. H. G., Breitinger, F., and Henriques, M. A. A. (2020b). The impact of excluding common blocks for approximate matching. Computers & Security, 89:101676.

Moia, V. H. G. and Henriques, M. A. A. (2017). Similarity digest search: A survey and comparative analysis of strategies to perform known le ltering using approximate matching. Security and Communication Networks, pages 1–17.

Oliver, J., Cheng, C., and Chen, Y. (2013). TLSH–a locality sensitive hash. In Cybercrime and Trustworthy Computing Workshop (CTC), 2013 Fourth, pages 7–13. IEEE.

Raff, E. and Nicholas, C. (2018). Lempel-ziv jaccard distance, an effective alternative to ssdeep and sdhash. Digital Investigation, 24:34–49.

Roussev, V. (2010). Data ngerprinting with similarity digests. In IFIP International Conf. on Digital Forensics, pages 207–226. Springer.

Roussev, V. (2011). An evaluation of forensic similarity hashes. Digital investigation, 8:34–41.

Velho, J. P. B., Moia, V. H. G., and Henriques, M. A. A. (2020). Entendendo e melhorando a capacidade de detecção de estratégias de busca de similaridade em investigações forenses. XX Brazilian Symposium on information and computational systems security, Brazilian Computer Society (SB).

Winter, C., Schneider, M., and Yannikos, Y. (2013). F2s2: Fast forensic similarity search through indexing piecewise hash signatures. Digital Investigation, 10(4):361–371.
Published
2021-10-04
VELHO, João P. B.; MOIA, Vitor H. G.; HENRIQUES, Marco A. A.. Evaluation of the impact of removing common features on similar files search strategies. In: BRAZILIAN SYMPOSIUM ON CYBERSECURITY (SBSEG), 21. , 2021, Belém. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2021 . p. 239-252. DOI: https://doi.org/10.5753/sbseg.2021.17319.