Entendendo e melhorando a capacidade de detecção de estratégias de busca de similaridade em investigações forenses
Abstract
Digital forensic practitioners face two main challenges: The increase in the number of digital devices in use and the difficulties in analysing them. Approximate Matching (AM) can be used to find relevant data by efficiently assessing the similarity of objects. However, commonly used procedures to assess the similarity between data sets (e.g., brute force) consume too much time and resources. To tackle this problem, the Similarity Digest Search Strategies allow faster comparisons by performing smart searches with AM. This paper compares some strategies in the literature and related brute force approaches, showing their precision and recall rates. We present the time taken by the strategies, analyze the impact of file types on similarity, and propose improvements.References
Breitinger, F. and Baier, H. (2013). Similarity preserving hashing: Eligible properties and a new algorithm mrsh-v2. In Digital Forensics and Cyber Crime: 4th International Conference, ICDF2C 2012, Lafayette, IN, USA, October 25-26, 2012, Revised Selected Papers, pages 167–182, Berlin, Heidelberg. Springer Berlin Heidelberg.
Breitinger, F., Baier, H., and White, D. (2014a). On the database lookup problem of approximate matching. Digital Investigation, 11:S1–S9.
Breitinger, F., Guttman, B., McCarrin, M., Roussev, V., and White, D. (2014b). Approximate matching: definition and terminology. NIST Special Publication, 800:168.
Breitinger, F. and Roussev, V. (2014). Automated evaluation of approximate matching algorithms on real data. Digital Investigation, 11:S10–S17.
Harichandran, V. S., Breitinger, F., and Baggili, I. (2016). Bytewise approximate matching: The good, the bad, and the unknown. The Journal of Digital Forensics, Security and Law: JDFSL, 11(2):59.
Kornblum, J. (2006). Identifying almost identical files using context triggered piecewise hashing. Digital investigation, 3:91–97.
Lillis, D., Breitinger, F., and Scanlon, M. (2017). Expediting mrsh-v2 approximate matching with hierarchical bloom filter trees. In International Conference on Digital Forensics and Cyber Crime, pages 144–157. Springer.
Moia, V. H. G., Breitinger, F., and Henriques, M. A. A. (2020). The impact of excluding common blocks for approximate matching. Computers & Security, 89:101676.
Moia, V. H. G. and Henriques, M. A. A. (2017a). Fast similarity digest search: a new strategy for performing queries efficiently with approximate matching. XVII Brazilian Symposium on information and computational systems security, Brazilian Computer Society (SB).
Moia, V. H. G. and Henriques, M. A. A. (2017b). Similarity digest search: A survey and comparative analysis of strategies to perform known file filtering using approximate matching. Security and Communication Networks, pages 1–17.
Noll, L. C. (2012). Fowler/Noll/Vo (FNV) hash. Disponível em: http://www.isthe.com/chongo/tech/comp/fnv/index.html. Acess. em 15 Set 2020.
Roussev, V. (2010). Data fingerprinting with similarity digests. In IFIP International Conf. on Digital Forensics, pages 207–226. Springer.
Roussev, V. (2011). An evaluation of forensic similarity hashes. Digital investigation, 8:34–41.
Tridgell, A. (2002). Spamsum. Disponível em: http://samba.org/ftp/unpacked/junkcode/spamsum. Acess. em 15 Set 2020.
Winter, C., Schneider, M., and Yannikos, Y. (2013). F2s2: Fast forensic similarity search through indexing piecewise hash signatures. Digital Investigation, 10(4):361–371.
Breitinger, F., Baier, H., and White, D. (2014a). On the database lookup problem of approximate matching. Digital Investigation, 11:S1–S9.
Breitinger, F., Guttman, B., McCarrin, M., Roussev, V., and White, D. (2014b). Approximate matching: definition and terminology. NIST Special Publication, 800:168.
Breitinger, F. and Roussev, V. (2014). Automated evaluation of approximate matching algorithms on real data. Digital Investigation, 11:S10–S17.
Harichandran, V. S., Breitinger, F., and Baggili, I. (2016). Bytewise approximate matching: The good, the bad, and the unknown. The Journal of Digital Forensics, Security and Law: JDFSL, 11(2):59.
Kornblum, J. (2006). Identifying almost identical files using context triggered piecewise hashing. Digital investigation, 3:91–97.
Lillis, D., Breitinger, F., and Scanlon, M. (2017). Expediting mrsh-v2 approximate matching with hierarchical bloom filter trees. In International Conference on Digital Forensics and Cyber Crime, pages 144–157. Springer.
Moia, V. H. G., Breitinger, F., and Henriques, M. A. A. (2020). The impact of excluding common blocks for approximate matching. Computers & Security, 89:101676.
Moia, V. H. G. and Henriques, M. A. A. (2017a). Fast similarity digest search: a new strategy for performing queries efficiently with approximate matching. XVII Brazilian Symposium on information and computational systems security, Brazilian Computer Society (SB).
Moia, V. H. G. and Henriques, M. A. A. (2017b). Similarity digest search: A survey and comparative analysis of strategies to perform known file filtering using approximate matching. Security and Communication Networks, pages 1–17.
Noll, L. C. (2012). Fowler/Noll/Vo (FNV) hash. Disponível em: http://www.isthe.com/chongo/tech/comp/fnv/index.html. Acess. em 15 Set 2020.
Roussev, V. (2010). Data fingerprinting with similarity digests. In IFIP International Conf. on Digital Forensics, pages 207–226. Springer.
Roussev, V. (2011). An evaluation of forensic similarity hashes. Digital investigation, 8:34–41.
Tridgell, A. (2002). Spamsum. Disponível em: http://samba.org/ftp/unpacked/junkcode/spamsum. Acess. em 15 Set 2020.
Winter, C., Schneider, M., and Yannikos, Y. (2013). F2s2: Fast forensic similarity search through indexing piecewise hash signatures. Digital Investigation, 10(4):361–371.
Published
2020-10-13
How to Cite
VELHO, João P. B.; MOIA, Vitor H. G.; HENRIQUES, Marco A. A..
Entendendo e melhorando a capacidade de detecção de estratégias de busca de similaridade em investigações forenses. In: BRAZILIAN SYMPOSIUM ON CYBERSECURITY (SBSEG), 20. , 2020, Petrópolis.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2020
.
p. 436-449.
DOI: https://doi.org/10.5753/sbseg.2020.19255.
