Understanding the effects of removing common blocks on Approximate Matching scores under different scenarios for digital forensic investigations

Vitor Hugo Moia; Frank Breitinger; Marco Aurélio Henriques

doi:10.5753/sbseg.2019.13966

Vitor Hugo Moia UNICAMP
Frank Breitinger University of New Haven
Marco Aurélio Henriques UNICAMP

DOI: https://doi.org/10.5753/sbseg.2019.13966

Resumo

Finding similarity in digital forensics investigations can be assisted with the use of Approximate Matching (AM) functions. These algorithms create small and compact representations of objects (similar to hashes) which can be compared to identify similarity. However, often results are biased due to common blocks (data structures found in many different ﬁles regardless of content). In this paper, we evaluate the precision and recall metrics for AM functions when removing common blocks. In detail, we analyze how the similarity score changes and impacts different investigation scenarios. Results show that many irrelevant matches can be ﬁltered out and that a new interpretation of the score allows a better similarity detection.

Referências

Bloom, B. H. (1970). Space/time trade-offs in hash coding with allowable errors. Commun. ACM, 13(7):422–426.

Breitinger, F. and Baier, H. (2013). Similarity Preserving Hashing: Eligible Properties and a New Algorithm MRSH-v2, pages 167–182. Springer Berlin Heidelberg, Berlin, Heidelberg.

Breitinger, F., Guttman, B., McCarrin, M., Roussev, V., and White, D. (2014). Approximate matching: denition and terminology. NIST Special Publication, 800:168.

Foster, K. (2012). Using distinct sectors in media sampling and full media analysis to detect presence of documents from a corpus. Technical report, Naval Post-graduate School Monterey (CA).

Garnkel, S. L. and McCarrin, M. (2015). Hash-based carving: Searching media for complete les and le fragments with sector hashing and hashdb. Digital Investigation, 14:S95–S105.

Gutierrez-Villarreal, F. J. (2015). Improving sector hash carving with rule-based and entropy-based non-probative block lters. Technical report, Naval Postgraduate School Monterey (CA).

Kornblum, J. (2006). Identifying almost identical les using context trig- gered piecewise hashing. Digital investigation, 3:91–97.

Moia, V. H. G., Breitinger, F., and Henriques, M. A. A. (2019). The impact of excluding common blocks for approximate matching. pages 1–11. TO BE PUBLISHED.

Oliver, J., Cheng, C., and Chen, Y. (2013). TLSH–a locality sensitive hash. In Cybercrime and Trustworthy Computing Workshop (CTC), 2013 Fourth, pages 7–13. IEEE.

Olson, D. L. and Delen, D. (2008). Advanced data mining techniques. Springer Science & Business Media.

Raff, E. and Nicholas, C. (2018). Lempel-ziv jaccard distance, an effective alternative to ssdeep and sdhash. Digital Investigation, 24:34–49.

Roussev, V. (2010). Data ngerprinting with similarity digests. In IFIP International Conf. on Digital Forensics, pages 207–226. Springer.

Roussev, V. (2011). An evaluation of forensic similarity hashes. Digital investigation, 8:34–41.

Understanding the effects of removing common blocks on Approximate Matching scores under different scenarios for digital forensic investigations

Resumo

Referências

Artigos mais lidos do(s) mesmo(s) autor(es)