Classificação de Fragmentos de Arquivos com Técnica de Aprendizagem de Máquina baseada em Árvores de Decisão

  • Juliano K. M. Oya Polícia Civil do Distrito Federal
  • Bruno W. P. Hoelz Polícia Federal

Abstract


The classification of file fragments is an important problem in computer forensics. This paper describes a flexible method for classifying file fragments using machine learning techniques. We used evidence files from real forensic cases to generate the training and testing fragments. From 12,153 evidence files of 21 different types, we generated and selected over a million of fragments with 1, 2 and 4 kilobytes of size. For each fragment, we extracted 45 attributes, which were subjected to machine learning techniques based on decision trees and, as a result, we obtained an average hit percentage of 98.78% for binary classifiers and 86.05% for multinomial classifiers.

References

Axelsson, S. (2010) “The normalized compression distance as a file fragment classifier”. Proceedings of the 2010 Digital Forensics Research Conference (DFRWS).

Bell, J. (2014) “Machine Learning: Hands-on for developers and technical professionals”. John Wiley & Sons.

Calhoun, W., Coles, D. (2008) “Predicting the types of file fragments”. Proceedings of the 2008 Digital Forensics Research Conference (DFRWS).

Cohen, M. I. (2007) “Advanced carving techniques”. In Digital Investigation: The International Journal of Digital Forensics & Incident, volume 4, pages 119-128.

Conti, G., Bratus, S., Sangster, B., Ragsdale, R., et al. (2010) “Automated mapping of large binary objects using primitive fragment type classification”. Proceedings of the 2010 Digital Forensics Research Conference (DFRWS).

Garfinkel, S. L. (2007) “Carving contiguous and fragmented files with fast object validation”. Journal Digital Investigation, volume 4, pages 2-12.

Fitzgerald, S., Mathews, G., Morris, C. and Zhulyn, O. (2012) “Using NLP techniques for file fragment classification”. Journal Digital Investigation, volume 9.

Foremost (2016). Disponível em: <http://foremost.sourceforge.net/>.

Li, Q., Ong, A., Suganthan, P., Thing, V. (2010) “A novel support vector machine approach to high entropy data fragment classification”. Proceedings of the South African Information Security Multi-Conference.

Penrose, P., Macfarlane, R., and Buchanan, W. J. (2013) “Approaches to the classification of high entropy file fragments”. Journal Digital Investigation, volume 10, issue 4, pages 372-384.

Quinlan, R. (1993) “C4.5: Programs for Machine Learning”. Morgan Kaufmann Publishers.

Roussev, V. and Quates, C. (2013) “File fragment encoding classification-An empirical approach”. Journal Digital Investigation, volume 10, pages S69-S77.

Veenman, CJ. (2007) “Statistical disk cluster classification for file carving”. In Proceedings of the IEEE 3rd international symposium on information assurance and security, IEEE Computer Society, pages 393–8.

Waikato (2016a) “Weka: Waikato Environment for Knowledge Analysis Tool”. Disponível em: <http://www.cs.waikato.ac.nz/~ml/>.

Waikato (2016b) “Weka: Attribute-Relation File Format”. Disponível em: <http://weka.wikispaces.com/ARFF>.

Waikato (2016c) “Weka: Decision Tree Algorithms”. Disponível em: <http://weka.sourceforge.net/doc.stable/weka/classifiers/trees/package-summary.html>.
Published
2016-11-07
OYA, Juliano K. M.; HOELZ, Bruno W. P.. Classificação de Fragmentos de Arquivos com Técnica de Aprendizagem de Máquina baseada em Árvores de Decisão. In: BRAZILIAN SYMPOSIUM ON CYBERSECURITY (SBSEG), 16. , 2016, Niterói. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2016 . p. 86-99. DOI: https://doi.org/10.5753/sbseg.2016.19300.