Static Analysis on Disassembled Files: A Deep Learning Approach to Malware Classification
Resumo
The cybernetic environment is hostile. An infinitude of gadgets with access to fast networks and the massive use of social networks considerably raised the number of vectors of malware propagation. Deep Learning models achieved great results in many different areas, including security-related tasks, such as static and dynamic malware analysis. This paper details a deep learning approach to the problem of malware classification using only the disassembled artifact's code as input. We show competitive performance when comparing to other solutions that use a higher degree of knowledge.Referências
Abadi, M. et al. (2015). TensorFlow: Large-scale machine learning on heterogeneous systems. http://tensorflow.org/. Software available from tensorflow.org.
Bengio, Y., Courville, A., and Vincent, P. (2013). Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence, 35(8):1798–1828.
Chollet, F. et al. (2015). Keras. https://github.com/fchollet/keras. [Online; accessed 2-August-2017].
Dahl, G. E., Stokes, J.W., Deng, L., and Yu, D. (2013). Large-scale malware classification using random projections and neural networks. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pages 3422–3426. IEEE.
Damshenas, M., Dehghantanha, A., and Mahmoud, R. (2013). A survey on malware propagation, analysis, and detection. International Journal of Cyber-Security and Digital Forensics (IJCSDF), 2(4):10–29.
Goodfellow, I., Bengio, Y., and Courville, A. (2016). Deep Learning. MIT Press. http://www.deeplearningbook.org.
Hardy, W., Chen, L., Hou, S., Ye, Y., and Li, X. (2016). Dl4md: A deep learning framework for intelligent malware detection. In Proceedings of the International Conference on Data Mining (DMIN), page 61. The Steering Committee of The World Congress in Computer Science, Computer Engineering and Applied Computing (WorldComp).
Hex-Rays (2017). IDA Pro Disassembler. https://www.hexrays.com/products/ida/index.shtml. [Online; accessed 2-August-2017 ].
Jang-Jaccard, J. and Nepal, S. (2014). A survey of emerging threats in cybersecurity. Journal of Computer and System Sciences, 80(5):973–993.
LeCun, Y., Bengio, Y., and Hinton, G. (2015). Deep learning. Nature, 521(7553):436–444.
Mangialardo, R. J. and Duarte, J. C. (2015). Integrating static and dynamic malware analysis using machine learning. IEEE Latin America Transactions, 13(9):3080–3087.
Manning, C. D., Raghavan, P., and Schütze, H. (2008). Scoring, term weighting and the vector space model. Introduction to information retrieval, 100:2–4.
Marpaung, J. A., Sain, M., and Lee, H.-J. (2012). Survey on malware evasion techniques: State of the art and challenges. In Advanced Communication Technology (ICACT), 2012 14th International Conference on, pages 744–749. IEEE.
McAfee (2014). Net losses: Estimating the global cost of cybercrime. intel security (mcafee). http://www.mcafee.com/us/resources/reports/rp-economic-impact-cybercrime2.pdf. [Online; accessed 2-August- 2017].
McAfee (2015). Previs˜oes do mcafee labs sobre ameaças em 2016. http://www.mcafee.com/br/resources/reports/rp-threats-predictions-2016.pdf. [Online; accessed 2-August-2017].
McGraw, G. and Morrisett, G. (2000). Attacking malicious code: A report to the infosec research council. IEEE Softw., 17(5):33–41.
Microsoft (2015). Microsoft Malware Classification Challenge (BIG 2015). https://www.kaggle.com/c/malware-classification. [Online; accessed 2- August-2017 ].
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., and Duchesnay, E. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830.
Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural Networks, 61:85–117.
Symantec (2016). Internet Security Threat Report. https://www.symantec.com/content/dam/symantec/docs/reports/istr-21-2016-en.pdf. [Online; accessed 2-August-2017].
Verison (2016). 2016 data breach investigations report. http://www.verizonenterprise.com/verizon-insights-lab/dbir/2016/. [Online; accessed 2-August-2017].
Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., and Manzagol, P.-A. (2010). Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. Journal of Machine Learning Research, 11(Dec):3371–3408.
Yuan, Z., Lu, Y., Wang, Z., and Xue, Y. (2014). Droid-sec: Deep learning in android malware detection. In ACM SIGCOMM Computer Communication Review, volume 44, pages 371–372. ACM.
Bengio, Y., Courville, A., and Vincent, P. (2013). Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence, 35(8):1798–1828.
Chollet, F. et al. (2015). Keras. https://github.com/fchollet/keras. [Online; accessed 2-August-2017].
Dahl, G. E., Stokes, J.W., Deng, L., and Yu, D. (2013). Large-scale malware classification using random projections and neural networks. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pages 3422–3426. IEEE.
Damshenas, M., Dehghantanha, A., and Mahmoud, R. (2013). A survey on malware propagation, analysis, and detection. International Journal of Cyber-Security and Digital Forensics (IJCSDF), 2(4):10–29.
Goodfellow, I., Bengio, Y., and Courville, A. (2016). Deep Learning. MIT Press. http://www.deeplearningbook.org.
Hardy, W., Chen, L., Hou, S., Ye, Y., and Li, X. (2016). Dl4md: A deep learning framework for intelligent malware detection. In Proceedings of the International Conference on Data Mining (DMIN), page 61. The Steering Committee of The World Congress in Computer Science, Computer Engineering and Applied Computing (WorldComp).
Hex-Rays (2017). IDA Pro Disassembler. https://www.hexrays.com/products/ida/index.shtml. [Online; accessed 2-August-2017 ].
Jang-Jaccard, J. and Nepal, S. (2014). A survey of emerging threats in cybersecurity. Journal of Computer and System Sciences, 80(5):973–993.
LeCun, Y., Bengio, Y., and Hinton, G. (2015). Deep learning. Nature, 521(7553):436–444.
Mangialardo, R. J. and Duarte, J. C. (2015). Integrating static and dynamic malware analysis using machine learning. IEEE Latin America Transactions, 13(9):3080–3087.
Manning, C. D., Raghavan, P., and Schütze, H. (2008). Scoring, term weighting and the vector space model. Introduction to information retrieval, 100:2–4.
Marpaung, J. A., Sain, M., and Lee, H.-J. (2012). Survey on malware evasion techniques: State of the art and challenges. In Advanced Communication Technology (ICACT), 2012 14th International Conference on, pages 744–749. IEEE.
McAfee (2014). Net losses: Estimating the global cost of cybercrime. intel security (mcafee). http://www.mcafee.com/us/resources/reports/rp-economic-impact-cybercrime2.pdf. [Online; accessed 2-August- 2017].
McAfee (2015). Previs˜oes do mcafee labs sobre ameaças em 2016. http://www.mcafee.com/br/resources/reports/rp-threats-predictions-2016.pdf. [Online; accessed 2-August-2017].
McGraw, G. and Morrisett, G. (2000). Attacking malicious code: A report to the infosec research council. IEEE Softw., 17(5):33–41.
Microsoft (2015). Microsoft Malware Classification Challenge (BIG 2015). https://www.kaggle.com/c/malware-classification. [Online; accessed 2- August-2017 ].
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., and Duchesnay, E. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830.
Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural Networks, 61:85–117.
Symantec (2016). Internet Security Threat Report. https://www.symantec.com/content/dam/symantec/docs/reports/istr-21-2016-en.pdf. [Online; accessed 2-August-2017].
Verison (2016). 2016 data breach investigations report. http://www.verizonenterprise.com/verizon-insights-lab/dbir/2016/. [Online; accessed 2-August-2017].
Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., and Manzagol, P.-A. (2010). Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. Journal of Machine Learning Research, 11(Dec):3371–3408.
Yuan, Z., Lu, Y., Wang, Z., and Xue, Y. (2014). Droid-sec: Deep learning in android malware detection. In ACM SIGCOMM Computer Communication Review, volume 44, pages 371–372. ACM.
Publicado
06/11/2017
Como Citar
PINTO, Dhiego Ramos; DUARTE, Julio Cesar.
Static Analysis on Disassembled Files: A Deep Learning Approach to Malware Classification. In: SIMPÓSIO BRASILEIRO DE SEGURANÇA DA INFORMAÇÃO E DE SISTEMAS COMPUTACIONAIS (SBSEG), 17. , 2017, Brasília.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2017
.
p. 471-478.
DOI: https://doi.org/10.5753/sbseg.2017.19520.