Bifocal Agent: automatically identifying malicious functions to enhance malware analyst focus
Abstract
Although there are various solutions for automatic detection of malicious components on an analyzed executable, malware analysis is still predominantly a manual process, with the human analyst being its bottleneck. Recent works identify suspicious regions in code, reducing the analyst’s effort. However, such solutions either are signature-based or generate many false positives. To overcome this challenge, we propose the Bifocal Agent, which operates at two distinct levels of granularity (function and basic block). The solution also uses new features to improve the detection of malicious functions. Experiments have shown that the solution increased the area under the ROC curve of the state-of-the-art related works by 17% and reduced false positives over a third.References
Alrawi, O., Ike, M., Pruett, M., Kasturi, R. P., Barua, S., Hirani, T., Hill, B., and Saltaformaggio, B. (2021). Forecasting malware capabilities from cyber attack memory images. In 30th USENIX Security Symposium (USENIX Security 21), pages 3523–3540. USENIX Association.
Andriesse, D., Slowinska, A., and Bos, H. (2017). Compiler-agnostic function detection in binaries. In 2017 IEEE European Symposium on Security and Privacy (EuroS&P), pages 177–189.
Coscia, A., Dentamaro, V., Galantucci, S., Maci, A., and Pirlo, G. (2023). Yamme: a yara-byte-signatures metamorphic mutation engine. IEEE Transactions on Information Forensics and Security, 18:4530–4545.
David, O. E. and Netanyahu, N. S. (2015). Deepsign: Deep learning for automatic malware signature generation and classification. In 2015 International Joint Conference on Neural Networks (IJCNN), pages 1–8.
Downing, E., Mirsky, Y., Park, K., and Lee, W. (2021). DeepReflect: Discovering malicious functionality through binary reconstruction. In 30th USENIX Security Symposium (USENIX Security 21), pages 3469–3486. USENIX Association.
Gutman, Y. (2019). Stop the churn, avoid burnout: How to keep your cybersecurity personnel. [link]. Accessed: 2024-09-30.
Jones, L., Sellers, A., and Carlisle, M. (2016). Cardinal: similarity analysis to defeat malware compiler variations. In 2016 11th International Conference on Malicious and Unwanted Software (MALWARE), pages 1–8.
Kaspersky (2023). Kaspersky Security Bulletin 2022. Statistics — securelist.com. [link]. [Acessado em 20-Maio-2024].
Lester, M. (2021). Pe malware machine learning dataset. [link]. Accessed: 2024-09-30.
Li, S., Ming, J., Qiu, P., Chen, Q., Liu, L., Bao, H., Wang, Q., and Jia, C. (2023). Packge-nome: Automatically generating robust yara rules for accurate malware packer detection. In Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security, CCS ’23, page 3078–3092, New York, NY, USA. Association for Computing Machinery.
Molloy, C., Charland, P., Ding, S. H. H., and Fung, B. C. M. (2022). Jarv1s: Phenotype clone search for rapid zero-day malware triage and functional decomposition for cyber threat intelligence. In 2022 14th International Conference on Cyber Conflict: Keep Moving! (CyCon), volume 700, pages 385–403.
Novkovic, I. and Groš, S. (2016). Can malware analysts be assisted in their work using techniques from machine learning? In 2016 39th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), pages 1408–1413.
Raff, E., Barker, J., Sylvester, J., Brandon, R., Catanzaro, B., and Nicholas, C. (2017). Malware detection by eating a whole exe.
Royal, P., Halpin, M., Dagon, D., Edmonds, R., and Lee, W. (2006). Polyunpack: Automating the hidden-code extraction of unpack-executing malware. In 2006 22nd Annual Computer Security Applications Conference (ACSAC’06), pages 289–300.
Ruaro, N., Pagani, F., Ortolani, S., Kruegel, C., and Vigna, G. (2022). Symbexcel: Automated analysis and understanding of malicious excel 4.0 macros. In 2022 IEEE Symposium on Security and Privacy (SP), pages 1066–1081.
Yong Wong, M., Landen, M., Antonakakis, M., Blough, D. M., Redmiles, E. M., and Ahamad, M. (2021). An inside look into the practice of malware analysis. CCS ’21, page 3053–3069, New York, NY, USA. Association for Computing Machinery.
Zhong, Y., Yamaki, H., Yamaguchi, Y., and Takakura, H. (2013). Ariguma code analyzer: Efficient variant detection by identifying common instruction sequences in malware families. In 2013 IEEE 37th Annual Computer Software and Applications Conference, pages 11–20.
Andriesse, D., Slowinska, A., and Bos, H. (2017). Compiler-agnostic function detection in binaries. In 2017 IEEE European Symposium on Security and Privacy (EuroS&P), pages 177–189.
Coscia, A., Dentamaro, V., Galantucci, S., Maci, A., and Pirlo, G. (2023). Yamme: a yara-byte-signatures metamorphic mutation engine. IEEE Transactions on Information Forensics and Security, 18:4530–4545.
David, O. E. and Netanyahu, N. S. (2015). Deepsign: Deep learning for automatic malware signature generation and classification. In 2015 International Joint Conference on Neural Networks (IJCNN), pages 1–8.
Downing, E., Mirsky, Y., Park, K., and Lee, W. (2021). DeepReflect: Discovering malicious functionality through binary reconstruction. In 30th USENIX Security Symposium (USENIX Security 21), pages 3469–3486. USENIX Association.
Gutman, Y. (2019). Stop the churn, avoid burnout: How to keep your cybersecurity personnel. [link]. Accessed: 2024-09-30.
Jones, L., Sellers, A., and Carlisle, M. (2016). Cardinal: similarity analysis to defeat malware compiler variations. In 2016 11th International Conference on Malicious and Unwanted Software (MALWARE), pages 1–8.
Kaspersky (2023). Kaspersky Security Bulletin 2022. Statistics — securelist.com. [link]. [Acessado em 20-Maio-2024].
Lester, M. (2021). Pe malware machine learning dataset. [link]. Accessed: 2024-09-30.
Li, S., Ming, J., Qiu, P., Chen, Q., Liu, L., Bao, H., Wang, Q., and Jia, C. (2023). Packge-nome: Automatically generating robust yara rules for accurate malware packer detection. In Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security, CCS ’23, page 3078–3092, New York, NY, USA. Association for Computing Machinery.
Molloy, C., Charland, P., Ding, S. H. H., and Fung, B. C. M. (2022). Jarv1s: Phenotype clone search for rapid zero-day malware triage and functional decomposition for cyber threat intelligence. In 2022 14th International Conference on Cyber Conflict: Keep Moving! (CyCon), volume 700, pages 385–403.
Novkovic, I. and Groš, S. (2016). Can malware analysts be assisted in their work using techniques from machine learning? In 2016 39th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), pages 1408–1413.
Raff, E., Barker, J., Sylvester, J., Brandon, R., Catanzaro, B., and Nicholas, C. (2017). Malware detection by eating a whole exe.
Royal, P., Halpin, M., Dagon, D., Edmonds, R., and Lee, W. (2006). Polyunpack: Automating the hidden-code extraction of unpack-executing malware. In 2006 22nd Annual Computer Security Applications Conference (ACSAC’06), pages 289–300.
Ruaro, N., Pagani, F., Ortolani, S., Kruegel, C., and Vigna, G. (2022). Symbexcel: Automated analysis and understanding of malicious excel 4.0 macros. In 2022 IEEE Symposium on Security and Privacy (SP), pages 1066–1081.
Yong Wong, M., Landen, M., Antonakakis, M., Blough, D. M., Redmiles, E. M., and Ahamad, M. (2021). An inside look into the practice of malware analysis. CCS ’21, page 3053–3069, New York, NY, USA. Association for Computing Machinery.
Zhong, Y., Yamaki, H., Yamaguchi, Y., and Takakura, H. (2013). Ariguma code analyzer: Efficient variant detection by identifying common instruction sequences in malware families. In 2013 IEEE 37th Annual Computer Software and Applications Conference, pages 11–20.
Published
2024-09-16
How to Cite
CHAHUD, Leonardo Gonçalves; ROCHA, Rafael Oliveira da; PEREIRA JR., Lourenço Alves; DRAGO, Idilio.
Bifocal Agent: automatically identifying malicious functions to enhance malware analyst focus. In: BRAZILIAN SYMPOSIUM ON CYBERSECURITY (SBSEG), 24. , 2024, São José dos Campos/SP.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2024
.
p. 60-75.
DOI: https://doi.org/10.5753/sbseg.2024.241689.
