Multi-Objective Feature Selection for Android Malware Detection
Abstract
This paper proposes a malware detection model for Android using the multi-view technique and multi-objective feature selection. Initially, a set of multiple features, referred to as multi-view, is extracted from an Android application, which is then used to build a feature vector for the classification task. Then, a multi-objective optimization algorithm is applied to select a subset of features that reduces the model’s error rate and inference time. Two classification models are applied for each feature subset using the ensemble method with majority voting. Experiments demonstrated the feasibility of our proposal. Compared to a single-view model without feature selection, our method improved the true positive rates by an average of 4.4 while requiring up to 65% less in inference processing costs.
Keywords:
Android, Malware Detection, Machine Learning, Multi-objective Optimization
References
Allix, K., Bissyandé, T. F., Klein, J., and Traon, Y. L. (2016). Androzoo: Collecting millions of android apps for the research community. 2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR), pages 468–471.
AndroidStats (2024). Android statistics. [link]. [online: acessado em 02-junho-2024].
Azad, M. A., Riaz, F., Aftab, A., Rizvi, S. K. J., Arshad, J., and Atlam, H. F. (2022). Deepsel: A novel feature selection for early identification of malware in mobile applications. Future Generation Computer Systems, 129:54–63.
Darwaish, A. and Nait-Abdesselam, F. (2020). Rgb-based android malware detection and classification using convolutional neural network. In IEEE Global Communications Conference.
Deb, K., Pratap, A., Agarwal, S., and Meyarivan, T. (2002). A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation, 6(2):182–197.
dos Santos, R. R., Viegas, E. K., and Santin, A. O. (2021). A reminiscent intrusion detection model based on deep autoencoders and transfer learning. In 2021 IEEE Global Communications Conference (GLOBECOM). IEEE.
dos Santos, R. R., Viegas, E. K., Santin, A. O., and Tedeschi, P. (2023). Federated learning for reliable model updates in network-based intrusion detection. Computers amp; Security, 133:103413.
Geremias, J., Viegas, E. K., Santin, A. O., Britto, A., and Horchulhack, P. (2022). Towards multi-view android malware detection through image-based deep learning. In 2022 International Wireless Communications and Mobile Computing (IWCMC). IEEE.
Geremias, J., Viegas, E. K., Santin, A. O., Britto, A., and Horchulhack, P. (2023). Towards a reliable hierarchical android malware detection through image-based cnn. In 2023 IEEE 20th Consumer Communications amp; Networking Conference (CCNC). IEEE.
Horchulhack, P., Viegas, E. K., Santin, A. O., and Simioni, J. A. (2024). Network-based intrusion detection through image-based cnn and transfer learning. In 2024 International Wireless Communications and Mobile Computing (IWCMC). IEEE.
Kaspersky (2023). Attacks on mobile devices significantly increase in 2023. [link]. [online: acessado em 02-junho-2024].
Martín, A., Lara-Cabrera, R., and Camacho, D. (2019). Android malware detection through hybrid features fusion and ensemble classifiers: The andropytool framework and the omnidroid dataset. Information Fusion, 52:128–142.
Millar, S., McLaughlin, N., Martinez del Rincon, J., and Miller, P. (2021). Multi-view deep learning for zero-day android malware detection. Journal of Information Security and Applications, 58:102718.
Pektaş, A. and Acarman, T. (2020). Learning to detect android malware via opcode sequences. Neurocomputing, 396:599–608.
Qiu, J., Zhang, J., Luo, W., Pan, L., Nepal, S., and Xiang, Y. (2020). A survey of android malware detection with deep neural models. ACM Computing Surveys, 53(6):1–36.
Ravi, V., Alazab, M., Selvaganapathy, S., and Chaganti, R. (2022). A multi-view attention-based deep learning framework for malware detection in smart healthcare systems. Computer Communications, 195:73–81.
Santos, R. R. d., Viegas, E. K., Santin, A. O., and Cogo, V. V. (2023). Reinforcement learning for intrusion detection: More model longness and fewer updates. IEEE Transactions on Network and Service Management, 20(2):2040–2055.
Seraj, S., Khodambashi, S., Pavlidis, M., and Polatidis, N. (2022). Hamdroid: permission-based harmful android anti-malware detection using neural networks. Neural Computing and Applications, 34(18):15165–15174.
Smith, M. R., Johnson, N. T., Ingram, J. B., Carbajal, A. J., Haus, B. I., Domschot, E., Ramyaa, R., Lamb, C. C., Verzi, S. J., and Kegelmeyer, W. P. (2020). Mind the gap: On bridging the semantic gap between machine learning and malware analysis. In Proceedings of the 13th ACM Workshop on Artificial Intelligence and Security, CCS ’20. ACM.
Virustotal (2024). Analyze suspicious files. [link]. [online: acessado em 02-junho-2024].
Wu, Y., Li, M., Zeng, Q., Yang, T., Wang, J., Fang, Z., and Cheng, L. (2023). Droidrl: Feature selection for android malware detection with reinforcement learning. Computers amp; Security, 128:103126.
Şahin, D. O., Kural, O. E., Akleylek, S., and Kılıç, E. (2021). A novel permission-based android malware detection system using feature selection based on linear regression. Neural Computing and Applications, 35(7):4903–4918.
AndroidStats (2024). Android statistics. [link]. [online: acessado em 02-junho-2024].
Azad, M. A., Riaz, F., Aftab, A., Rizvi, S. K. J., Arshad, J., and Atlam, H. F. (2022). Deepsel: A novel feature selection for early identification of malware in mobile applications. Future Generation Computer Systems, 129:54–63.
Darwaish, A. and Nait-Abdesselam, F. (2020). Rgb-based android malware detection and classification using convolutional neural network. In IEEE Global Communications Conference.
Deb, K., Pratap, A., Agarwal, S., and Meyarivan, T. (2002). A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation, 6(2):182–197.
dos Santos, R. R., Viegas, E. K., and Santin, A. O. (2021). A reminiscent intrusion detection model based on deep autoencoders and transfer learning. In 2021 IEEE Global Communications Conference (GLOBECOM). IEEE.
dos Santos, R. R., Viegas, E. K., Santin, A. O., and Tedeschi, P. (2023). Federated learning for reliable model updates in network-based intrusion detection. Computers amp; Security, 133:103413.
Geremias, J., Viegas, E. K., Santin, A. O., Britto, A., and Horchulhack, P. (2022). Towards multi-view android malware detection through image-based deep learning. In 2022 International Wireless Communications and Mobile Computing (IWCMC). IEEE.
Geremias, J., Viegas, E. K., Santin, A. O., Britto, A., and Horchulhack, P. (2023). Towards a reliable hierarchical android malware detection through image-based cnn. In 2023 IEEE 20th Consumer Communications amp; Networking Conference (CCNC). IEEE.
Horchulhack, P., Viegas, E. K., Santin, A. O., and Simioni, J. A. (2024). Network-based intrusion detection through image-based cnn and transfer learning. In 2024 International Wireless Communications and Mobile Computing (IWCMC). IEEE.
Kaspersky (2023). Attacks on mobile devices significantly increase in 2023. [link]. [online: acessado em 02-junho-2024].
Martín, A., Lara-Cabrera, R., and Camacho, D. (2019). Android malware detection through hybrid features fusion and ensemble classifiers: The andropytool framework and the omnidroid dataset. Information Fusion, 52:128–142.
Millar, S., McLaughlin, N., Martinez del Rincon, J., and Miller, P. (2021). Multi-view deep learning for zero-day android malware detection. Journal of Information Security and Applications, 58:102718.
Pektaş, A. and Acarman, T. (2020). Learning to detect android malware via opcode sequences. Neurocomputing, 396:599–608.
Qiu, J., Zhang, J., Luo, W., Pan, L., Nepal, S., and Xiang, Y. (2020). A survey of android malware detection with deep neural models. ACM Computing Surveys, 53(6):1–36.
Ravi, V., Alazab, M., Selvaganapathy, S., and Chaganti, R. (2022). A multi-view attention-based deep learning framework for malware detection in smart healthcare systems. Computer Communications, 195:73–81.
Santos, R. R. d., Viegas, E. K., Santin, A. O., and Cogo, V. V. (2023). Reinforcement learning for intrusion detection: More model longness and fewer updates. IEEE Transactions on Network and Service Management, 20(2):2040–2055.
Seraj, S., Khodambashi, S., Pavlidis, M., and Polatidis, N. (2022). Hamdroid: permission-based harmful android anti-malware detection using neural networks. Neural Computing and Applications, 34(18):15165–15174.
Smith, M. R., Johnson, N. T., Ingram, J. B., Carbajal, A. J., Haus, B. I., Domschot, E., Ramyaa, R., Lamb, C. C., Verzi, S. J., and Kegelmeyer, W. P. (2020). Mind the gap: On bridging the semantic gap between machine learning and malware analysis. In Proceedings of the 13th ACM Workshop on Artificial Intelligence and Security, CCS ’20. ACM.
Virustotal (2024). Analyze suspicious files. [link]. [online: acessado em 02-junho-2024].
Wu, Y., Li, M., Zeng, Q., Yang, T., Wang, J., Fang, Z., and Cheng, L. (2023). Droidrl: Feature selection for android malware detection with reinforcement learning. Computers amp; Security, 128:103126.
Şahin, D. O., Kural, O. E., Akleylek, S., and Kılıç, E. (2021). A novel permission-based android malware detection system using feature selection based on linear regression. Neural Computing and Applications, 35(7):4903–4918.
Published
2024-09-16
How to Cite
FRANSOZI, Philipe; GEREMIAS, Jhonatan; VIEGAS, Eduardo K.; SANTIN, Altair O..
Multi-Objective Feature Selection for Android Malware Detection. In: WORKSHOP ON SCIENTIFIC INITIATION AND UNDERGRADUATE WORKS - BRAZILIAN SYMPOSIUM ON CYBERSECURITY (SBSEG), 24. , 2024, São José dos Campos/SP.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2024
.
p. 292-302.
DOI: https://doi.org/10.5753/sbseg_estendido.2024.241836.
