Machine Learning-based Spyware Detection Systems: An Undersampling Performance Analysis

  • Diogo Santos de Oliveira M. de Souza FATEC
  • Maria Clara Souza Ramos FATEC
  • Wilson Cabral de Oliveira Junior FATEC
  • Thiago José Lucas FATEC
  • Tiago Martins Ferreira UNESP
  • Kelton Augusto Pontara da Costa UNESP

Resumo


This study examined Machine Learning (ML) techniques with and without the application of Undersampling to improve Spyware identification in Intrusion Detection Systems (IDS). The problem of data imbalance and how Undersampling can help select more relevant and diverse data subsets was discussed. The research adopts a methodical approach, which includes a systematic literature review, selection of relevant data, appropriate preprocessing, and evaluation of the performance of ML classifiers. The findings suggest that the use of Undersampling techniques can significantly influence the effectiveness of IDS, with certain classifiers showing notable improvements after being trained with these methods.
Palavras-chave: Spyware, Intrusion Detecion, Machine Learning

Referências

M. Moniruzzaman, A. Bagirov, and I. Gondal, “Partial undersampling of imbalanced data for cyber threats detection,” in Proceedings of the Australasian Computer Science Week Multiconference, ser. ACSW ’20. New York, NY, USA: Association for Computing Machinery, 2020. [Online]. DOI: 10.1145/3373017.3373026

H. M. Salih and M. S. Mohammed, “Spyware injection in android using fake application,” in 2020 International Conference on Computer Science and Software Engineering (CSASE), 2020, pp. 100–105.

V. Mahesh and S. Devi K.A., “Detection and prediction of spyware for user applications by interdisciplinary approach,” in 2020 International Conference on Computational Intelligence for Smart Power System and Sustainable Energy (CISPSSE), 2020, pp. 1–6.

M. Park and S. Chai, “Ai model for predicting legal judgments to improve accuracy and explainability of online privacy invasion cases,” Applied Sciences, vol. 11, no. 23, 2021. [Online]. Available: [link]

M. Mohseni and J. Tanha, “A density-based undersampling approach to intrusion detection,” in 2021 5th International Conference on Pattern Recognition and Image Analysis (IPRIA), 2021, pp. 1–7.

R. Zuech, J. Hancock, and T. M. Khoshgoftaar, “Detecting web attacks in severely imbalanced network traffic data,” in 2021 IEEE 22nd International Conference on Information Reuse and Integration for Data Science (IRI), 2021, pp. 267–273.

J. L. Leevy, J. Hancock, T. M. Khoshgoftaar, and N. Seliya, “Iot reconnaissance attack classification with random undersampling and ensemble feature selection,” in 2021 IEEE 7th International Conference on Collaboration and Internet Computing (CIC), 2021, pp. 41–49.

M. K. Qabalin, M. Naser, and M. Alkasassbeh, “Android spyware detection using machine learning: A novel dataset,” Sensors, vol. 22, no. 15, 2022. [Online]. Available: [link]

S. El-Gendy, M. S. Elsayed, A. Jurcut, and M. A. Azer, “Privacy preservation using machine learning in the internet of things,” Mathematics, vol. 11, no. 16, 2023. [Online]. Available: [link]

A. Kumari and I. Sharma, “Towards securing mobile communication from spyware attacks with artificial intelligence techniques,” in 2023 International Conference on Research Methodologies in Knowledge Management, Artificial Intelligence and Telecommunication Engineering (RMKMATE), 2023, pp. 1–5.

H. Badih, B. Bond, and J. Rrushi, “On second-order detection of webcam spyware,” in 2020 3rd International Conference on Information and Computer Technologies (ICICT), 2020, pp. 424–431.
Publicado
22/10/2025
SOUZA, Diogo Santos de Oliveira M. de; RAMOS, Maria Clara Souza; OLIVEIRA JUNIOR, Wilson Cabral de; LUCAS, Thiago José; FERREIRA, Tiago Martins; COSTA, Kelton Augusto Pontara da. Machine Learning-based Spyware Detection Systems: An Undersampling Performance Analysis. In: CONGRESSO LATINO-AMERICANO DE SOFTWARE LIVRE E TECNOLOGIAS ABERTAS (LATINOWARE), 22. , 2025, Foz do Iguaçu/PR. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2025 . p. 179-186. DOI: https://doi.org/10.5753/latinoware.2025.15967.