Predictive Models for Data Breach Detection: A Comparative Approach Between Classical and Deep Learning Techniques
Abstract
Given the increase in data breaches and the high costs involved, this study performs a comparative analysis between prediction algorithms applied to information security in different organizational sectors. The LSTM, TCN, Prophet, SARIMA and XGBoost models were evaluated, based on incident data made available by the Privacy Rights Clearinghouse. The comparison considered the MAE, RMSE and MAPE metrics. The best MAPE results were achieved by TCN in the General Total (10.21%), Healthcare (23.52%) and Other Businesses (19.39%), and by LSTM in the Unknown sectors (11.95%), Financial Services (21.14%) and also in the General Total (12.13%). The results show good performance of models based on neural networks.References
Africk, E. and Levy, Y. (2021). An examination of historic data breach incidents: What cybersecurity big data visualization and analytics can tell us? Online Journal of Applied Knowledge Management (OJAKM), 9(1):31–45.
Ahmed, S., Nielsen, I. E., Tripathi, A., Siddiqui, S., Ramachandran, R. P., and Rasool, G. (2023). Transformers in time-series analysis: A tutorial. Circuits, Systems, and Signal Processing, 42(12):7433–7466.
Alahmari, A. and Duncan, B. (2020). Cybersecurity risk management in small and medium-sized enterprises: A systematic review of recent evidence. In 2020 International Conference on Cyber Situational Awareness, Data Analytics and Assessment (CyberSA), pages 1–5. IEEE.
Almulihi, A. H., Alassery, F., Khan, A. I., Shukla, S., Gupta, B. K., and Kumar, R. (2022). Analyzing the implications of healthcare data breaches through computational technique. Intelligent Automation & Soft Computing, 32(3).
Avanzi, B., Eling, M., Hurley, M., and Schanz, K.-U. (2025). On the evolution of data breach reporting patterns and frequency in the united states: A cross-state analysis. North American Actuarial Journal, pages 1–32.
Barati, M. and Yankson, B. (2022). Predicting the occurrence of a data breach. International Journal of Information Management Data Insights, 2(2):100128.
Carfora, M. F. and Orlando, A. (2022). Algumas observações sobre estimativas de distribuição de violações de dados maliciosas e negligentes. Computação, 10(208).
Duggineni, S. (2023). Impact of controls on data integrity and information systems. Science and Technology, 13(2):29–35.
Foerderer, J. and Schuetz, S. W. (2022). Data breach announcements and stock market reactions: a matter of timing? Management Science, 68(10):7298–7322.
Gong, X., Chen, Y., Wang, Q., Wang, M., and Li, S. (2022). Private data inference attacks against cloud: Model, technologies, and research directions. IEEE Communications Magazine, 60(9):46–52.
IBM (2024). Cost of a data breach report 2024. Accessed: 2025-04-21.
Janiesch, C., Zschech, P., and Heinrich, K. (2021). Machine learning and deep learning. Electronic Markets, 31(3):685–695.
Kumar, S., Kaur, A., and Kumar, R. (2023). A hybrid oversampling approach to deal with data imbalance and outliers for credit card fraud detection. Applications in Computing and Mathematics for Engineering, 2(1):100004.
Lewis, C. D. (1982). Métodos de Previsão Industrial e Empresarial: Um Guia Prático para Suavização Exponencial e Ajuste de Curvas. Butterworth Scientific, Oxford, Reino Unido. [Google Scholar].
Mangku, D. G. S., Yuliartini, N. P. R., Suastika, I. G. N., and Wirawan, I. G. M. A. S. (2021). The personal data protection of internet users in indonesia. Journal of Southwest Jiaotong University, 56(1):202–209.
Partners, M. (2022). Detecting trends and mean reversion with the hurst exponent. Acesso em: 7 abr. 2025.
Perera, S., Jin, X., Maurushat, A., and Opoku, D. G. J. (2022). Factors affecting reputational damage to organisations due to cyberattacks. Informatics, 9(1):28.
Pimenta Rodrigues, G. A., Marques Serrano, A. L., Lopes Espiñeira Lemos, A. N., Canedo, E. D., Mendonça, F. L. L. D., de Oliveira Albuquerque, R., and García Villalba, L. J. (2024). Understanding data breach from a global perspective: Incident visualization and data protection law review. Data, 9(2):27.
Privacy Rights Clearinghouse (2025). Privacy rights clearinghouse: Chronology of data breaches. Accessed: 2025-04-21.
Silveira, M., Portela, A., Souza, M., Silva, D., Mesquita, M., Silva, D., Menezes, R., and Gomes, R. (2023). Aplicação de técnicas de encriptação e anonimização em nuvem para proteção de dados. In Anais do XXIII Simpósio Brasileiro de Segurança da Informação e de Sistemas Computacionais (SBSEG 2023), pages 111–124, Porto Alegre. SBC.
Sun, H., Xu, M., and Zhao, P. (2020). Modeling malicious hacking data breach risks. North American Actuarial Journal, 25(4):484–502.
Varshney, S., Munjal, D., Bhattacharya, O., Saboo, S., and Aggarwal, N. (2020). Big data privacy breach prevention strategies. In 2020 IEEE International Symposium on Sustainable Energy, Signal Processing and Cyber Security (iSSSC), pages 1–6. IEEE.
Yu, J., Moon, H., Chua, B.-L., and Han, H. (2022). Hotel data privacy: strategies to reduce customers’ emotional violations, privacy concerns, and switching intention. Journal of Travel & Tourism Marketing, 39(2):213–225.
Ahmed, S., Nielsen, I. E., Tripathi, A., Siddiqui, S., Ramachandran, R. P., and Rasool, G. (2023). Transformers in time-series analysis: A tutorial. Circuits, Systems, and Signal Processing, 42(12):7433–7466.
Alahmari, A. and Duncan, B. (2020). Cybersecurity risk management in small and medium-sized enterprises: A systematic review of recent evidence. In 2020 International Conference on Cyber Situational Awareness, Data Analytics and Assessment (CyberSA), pages 1–5. IEEE.
Almulihi, A. H., Alassery, F., Khan, A. I., Shukla, S., Gupta, B. K., and Kumar, R. (2022). Analyzing the implications of healthcare data breaches through computational technique. Intelligent Automation & Soft Computing, 32(3).
Avanzi, B., Eling, M., Hurley, M., and Schanz, K.-U. (2025). On the evolution of data breach reporting patterns and frequency in the united states: A cross-state analysis. North American Actuarial Journal, pages 1–32.
Barati, M. and Yankson, B. (2022). Predicting the occurrence of a data breach. International Journal of Information Management Data Insights, 2(2):100128.
Carfora, M. F. and Orlando, A. (2022). Algumas observações sobre estimativas de distribuição de violações de dados maliciosas e negligentes. Computação, 10(208).
Duggineni, S. (2023). Impact of controls on data integrity and information systems. Science and Technology, 13(2):29–35.
Foerderer, J. and Schuetz, S. W. (2022). Data breach announcements and stock market reactions: a matter of timing? Management Science, 68(10):7298–7322.
Gong, X., Chen, Y., Wang, Q., Wang, M., and Li, S. (2022). Private data inference attacks against cloud: Model, technologies, and research directions. IEEE Communications Magazine, 60(9):46–52.
IBM (2024). Cost of a data breach report 2024. Accessed: 2025-04-21.
Janiesch, C., Zschech, P., and Heinrich, K. (2021). Machine learning and deep learning. Electronic Markets, 31(3):685–695.
Kumar, S., Kaur, A., and Kumar, R. (2023). A hybrid oversampling approach to deal with data imbalance and outliers for credit card fraud detection. Applications in Computing and Mathematics for Engineering, 2(1):100004.
Lewis, C. D. (1982). Métodos de Previsão Industrial e Empresarial: Um Guia Prático para Suavização Exponencial e Ajuste de Curvas. Butterworth Scientific, Oxford, Reino Unido. [Google Scholar].
Mangku, D. G. S., Yuliartini, N. P. R., Suastika, I. G. N., and Wirawan, I. G. M. A. S. (2021). The personal data protection of internet users in indonesia. Journal of Southwest Jiaotong University, 56(1):202–209.
Partners, M. (2022). Detecting trends and mean reversion with the hurst exponent. Acesso em: 7 abr. 2025.
Perera, S., Jin, X., Maurushat, A., and Opoku, D. G. J. (2022). Factors affecting reputational damage to organisations due to cyberattacks. Informatics, 9(1):28.
Pimenta Rodrigues, G. A., Marques Serrano, A. L., Lopes Espiñeira Lemos, A. N., Canedo, E. D., Mendonça, F. L. L. D., de Oliveira Albuquerque, R., and García Villalba, L. J. (2024). Understanding data breach from a global perspective: Incident visualization and data protection law review. Data, 9(2):27.
Privacy Rights Clearinghouse (2025). Privacy rights clearinghouse: Chronology of data breaches. Accessed: 2025-04-21.
Silveira, M., Portela, A., Souza, M., Silva, D., Mesquita, M., Silva, D., Menezes, R., and Gomes, R. (2023). Aplicação de técnicas de encriptação e anonimização em nuvem para proteção de dados. In Anais do XXIII Simpósio Brasileiro de Segurança da Informação e de Sistemas Computacionais (SBSEG 2023), pages 111–124, Porto Alegre. SBC.
Sun, H., Xu, M., and Zhao, P. (2020). Modeling malicious hacking data breach risks. North American Actuarial Journal, 25(4):484–502.
Varshney, S., Munjal, D., Bhattacharya, O., Saboo, S., and Aggarwal, N. (2020). Big data privacy breach prevention strategies. In 2020 IEEE International Symposium on Sustainable Energy, Signal Processing and Cyber Security (iSSSC), pages 1–6. IEEE.
Yu, J., Moon, H., Chua, B.-L., and Han, H. (2022). Hotel data privacy: strategies to reduce customers’ emotional violations, privacy concerns, and switching intention. Journal of Travel & Tourism Marketing, 39(2):213–225.
Published
2025-09-01
How to Cite
SANTOS, Evanei Gomes Dos; RODRIGUES, Gabriel Arquelau Pimenta; SERRANO, André Luiz Marques; ROCHA FILHO, Geraldo Pereira; OLIVEIRA, Felipe Barreto De; GONCALVES, Vinicius Pereira.
Predictive Models for Data Breach Detection: A Comparative Approach Between Classical and Deep Learning Techniques. In: BRAZILIAN SYMPOSIUM ON CYBERSECURITY (SBSEG), 25. , 2025, Foz do Iguaçu/PR.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2025
.
p. 626-642.
DOI: https://doi.org/10.5753/sbseg.2025.10490.
