A Framework for Efficient Pre-Processing of HTTP Requests Using Machine Learning-Based Web Application Firewalls

Lucas G. Cilento; Paulo S. G. de Mattos Neto; Daniel C. Cunha

doi:10.5753/sbseg.2025.9793

Lucas G. Cilento UFPE
Paulo S. G. de Mattos Neto UFPE
Daniel C. Cunha UFPE

DOI: https://doi.org/10.5753/sbseg.2025.9793

Resumo

The selection of pre-processing methods for requests in web application firewalls (WAFs) that use machine learning is critical in determining performance, latency, and computational resource consumption for detecting web attacks. Among pre-processing techniques, the combination of N -gram and term frequency-inverse document frequency (TF-IDF) is one of the most popular alternatives for pre-processing. However, this approach often results in high dimensionality, which can negatively impact latency and resource usage. The key challenge lies in achieving a balance between effective attack detection and resource efficiency. This article proposes a framework for evaluating WAF architectures that preserves detection performance while reducing the number of variables by at least 80%.

Referências

(2025). February 2025 web server survey. Available at [link] [Accessed at: March 31, 2025].

Babiker, M., Karaarslan, E., and Hoşcan, Y. (2019). A hybrid feature-selection approach for finding the digital evidence of web application attacks. Turkish Journal of Electrical Engineering and Computer Sciences, 27:4102–4117.

Bocharov, A. (2025). Making WAF ML models go brrr: Saving decades of processing time. Available at [link] [Accessed at: March 31, 2025].

Dhote, S., Singh, S., Student, A., and Raigar, D. (2024). A comprehensive survey of ml-based wafs with signature and anomaly detection. Strad Research, 11:54–60.

Dong, C. and Li, D. (2024). AST-DF: A new webshell detection method based on abstract syntax tree and deep forest. Electronics, 13(8):1482.

Giménez, C. T., Villegas, A. P., and Marañón, G. A. (2010). HTTP data set CSIC 2010. Information Security Institute of CSIC (Spanish Research National Council), 64.

Guo, Z., Li, Q., Li, X., Xiao, M., Hu, R., Jiang, Y., Zhao, L., Du, H., and Chen, Q. (2023). SQL injection detection method based on N-gram and TFIDF. In 2023 International Seminar on Computer Science and Engineering Technology (SCSET), pages 204–207. IEEE.

Guyon, I. and Elisseeff, A. (2003). An introduction to variable and feature selection. J. Mach. Learn. Res., 3(null):1157–1182.

Hamon, J. (2013). Optimisation combinatoire pour la sélection de variables en régression en grande dimension : Application en génétique animale. Theses, Université des Sciences et Technologie de Lille - Lille I.

Hoang, X. D. and Nguyen, T. H. (2021). Detecting common web attacks based on supervised machine learning using web logs. Journal of Theoretical and Applied Information Technology, 99(6).

IŞiker, B. and SoĞukpinar, İ. (2021). Machine learning based web application firewall. In 2021 2nd International Informatics and Software Engineering Conference (IISEC), pages 1–6. IEEE.

Kruegel, C. and Vigna, G. (2003). Anomaly detection of web-based attacks. In Proc. of the 10th ACM Conference on Computer and Communications Security, pages 251–261.

Kruegel, C., Vigna, G., and Robertson, W. (2005). A multi-model approach to the detection of web-based attacks. Computer Networks, 48(5):717–738.

Krügel, C., Tøth, T., and Kirda, E. (2002). Service specific anomaly detection for network intrusion detection. In Proc. of the ACM Symposium on Applied Computing, pages 201–208.

Kumar, H. et al. (2023). Securing web application using web application firewall (WAF) and machine learning. In 2023 First International Conference on Advances in Electrical, Electronics and Computational Intelligence (ICAEECI), pages 1–8. IEEE.

Ramezany, S., Setthawong, R., and Tanprasert, T. (2022). A machine learning-based malicious payload detection and classification framework for new web attacks. In 2022 19th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON), pages 1–4.

Rozenfeld, B. (2024). Best WAF solutions in 2024-2025: Real-world comparison. Available at [link] [Accessed at: March 31, 2025].

Song, Y., Keromytis, A. D., and Stolfo, S. (2009). Spectrogram: A mixture-of-Markov-chains model for anomaly detection in web traffic. In Proc. of the 16th Annual Network and Distributed System Security (NDSS) Symposium, pages 1–15.

Sureda Riera, T., Bermejo Higuera, J.-R., Bermejo Higuera, J., Martínez Herraiz, J.-J., and Sicilia Montalvo, J.-A. (2020). Prevention and fighting against web attacks through anomaly detection technology: A systematic review. Sustainability, 12(12).

Torrano-Gimenez, C., Nguyen, H. T., Alvarez, G., and Franke, K. (2015). Combining expert knowledge with automatic feature extraction for reliable web attack detection. Security and Communication Networks, 8(16):2750–2767.

van der Maaten, L., Postma, E., and Herik, H. (2007). Dimensionality reduction: A comparative review. Journal of Machine Learning Research - JMLR, 10.

Zhang, S., Li, Y., and Jiang, Q. (2023). Feature ratio method: A payload feature extraction and detection approach for SQL injection attacks. In 2023 3rd Asia-Pacific Conference on Communications Technology and Computer Science (ACCTCS), pages 172–175. IEEE.

A Framework for Efficient Pre-Processing of HTTP Requests Using Machine Learning-Based Web Application Firewalls

Resumo

Referências

Artigos mais lidos do(s) mesmo(s) autor(es)