Different pre-processing techniques applied to Machine Learning algorithms for SQL Injection detection

Abstract


This work aimed to use machine learning algorithms to detect SQL Injection from five different input data treatment methods. Using the Naive Bayes, SVM, Gradient Boosting Tree (GBT), and Random Forest (RF) algorithms, the results point to the superiority of the GBT and RF ensemble algorithms. GBT obtained the best result with the set of metrics G-Test and Entropy, calculated on tokenization and transformation with regular expression, presenting an accuracy of 98.46%. The Naive Bayes algorithm presented the worst performance in all evaluated sets.
Keywords: SQL Injection, Database, Security, Data Mining

References

Aggarwal, C. C. and Zhai, C. (2012).Mining text data. Springer Science & Business Media.

Cavnar, W. B. and Trenkle, J. M. (1994). N-gram-based text categorization. In Annual symposium on document analysis and information retrieval.

Chen, Z., Guo, M., and Zhou, L. (2018). Research on SQL injection detection technology based on SVM. In MATEC Web of Conferences, volume 173,pages 1-5.

Choi, J., Choi, C., Kim, H., and Kim, P. (2011). Efficient malicious codedetection using N-gram analysis and SVM. InInternational Conference on Network-Based Information Systems, NBiS 2011, pages 618-621. IEEE.

Fang, Y., Peng, J., Liu, L., and Huang, C. (2018). WOVSQLI: Detectionof SQL Injection Behaviors Using Word Vector and LSTM. In International Conference on Cryptography, Security and Privacy, pages 170-174, New York, NY, USA.ACM.

Hanmanthu, B., Ram, B. R., and Niranjan, P. (2015). SQL injection attack prevention based on decision tree classification. InInternational Conferenceon Intelligent Systems and Control, ISCO 2015, page 5. IEEE.

Joshi, A. and Geetha, V. (2014). SQL Injection detection using machine learning. In International Conference on Control, Instrumentation, Communication and Computational Technologies, ICCICCT 2014, number 2, pages 1111-1115.IEEE.

Kim, M. Y. and Lee, D. H. (2014). Data-mining based SQL injec-tion attack detection using internal query trees.Expert Systems with Applications,41(11):5416-5430.

Lodeiro-Santiago, M., Caballero-Gil, C., and Caballero-Gil,P. (2017). Collaborative SQL-injections detection system with machine learning. ACM International Conference Proceeding Series.

McWhirter, P. R., Kifayat, K., Shi, Q., and Askwith, B. (2018). SQL Injection Attack classification through the feature extraction of SQL query stringsusing a Gap-Weighted String Subsequence Kernel.Journal of Information Security andApplications, 40:199-216.

Mishra, S. (2019). SQL Injection detection using machine learning.Dissertação (mestrado), San José State University.

Nayak, R. and Qiu, T. (2005). A data mining application: Analysis of problems occurring during a software project development process. International Journal of Software Engineering and Knowledge Engineering, 15(4):647-663.

OWASP, (2020). OWASP Top 10 Web Application Security Risks. Disponível em : https://owasp.org/www-project-top-ten/. Acesso em Maio/2021.

Rankothge, W. H., Randeniya, M., and Samaranayaka, V. (2020). Identification and Mitigation Tool for Sql Injection Attacks (SQLIA). In International Conference on Industrial and Information Systems (ICIIS), pages 591-595. IEEE.

Tang, P., Qiu, W., Huang, Z., Lian, H., and Liu, G. (2020). Detection of SQL injection based on artificial neural network. Knowledge-Based Systems, 190(105528).
Published
2021-07-18
SOUZA, Vanessa C. O.; SILVA, Erick T. A.; D., Rafael M.; PAULA, Melise M. V.. Different pre-processing techniques applied to Machine Learning algorithms for SQL Injection detection. In: INTEGRATED SOFTWARE AND HARDWARE SEMINAR (SEMISH), 48. , 2021, Evento Online. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2021 . p. 257-264. ISSN 2595-6205. DOI: https://doi.org/10.5753/semish.2021.15830.