Using Machine Learning to Automatically Detect Malicious URLs in Brazil
Abstract
Phishing is an attack that uses social engineering and other techniques to steal victims' personal or financial information. Brazil leads the phishing attacked user statistics and over 77% of these attacks are carried out via URLs. Despite the existence of bases and techniques for detecting malicious URLs, they are not effective when it comes to URLs targeted at Brazilian users, which have different characteristics. This paper presents an effective method for detecting Brazilian malicious URLs based on machine learning. More than 110 characteristics (lexicons, network, reputation and others) and different classifiers were used to evaluate the effectiveness of the proposed method. The evaluation was performed with real data extracted from the fraud catalog of the Brazilian academic network and other sources. Results show high accuracy and accuracy rates above 96%.
References
Basnet, R. B., Sung, A. H., and Liu, Q. (2014). Learning to detect phishing URLs. In International Journal of Research in Engineering and Technology, IJRET, volume 3, pages 11–24.
Bezzera, M. and Feitosa, E. (2015). Investigando o uso de Caracter´ısticas na Detecção de URLs Maliciosas. In XV Simpósio em Segurança da Informação e de Sistemas Computacionais, SBSeg 2015, pages 100–113, Florianópolis, SC.
Brito, I., Borges, J. L., Ayres, L., Tavares, P., Bastos, R., Lima, E., and Solha, L. V. (2015). Catálogo de fraudes da rnp: 7 anos de experiência no tratamento de fraudes eletrônicas brasileiras. In 2015 Conferência Integrada ICCyber ICMedia, pages 1–5, Brasília, DF.
Brito, I., Borges, J. L., Tavares, P., Bastos, R., Lima, E., and Solha, L. V. (2016). Cat´alogo de fraudes e cat´alogo de urls maliciosas: Identificac¸ ão e combate a fraudes eletrˆonicas na rede acadˆemica brasileira. In Sexta Conferencia de Directores de Tecnologia de Informaci´on, TICAL 2016, pages 1–16, Buenos Aires, AR.
Canali, D., Cova, M., Vigna, G., and Kruegel, C. (2011). Prophiler: a fast filter for The large-scale detection of malicious web pages. In 20th international conference on World wide web, page 197–206, India.
Eshete, B., Villafiorita, A., and Weldemariam, K. (2012). Binspect: Holistic analysis and detection of malicious web pages. In International Conference on Security and Privacy in Communication Systems, pages 149–166, Springer, Berlin, Heidelberg.
Garera, S., Provos, N., Chew, M., and Rubin, A. D. (2007). A framework for detection and measurement of phishing attacks. In Proceedings of the 2007 ACM workshop on Recurring malcode, pages 1–8.
Garnaeva, M., Sinitsyn, F., Namestnikov, Y., Makrushin, D., and Liskin, A. (2016). Kaspersky Security Bulletin: OVERALL STATISTICS FOR 2016. URL: https://goo.gl/sJvhGG (´ultimo acesso 23/12/2018).
Gudkova, D., Vergelis, M., Shcherbakova, T., and Demidova, N. (2017). Kaspersky Lab: Spam and phishing report in 2017. URL: https://securelist.com/spam-and-phishing-in- 2017/83833/ (´ultimo acesso 24/10/2018).
Ludl, C., Mcallister, S., Kirda, E., and Kruegel, C. (2007). On the effectiveness os techniques to detect phishing sites. In International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment, pages 20–39, Springer, Berlin, Heidelberg.
Ma, J., Saul, L. K., Savage, S., and Voelker, G. M. (2009). Beyond blacklists: learning to detect malicious web sites from suspicious urls. In 15th ACM SIGKDD international conference on Knowledge discovery and data mining, page 1245–1254.
Olivo, C. K., Santin, A. O., and Oliveira, L. (2010). Avaliac¸ ão de Caracter´ısticas para Detecc¸ ão de Phishing de E-mail. In Pontif´ıcia Universidade Cat´olica do Paran´a, pages 1–2, Curitiba, PR.
Patil, D. R. and Patil, J. (2015). Survey on malicious web pages detection techniques. In International Journal of u-and e-Service, Science and Technology, pages 195–206.
Vazhayil, A., Vinayakumar, R., and Soman, K. (2018). Comparative study of the detection of malicious urls using shallow and deep networks. In 2018 9th International Conference on Computing, Communication and Networking Technologies (ICCCNT), pages 1–6. IEEE.
Xiang, G., Pendleton, B. A., Hong, J. I., and Rose, C. P. (2010). A hierarchical adaptive probabilistic approach for zero hour phish detection. In 15th European Symposium on Research in Computer Security, page 268–285.
Yang, P., Zhao, G., and Zeng, P. (2019). Phishing website detection based on multidimensional features driven by deep learning. In IEEE, pages 1–14.
