Evolução de ameaças em fóruns da Dark Web e Surface Web: um estudo baseado em modelagem de tópicos e séries temporais
Resumo
Este trabalho investiga a evolução temporal das discussões sobre ameaças cibernéticas em fóruns da Dark Web e da Surface Web entre 2015 e 2024, com o objetivo de identificar tendências, padrões sazonais e diferenças entre esses ambientes. Ao analisar mais de 52.000 postagens utilizando processamento de linguagem natural e modelagem de tópicos com Latent Dirichlet Allocation (LDA), o estudo revela tendências-chave, padrões sazonais, diferenças entre os ambientes e dinâmicas dentro das comunidades online. A análise revelou que fóruns da Surface Web apresentaram alta variabilidade de tópicos. Por outro lado, a Dark Web em português demonstrou predominância na comercialização de dados pessoais, enquanto a Dark Web em inglês manteve tópicos técnicos ofensivos, como phishing e criação de malware, com frequência contínua.Referências
Avanzi, B., Tan, X., Taylor, G., and Wong, B. (2023). On the evolution of data breach reporting patterns and frequency in the united states: a cross-state analysis. arXiv preprint arXiv:2310.04786.
Blei, D. M., Ng, A. Y., and Jordan, M. I. (2003). Latent dirichlet allocation. Journal of machine Learning research, 3(Jan):993–1022.
Cascavilla, G. (2025). The rise of cybercrime and cyber-threat intelligence: Perspectives and challenges from law enforcement. IEEE Security & Privacy, 23(1):17–26.
Cimpanu, C. (2020). University of utah pays $457,000 to ransomware gang. Acessado: 12-04-2023.
Crawly (2021). O que é crawler e como funcionam os robôs para coleta de dados. Acessado: 25-10-2024.
de Jesus Filho, S. A. (2024). Identificação de posts maliciosos na dark web utilizando aprendizado de máquina supervisionado. Dissertação de mestrado, Universidade Federal de Uberlândia, Uberlândia, Brasil. Orientador: Rodrigo Sanches Miani.
Fu, T., Abbasi, A., and Chen, H.-c. (2010). A focused crawler for dark web forums. JASIST, 61:1213–1231.
Hickman, L., Thapa, S., Tay, L., Cao, M., and Srinivasan, P. (2022). Text preprocessing for text mining in organizational research: Review and recommendations. Organizational Research Methods, 25(1):114–146.
Kavallieros, D., Myttas, D., Kermitsis, E., Lissaris, E., Giataganas, G., and Darra, E. (2021). Understanding the Dark Web, pages 3–26. Springer International Publishing, Cham.
Koloveas, P., Chantzios, T., Alevizopoulou, S., Skiadopoulos, S., and Tryfonopoulos, C. (2021). intime: A machine learning-based framework for gathering and leveraging web data to cyber-threat intelligence. Electronics, 10(7).
Kühn, P., Wittorf, K., and Reuter, C. (2024). Navigating the shadows: Manual and semi-automated evaluation of the dark web for cyber threat intelligence. IEEE Access, 12:118903–118922.
Labs, F. (2024). Pesquisa de ameaças da fortinet descobre que os cibercriminosos estão explorando novas vulnerabilidades do setor 43% mais rápido do que no 1º semestre de 2023. Acessado: 13-04-2025.
Liakos, P., Ntoulas, A., Labrinidis, A., and Delis, A. (2015). Focused crawling for the hidden web. World Wide Web, 19.
Najork, M. (2009). Web Crawler Architecture, pages 3462–3465. Springer US, Boston, MA.
Nunes, E., Diab, A., Gunn, A., Marin, E., Mishra, V., Paliath, V., Robertson, J., Shakarian, J., Thart, A., and Shakarian, P. (2016). Darknet and deepnet mining for proactive cybersecurity threat intelligence. In 2016 IEEE Conference on Intelligence and Security Informatics (ISI), pages 7–12.
Rahman, M. R., Hezaveh, R. M., and Williams, L. (2023). What are the attackers doing now? automating cyberthreat intelligence extraction from text on pace with the changing threat landscape: A survey. ACM Comput. Surv., 55(12).
Sapienza, A., Bessi, A., Damodaran, S., Shakarian, P., Lerman, K., and Ferrara, E. (2017). Early warnings of cyber threats in online discussions. In 2017 IEEE International Conference on Data Mining Workshops (ICDMW), pages 667–674.
Sarkar, S., Almukaynizi, M., Shakarian, J., and Shakarian, P. (2018). Predicting enterprise cyber incidents using social network analysis on the darkweb hacker forums.
Sun, N., Ding, M., Jiang, J., Xu, W., Mo, X., Tai, Y., and Zhang, J. (2023). Cyber threat intelligence mining for proactive cybersecurity defense: A survey and new perspectives. IEEE Communications Surveys & Tutorials, 25(3):1748–1774.
Syed, S. and Spruit, M. (2017). Full-text or abstract? examining topic coherence scores using latent dirichlet allocation. In 2017 IEEE International Conference on Data Science and Advanced Analytics (DSAA), pages 165–174.
Tong, Z. and Zhang, H. (2016). A text mining research based on lda topic modelling. In International conference on computer science, engineering and information technology, pages 201–210.
Wagner, T. D., Mahbub, K., Palomar, E., and Abdallah, A. E. (2019). Cyber threat intelligence sharing: Survey and research directions. Computers & Security, 87:101589.
Řehůřek, R. (2024). What is gensim. Acessado: 27-04-2025.
Blei, D. M., Ng, A. Y., and Jordan, M. I. (2003). Latent dirichlet allocation. Journal of machine Learning research, 3(Jan):993–1022.
Cascavilla, G. (2025). The rise of cybercrime and cyber-threat intelligence: Perspectives and challenges from law enforcement. IEEE Security & Privacy, 23(1):17–26.
Cimpanu, C. (2020). University of utah pays $457,000 to ransomware gang. Acessado: 12-04-2023.
Crawly (2021). O que é crawler e como funcionam os robôs para coleta de dados. Acessado: 25-10-2024.
de Jesus Filho, S. A. (2024). Identificação de posts maliciosos na dark web utilizando aprendizado de máquina supervisionado. Dissertação de mestrado, Universidade Federal de Uberlândia, Uberlândia, Brasil. Orientador: Rodrigo Sanches Miani.
Fu, T., Abbasi, A., and Chen, H.-c. (2010). A focused crawler for dark web forums. JASIST, 61:1213–1231.
Hickman, L., Thapa, S., Tay, L., Cao, M., and Srinivasan, P. (2022). Text preprocessing for text mining in organizational research: Review and recommendations. Organizational Research Methods, 25(1):114–146.
Kavallieros, D., Myttas, D., Kermitsis, E., Lissaris, E., Giataganas, G., and Darra, E. (2021). Understanding the Dark Web, pages 3–26. Springer International Publishing, Cham.
Koloveas, P., Chantzios, T., Alevizopoulou, S., Skiadopoulos, S., and Tryfonopoulos, C. (2021). intime: A machine learning-based framework for gathering and leveraging web data to cyber-threat intelligence. Electronics, 10(7).
Kühn, P., Wittorf, K., and Reuter, C. (2024). Navigating the shadows: Manual and semi-automated evaluation of the dark web for cyber threat intelligence. IEEE Access, 12:118903–118922.
Labs, F. (2024). Pesquisa de ameaças da fortinet descobre que os cibercriminosos estão explorando novas vulnerabilidades do setor 43% mais rápido do que no 1º semestre de 2023. Acessado: 13-04-2025.
Liakos, P., Ntoulas, A., Labrinidis, A., and Delis, A. (2015). Focused crawling for the hidden web. World Wide Web, 19.
Najork, M. (2009). Web Crawler Architecture, pages 3462–3465. Springer US, Boston, MA.
Nunes, E., Diab, A., Gunn, A., Marin, E., Mishra, V., Paliath, V., Robertson, J., Shakarian, J., Thart, A., and Shakarian, P. (2016). Darknet and deepnet mining for proactive cybersecurity threat intelligence. In 2016 IEEE Conference on Intelligence and Security Informatics (ISI), pages 7–12.
Rahman, M. R., Hezaveh, R. M., and Williams, L. (2023). What are the attackers doing now? automating cyberthreat intelligence extraction from text on pace with the changing threat landscape: A survey. ACM Comput. Surv., 55(12).
Sapienza, A., Bessi, A., Damodaran, S., Shakarian, P., Lerman, K., and Ferrara, E. (2017). Early warnings of cyber threats in online discussions. In 2017 IEEE International Conference on Data Mining Workshops (ICDMW), pages 667–674.
Sarkar, S., Almukaynizi, M., Shakarian, J., and Shakarian, P. (2018). Predicting enterprise cyber incidents using social network analysis on the darkweb hacker forums.
Sun, N., Ding, M., Jiang, J., Xu, W., Mo, X., Tai, Y., and Zhang, J. (2023). Cyber threat intelligence mining for proactive cybersecurity defense: A survey and new perspectives. IEEE Communications Surveys & Tutorials, 25(3):1748–1774.
Syed, S. and Spruit, M. (2017). Full-text or abstract? examining topic coherence scores using latent dirichlet allocation. In 2017 IEEE International Conference on Data Science and Advanced Analytics (DSAA), pages 165–174.
Tong, Z. and Zhang, H. (2016). A text mining research based on lda topic modelling. In International conference on computer science, engineering and information technology, pages 201–210.
Wagner, T. D., Mahbub, K., Palomar, E., and Abdallah, A. E. (2019). Cyber threat intelligence sharing: Survey and research directions. Computers & Security, 87:101589.
Řehůřek, R. (2024). What is gensim. Acessado: 27-04-2025.
Publicado
01/09/2025
Como Citar
PEREIRA, Miguel Henrique de Brito; JESUS FILHO, Sebastião Alves de; GABRIEL, Paulo Henrique Ribeiro; MIANI, Rodrigo Sanches.
Evolução de ameaças em fóruns da Dark Web e Surface Web: um estudo baseado em modelagem de tópicos e séries temporais. In: SIMPÓSIO BRASILEIRO DE CIBERSEGURANÇA (SBSEG), 25. , 2025, Foz do Iguaçu/PR.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2025
.
p. 401-416.
DOI: https://doi.org/10.5753/sbseg.2025.11375.
