Uma Avaliação de Estratégias de Detecção de Conteúdo de Baixa Qualidade: Quais Atributos Ainda São Relevantes?

Júlio César  Resende; Igor  Moraes; Leonardo  Rocha

Júlio César Resende Universidade Federal de São João del Rei
Igor Moraes Universidade Federal de São João del Rei
Leonardo Rocha Universidade Federal de São João del Rei

Resumo

Milhões de usuários passaram a contar com a ampla gama de serviços fornecidos pelas Redes Sociais. Entretanto, a facilidade em utilizar essas redes para comunicação tornaram as mesmas vulneráveis a usuários mal intencionados (spammers), que têm objetivo de proliferar diferentes tipos de dados maliciosos ou difundir conteúdos de baixa qualidade (spams). Um dos principais exemplos dessas aplicações é o Twitter, para o qual diversas estratégias de detecção de spams vêm sendo propostas. No presente trabalho, realizamos uma pesquisa bibliográfica dessas estratégias. Por meio de uma avaliação experimental identificamos quais delas ainda são aplicáveis no cenário atual, considerando que o Twitter vem passando por mudanças constantemente.

Palavras-chave: Spam, Detecção de Conteúdo de Baixa Qualidade

Referências

Aggarwal, A., Rajadesingan, A., and Kumaraguru, P. (2012). PhishAri: Automatic realtime phishing detection on twitter. In 2012 eCrime Researchers Summit. IEEE.

Almaatouq, A., Alabdulkareem, A., Nouh, M., Shmueli, E., Alsaleh, M., Singh, V. K., Alarifi, A., Alfaris, A., and Pentland, A. S. (2014). Twitter: who gets caught? observed trends in social micro-blogging spam. In Proceedings of the 2014 ACM conference on Web science - WebSci. ACM Press.

Benevenuto, F., Magno, G., Rodrigues, T., and Almeida, V. (2010). Detecting spammers on twitter. In Collaboration, electronic messaging, anti-abuse and spam conference (CEAS), volume 6, page 12.

Bosma, M., Meij, E., and Weerkamp, W. (2012). A framework for unsupervised spam detection in social networking sites. In European Conference on Information Retrieval, pages 364–375. Springer.

Chen, C., Zhang, J., Chen, X., Xiang, Y., and Zhou, W. (2015). 6 million spam tweets: A large ground truth for timely twitter spam detection. In 2015 IEEE International Conference on Communications (ICC). IEEE.

Chen, W., Yeo, C. K., Lau, C. T., and Lee, B. S. (2017). A study on real-time low-quality content detection on twitter from the users’ perspective. PLOS ONE, 12(8):1–22.

Fakhraei, S., Foulds, J., Shashanka, M., and Getoor, L. (2015). Collective spammer detection in evolving multi-relational social networks. In Proceedings of the 21th SIGKDD. ACM Press.

Gao, H., Chen, Y., Lee, K., Palsetia, D., and Choudhary, A. (2011). Poster. In Proceedings of the 18th ACM conference on Computer and communications security. ACM Press.

Hu, X., Tang, J., Gao, H., and Liu, H. (2014). Social spammer detection with sentiment information. In 2014 IEEE International Conference on Data Mining. IEEE.

Jin, X., Lin, C. X., Luo, J., and Han, J. (2011). Socialspamguard: A data mining-based spam detection system for social media networks. In Proceedings of the international conference on very large data bases.

Lee, K., Eoff, B. D., and Caverlee, J. (2011). Seven months with the devils: A longterm study of content polluters on twitter. In Fifth International AAAI Conference on Weblogs and Social Media.

Liu, H. and Setiono, R. (1995). Chi2: feature selection and discretization of numeric attributes. In Proceedings of 7th IEEE International Conference on Tools with Artificial Intelligence. IEEE Comput. Soc. Press.

Martinez-Romo, J. and Araujo, L. (2013). Detecting malicious tweets in trending topics using a statistical analysis of language. Expert Systems with Applications, 40(8):2992– 3000.

McCord, M. and Chuah, M. (2011). Spam detection on twitter using traditional classifiers. In Lecture Notes in Computer Science, pages 175–186. Springer Berlin Heidelberg.

Miller, Z., Dickinson, B., Deitrick,W., Hu,W., andWang, A. H. (2014). Twitter spammer detection using data stream clustering. Information Sciences, 260:64–73.

Santos, I., Minambres-Marcos, I., Laorden, C., Galan-Garcıa, P., Santamarıa-Ibirika, A., and Bringas, P. G. (2014). Twitter content-based spam filtering. In Advances in Intelligent Systems and Computing, pages 449–458. Springer International Publishing.

Song, J., Lee, S., and Kim, J. (2011). Spam filtering in twitter using sender-receiver relationship. In Lecture Notes in Computer Science, pages 301–317. Springer Berlin Heidelberg.

Sridharan, V., Shankar, V., and Gupta, M. (2012). Twitter games. In Proceedings of the 28th ACSAC. ACM Press.

Stats, I. L. (2019). Internet Live Stats - 1 second. https://www.internetlivestats.com/one-second/. Accessed: 2019-07-03.

Tan, E., Guo, L., Chen, S., Zhang, X., and Zhao, Y. (2012). Spammer behavior analysis and detection in user generated content on social networks. In 2012 IEEE 32nd International Conference on Distributed Computing Systems. IEEE.

Thomas, K., Grier, C., Song, D., and Paxson, V. (2011). Suspended accounts in retrospect. In Proceedings of the 2011 ACM SIGCOMM conference on Internet measurement conference. ACM Press.

Ungerleider, N. (2015). Almost 10% of twitter is spam. https://www.fastcompany.com/3044485/almost-10-of-twitter-is-spam. Accessed: 2019-07-02.

Wang, A. H. (2010). Don’t follow me: Spam detection in twitter. In 2010 International Conference on Security and Cryptography (SECRYPT), pages 1–10.

Wang, B., Zubiaga, A., Liakata, M., and Procter, R. (2015). Making the most of tweet-inherent features for social spam detection on twitter. arXiv preprint ar-Xiv:1503.07405.

Yang, C., Harkreader, R., and Gu, G. (2013). Empirical evaluation and new design for fighting evolving twitter spammers. IEEE Transactions on Information Forensics and Security, 8(8):1280–1293.

Yang, C., Harkreader, R. C., and Gu, G. (2011). Die free or live hard? empirical evaluation and new design for fighting evolving twitter spammers. In International Workshop on Recent Advances in Intrusion Detection, pages 318–337. Springer.

Zheng, X., Zhang, X., Yu, Y., Kechadi, T., and Rong, C. (2015). ELM-based spammer detection in social networks. The Journal of Supercomputing, 72(8):2991–3005.

Łuksza, K. (2018). Bot traffic is bigger than human. make sure it doesn’t affect you!