A Survey and a Preliminary Evaluation of Low-quality Content Detection Strategies: Which Attributes Are Still Relevant, Which Are Not?

Júlio Resende; Igor Moraes; Nícollas Silva; Vinícius Durelli; Diego Dias; Leonardo Rocha

doi:10.5753/kdmile.2019.8784

Júlio Resende Universidade Federal de São João del Rei (UFSJ)
Igor Moraes Universidade Federal de São João del Rei (UFSJ)
Nícollas Silva Universidade Federal de Minas Gerais (UFMG)
Vinícius Durelli Universidade Federal de São João del Rei (UFSJ)
Diego Dias Universidade Federal de São João del Rei (UFSJ)
Leonardo Rocha Universidade Federal de São João del Rei (UFSJ)

DOI: https://doi.org/10.5753/kdmile.2019.8784

Resumo

Online social networks have gone mainstream: millions of users have come to rely on the wide range of services provided by social networks. However, the ease use of social networks for communicating information also makes them particularly vulnerable to social spammers, i.e., ill-intentioned users whose main purpose is to degrade the information quality of social networks through the proliferation of different types of malicious data (e.g., social spam, malware downloads, and phishing) that are collectively called low-quality content or spams. Since Twitter is also rife with low-quality content, several researchers have devised various low-quality detection strategies that inspect tweets for the existence of spam contents. We carried out a literature survey of these low-quality detection strategies, examining which strategies are still applicable in the current scenario – taken into account that Twitter has undergone a lot of changes in the last few years. To gather some evidence of the usefulness of the attributes used by the low-quality detection strategies, we carried out a preliminary evaluation of these attributes.

Palavras-chave: Spam Detection, Data Mining, Machine Learning

Referências

Aggarwal, A., Rajadesingan, A., and Kumaraguru, P. PhishAri: Automatic realtime phishing detection on twitter. In 2012 eCrime Researchers Summit. IEEE, 2012.

Almaatouq, A., Alabdulkareem, A., Nouh, M., Shmueli, E., Alsaleh, M., Singh, V. K., Alarifi, A., Alfaris, A., and Pentland, A. S. Twitter. In Proceedings of the 2014 ACM conference on Web science - WebSci. ACM Press, 2014.

Benevenuto, F., Magno, G., Rodrigues, T., and Almeida, V. Detecting spammers on twitter. In Collaboration, electronic messaging, anti-abuse and spam conference (CEAS). Vol. 6. pp. 12, 2010.

Bosma, M., Meij, E., and Weerkamp, W. A framework for unsupervised spam detection in social networking sites. In European Conference on Information Retrieval. Springer, pp. 364–375, 2012.

Chen, C., Zhang, J., Chen, X., Xiang, Y., and Zhou, W. 6 million spam tweets: A large ground truth for timely twitter spam detection. In 2015 IEEE International Conference on Communications (ICC). IEEE, 2015.

Chen, W., Yeo, C. K., Lau, C. T., and Lee, B. S. A study on real-time low-quality content detection on twitter from the users’ perspective. PLOS ONE 12 (8): 1–22, 08, 2017.

Fakhraei, S., Foulds, J., Shashanka, M., and Getoor, L. Collective spammer detection in evolving multi-relational social networks. In Proceedings of the 21th SIGKDD. ACM Press, 2015.

Gao, H., Chen, Y., Lee, K., Palsetia, D., and Choudhary, A. Poster. In Proceedings of the 18th ACM conference on Computer and communications security. ACM Press, 2011.

Hu, X., Tang, J., Gao, H., and Liu, H. Social spammer detection with sentiment information. In 2014 IEEE International Conference on Data Mining. IEEE, 2014.

Jin, X., Lin, C. X., Luo, J., and Han, J. Socialspamguard: A data mining-based spam detection system for social media networks. In Proceedings of the international conference on very large data bases, 2011.

Lee, K., Eoff, B. D., and Caverlee, J. Seven months with the devils: A long-term study of content polluters on twitter. In Fifth International AAAI Conference on Weblogs and Social Media, 2011.

Liu, H. and Setiono, R. Chi2: feature selection and discretization of numeric attributes. In Proceedings of 7th IEEE International Conference on Tools with Artificial Intelligence. IEEE Comput. Soc. Press, 1995.

Martinez-Romo, J. and Araujo, L. Detecting malicious tweets in trending topics using a statistical analysis of language. Expert Systems with Applications 40 (8): 2992–3000, jun, 2013.

McCord, M. and Chuah, M. Spam detection on twitter using traditional classifiers. In Lecture Notes in Computer Science. Springer Berlin Heidelberg, pp. 175–186, 2011.

Miller, Z., Dickinson, B., Deitrick, W., Hu, W., andWang, A. H. Twitter spammer detection using data stream clustering. Information Sciences vol. 260, pp. 64–73, Mar., 2014.

Santos, I., Miñambres-Marcos, I., Laorden, C., Galán-García, P., Santamaría-Ibirika, A., and Bringas, P. G. Twitter content-based spam filtering. In Advances in Intelligent Systems and Computing. Springer International Publishing, pp. 449–458, 2014.

Song, J., Lee, S., and Kim, J. Spam filtering in twitter using sender-receiver relationship. In Lecture Notes in Computer Science. Springer Berlin Heidelberg, pp. 301–317, 2011.

Sridharan, V., Shankar, V., and Gupta, M. Twitter games. In Proceedings of the 28th ACSAC. ACM Press, 2012.

Stats, I. L. Internet Live Stats - 1 second. https://www.internetlivestats.com/one-second/, 2019. Accessed: 2019-07-03.

Tan, E., Guo, L., Chen, S., Zhang, X., and Zhao, Y. Spammer behavior analysis and detection in user generated content on social networks. In 2012 IEEE 32nd International Conference on Distributed Computing Systems. IEEE, 2012.

Thomas, K., Grier, C., Song, D., and Paxson, V. Suspended accounts in retrospect. In Proceedings of the 2011 ACM SIGCOMM conference on Internet measurement conference. ACM Press, 2011.

Ungerleider, N. Almost 10% of twitter is spam. https://www.fastcompany.com/3044485/almost-10-of-twitter-is-spam, 2015. Accessed: 2019-07-02.

Wang, A. H. Don’t follow me: Spam detection in twitter. In 2010 International Conference on Security and Cryptography (SECRYPT). pp. 1–10, 2010.

Wang, B., Zubiaga, A., Liakata, M., and Procter, R. Making the most of tweet-inherent features for social spam detection on twitter. arXiv preprint arXiv:1503.07405 , 2015.

Yang, C., Harkreader, R., and Gu, G. Empirical evaluation and new design for fighting evolving twitter spammers. IEEE Transactions on Information Forensics and Security 8 (8): 1280–1293, Aug., 2013.

Yang, C., Harkreader, R. C., and Gu, G. Die free or live hard? empirical evaluation and new design for fighting evolving twitter spammers. In International Workshop on Recent Advances in Intrusion Detection. Springer, pp. 318–337, 2011.

Zheng, X., Zhang, X., Yu, Y., Kechadi, T., and Rong, C. ELM-based spammer detection in social networks. The Journal of Supercomputing 72 (8): 2991–3005, May, 2015.

Łuksza, K. Bot traffic is bigger than human. make sure it doesn’t affect you!, 2018.