A Survey and a Preliminary Evaluation of Low-quality Content Detection Strategies: Which Attributes Are Still Relevant, Which Are Not?

  • Júlio Resende Universidade Federal de São João del Rei (UFSJ)
  • Igor Moraes Universidade Federal de São João del Rei (UFSJ)
  • Nícollas Silva Universidade Federal de Minas Gerais (UFMG)
  • Vinícius Durelli Universidade Federal de São João del Rei (UFSJ)
  • Diego Dias Universidade Federal de São João del Rei (UFSJ)
  • Leonardo Rocha Universidade Federal de São João del Rei (UFSJ)

Resumo


Online social networks have gone mainstream: millions of users have come to rely on the wide range of services provided by social networks. However, the ease use of social networks for communicating information also makes them particularly vulnerable to social spammers, i.e., ill-intentioned users whose main purpose is to degrade the information quality of social networks through the proliferation of different types of malicious data (e.g., social spam, malware downloads, and phishing) that are collectively called low-quality content or spams. Since Twitter is also rife with low-quality content, several researchers have devised various low-quality detection strategies that inspect tweets for the existence of spam contents. We carried out a literature survey of these low-quality detection strategies, examining which strategies are still applicable in the current scenario – taken into account that Twitter has undergone a lot of changes in the last few years. To gather some evidence of the usefulness of the attributes used by the low-quality detection strategies, we carried out a preliminary evaluation of these attributes.

Palavras-chave: Spam Detection, Data Mining, Machine Learning

Referências

Aggarwal, A., Rajadesingan, A., and Kumaraguru, P. PhishAri: Automatic realtime phishing detection on
twitter. In 2012 eCrime Researchers Summit. IEEE, 2012.

Almaatouq, A., Alabdulkareem, A., Nouh, M., Shmueli, E., Alsaleh, M., Singh, V. K., Alarifi, A., Alfaris,
A., and Pentland, A. S. Twitter. In Proceedings of the 2014 ACM conference on Web science - WebSci. ACM
Press, 2014.

Benevenuto, F., Magno, G., Rodrigues, T., and Almeida, V. Detecting spammers on twitter. In Collaboration,
electronic messaging, anti-abuse and spam conference (CEAS). Vol. 6. pp. 12, 2010.

Bosma, M., Meij, E., and Weerkamp, W. A framework for unsupervised spam detection in social networking sites.
In European Conference on Information Retrieval. Springer, pp. 364–375, 2012.

Chen, C., Zhang, J., Chen, X., Xiang, Y., and Zhou, W. 6 million spam tweets: A large ground truth for timely
twitter spam detection. In 2015 IEEE International Conference on Communications (ICC). IEEE, 2015.

Chen, W., Yeo, C. K., Lau, C. T., and Lee, B. S. A study on real-time low-quality content detection on twitter
from the users’ perspective. PLOS ONE 12 (8): 1–22, 08, 2017.

Fakhraei, S., Foulds, J., Shashanka, M., and Getoor, L. Collective spammer detection in evolving multi-relational
social networks. In Proceedings of the 21th SIGKDD. ACM Press, 2015.

Gao, H., Chen, Y., Lee, K., Palsetia, D., and Choudhary, A. Poster. In Proceedings of the 18th ACM conference
on Computer and communications security. ACM Press, 2011.

Hu, X., Tang, J., Gao, H., and Liu, H. Social spammer detection with sentiment information. In 2014 IEEE
International Conference on Data Mining. IEEE, 2014.

Jin, X., Lin, C. X., Luo, J., and Han, J. Socialspamguard: A data mining-based spam detection system for social
media networks. In Proceedings of the international conference on very large data bases, 2011.

Lee, K., Eoff, B. D., and Caverlee, J. Seven months with the devils: A long-term study of content polluters on
twitter. In Fifth International AAAI Conference on Weblogs and Social Media, 2011.

Liu, H. and Setiono, R. Chi2: feature selection and discretization of numeric attributes. In Proceedings of 7th IEEE
International Conference on Tools with Artificial Intelligence. IEEE Comput. Soc. Press, 1995.

Martinez-Romo, J. and Araujo, L. Detecting malicious tweets in trending topics using a statistical analysis of
language. Expert Systems with Applications 40 (8): 2992–3000, jun, 2013.

McCord, M. and Chuah, M. Spam detection on twitter using traditional classifiers. In Lecture Notes in Computer
Science. Springer Berlin Heidelberg, pp. 175–186, 2011.

Miller, Z., Dickinson, B., Deitrick, W., Hu, W., andWang, A. H. Twitter spammer detection using data stream
clustering. Information Sciences vol. 260, pp. 64–73, Mar., 2014.

Santos, I., Miñambres-Marcos, I., Laorden, C., Galán-García, P., Santamaría-Ibirika, A., and Bringas,
P. G. Twitter content-based spam filtering. In Advances in Intelligent Systems and Computing. Springer International Publishing, pp. 449–458, 2014.

Song, J., Lee, S., and Kim, J. Spam filtering in twitter using sender-receiver relationship. In Lecture Notes in
Computer Science. Springer Berlin Heidelberg, pp. 301–317, 2011.

Sridharan, V., Shankar, V., and Gupta, M. Twitter games. In Proceedings of the 28th ACSAC. ACM Press, 2012.
Stats, I. L. Internet Live Stats - 1 second. https://www.internetlivestats.com/one-second/, 2019. Accessed:
2019-07-03.

Tan, E., Guo, L., Chen, S., Zhang, X., and Zhao, Y. Spammer behavior analysis and detection in user generated
content on social networks. In 2012 IEEE 32nd International Conference on Distributed Computing Systems. IEEE,
2012.

Thomas, K., Grier, C., Song, D., and Paxson, V. Suspended accounts in retrospect. In Proceedings of the 2011
ACM SIGCOMM conference on Internet measurement conference. ACM Press, 2011.

Ungerleider, N. Almost 10% of twitter is spam. https://www.fastcompany.com/3044485/
almost-10-of-twitter-is-spam, 2015. Accessed: 2019-07-02.

Wang, A. H. Don’t follow me: Spam detection in twitter. In 2010 International Conference on Security and Cryptography (SECRYPT). pp. 1–10, 2010.

Wang, B., Zubiaga, A., Liakata, M., and Procter, R. Making the most of tweet-inherent features for social spam
detection on twitter. arXiv preprint arXiv:1503.07405 , 2015.

Yang, C., Harkreader, R., and Gu, G. Empirical evaluation and new design for fighting evolving twitter spammers.
IEEE Transactions on Information Forensics and Security 8 (8): 1280–1293, Aug., 2013.

Yang, C., Harkreader, R. C., and Gu, G. Die free or live hard? empirical evaluation and new design for fighting
evolving twitter spammers. In International Workshop on Recent Advances in Intrusion Detection. Springer, pp.
318–337, 2011.

Zheng, X., Zhang, X., Yu, Y., Kechadi, T., and Rong, C. ELM-based spammer detection in social networks. The
Journal of Supercomputing 72 (8): 2991–3005, May, 2015.

Łuksza, K. Bot traffic is bigger than human. make sure it doesn’t affect you!, 2018.
Publicado
18/11/2019
Como Citar

Selecione um Formato
RESENDE, Júlio; MORAES, Igor; SILVA, Nícollas; DURELLI, Vinícius; DIAS, Diego; ROCHA, Leonardo. A Survey and a Preliminary Evaluation of Low-quality Content Detection Strategies: Which Attributes Are Still Relevant, Which Are Not?. In: SYMPOSIUM ON KNOWLEDGE DISCOVERY, MINING AND LEARNING (KDMILE) , 2019, Fortaleza. Anais do VII Symposium on Knowledge Discovery, Mining and Learning. Porto Alegre: Sociedade Brasileira de Computação, nov. 2019 . p. 17-24. DOI: https://doi.org/10.5753/kdmile.2019.8784.