A survey of low-quality content detection strategies
Abstract
Millions of users have come to rely on the wide range of services provided by Social Networks. However, the ease use of social networks for communicating information also makes them particularly vulnerable to ill-intentioned users (spammers) whose main purpose is to proliferate of different types of malicious data and low-quality content (spams). Since Twitter is also rife with low-quality content, several researchers have devised various low-quality detection strategies that inspect tweets for the existence of spam contents. We carried out a literature survey of these low-quality detection strategies, evaluating which strategies are still applicable in the current scenario, taken into account that Twitter has undergone a lot of changes in the last few years.
References
Almaatouq, A., Alabdulkareem, A., Nouh, M., Shmueli, E., Alsaleh, M., Singh, V. K., Alarifi, A., Alfaris, A., and Pentland, A. S. (2014). Twitter: who gets caught? observed trends in social micro-blogging spam. In Proceedings of the 2014 ACM conference on Web science - WebSci. ACM Press.
Benevenuto, F., Magno, G., Rodrigues, T., and Almeida, V. (2010). Detecting spammers on twitter. In Collaboration, electronic messaging, anti-abuse and spam conference (CEAS), volume 6, page 12.
Bosma, M., Meij, E., and Weerkamp, W. (2012). A framework for unsupervised spam detection in social networking sites. In European Conference on Information Retrieval, pages 364–375. Springer.
Chen, C., Zhang, J., Chen, X., Xiang, Y., and Zhou, W. (2015). 6 million spam tweets: A large ground truth for timely twitter spam detection. In 2015 IEEE International Conference on Communications (ICC). IEEE.
Chen, W., Yeo, C. K., Lau, C. T., and Lee, B. S. (2017). A study on real-time low-quality content detection on twitter from the users’ perspective. PLOS ONE, 12(8):1–22.
Fakhraei, S., Foulds, J., Shashanka, M., and Getoor, L. (2015). Collective spammer detection in evolving multi-relational social networks. In Proceedings of the 21th SIGKDD. ACM Press.
Gao, H., Chen, Y., Lee, K., Palsetia, D., and Choudhary, A. (2011). Poster. In Proceedings of the 18th ACM conference on Computer and communications security. ACM Press.
Hu, X., Tang, J., Gao, H., and Liu, H. (2014). Social spammer detection with sentiment information. In 2014 IEEE International Conference on Data Mining. IEEE.
Jin, X., Lin, C. X., Luo, J., and Han, J. (2011). Socialspamguard: A data mining-based spam detection system for social media networks. In Proceedings of the international conference on very large data bases.
Lee, K., Eoff, B. D., and Caverlee, J. (2011). Seven months with the devils: A longterm study of content polluters on twitter. In Fifth International AAAI Conference on Weblogs and Social Media.
Liu, H. and Setiono, R. (1995). Chi2: feature selection and discretization of numeric attributes. In Proceedings of 7th IEEE International Conference on Tools with Artificial Intelligence. IEEE Comput. Soc. Press.
Martinez-Romo, J. and Araujo, L. (2013). Detecting malicious tweets in trending topics using a statistical analysis of language. Expert Systems with Applications, 40(8):2992– 3000.
McCord, M. and Chuah, M. (2011). Spam detection on twitter using traditional classifiers. In Lecture Notes in Computer Science, pages 175–186. Springer Berlin Heidelberg.
Miller, Z., Dickinson, B., Deitrick,W., Hu,W., andWang, A. H. (2014). Twitter spammer detection using data stream clustering. Information Sciences, 260:64–73.
Santos, I., Minambres-Marcos, I., Laorden, C., Galan-Garcıa, P., Santamarıa-Ibirika, A., and Bringas, P. G. (2014). Twitter content-based spam filtering. In Advances in Intelligent Systems and Computing, pages 449–458. Springer International Publishing.
Song, J., Lee, S., and Kim, J. (2011). Spam filtering in twitter using sender-receiver relationship. In Lecture Notes in Computer Science, pages 301–317. Springer Berlin Heidelberg.
Sridharan, V., Shankar, V., and Gupta, M. (2012). Twitter games. In Proceedings of the 28th ACSAC. ACM Press.
Stats, I. L. (2019). Internet Live Stats - 1 second. https://www.internetlivestats.com/one-second/. Accessed: 2019-07-03.
Tan, E., Guo, L., Chen, S., Zhang, X., and Zhao, Y. (2012). Spammer behavior analysis and detection in user generated content on social networks. In 2012 IEEE 32nd International Conference on Distributed Computing Systems. IEEE.
Thomas, K., Grier, C., Song, D., and Paxson, V. (2011). Suspended accounts in retrospect. In Proceedings of the 2011 ACM SIGCOMM conference on Internet measurement conference. ACM Press.
Ungerleider, N. (2015). Almost 10% of twitter is spam. https://www.fastcompany.com/3044485/almost-10-of-twitter-is-spam. Accessed: 2019-07-02.
Wang, A. H. (2010). Don’t follow me: Spam detection in twitter. In 2010 International Conference on Security and Cryptography (SECRYPT), pages 1–10.
Wang, B., Zubiaga, A., Liakata, M., and Procter, R. (2015). Making the most of tweet-inherent features for social spam detection on twitter. arXiv preprint ar-Xiv:1503.07405.
Yang, C., Harkreader, R., and Gu, G. (2013). Empirical evaluation and new design for fighting evolving twitter spammers. IEEE Transactions on Information Forensics and Security, 8(8):1280–1293.
Yang, C., Harkreader, R. C., and Gu, G. (2011). Die free or live hard? empirical evaluation and new design for fighting evolving twitter spammers. In International Workshop on Recent Advances in Intrusion Detection, pages 318–337. Springer.
Zheng, X., Zhang, X., Yu, Y., Kechadi, T., and Rong, C. (2015). ELM-based spammer detection in social networks. The Journal of Supercomputing, 72(8):2991–3005.
Łuksza, K. (2018). Bot traffic is bigger than human. make sure it doesn’t affect you!