Combining clustering and classification algorithms for automatic bot detection: a case study on posts about COVID-19
In the last decade, there has been a great insertion of bots in several social media. Among the potentially harmful effects of these software agents, there are: the spread of computer viruses and different internet scams, and the spread of fake news, with emphasis on political-electoral and public health-related news. This work presents a new approach for bots' detection on Twitter, combining the use of feature selection, clustering, and classification algorithms. The proposed approach was compared with more conventional ones (for example, without the use of clustering) and the premise used in this work proved to be true: the use of clustering, together with the features selection, allowed the production of better classification models in order to identify not only the bots who have an activity profile considered non-human (extremely active on Twitter) but also other bots whose profiles are more similar to humans' ones. The best results of automatic detection of bots reached an overall accuracy of 96.8% and F1 score equal to 0.622. As an additional advantage, these values were achieved by decision-tree models, which can be considered explainable artificial intelligence models.
Alessandro Bessi and Emilio Ferrara. 2016. Social bots distort the 2016 U.S. Presidential election online discussion. First Monday 21, 11 (2016).
Kenny Byrd, Alisher Mansurov, and Olga Baysal. 2016. Mining twitter data for influenza detection and surveillance. In Proceedings of the International Workshop on Software Engineering in Healthcare Systems. IEEE, 43–49.
N. Chavoshi, H. Hamooni, and A. Mueen. 2016. DeBot: Twitter Bot Detection via Warped Correlation. In 2016 IEEE 16th International Conference on Data Mining (ICDM). IEEE, 817–822.
Zhouhan Chen and Devika Subramanian. 2018. An Unsupervised Approach to Detect Spam Campaigns that Use Botnets on Twitter. arXiv:1804.05232 [cs.SI]
Laurenz A. Cornelissen, Richard J Barnett, Petrus Schoonwinkel, Brent D. Eichstadt, and Hluma B. Magodla. 2018. A Network Topology Approach to Bot Classification. In Proceedings of the Annual Conference of the South African Institute of Computer Scientists and Information Technologists (Port Elizabeth, South Africa) (SAICSIT ’18). Association for Computing Machinery, New York, NY, USA, 79–88. https://doi.org/10.1145/3278681.3278692
S. Cresci, R. D. Pietro, M. Petrocchi, A. Spognardi, and M. Tesconi. 2018. Social Fingerprinting: Detection of Spambot Groups Through DNA-Inspired Behavioral Modeling. IEEE Transactions on Dependable and Secure Computing 15, 4 (2018), 561–576.
Xiangfeng Dai, Marwan Bikdash, and Bradley Meyer. 2017. From social media to public health surveillance: Word embedding based clustering method for twitter classification. In SoutheastCon 2017. IEEE, IEEE, 1–7.
Matheus de Oliveira Lêu, Daniel Marques Gomes de Morais, Fernando Xavier, and Luciano Antonio Digiampietri. 2019. Detecção automática de bots em redes sociais: um estudo de caso no segundo turno das eleições presidenciais brasileiras de 2018. Revista de Sistemas de Informação da FSMA 24 (2019), 31–39.
Juan Echeverrïa, Emiliano De Cristofaro, Nicolas Kourtellis, Ilias Leontiadis, Gianluca Stringhini, and Shi Zhou. 2018. LOBO: Evaluation of Generalization Deficiencies in Twitter Bot Classifiers. In Proceedings of the 34th Annual Computer Security Applications Conference (San Juan, PR, USA) (ACSAC ’18). Association for Computing Machinery, New York, NY, USA, 137–146. https://doi.org/10.1145/3274694.3274738
J. Fernquist, L. Kaati, and R. Schroeder. 2018. Political Bots and the Swedish General Election. In 2018 IEEE International Conference on Intelligence and Security Informatics (ISI). IEEE, 124–129.
Emilio Ferrara, Onur Varol, Clayton Davis, Filippo Menczer, and Alessandro Flammini. 2016. The Rise of Social Bots. Commun. ACM 59, 7 (June 2016), 96–104. https://doi.org/10.1145/2818717
Z. Gilani, E. Kochmar, and J. Crowcroft. 2017. Classification of Twitter Accounts into Automated Agents and Human Users. In 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM). IEEE, 489–496.
Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H. Witten. 2009. The WEKA Data Mining Software: An Update. SIGKDD Explor. Newsl. 11, 1 (Nov. 2009), 10–18.
Mark A. Hall. 2000. Correlation-Based Feature Selection for Discrete and Numeric Class Machine Learning. In Proceedings of the Seventeenth International Conference on Machine Learning (ICML 2000). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 359–366.
Sophie E Jordan, Sierra E Hovet, Isaac Chun-Hai Fung, Hai Liang, King-Wa Fu, and Zion Tsz Ho Tse. 2019. Using Twitter for public health surveillance from monitoring and prediction to public response. Data 4, 1 (2019), 6.
M. Kantepe and M. C. Ganiz. 2017. Preprocessing framework for Twitter bot detection. In 2017 International Conference on Computer Science and Engineering (UBMK). IEEE, 630–634.
S. Khaled, N. El-Tazi, and H. M. O. Mokhtar. 2018. Detecting Fake Accounts on Social Media. In 2018 IEEE International Conference on Big Data (Big Data). IEEE, 3672–3681.
A. Minnich, N. Chavoshi, D. Koutra, and A. Mueen. 2017. BotWalk: Efficient Adaptive Exploration of Twitter Bot Networks. In 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM). IEEE, 467–474.
Mariam Orabi, Djedjiga Mouheb, Zaher [Al Aghbari], and Ibrahim Kamel. 2020. Detection of Bots in Social Media: A Systematic Review. Information Processing & Management 57, 4 (2020), 102250. https://doi.org/10.1016/j.ipm.2020.102250
Onur Varol, Emilio Ferrara, Clayton A Davis, Filippo Menczer, and Alessandro Flammini. 2017. Online human-bot interactions: Detection, estimation, and characterization. In Eleventh international AAAI conference on web and social media. Association for the Advancement of Artificial Intelligence.
Kai-Cheng Yang, Onur Varol, Pik-Mai Hui, and Filippo Menczer. 2020. Scalable and generalizable social bot detection through data selection. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. Association for the Advancement of Artificial Intelligence, 1096–1103.