Um Modelo Baseado em Regras para a Detecção de bots no Twitter

Maria Alice Gomes Lopes Leite; Marcus Vinicius Carvalho Guelpeli; Caroline Queiroz Santos

doi:10.5753/brasnam.2020.11161

Maria Alice Gomes Lopes Leite IFNMG
Marcus Vinicius Carvalho Guelpeli UFVJM
Caroline Queiroz Santos UFVJM

DOI: https://doi.org/10.5753/brasnam.2020.11161

Resumo

O crescimento do uso das redes sociais online pela sociedade as tornou importantes fontes de estudos em vários campos, desde o mercado de ações e previsão de eleições até o comportamento humano. No entanto, amostras de dados extraídas dessas redes tornaram-se vulneráveis à atividade de contas bots. Por isso, este trabalho propõe uma abordagem supervisionada para extração de conhecimento a partir de uma base de dados da literatura, utilizando técnicas que visam não apenas classificar, mas também descrever as principais características dos bots e das contas genuínas no Twitter. O modelo de classificação baseado em regras foi gerado com o objetivo de contribuir para a construção de um framework para coletar dados do Twitter com pouca interferência de contas maliciosas. Os resultados foram considerados satisfatórios, se comparados a outros trabalhos relacionados.

Palavras-chave: Twitter, Bots, Filtro para bots, Redes Sociais, Indução de Regras

Referências

Ahmed, F. and Abulaish, M. (2013). A generic statistical approach for spam detection in online social networks. Computer Communications, 36:1120–1129.

Arias, M., Arratia, A., and Xuriguera, R. (2014). Forecasting with twitter data. ACM Trans. Intell. Syst. Technol., 5(1):8:1–8:24.

Bessi, A., Coletto, M., Davidescu, G., Scala, A., Caldarelli, G., and Quattrociocchi, W. (2014). Science vs conspiracy: Collective narratives in the age of misinformation. PloS one, 10.

Chapman, P., Clinton, J., Kerber, R., Khabaza, T., Reinartz, T., Shearer, C., and Wirth, R. (2000). Crisp-dm 1.0 step-by-step data mining guide.

Chavoshi, N., Hamooni, H., and Mueen, A. (2016). Identifying correlated bots in twitter.

Chu, Z., Gianvecchio, S., Wang, H., and Jajodia, S. (2012). Detecting automation of twit- ter accounts: Are you a human, bot, or cyborg? Dependable and Secure Computing, IEEE Transactions on, 9:811–824.

Cresci, S., Pietro, R., Petrocchi, M., Spognardi, A., and Tesconi, M. (2015). Fame for sale: Efficient detection of fake twitter followers. Decision Support Systems, 80.

Cresci, S., Pietro, R., Petrocchi, M., Spognardi, A., and Tesconi, M. (2017). The paradigm-shift of social spambots: Evidence, theories, and tools for the arms race.

Dong, G. and Liu, H. (2018). Feature Engineering for Machine Learning and Data Analy- tics. CRC Press, Inc., Boca Raton, FL, USA, 1st edition.

Ferrara, E. (2017). Disinformation and social bot operations in the run up to the 2017 french presidential election. First Monday, 22.

Goodman, B. and Flaxman, S. (2016). Eu regulations on algorithmic decision-making and a ”right to explanation”. AI Magazine, 38.

Han, J., Kamber, M., and Pei, J. (2012). Data mining concepts and techniques, third edition.

Kirkpatrick, K. (2016). Battling algorithmic bias: how do we ensure algorithms treat us fairly? Communications of the ACM, 59:16–17.

Lee, K., Caverlee, J., and Webb, S. (2010). Uncovering social spammers: social honey- pots + machine learning. pages 435–442.

Lee, K., Eoff, B., and Caverlee, J. (2011). Seven months with the devils: A long-term study of content polluters on twitter.

Lundberg, J., Nordqvist, J., and Laitinen, M. (2019). Towards a language independent twitter bot detector. In Proceedings of the Digital Humanities in the Nordic Countries 4th Conference, Copenhagen, Denmark, March 5-8, 2019, pages 308–319.

Metz, C. E. (1978). Basic principles of ROC analysis. Seminars in Nuclear Medicine, 8(4):283–298.

Miller, Z., Dickinson, B., Deitrick, W., Hu, W., and Wang, A. (2014). Twitter spammer detection using data stream clustering. Information Sciences, 260:64–73.

Naaman, M., Boase, J., and Lai, C.-H. (2010). Is it really about me?: Message content in social awareness streams. In Proceedings of the 2010 ACM Conference on Computer Supported Cooperative Work, CSCW ’10, pages 189–192, New York, NY, USA. ACM.

Quinlan, J. R. (1993). C4.5: Programs for Machine Learning. Morgan Kaufmann Pu- blishers Inc., San Francisco, CA, USA.

Saeed, Z., Abbasi, R., Maqbool, O., Sadaf, A., Razzak, I., Daud, A., Aljohani, N., and Xu, G. (2019). What’s happening around the world? a survey and framework on event detection techniques on twitter. Journal of Grid Computing.

Silva Filho, R. L. C. and Adeodato, P. J. L. (2019). Data mining solution for assessing the secondary school students of brazilian federal institutes. In 2019 8th Brazilian Conference on Intelligent Systems (BRACIS), pages 574–579.

Stringhini, G., Kruegel, C., and Vigna, G. (2010). Detecting spammers on social networks. In Proceedings of the 26th Annual Computer Security Applications Con- ference, ACSAC ’10, pages 1–9, New York, NY, USA. ACM.

Taigman, Y., Yang, M., Ranzato, M., and Wolf, L. (2014). Deepface: Closing the gap to human-level performance in face verification.

Varol, O., Ferrara, E., Davis, C. A., Menczer, F., and Flammini, A. (2017). Online human- bot interactions: Detection, estimation, and characterization. CoRR, abs/1703.03107.

Xia, X., Shihab, E., Kamei, Y., Lo, D., and Wang, X. (2016). Predicting crashing releases of mobile applications. In Proceedings of the 10th ACM/IEEE International Sympo- sium on Empirical Software Engineering and Measurement, ESEM ’16, New York, NY, USA. Association for Computing Machinery.

Yang, K.-C., Varol, O., Davis, C. A., Ferrara, E., Flammini, A., and Menczer, F. (2019). Arming the public with artificial intelligence to counter social bots. (December 2018):48–61.