Redes Neurais Convolucionais na Detecção de Bots Sociais: Um Método Baseado na Clusterização de Mensagens Textuais
Resumo
Cada vez mais bots sociais executam atividades maliciosas em redes sociais. O estado da arte na detecção desse tipo de malware considera, entre outras informações, medidas estatísticas calculadas a partir do conteúdo das mensagens postadas nas redes. Como esses cálculos podem ocasionar perda de informação, o presente artigo busca evidências experimentais que apoiem a hipótese de que o uso do conteúdo textual original das mensagens pode aprimorar a precisão de detecção. Para esse fim, foi proposto um método que utiliza redes neurais convolucionais para identificar mensagens e contas suspeitas. Tais redes são treinadas com amostras obtidas pela clusterização dos textos originais das mensagens. Experimentos com o Twitter confirmam a hipótese levantada.
Referências
Allcott, H. and Gentzkow, M. (2017). Social media and fake news in the 2016 election. Journal of Economic Perspectives, 31(2):211–36.
Alvisi, L., Clement, A., Epasto, A., Lattanzi, S., and Panconesi, A. (2013). Sok: The evolution of sybil defense via social networks. In Security and Privacy (SP), 2013 IEEE Symposium on, pages 382–396. IEEE.
Badri Satya, P. R., Lee, K., Lee, D., Tran, T., and Zhang, J. J. (2016). Uncovering fake likers in online social networks. In Proceedings of the 25th ACM International on CIKM, CIKM ’16, pages 2365–2370, New York, NY, USA. ACM.
Barbon, S., Igawa, R. A., and Zarpelao, B. B. (2017). Authorship verification applied to detection of compromised accounts on online social networks. Multimedia Tools and Applications, 76(3):3213–3233.
Beaudry, N. J. and Renner, R. (2012). An intuitive proof of the data processing inequality. Quantum Info. Comput., 12(5-6):432–441.
Beutel, A., Xu, W., Guruswami, V., Palow, C., and Faloutsos, C. (2013). Copycatch: Stopping group attacks by spotting lockstep behavior in social networks. In Proceedings of the 22Nd International Conference on World Wide Web, number - in WWW ’13, pages 119–130, New York, NY, USA. ACM. -.
Bezerra, E. (2016). Introdução à aprendizagem profunda. [link].
Bird, S., Klein, E., and Loper, E. (2009). Natural language processing with Python: analyzing text with the natural language toolkit. "O’Reilly Media, Inc.".
Boshmaf, Y., Logothetis, D., Siganos, G., Lería, J., Lorenzo, J., Ripeanu, M., and Beznosov, K. (2015). Integro: Leveraging victim prediction for robust fake account detection in osns. In NDSS, volume 15, pages 8–11.
Braz, P. and Goldschmidt, R. (2017). Um método para detecção de bots sociais baseado em redes neurais convolucionais aplicadas em mensagens textuais. In SBSeg 2017().
Cao, Q., Sirivianos, M., Yang, X., and Pregueiro, T. (2012). Aiding the detection of fake accounts in large scale social online services. In Presented as part of the 9th USENIX Symposium on Networked Systems Design and Implementation (NSDI 12), pages 197–210, San Jose, CA. USENIX.
Chollet, F. et al. (2015). Keras. [link].
Chu, Z., Gianvecchio, S., Wang, H., and Jajodia, S. (2012). Detecting automation of twitter accounts: Are you a human, bot, or cyborg? IEEE Transactions on Dependable and Secure Computing, 9(6):811–824.
Davis, C. A., Varol, O., Ferrara, E., Flammini, A., and Menczer, F. (2016). Botornot: A system to evaluate social bots. In Proceedings of the 25th International Conference Companion on World Wide Web, pages 273–274. International World Wide Web Conferences Steering Committee.
Dhingra, B., Zhou, Z., Fitzpatrick, D., Muehl, M., and Cohen, W. W. (2016). Tweet2vec: Character-based distributed representations for social media. arXiv preprint arXiv:1605.03481.
Ferrara, E., Varol, O., Davis, C., Menczer, F., and Flammini, A. (2016). The rise of social bots. Commun. ACM, 59(7):96–104.
Freitas, C., Benevenuto, F., and Veloso, A. (2014). Socialbots: Implicaç oes na segurança e na credibilidade de serviços baseados no twitter.
Gilani, Z., Farahbakhsh, R., Tyson, G., Wang, L., and Crowcroft, J. (2017). Of bots and humans (on twitter). In Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017, pages 349–354. ACM.
Ho, T. K. (1995). Random decision forests. In Document Analysis and Recognition, 1995., Proceedings of the Third International Conference on, volume 1, pages 278–282. IEEE.
Hochreiter, S. and Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8):1735–1780.
Hwang, T., Pearce, I., and Nanis, M. (2012). Socialbots: Voices from the fronts. interactions, 19(2):38–45.
Igawa, R. A., Barbon Jr, S., Paulo, K. C. S., Kido, G. S., Guido, R. C., Júnior, M. L. P., and Silva, I. N. d. (2016). Account classification in online social networks with lbca and wavelets. Inf. Sci., 332(C):72–83.
Jain, A. K. and Dubes, R. C. (1988). Algorithms for clustering data.
Keretna, S., Hossny, A., and Creighton, D. (2013). Recognising user identity in twitter social networks via text mining. In Systems, Man, and Cybernetics (SMC), 2013 IEEE International Conference on, pages 3079–3082. IEEE.
Kudugunta, S. and Ferrara, E. (2018). Deep neural networks for bot detection. arXiv preprint arXiv:1802.04289.
LeCun, Y., Huang, F. J., and Bottou, L. (2004). Learning methods for generic object recognition with invariance to pose and lighting. In Computer Vision and Pattern Recog nition, 2004. CVPR 2004. Proceedings of the 2004 IEEE Computer Society Conference on, volume 2, pages II–104. IEEE.
Lee, K., Eoff, B. D., and Caverlee, J. (2011). Seven months with the devils: A long-term study of content polluters on twitter.
Liddy, E. D. (2001). Natural language processing.
Ratkiewicz, J., Conover, M., Meiss, M. R., Gonçalves, B., Flammini, A., and Menczer, F. (2011). Detecting and tracking political abuse in social media.
Rosenblatt, F. (1961). Principles of neurodynamics. perceptrons and the theory of brain mechanisms. Technical report, CORNELL AERONAUTICAL LAB INC BUFFALO NY.
Salton, G. and McGill, M. J. (1986). Introduction to Modern Information Retrieval. McGraw-Hill, Inc., New York, NY, USA.
Stein, T., Chen, E., and Mangla, K. (2011). Facebook immune system. In Proceedings of the 4th Workshop on Social Network Systems, page 8. ACM.
Vosoughi, S., Vijayaraghavan, P., and Roy, D. (2016). Tweet2vec: Learning tweet embeddings using character-level cnn-lstm encoder-decoder. In Proceedings of the 39th International ACM SIGIR, SIGIR ’16, pages 1041–1044, New York, NY, USA. ACM.
Wang, G., Konolige, T., Wilson, C., Wang, X., Zheng, H., and Zhao, B. Y. (2013). You are how you click: Clickstream analysis for sybil detection. In Presented as part of the 22nd USENIX Security Symposium (USENIX Security 13), pages 241–256.
Wang, G., Mohanlal, M., Wilson, C., Wang, X., Metzger, M., Zheng, H., and Zhao, B. Y. (2012). Social turing tests: Crowdsourcing sybil detection. arXiv preprint arXiv:1205.3856.
Wang, G., Zhang, X., Tang, S., Zheng, H., and Zhao, B. Y. (2016). Unsupervised clickstream clustering for user behavior analysis. In SIGCHI Conference on Human Factors in Computing Systems.
Witten, I. H., Frank, E., Hall, M. A., and Pal, C. J. (2016). Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann.
Xiao, C., Freeman, D. M., and Hwa, T. (2015). Detecting clusters of fake accounts in online social networks. In Proceedings of the 8th ACM Workshop on Artificial Intelligence and Security, number - in AISec ’15, pages 91–101, New York, NY, USA. ACM. 1/12/2016.
Yang, Z., Wilson, C., Wang, X., Gao, T., Zhao, B. Y., and Dai, Y. (2014). Uncovering social network sybils in the wild. ACM Transactions on Knowledge Discovery from Data (TKDD), 8(1):2.
Yang, Z., Xue, J., Yang, X., Wang, X., and Dai, Y. (2015). Votetrust: Leveraging friend invitation graph to defend against social network sybils. -, -. 16/07/2016.
Zhang, X., Zhao, J., and LeCun, Y. (2015). Character-level convolutional networks for text classification. In Advances in neural information processing systems, pages 649–657.