Um Método não Supervisionado Baseado em Tópicos para Identificar Dimensões de Reputação em Microblogs

Brayan Neves; Anderson A. Ferreira

doi:10.5753/sbsi.2016.5943

Brayan Neves Universidade Federal de Ouro Preto
Anderson A. Ferreira Universidade Federal de Ouro Preto

DOI: https://doi.org/10.5753/sbsi.2016.5943

Resumo

Atualmente, redes sociais se tornaram grandes fontes de estudos, pois, com elas, é possível encontrar uma gama de informações relacionadas a gostos, interesses, desejos e opiniões de seus usuários. A identificação de dimensões de reputação é uma tarefa do gerenciamento da reputação digital, que consiste em segmentar uma base de opiniões sobre uma entidade em dimensões de reputação, estas dimensões refletem percepções afetivas e cognitivas da entidade por diferentes grupos de pessoas. Técnicas supervisionadas aplicadas a essa tarefa tem sido propostas, no entanto, elas se tornam inviáveis em aplicações reais, pois dependem de um conjunto de exemplos de treino que normalmente são manualmente rotulados. Este trabalho apresenta um novo método não supervisionado para identificar dimensões de reputação em microblogs baseado em modelagem de tópicos. Visto que, a maioria dos algoritmos de modelagem de tópicos não possuem um bom desempenho em identificar essas dimensões em textos curtos, a abordagem proposta visa melhorar esse desempenho. Este trabalho foi avaliado sobre o conjunto de dados do desafio RepLab 2014 e teve performance superior ao vencedor desse desafio, que é um método supervisionado. Porém, o método proposto neste artigo não necessita de, manualmente, rotular exemplos de treino.

Palavras-chave: Comunidades de Interesse, Análise de Tópicos, Twitter, Análises de Redes Sociais, Análise de Comunidades, Modelagem de Tópicos, Mídias Sociais

Referências

E. Amigó, J. Carrillo-de Albornoz, I. Chugur, A. Corujo, J. Gonzalo, E. Meij, M. de Rijke, and D. Spina. Overview of replab 2014: author profiling and reputation dimensions for online reputation management. In Information Access Evaluation. Multilinguality, Multimodality, and Interaction, pages 307–322. Springer, 2014.

R. Baeza-Yates, B. Ribeiro-Neto, et al. Modern information retrieval, volume 463. ACM press New York, 1999.

S. Bird, E. Klein, and E. Loper. Natural language processing with Python. ”O’Reilly Media, Inc.”, 2009.

D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. the Journal of machine Learning research, 3:993–1022, 2003.

C.-C. Chang and C.-J. Lin. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2:27:1–27:27, 2011. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.

S. W. Cho, M. Cha, and K.-A. Sohn. Topic category analysis on twitter via cross-media strategy. Multimedia Tools and Applications, pages 1–21, 2015.

C. Garbacea, M. Tsagkias, M. Rijke, et al. Feature selection and data sampling methods for learning reputation dimensions: The university of amsterdam at replab 2014. In Proceedings of Conference and Labs of the Evaluation Forum (CLEF), number 1180, pages 1479–1490. CEUR, 2014.

J. Gobeill, A. Gaudinat, and P. Ruch. Instance-based learning for tweet categorization in clef replab 2014. In Proceedings of Conference and Labs of the Evaluation Forum (CLEF), number 1180, pages 1491–1499, 2014.

A. Lancichinetti, M. I. Sirer, J. X. Wang, D. Acuna, K. K¨ording, and L. A. N. Amaral. High-reproducibility and high-accuracy method for automated topic classification. Physical Review X, 5(1):011007, 2015.

W. Magdy, K. Darwish, and I. Weber. #failedrevolutions: Using twitter to study the antecedents of isis support. First Monday, 21(2), 2016.

G. McDonald, R. Deveaud, R. McCreadie, T. Gollins, C. Macdonald, and I. Ounis. University of glasgow terrier team/project abacá at replab 2014: Reputation dimensions task. In Proceedings of Conference and Labs of the Evaluation Forum (CLEF), number 1180, pages 1500–1504, 2014.

M. A. Qureshi, A. Younus, C. O’Riordan, and G. Pasi. Cirgirgdisco at replab2014 reputation dimension task: Using wikipedia graph structure for classifying the reputation dimension of a tweet. In Proceedings of Conference and Labs of the Evaluation Forum (CLEF), number 1180, pages 1512–1518. Citeseer, 2014.

M. Rosvall and C. T. Bergstrom. Maps of random walks on complex networks reveal community structure. Proceedings of the National Academy of Sciences, 105(4):1118–1123, 2008.

P. Sánchez, J. García Morera, J. Villena Román, J. C. González Cristóbal, et al. Daedalus at replab 2014: Detecting reptrak reputation dimensions on tweets. (1180):1505–1511, 2014.

J. Tang, Z. Meng, X. Nguyen, Q. Mei, and M. Zhang. Understanding the limiting factors of topic modeling via posterior contraction analysis. In Proceedings of The 31st International Conference on Machine Learning, pages 190–198, 2014.

X. Yan, J. Guo, Y. Lan, and X. Cheng. A biterm topic model for short texts. In Proceedings of the 22nd international conference on World Wide Web, pages 1445–1456. International World Wide Web Conferences Steering Committee, 2013.

W. X. Zhao, J. Jiang, J. Weng, J. He, E.-P. Lim, H. Yan, and X. Li. Comparing twitter and traditional media using topic models. In Advances in Information Retrieval, pages 338–349. Springer, 2011.

Y. Zuo, J. Zhao, and K. Xu. Word network topic model: a simple but general solution for short and imbalanced texts. Knowledge and Information Systems, pages 1–20, 2014.