Data mining of social manifestations in Twitter: An ETL approach focused on sentiment analysis
Resumo
The objective of this study was to analyze sentiments of users of online social network twitter to understand how people manifested toward the article published by the magazine Veja on 04-18-16 entitled "bela, recatada e do lar" (beautiful, demure and from home) in an attempt to understand how this behavior evolved in two weeks and to assess which events had aroused greater reaction from people. To this end, a data mining technique known as sentiment analysis was used with the help of the ETL (Extract, Transform and Load) methodology and the Naive Bayes probabilistic learning algorithm. Moreover, the null hypothesis was formulated and tested to see whether two events that took place during the collection period influenced, in fact, the polarity of analyzed sentiments in the generated database.
Referências
Balamurali, A.R., Joshi, A. and Bhattacharyya, P. 2011. Harnessing wordnet senses for supervised sentiment classification. Proceedings of the Conference on Empirical Methods in Natural Language Processing (2011), 1081– 1091.
Bollen, J., Van de Sompel, H., Hagberg, A. and Chute, R. 2009. A principal component analysis of 39 scientific impact measures. PloS one. 4, 6 (2009), e6022.
Bonabeau, E. 2004. The perils of the imitation age. Harvard Business Review.
Camilo, C.O. and Silva, J.C. da 2009. Mineração de dados: Conceitos, tarefas, métodos e ferramentas. Technical Report #Technical Report #RT-INF_001-09. Universidade Federal de Goiás.
Campos, S.R. 2013. Validação de dados em sistemas de data warehouse através de índice de similaridade no processo de ETL e mapeamento de trilhas de auditoria utilizando indexação ontológica. Universidade de Brasília.
Castells, M. 2013. Redes de indignação e esperança: movimentos sociais na era da internet. Zahar.
Davidov, D., Tsur, O. and Rappoport, A. 2010. Enhanced sentiment learning using twitter hashtags and smileys. Proceedings of the 23rd international conference on computational linguistics: posters (2010), 241–249.
Elmasri, R. and Navathe, S.B. 2005. Sistemas de banco de dados. Pearson Addison Wesley.
Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P. and Uthurusamy, R. eds. 1996. Advances in knowledge discovery and data mining. American Association for Artificial Intelligence.
Ferreira, M. da C. da S. 2012. Classificação Hierárquica da Atividade Económica das Empresas a partir de Texto da Web. Universidade do Porto.
de França, T.C. and Oliveira, J. 2014. Análise de Sentimento de Tweets Relacionados aos Protestos que ocorreram no Brasil entre Junho e Agosto de 2013. Anais do XXXIV Congresso da Sociedade Brasileira de Computação (Brasília, 2014), 128–139.
Hassan, S.-U. and Gillani, U.A. 2016. Altmetrics of “altmetrics” using Google Scholar, Twitter, Mendeley, Facebook, Google-plus, CiteULike, Blogs and Wiki. arXiv:1603.07992 [cs]. (Mar. 2016).
Jiawei Han, Micheline Kamber and Jian Pei 2011. Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers Inc.
Kwak, H. and Lee, J.G. 2014. Has Much Potential but Biased: Exploring the Scholarly Landscape in Twitter. Proceedings of the 23rd International Conference on World Wide Web (New York, NY, USA, 2014), 563–564.
Li, Y.-M. and Li, T.-Y. 2011. Deriving marketing intelligence over microblogs. Proceedings of the 44th Hawaii International Conference on System Sciences (2011), 1–10.
Liu, B. 2012. Sentiment analysis and opinion mining. Synthesis lectures on human language technologies. 5, 1 (2012), 1–167.
Lucas, A. de M. 2002. Utilização de técnicas de mineração de dados considerando aspectos temporais. Universidade Federal do Rio Grande do Sul.
Maia, L.F.M.P., Costa, R.J.M. and Cruz, S.M.S. 2015. Uma Proposta de Biblioteca Digital de Trabalhos de Conclusão de Curso. Anais da II Escola Regional de Sistemas de Informação do Rio de Janeiro (Rio de Janeiro, 2015).
Maia, L.F.M.P., Yagui, M.M.M., Quispe, F.E.M., Oliveira, G.S., Leonardo, J.S. and Cruz, S.M.S. 2014. Combinando Dados de Clickstream e Análise de Redes Sociais Para Identificação do Comportamento Eletrônico dos Petianos da Região Sudeste. Anais do XIX Encontro Nacional de Grupos do Programa de Educação Tutorial (Santa Maria, 2014).
Nascimento, P., Aguas, R., De Lima, D., Kong, X., Osiek, B., Xexéo, G. and De Souza, J. 2012. Análise de sentimento de tweets com foco em notícias. Anais do XXXII Congresso da Sociedade Brasileira de Computação (Curitiba, 2012), 16–19.
Pak, A. and Paroubek, P. 2010. Twitter as a Corpus for Sentiment Analysis and Opinion Mining. Proceedings of the Seventh International Conference on Language Resources and Evaluation (Valletta, 2010), 1320–1326.
Sarlan, A., Nadam, C. and Basri, S. 2014. Twitter sentiment analysis. Proceedings of the 6th International Conference on Information Technology and Multimedia (Putrajaya, 2014), 212–216.
Yousefpour, A., Ibrahim, R. and Abdull Hamed, H.N. 2014. A Novel Feature Reduction Method in Sentiment Analysis. International Journal of Innovative Computing. 4, 1 (2014).