Mineração de tópicos e aspectos em microblogs sobre Dengue, Chikungunya, Zika e Microcefalia
Resumo
A correta análise de textos opinativos, incluindo aqueles postados em microblogs e redes sociais, passa pela identificação do tópico comentado pelo autor do texto. A análise dos tópicos pode ser realizada por um conjunto de técnicas para a identificação do que chamamos de ‘termos de aspectos’. Neste artigo, mostramos como a identificação de termos de aspectos em microblogs em Português pode ser alcançada por métodos baseados em frequência e pela representação vetorial de palavras (word2vec). Obtivemos uma lista de n-gramas que acreditamos que sejam indicadores adequados dos tópicos comentados. Focamos nosso trabalho em textos sobre Dengue, Chikungunya e Zika, assim como Microcefalia, que atualmente são sérias ameaças à saúde.
Referências
Bird, S., Klein, E., and Loper, E. (2009). Natural Language Processing with Python. O’Reilly Media, Inc.
Blei, D. M., Ng, A. Y., and Jordan, M. I. (2003). Latent Dirichlet Allocation. Journal of Machine Learning Research, 3(Jan):993–1022.
Bollen, J., Mao, H., and Zeng, X. (2011). Twitter mood predicts the stock market. Journal of Computational Science, 2(1):1–8.
Cataldi, M., Di Caro, L., and Schifanella, C. (2010). Emerging topic detection on Twitter based on temporal and social terms evaluation. In Proceedings of the Tenth International Workshop on Multimedia Data Mining, page 4. ACM.
da Silva Conrado, M., Pardo, T. A. S., and Rezende, S. O. (2013). A machine learning approach to automatic term extraction using a rich feature set. In HLT-NAACL, pages 16–23.
Freitas, C., Rocha, P., and Bick, E. (2008). Um mundo novo na Floresta Sintá (c) tica – o treebank do Português. Calidoscópio, 6(3):142–148.
Gandomi, A. and Haider, M. (2015). Beyond the hype: Big data concepts, methods, and analytics. International Journal of Information Management, 35(2):137–144.
Hu, M. and Liu, B. (2004a). Mining and summarizing customer reviews. In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 168–177. ACM.
Hu, M. and Liu, B. (2004b). Mining opinion features in customer reviews. In AAAI, volume 4, pages 755–760.
Lek, H. H. and Poo, D. C. (2013). Aspect-based Twitter sentiment classification. In 2013 IEEE 25th International Conference on Tools with Artificial Intelligence, pages 366–373. IEEE.
Liu, B. (2012). Sentiment analysis and opinion mining. Synthesis Lectures on Human Language Technologies, 5(1):1–167.
Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013a). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
Mikolov, T., Corrado, G., Chen, K., and Dean, J. (2013b). Efficient Estimation of Word Representations in Vector Space. Proceedings of the International Conference on Learning Representations (ICLR 2013), pages 1–12.
Moorhead, S. A., Hazlett, D. E., Harrison, L., Carroll, J. K., Irwin, A., and Hoving, C. (2013). A new dimension of health care: systematic review of the uses, benefits, and limitations of social media for health communication. Journal of Medical Internet Research, 15(4):e85.
Pavlopoulos, J. and Androutsopoulos, I. (2014). Aspect term extraction for sentiment analysis: New datasets, new evaluation measures and an improved unsupervised method. Proceedings of LASMEACL, pages 44–52.
Rodrigues, J., Branco, A., Neale, S., and Silva, J. (2003). LX-DSemVectors: Distributional Semantics Models for Portuguese. 6th International Workshop PROPOR’2003, Faro, Portugal, June 2003, 8775(2721):214–219.
Temporal, J. C. A. N. (2016). Identificação de entidades mecionadas para análise de sentimentos em microblogs. Monografia (Bacharel em Informática Biomédica), FFCLRP, Universidade de São Paulo, Brazil.
Tumasjan, A., Sprenger, T. O., Sandner, P. G., and Welpe, I. M. (2010). Predicting elections with Twitter: What 140 characters reveal about political sentiment. ICWSM, 10(1):178–185.
Wei, C.-P., Chen, Y.-M., Yang, C.-S., and Yang, C. C. (2010). Understanding what concerns consumers: a semantic approach to product feature extraction from consumer reviews. Information Systems and E-Business Management, 8(2):149–167.