Emoções em português do Brasil: um conjunto de dados e resultados de base
Resumo
Este artigo apresenta um novo conjunto de dados para análise de sentimentos em português do Brasil. Os textos foram extráıdos de uma rede social brasileira denominada Meu Querido Diário. Nessa rede social, os usuários frequentemente compartilham sentimentos e emoções associados ao dia-a-dia. O principal diferencial deste conjunto de dados é que, nessa rede social, o próprio usuário pode informar a emoção associadaá sua entrada. Foram realizados experimentos preliminares com alguns modelos de classificação, criando os primeiros resultados de base. O modelo que obteve melhor resultado foi o SVM com kernel linear utilizando bigramas.
Referências
Brum, H. B. and Nunes, M. d. G. V. (2017). Building a sentiment corpus of tweets in brazilian portuguese. arXiv preprint arXiv:1712.08917.
Cambria, E., Schuller, B., Xia, Y., and Havasi, C. (2013). New avenues in opinion mining and sentiment analysis. IEEE Intelligent Systems, 28(2):15–21.
Cortes, C. and Vapnik, V. (1995). Support-vector networks. Machine learning, 20(3):273–297.
de Pelle, R. P. and Moreira, V. P. M. (2017). Offensive comments in the brazilian web: a dataset and baseline results. In Congresso da Sociedade Brasileira de Computação-CSBC.
Forman, G. (2007). Feature selection for text classification. Computational methods of feature selection, 1944355797.
Guedes, G. P., Bezerra, E., Ferrari, L., and Duarte, F. (2016). Gender differences in the use of portuguese in social networks: Evidence from liwc. In Proceedings of the 22nd Brazilian Symposium on Multimedia and the Web, pages 339–342. ACM.
Hassan, A. and Mahmood, A. (2017). Deep learning for sentence classification. In Systems, Applications and Technology Conference (LISAT), 2017 IEEE Long Island, pages 1–5. IEEE.
Liu, B. (2012). Sentiment analysis and opinion mining. Synthesis lectures on human language technologies, 5(1):1–167.
McCallum, A., Nigam, K., et al. (1998). A comparison of event models for naive bayes text classification. In AAAI-98 workshop on learning for text categorization, volume 752, pages 41–48. Citeseer.
Nascimento, P., Aguas, R., Lima, D., Kong, X., Osiek, B., Xexéo, G., and Souza, J. (2012). Análise de sentimento de tweets com foco em notícias. In Brazilian Workshop on Social Network Analysis and Mining.
Pang, B., Lee, L., et al. (2008). Opinion mining and sentiment analysis. Foundations and Trends R© in Information Retrieval, 2(1–2):1–135.
Rosenthal, S., Nakov, P., Kiritchenko, S., Mohammad, S., Ritter, A., and Stoyanov, V. (2015). Semeval-2015 task 10: Sentiment analysis in twitter. In Proceedings of the 9th international workshop on semantic evaluation (SemEval 2015), pages 451–463.
Saif, H., Fernandez, M., He, Y., and Alani, H. (2013). Evaluation datasets for twitter sentiment analysis: a survey and a new dataset, the sts-gold.
Salton, G., Wong, A., and Yang, C.-S. (1975). A vector space model for automatic indexing. Communications of the ACM, 18(11):613–620.
Wiegand, M., Balahur, A., Roth, B., Klakow, D., and Montoyo, A. (2010). A survey on the role of negation in sentiment analysis. In Proceedings of the workshop on negation and speculation in natural language processing, pages 60–68. Association for Computational Linguistics.
Zhu, J. and Chen, W. (2005). Some studies on chinese domain knowledge dictionary and its application to text classification. In Proceedings of the Fourth SIGHAN Workshop on Chinese Language Processing.
