Evaluating the Brazilian Portuguese version of the 2015 LIWC Lexicon with sentiment analysis in social networks

Flavio Carvalho; Rafael  Guimarães Rodrigues; Gabriel  Santos; Pedro  Cruz; Lilian  Ferrari; Gustavo  Paiva  Guedes

doi:10.5753/brasnam.2019.6545

Flavio Carvalho CEFET-RJ
Rafael Guimarães Rodrigues CEFET-RJ
Gabriel Santos CEFET-RJ
Pedro Cruz CEFET-RJ
Lilian Ferrari UFRJ
Gustavo Paiva Guedes CEFET-RJ

DOI: https://doi.org/10.5753/brasnam.2019.6545

Resumo

O LIWC é um programa de análise de texto que categoriza palavras em categorias derivadas de gramática e psicologia. O léxico LIWC atualmente disponível para o português brasileiro (LIWC 2007pt) é baseado na versão 2007 do programa LIWC. Como vários estudos indicaram, o LIWC 2007pt mostra problemas de desempenho e categorização. Neste cenário, este trabalho destaca um novo léxico do LIWC no Brasil (LIWC 2015pt), baseado no programa LIWC 2015. Este trabalho compara o desempenho do LIWC 2007pt e do LIWC 2015pt em tarefas de classificação. Três experimentos foram conduzidos e os resultados indicam que o LIWC 2015pt supera o LIWC 2007pt em todas as três tarefas.

Palavras-chave: Processamento de Linguagem Natural, Detecção de Emoções, Linguistic Inquiry and Word Count (LIWC)

Referências

Aires, R., Manfrin, A., Aluı́sio, S., and Santos, D. (2004). Which classification algorithm works best with stylistic features of Portuguese in order to classify web texts according to users’ needs?

Alparone, F., Caso, S., Agosti, A., and Rellini, A. (2004). The Italian LIWC2001 Dictio- nary. LIWC. net, Austin.

Balage Filho, P. P., Pardo, T. A., and Aluı́sio, S. M. (2013). An evaluation of the Brazilian Portuguese LIWC dictionary for sentiment analysis. In Proceedings of the 9th Brazil- ian Symposium in Information and Human Language Technology (STIL), pages 215– 219.

Bjekić, J., Lazarević, L. B., Živanović, M., and Knežević, G. (2014). Psychometric eval- uation of the Serbian dictionary for automatic text analysis: LIWCser. Psihologija, 47(1):5–32.

Caetano, J. A., Lima, H. S., dos Santos, M. F., and Marques-Neto, H. T. (2017). Utilizando análise de sentimentos para definição da homofilia polı́tica dos usuários do Twitter durante a eleição presidencial americana de 2016. In 6th Brazilian Workshop on Social Network Analysis and Mining (BraSNAM 2017), volume 6. SBC.

Carvalho, F., Rodrigues, R. G., Ferrari, L., and Guedes, G. P. (2018a). A dictionary of pro- nouns for Brazilian Portuguese. In Congresso Internacional de Informática Educativa (TISE), Brası́lia, Brasil. J. Sánchez.

Carvalho, F., Santos, G. d., and Guedes, G. P. (2018b). AffectPT-br: an affective lexicon based on LIWC 2015. In 37th International Conference of the Chilean Computer Science Society (SCCC 2018), Santiago, Chile. IEEE.

Cavalcante, P. E. C. and Malheiros, Y. d. A. (2017). Um dataset para análise de sentimen- tos na lı́ngua portuguesa. Trabalho de Conclusão de Curso, Bacharel em Sistemas de Informação, Universidade Federal da Paraı́ba.

Crowl, L. A. (1994). How to measure, present, and compare parallel performance. IEEE Parallel & Distributed Technology: Systems & Technology, 2(1):9–25.

Fersini, E., Pozzi, F. A., and Messina, E. (2015). Detecting irony and sarcasm in mi- croblogs: The role of expressive signals and ensemble classifiers. In Data Science andAdvanced Analytics (DSAA), 2015. 36678 2015. IEEE International Conference on, pages 1–8. IEEE.

Gabrilovich, E. and Markovitch, S. (2004). Text categorization with many redundant features: using aggressive feature selection to make SVMs competitive with C4.5. In Proceedings of the twenty-first international conference on Machine learning, page 41. ACM.

Grimmer, J. and Stewart, B. M. (2013). Text as data: The promise and pitfalls of automatic content analysis methods for political texts. Political analysis, 21(3):267–297.

Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., and Witten, I. H. (2009). The WEKA data mining software: an update. ACM SIGKDD explorations newsletter, 11(1):10–18.

Ireland, M. E., Slatcher, R. B., Eastwick, P. W., Scissors, L. E., Finkel, E. J., and Pen- nebaker, J. W. (2011). Language style matching predicts relationship initiation and stability. Psychological science, 22(1):39–44.

Kohavi, R. et al. (1995). A study of cross-validation and bootstrap for accuracy estima- tion and model selection. In Proceedings of the 14th International Joint Conference on Artificial Intelligence, volume 2, pages 1137–1145, Montreal, Canada. Morgan Kauf- mann.

Liu, B. (2012). Sentiment analysis and opinion mining. Synthesis lectures on human language technologies, 5(1):1–167.

Loures, T. C., de Melo, P. O. V., and Veloso, A. A. (2017). É possı́vel descrever episódios de séries de televisão a partir de comentários online? In 6th Brazilian Workshop on Social Network Analysis and Mining (BraSNAM 2017). SBC.

Massó, G., Lambert, P., Penagos, C. R., and Saurı́, R. (2013). Generating New LIWC Dictionaries by Triangulation. In Asia Information Retrieval Symposium, pages 263– 271. Springer.

Meier, T., Boyd, R. L., Pennebaker, J. W., Mehl, M. R., Martin, M., Wolf, M., and Horn, A. B. (2019). “LIWC auf Deutsch”: The development, psychometrics, and introduc- tion of DE-LIWC2015.

Moreira, S. F., Baklizky, M., and Digiampietri, L. A. (2018). Uso de mineração de textos para a identificação de postagens com informações de localização. In 7th Brazilian Workshop on Social Network Analysis and Mining (BraSNAM 2018). SBC.

Öztuna, D., Elhan, A. H., and Tüccar, E. (2006). Investigation of four different normality tests in terms of type 1 error rate and power under different distributions. Turkish Journal of Medical Sciences, 36(3):171–176.

Paulussen, H., Macken, L., Vandeweghe, W., and Desmet, P. (2013). Dutch parallel corpus: A balanced parallel corpus for Dutch-English and Dutch-French. In Essential Speech and language technology for Dutch, pages 185–199. Springer.

Peat, J. and Barton, B. (2008). Medical statistics: A guide to data analysis and critical appraisal. John Wiley & Sons.

Pennebaker, J. W., Boyd, R. L., Jordan, K., and Blackburn, K. (2015). The development and psychometric properties of LIWC2015. Technical report, University of Texas.Pennebaker, J. W. and Chung, C. K. (2011). Expressive writing: Connections to physical and mental health. Oxford handbook of health psychology, pages 417–437.

Pettijohn, T. F. and Sacco Jr, D. F. (2009). The language of lyrics: An analysis of popular billboard songs across conditions of social and economic threat. Journal of Language and Social Psychology, 28(3):297–311.

Piolat, A., Booth, R. J., Chung, C. K., Davids, M., and Pennebaker, J. W. (2011). La version française du dictionnaire pour le LIWC: modalités de construction et exemples d’utilisation. Psychologie française, 56(3):145–159.

Ramirez-Esparza, N., Chung, C. K., Kacewicz, E., and Pennebaker, J. W. (2008). The psychology of word use in depression forums in English and in Spanish: Testing two text analytic approaches. In ICWSM.

Ravi, K. and Ravi, V. (2017). A novel automatic satire and irony detection using ensem- bled feature selection and data mining. Knowledge-Based Systems, 120:15–33.

Rodrigues, R. G. and Guedes, G. P. (2017). A hybrid affective lexicon for brazilian portuguese. CEP, 20271:110.

Shibata, D., Wakamiya, S., Kinoshita, A., and Aramaki, E. (2016). Detecting Japanese patients with Alzheimer’s disease based on word category frequencies. In Proceedings of the Clinical Natural Language Processing Workshop (ClinicalNLP), pages 78–85.

Taboada, M., Brooke, J., Tofiloski, M., Voll, K., and Stede, M. (2011). Lexicon-based methods for sentiment analysis. Computational linguistics, 37(2):267–307.

Tausczik, Y. R. and Pennebaker, J. W. (2010). The psychological meaning of words: LIWC and computerized text analysis methods. Journal of language and social psy- chology, 29(1):24–54.

Van Wissen, L. and Boot, P. (2017). An electronic translation of the LIWC Dictionary into Dutch. In Electronic lexicography in the 21st century: Proceedings of eLex 2017 conference, pages 703–715. Lexical Computing.

Wang, S. and Manning, C. D. (2012). Baselines and bigrams: Simple, good sentiment and topic classification. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers-Volume 2, pages 90–94. Association for Computational Linguistics.

Zeng, X., Yang, C., Tu, C., Liu, Z., and Sun, M. (2018). Chinese LIWC lexicon expansion via hierarchical classification of word embeddings with sememe attention. In Thirty- Second AAAI Conference on Artificial Intelligence.

Zhang, Y., Jin, R., and Zhou, Z.-H. (2010). Understanding bag-of-words model: a sta- tistical framework. International Journal of Machine Learning and Cybernetics, 1(1- 4):43–52.

Avaliação da versão em português do LIWC Lexicon 2015 com análise de sentimentos em redes sociais

Resumo

Referências

Artigos mais lidos do(s) mesmo(s) autor(es)