Sentiment Analysis in Portuguese Texts from Online Health Community Forums: Data, Model and Evaluation
Resumo
Este estudo apresenta dados e modelos para a Análise de Sentimentos de textos em português sobre Diabetes Mellitus. O corpus é composto por 1290 posts, extraídos de forums online sobre tópicos de saúde e anotados por dois estudandes de acordo com 3 categorias (e.g. Positivo, Neutro e Negativo). A avaliação de classificadores de Aprendizagem de Máquina (classificadores Support Vector Machine, Decision Tree, Random Forest e Logistic Regression) tradicionais e estado-da-arte (modelos baseados em BERT) mostrou a vantagem em performance do segundo tipo como esperado. Os dados e modelos estão disponíveis para a comunidade por meio de solicitação.Referências
Cignarelli, A., Sansone, A., Caruso, I., Perrini, S., Natalicchio, A., Laviola, L., Jannini, E. A., and Giorgino, F. (2020). Diabetes in the time of covid-19: A twitter-based sentiment analysis. Journal of Diabetes Science and Technology, 14(6):1131–1132.
Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1):37–46.
Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186.
Gabarron, E., Bradway, M., Fernandez-Luque, L., Chomutare, T., Hansen, A. H., Wynn, R., and Arsand, E. (2018). Social media for health promotion in diabetes: study protocol for a participatory public health intervention design. BMC Health Services Research, 18(1):414.
Gabarron, E., Dorronzoro, E., Rivera-Romero, O., and Wynn, R. (2019). Diabetes on twitter: A sentiment analysis. Journal of Diabetes Science and Technology, 13(3):439– 444.
Landis, J. R. and Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33(1):159–174.
Liu, B. (2012). Sentiment Analysis and Opinion Mining. Morgan Claypool Publishers.
Liu, X., Sun, M., and Li, J. (2018). Research on gender differences in online health communities. International Journal of Medical Informatics, 111:172–181.
Liu, Y., Stouffs, R., and Theng, Y. L. (2020). Sentiment analysis on social media for identifying public awareness of type 2 diabetes. In The 54th International Conference of the Architectural Science Association (ANZAScA).
Lu, Y., Wu, Y., Liu, J., Li, J., and Zhang, P. (2017). Understanding health care social media use from different stakeholder perspectives: A content analysis of an online health community. J Med Internet Res, 19(4):e109.
Saeedi, P., Petersohn, I., Salpea, P., Malanda, B., Karuranga, S., Unwin, N., Colagiuri, S., Guariguata, L., Motala, A. A., Ogurtsova, K., Shaw, J. E., Bright, D., and Williams, R. (2019). Global and regional diabetes prevalence estimates for 2019 and projections for 2030 and 2045: Results from the international diabetes federation diabetes atlas, 9th edition. Diabetes Research and Clinical Practice, 157:107843.
Salas-Zárate, M. d. P., Medina-Moreira, J., Lagos-Ortiz, K., Luna-Aveiga, H., RodríguezGarcía, M. ´A., and Valencia-García, R. (2017). Sentiment analysis on tweets about diabetes: An aspect-level approach. Computational and Mathematical Methods in Medicine, 2017:5140631.
Schneider, E. T. R., de Souza, J. V. A., Knafou, J., Oliveira, L. E. S. e., Copara, J., Gumiel, Y. B., Oliveira, L. F. A. d., Paraiso, E. C., Teodoro, D., and Barra, C. M. C. M. (2020). BioBERTpt a Portuguese neural language model for clinical named entity recognition. In Proceedings of the 3rd Clinical Natural Language Processing Workshop, pages 65–72. Association for Computational Linguistics.
Souza, F., Nogueira, R., and Lotufo, R. (2020). Bertimbau: Pretrained bert models for brazilian portuguese. In Cerri, R. and Prati, R. C., editors, Intelligent Systems, pages 403–417, Cham. Springer International Publishing.
Tkachenko, M., Malyuk, M., Shevchenko, N., Holmanyuk, A., and Liubimov, N. (20202021). Label Studio: Data labeling software. Open source software available from https://github.com/heartexlabs/label-studio.
Yadav, A. and Vishwakarma, D. K. (2020). Sentiment analysis using deep learning architectures: a review. Artificial Intelligence Review, 53(6):4335–4385.
Yue, L., Chen, W., Li, X., Zuo, W., and Yin, M. (2019). A survey of sentiment analysis in social media. Knowledge and Information Systems, 60(2):617–663.
Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1):37–46.
Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186.
Gabarron, E., Bradway, M., Fernandez-Luque, L., Chomutare, T., Hansen, A. H., Wynn, R., and Arsand, E. (2018). Social media for health promotion in diabetes: study protocol for a participatory public health intervention design. BMC Health Services Research, 18(1):414.
Gabarron, E., Dorronzoro, E., Rivera-Romero, O., and Wynn, R. (2019). Diabetes on twitter: A sentiment analysis. Journal of Diabetes Science and Technology, 13(3):439– 444.
Landis, J. R. and Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33(1):159–174.
Liu, B. (2012). Sentiment Analysis and Opinion Mining. Morgan Claypool Publishers.
Liu, X., Sun, M., and Li, J. (2018). Research on gender differences in online health communities. International Journal of Medical Informatics, 111:172–181.
Liu, Y., Stouffs, R., and Theng, Y. L. (2020). Sentiment analysis on social media for identifying public awareness of type 2 diabetes. In The 54th International Conference of the Architectural Science Association (ANZAScA).
Lu, Y., Wu, Y., Liu, J., Li, J., and Zhang, P. (2017). Understanding health care social media use from different stakeholder perspectives: A content analysis of an online health community. J Med Internet Res, 19(4):e109.
Saeedi, P., Petersohn, I., Salpea, P., Malanda, B., Karuranga, S., Unwin, N., Colagiuri, S., Guariguata, L., Motala, A. A., Ogurtsova, K., Shaw, J. E., Bright, D., and Williams, R. (2019). Global and regional diabetes prevalence estimates for 2019 and projections for 2030 and 2045: Results from the international diabetes federation diabetes atlas, 9th edition. Diabetes Research and Clinical Practice, 157:107843.
Salas-Zárate, M. d. P., Medina-Moreira, J., Lagos-Ortiz, K., Luna-Aveiga, H., RodríguezGarcía, M. ´A., and Valencia-García, R. (2017). Sentiment analysis on tweets about diabetes: An aspect-level approach. Computational and Mathematical Methods in Medicine, 2017:5140631.
Schneider, E. T. R., de Souza, J. V. A., Knafou, J., Oliveira, L. E. S. e., Copara, J., Gumiel, Y. B., Oliveira, L. F. A. d., Paraiso, E. C., Teodoro, D., and Barra, C. M. C. M. (2020). BioBERTpt a Portuguese neural language model for clinical named entity recognition. In Proceedings of the 3rd Clinical Natural Language Processing Workshop, pages 65–72. Association for Computational Linguistics.
Souza, F., Nogueira, R., and Lotufo, R. (2020). Bertimbau: Pretrained bert models for brazilian portuguese. In Cerri, R. and Prati, R. C., editors, Intelligent Systems, pages 403–417, Cham. Springer International Publishing.
Tkachenko, M., Malyuk, M., Shevchenko, N., Holmanyuk, A., and Liubimov, N. (20202021). Label Studio: Data labeling software. Open source software available from https://github.com/heartexlabs/label-studio.
Yadav, A. and Vishwakarma, D. K. (2020). Sentiment analysis using deep learning architectures: a review. Artificial Intelligence Review, 53(6):4335–4385.
Yue, L., Chen, W., Li, X., Zuo, W., and Yin, M. (2019). A survey of sentiment analysis in social media. Knowledge and Information Systems, 60(2):617–663.
Publicado
29/11/2021
Como Citar
GUMIEL, Yohan Bonescki; LEE, Isabela; SOARES, Tayane Arantes; FERREIRA, Thiago Castro; PAGANO, Adriana.
Sentiment Analysis in Portuguese Texts from Online Health Community Forums: Data, Model and Evaluation. In: SIMPÓSIO BRASILEIRO DE TECNOLOGIA DA INFORMAÇÃO E DA LINGUAGEM HUMANA (STIL), 13. , 2021, Evento Online.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2021
.
p. 64-72.
DOI: https://doi.org/10.5753/stil.2021.17785.