DepreRedditBR: Um conjunto de dados textuais com postagens depressivas no idioma português brasileiro
Resumo
A depressão é um transtorno mental que apresenta características, muitas vezes, incapacitantes. O monitoramento da atividade de usuários em suas redes sociais pode ajudar na identificação precoce da depressão. Pesquisas tem buscado dados textuais para treinar modelos e gerar soluções computacionais, porém a maioria ainda utiliza dados no idioma inglês. Neste cenário, este trabalho construiu o DepreRedditBR, um conjunto de dados textuais com 509.675 instâncias de postagens com teor depressivo a partir da rede Reddit no idioma português brasileiro. O DepreRedditBR foi utilizado para o pré-treinamento de um LLM, cujo conhecimento adquirido permitiu que o modelo, depois de ajustado, classificasse postagens de acordo com o grau de depressão.
Palavras-chave:
Conjunto de dados textuais, Depressão, Saúde mental, Reddit
Referências
Azam, F., Agro, M., Sami, M., Abro, M. H., and Dewani, A. (2021). Identifying depression among twitter users using sentiment analysis. In 2021 international conference on artificial intelligence (ICAI), pages 44–49. IEEE.
Balage Filho, P., Pardo, T. A. S., and Aluísio, S. (2013). An evaluation of the brazilian portuguese liwc dictionary for sentiment analysis. In Proceedings of the 9th Brazilian Symposium in Information and Human Language Technology.
Cacheda, F., Fernandez, D., Novoa, F. J., Carneiro, V., et al. (2019). Early detection of depression: social network analysis and random forest techniques. Journal of medical Internet research, 21(6):e12554.
Caseli, H. d. M. and Nunes, M. d. G. V. (2023). Processamento de linguagem natural: conceitos, técnicas e aplicações em português. BPLN, 2a edition.
da Silva Nascimento, R., Parreira, P., dos Santos, G. N., and Guedes, G. P. (2018). Identificando sinais de comportamento depressivo em redes sociais. In Anais do VII Brazilian Workshop on Social Network Analysis and Mining. SBC.
De Choudhury, M., Gamon, M., Counts, S., and Horvitz, E. (2013). Predicting depression via social media. In Seventh international AAAI conference on weblogs and social media, pages 128–137.
Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 4171–4186. NAACL.
Estrela, P., Andrade, L., Souza, D., Cunha, A., and Mendes, R. (2024). Análise de sentimentos em postagens do reddit no intercurso da pandemia de covid-19. Submetido à Revista Principia.
Herculano, A., Gomes, G., Souza, D., and Rêgo, A. (2022). Detecting signs of mental disorders on social networks: a systematic literature review. DATA ANALYTICS 2022, pages 55–61.
Ji, S., Zhang, T., Ansari, L., Fu, J., Tiwari, P., and Cambria, E. (2022). MentalBERT: Publicly available pretrained language models for mental healthcare. In Calzolari, N., Béchet, F., Blache, P., Choukri, K., Cieri, C., Declerck, T., Goggi, S., Isahara, H., Maegaard, B., Mariani, J., Mazo, H., Odijk, J., and Piperidis, S., editors, Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 7184–7190, Marseille, France. European Language Resources Association.
Kristensen, C. H., Gomes, C. F. d. A., Justo, A. R., and Vieira, K. (2011). Normas brasileiras para o affective norms for english words. Trends in Psychiatry and Psychotherapy, 33:135–146.
Low, D. M., Rumker, L., Talkar, T., Torous, J., Cecchi, G., and Ghosh, S. S. (2020). Natural language processing reveals vulnerable mental health support groups and heightened health anxiety on reddit during covid-19: Observational study. Journal of medical Internet research, 22(10):e22635.
Nardi, A. E., da Silva, A. G., and Quevedo, J. (2021). Tratado de Psiquiatria da Associação Brasileira de Psiquiatria. Artmed Editora.
Pérez, A., Parapar, J., and Barreiro, Á. (2022). Automatic depression score estimation with word embedding models. Artificial Intelligence in Medicine, 132:102380.
Sampath, K. and Durairaj, T. (2022). Data set creation and empirical analysis for detecting signs of depression from social media postings. In International Conference on Computational Intelligence in Data Science, pages 136–151. Springer.
Santos, W. R. d., de Oliveira, R. L., and Paraboni, I. (2023). Setembrobr: a social media corpus for depression and anxiety disorder prediction. Language Resources and Evaluation, pages 1–28.
Souza, F., Nogueira, R., and Lotufo, R. (2020). Bertimbau: pretrained bert models for brazilian portuguese. In Intelligent Systems: 9th Brazilian Conference, BRACIS 2020, Rio Grande, Brazil, October 20–23, 2020, Proceedings, Part I 9, pages 403–417. Springer.
Sperling, O. V. and Ladeira, M. (2019). Mining twitter data for signs of depression in brazil. In Anais do VII Symposium on Knowledge Discovery, Mining and Learning, pages 25–32. SBC.
Uban, A.-S., Chulvi, B., and Rosso, P. (2021). An emotion and cognitive based analysis of mental health disorders from social media data. Future Generation Computer Systems, 124:480–494.
Vedula, N. and Parthasarathy, S. (2017). Emotional and linguistic cues of depression from social media. In Proceedings of the 2017 International Conference on Digital Health, pages 127–136.
WHO (2023). World health organization. [link] Last accessed 10 Julho 2024.
Balage Filho, P., Pardo, T. A. S., and Aluísio, S. (2013). An evaluation of the brazilian portuguese liwc dictionary for sentiment analysis. In Proceedings of the 9th Brazilian Symposium in Information and Human Language Technology.
Cacheda, F., Fernandez, D., Novoa, F. J., Carneiro, V., et al. (2019). Early detection of depression: social network analysis and random forest techniques. Journal of medical Internet research, 21(6):e12554.
Caseli, H. d. M. and Nunes, M. d. G. V. (2023). Processamento de linguagem natural: conceitos, técnicas e aplicações em português. BPLN, 2a edition.
da Silva Nascimento, R., Parreira, P., dos Santos, G. N., and Guedes, G. P. (2018). Identificando sinais de comportamento depressivo em redes sociais. In Anais do VII Brazilian Workshop on Social Network Analysis and Mining. SBC.
De Choudhury, M., Gamon, M., Counts, S., and Horvitz, E. (2013). Predicting depression via social media. In Seventh international AAAI conference on weblogs and social media, pages 128–137.
Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 4171–4186. NAACL.
Estrela, P., Andrade, L., Souza, D., Cunha, A., and Mendes, R. (2024). Análise de sentimentos em postagens do reddit no intercurso da pandemia de covid-19. Submetido à Revista Principia.
Herculano, A., Gomes, G., Souza, D., and Rêgo, A. (2022). Detecting signs of mental disorders on social networks: a systematic literature review. DATA ANALYTICS 2022, pages 55–61.
Ji, S., Zhang, T., Ansari, L., Fu, J., Tiwari, P., and Cambria, E. (2022). MentalBERT: Publicly available pretrained language models for mental healthcare. In Calzolari, N., Béchet, F., Blache, P., Choukri, K., Cieri, C., Declerck, T., Goggi, S., Isahara, H., Maegaard, B., Mariani, J., Mazo, H., Odijk, J., and Piperidis, S., editors, Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 7184–7190, Marseille, France. European Language Resources Association.
Kristensen, C. H., Gomes, C. F. d. A., Justo, A. R., and Vieira, K. (2011). Normas brasileiras para o affective norms for english words. Trends in Psychiatry and Psychotherapy, 33:135–146.
Low, D. M., Rumker, L., Talkar, T., Torous, J., Cecchi, G., and Ghosh, S. S. (2020). Natural language processing reveals vulnerable mental health support groups and heightened health anxiety on reddit during covid-19: Observational study. Journal of medical Internet research, 22(10):e22635.
Nardi, A. E., da Silva, A. G., and Quevedo, J. (2021). Tratado de Psiquiatria da Associação Brasileira de Psiquiatria. Artmed Editora.
Pérez, A., Parapar, J., and Barreiro, Á. (2022). Automatic depression score estimation with word embedding models. Artificial Intelligence in Medicine, 132:102380.
Sampath, K. and Durairaj, T. (2022). Data set creation and empirical analysis for detecting signs of depression from social media postings. In International Conference on Computational Intelligence in Data Science, pages 136–151. Springer.
Santos, W. R. d., de Oliveira, R. L., and Paraboni, I. (2023). Setembrobr: a social media corpus for depression and anxiety disorder prediction. Language Resources and Evaluation, pages 1–28.
Souza, F., Nogueira, R., and Lotufo, R. (2020). Bertimbau: pretrained bert models for brazilian portuguese. In Intelligent Systems: 9th Brazilian Conference, BRACIS 2020, Rio Grande, Brazil, October 20–23, 2020, Proceedings, Part I 9, pages 403–417. Springer.
Sperling, O. V. and Ladeira, M. (2019). Mining twitter data for signs of depression in brazil. In Anais do VII Symposium on Knowledge Discovery, Mining and Learning, pages 25–32. SBC.
Uban, A.-S., Chulvi, B., and Rosso, P. (2021). An emotion and cognitive based analysis of mental health disorders from social media data. Future Generation Computer Systems, 124:480–494.
Vedula, N. and Parthasarathy, S. (2017). Emotional and linguistic cues of depression from social media. In Proceedings of the 2017 International Conference on Digital Health, pages 127–136.
WHO (2023). World health organization. [link] Last accessed 10 Julho 2024.
Publicado
14/10/2024
Como Citar
HERCULANO, Ayrton Douglas Rodrigues; DE PAULA, Taw-Ham Almeida Balbino; FERNANDES, Damires Yluska de Souza; REGO, Alex Sandro da Cunha.
DepreRedditBR: Um conjunto de dados textuais com postagens depressivas no idioma português brasileiro. In: DATASET SHOWCASE WORKSHOP (DSW), 6. , 2024, Florianópolis/SC.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2024
.
p. 77-90.
DOI: https://doi.org/10.5753/dsw.2024.243994.