DepressSet: a dataset of textual analyses on depressive posts
Abstract
Social media can be useful for seeking help or guidance on how to deal with or to better understand depressive disorder. However, dealing with data on such a disorder can be challenging due to the sensitivity of the content domain or even the difficulty in finding data on the subject. In this work, we present a dataset collected from depression-related communities on Facebook in September 2022. We specify the extraction, processing, storage, and release of the data, along with its limitations, challenges, and lessons learned. We enriched the captured data with linguistic analyses of the posts and also with predictions for each post using a text classification model. Finally, we propose potential applications of the dataset and its limitations.References
Chou, W.-Y. S., Gaysynsky, A., Trivedi, N., e Vanderpool, R. C. (2021). Using social media for health: National data from hints 2019. Journal of Health Communication, 26(3):184193.
De Choudhury, M. (2014). Opportunities of social media in health and well-being. XRDS, 21(2):2327.
Giuntini, F. T., Cazzolato, M. T., de Jesus Dutra dos Reis, M., Campbell, A. T., Traina, A. J. M., e Ueyama, J. (2020). A review on recognizing depression in social networks: challenges and opportunities. Journal of Ambient Intelligence and Humanized Computing, 11:1–17.
Gonçalves, M. V., dos Santos, J., Ferreira, C., Zavaleta, J., Cruz, S., e Oliveira, J. (2021). Datasets curados e enriquecidos com proveniência da campanha nacional de vacinação contra covid-19. pp. 148–159.
Lima Filho., S., Ferreira da Silva., M., e Oliveira., J. (2024). A systematic analysis of depression-related discourse within facebook: A comparison between brazilian and american communities. In Proceedings of the 17th International Joint Conference on Biomedical Engineering Systems and Technologies - HEALTHINF, pp. 466–473. INSTICC, SciTePress.
Low, D. M., Rumker, L., Torous, J., Cecchi, G., Ghosh, S. S., e Talkar, T. (2020). Natural language processing reveals vulnerable mental health support groups and heightened health anxiety on reddit during covid-19: Observational study. Journal of medical Internet research, 22(10):e22635.
P. Lima Filho, S., Ferreira da Silva, M., Oliveira, J., e Ruback, L. (2022). A study about gathering features in depression detection problem with health professionals community. iSys - Brazilian Journal of Information Systems, 15(1):10:110:26.
Reimers, N. e Gurevych, I. (2019). Sentence-bert: Sentence embeddings using siamese bert-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics.
Ríssola, E. A., Bahrainian, S. A., e Crestani, F. (2020). A dataset for research on depression in social media. Proceedings of the 28th ACM Conference on User Modeling, Adaptation and Personalization.
Shing, H.-C., Nair, S., Zirikly, A., Friedenberg, M., Daumé III, H., e Resnik, P. (2018). Expert, crowdsourced, and machine assessment of suicide risk via online postings. In Loveys, K., Niederhoffer, K., Prud’hommeaux, E., Resnik, R., e Resnik, P., editors, Proceedings of the Fifth Workshop on Computational Linguistics and Clinical Psychology: From Keyboard to Clinic, pp. 25–36, New Orleans, LA. Association for Computational Linguistics.
Yen, S.-C., Chu, K.-C., e Tsai, P.-Y. (2021). Prediction model of social network suicide ideation by small sample. In 2021 IEEE 22nd International Conference on Information Reuse and Integration for Data Science (IRI), pp. 385–389. IEEE.
De Choudhury, M. (2014). Opportunities of social media in health and well-being. XRDS, 21(2):2327.
Giuntini, F. T., Cazzolato, M. T., de Jesus Dutra dos Reis, M., Campbell, A. T., Traina, A. J. M., e Ueyama, J. (2020). A review on recognizing depression in social networks: challenges and opportunities. Journal of Ambient Intelligence and Humanized Computing, 11:1–17.
Gonçalves, M. V., dos Santos, J., Ferreira, C., Zavaleta, J., Cruz, S., e Oliveira, J. (2021). Datasets curados e enriquecidos com proveniência da campanha nacional de vacinação contra covid-19. pp. 148–159.
Lima Filho., S., Ferreira da Silva., M., e Oliveira., J. (2024). A systematic analysis of depression-related discourse within facebook: A comparison between brazilian and american communities. In Proceedings of the 17th International Joint Conference on Biomedical Engineering Systems and Technologies - HEALTHINF, pp. 466–473. INSTICC, SciTePress.
Low, D. M., Rumker, L., Torous, J., Cecchi, G., Ghosh, S. S., e Talkar, T. (2020). Natural language processing reveals vulnerable mental health support groups and heightened health anxiety on reddit during covid-19: Observational study. Journal of medical Internet research, 22(10):e22635.
P. Lima Filho, S., Ferreira da Silva, M., Oliveira, J., e Ruback, L. (2022). A study about gathering features in depression detection problem with health professionals community. iSys - Brazilian Journal of Information Systems, 15(1):10:110:26.
Reimers, N. e Gurevych, I. (2019). Sentence-bert: Sentence embeddings using siamese bert-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics.
Ríssola, E. A., Bahrainian, S. A., e Crestani, F. (2020). A dataset for research on depression in social media. Proceedings of the 28th ACM Conference on User Modeling, Adaptation and Personalization.
Shing, H.-C., Nair, S., Zirikly, A., Friedenberg, M., Daumé III, H., e Resnik, P. (2018). Expert, crowdsourced, and machine assessment of suicide risk via online postings. In Loveys, K., Niederhoffer, K., Prud’hommeaux, E., Resnik, R., e Resnik, P., editors, Proceedings of the Fifth Workshop on Computational Linguistics and Clinical Psychology: From Keyboard to Clinic, pp. 25–36, New Orleans, LA. Association for Computational Linguistics.
Yen, S.-C., Chu, K.-C., e Tsai, P.-Y. (2021). Prediction model of social network suicide ideation by small sample. In 2021 IEEE 22nd International Conference on Information Reuse and Integration for Data Science (IRI), pp. 385–389. IEEE.
Published
2024-07-21
How to Cite
LIMA FILHO, Silas; SILVA, Eliel Roger da; OLIVEIRA, Jonice; SILVA, Mônica Ferreira da.
DepressSet: a dataset of textual analyses on depressive posts. In: BRAZILIAN WORKSHOP ON SOCIAL NETWORK ANALYSIS AND MINING (BRASNAM), 13. , 2024, Brasília/DF.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2024
.
p. 214-220.
ISSN 2595-6094.
DOI: https://doi.org/10.5753/brasnam.2024.2774.