Identifying Depressive Content in Decentralized Social Networks: A Case Study on Bluesky
Resumo
Depression is a global mental health concern, yet scalable screening remains challenging. While prior work focuses on centralized platforms, little is known about depressive signals in newer networks like Bluesky. We present a manually annotated dataset of 4,898 public English Bluesky posts (14.5% labeled as depressive). Evaluating a hybrid pipeline combining XGBoost and fine-tuned transformers (BERT-base, RoBERTa-base, and MentalBERT), our best model achieves an F1-score of 80.5% on Bluesky and 90.8% on Twitter. This highlights that core linguistic cues remain informative across decentralized platforms. The proposed system is intended for decision support rather than diagnosis, requiring clinical validation and ethical oversight.
Referências
Association, A. P. (2013). Diagnostic and Statistical Manual of Mental Disorders (DSM-5).
Atapattu, T. et al. (2022). Emoment: Multilingual mental health corpus with fine-grained emotion annotations. In Proceedings of the 29th International Conference on Computational Linguistics (COLING 2022).
Bains, N. and Abdijadid, S. (2023). Major depressive disorder. In StatPearls [Internet].
Bokolo, A. and Liu, C. (2024). Comparative performance of transformer-based models for detecting mental health conditions. Electronics.
Cacheda, F. et al. (2019). Early detection of depression: social network analysis and random forest techniques. Journal of Medical Internet Research.
Cao, L., Wang, R., and Zhou, Y. (2025). A systematic review of machine learning approaches for depression detection on social media. Journal of Big Data Science.
Cha, J., Park, S., and Sim, J. (2022). A lexicon-based approach to examine depression detection in social media: the case of twitter and university community. Palgrave Communications.
Chancellor, S. et al. (2016). Quantifying and predicting mental illness severity in online pro-eating disorder communities. Proceedings of the 19th ACM Conference on Computer-Supported Cooperative Work & Social Computing (CSCW).
Conway, M. and O’Connor, D. (2016). Social media, big data, and mental health: current advances and ethical implications. Current Opinion in Psychology.
Coppersmith, G., Dredze, M., and Harman, C. (2014). Quantifying mental health signals in twitter. In Proceedings of the Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality.
D’Cruz, L., Dubey, V., and Thakur, P. (2023). Depression prediction from combined reddit and twitter data using machine learning. In 2023 2nd International Conference for Innovation in Technology (INOCON).
De Choudhury, M., Counts, S., and Horvitz, E. (2013a). Social media as a measurement tool of depression in populations. Proceedings of the 5th Annual ACM Web Science Conference.
De Choudhury, M. et al. (2013b). Predicting depression via social media. In Proceedings of the International AAAI Conference on Weblogs and Social Media (ICWSM).
Devlin, J. et al. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding.
Germain, A. and Kupfer, D. J. (2008). Circadian rhythm disturbances in depression. Human Psychopharmacology: Clinical and Experimental.
Grabb, D., Lamparth, M., and Vasan, N. (2024). Risks from language models for automated mental healthcare: Ethics and structure for implementation. arXiv preprint arXiv:2406.11852.
Guntuku, S. C. et al. (2017). Detecting depression and mental illness on social media: an integrative review. Current Opinion in Behavioral Sciences.
Hasan, A. and Kumar, R. (2025). Benchmarking transformer and lstm models for depression detection on reddit. arXiv preprint arXiv:2507.19511.
He, P., Gao, J., and Chen, W. (2021). Deberta: Decoding-enhanced bert with disentangled attention. arXiv preprint arXiv:2006.03654.
InfamousCoder (2022). Depression: Twitter dataset + feature extraction.
Ji, S. et al. (2021). Mentalbert: Publicly available pretrained language models for mental healthcare.
Kabir, M. et al. (2022). Deptweet: A typology for social media texts to detect depression severities. Computers in Human Behavior.
Kerasiotis, M., Ilias, L., and Askounis, D. (2024). Depression detection in social media posts using transformer-based models and auxiliary features. Social Network Analysis and Mining.
Lieblich, S. M. et al. (2015). High heterogeneity and low reliability in the diagnosis of major depression will impair the development of new drugs. BJPsych Open.
Liu, D. et al. (2022). Detecting and measuring depression on social media using a machine learning approach: Systematic review. JMIR Mental Health.
Liu, Y. et al. (2019). Roberta: A robustly optimized bert pretraining approach.
Losada, D. E., Crestani, F., and Parapar, J. (2019). Overview of erisk 2019: Early risk prediction on the internet. In Experimental IR Meets Multilinguality, Multimodality, and Interaction (CLEF 2019).
Matei, H. (2025). Bluesky vs twitter: Which platform is the future of social media? Organization, W. H. (2023). Depressive disorder (depression).
Rai, S. et al. (2024). Key language markers of depression on social media depend on race. Proceedings of the National Academy of Sciences of the United States of America.
Regier, D. A. et al. (2013). Dsm-5 field trials in the united states and canada, part ii: Test-retest reliability of selected categorical diagnoses. American Journal of Psychiatry.
Silberling, A. (2024). Bluesky is now open for anyone to join — techcrunch.
Spitzer, R. L., Kroenke, K., and Williams, J. B. (2001). Validity of a brief depression severity measure (phq-9). Journal of General Internal Medicine.
Tadesse, M. M. et al. (2019). Detection of depression-related posts in reddit social media forum.
Tavchioski, I., Robnik-Šikonja, M., and Pollak, S. (2023). Detection of depression on social networks using transformers and ensembles.
Topics, E. (2025). Bluesky users: Age, gender, and demographic insights (2025).
Zimmer, M. (2010). “but the data is already public”: On the ethics of research in facebook. Ethics and Information Technology.
