Telegram4DS: Um Conjunto de Dados de Perguntas de Grupos Brasileiros do Telegram sobre Ciência de Dados e IA

  • Leonardo Gargano UFRJ
  • Adriana S. Vivacqua UFRJ

Resumo


Aplicativos de mensagens instantâneas concentram discussões técnicas que raramente são capturadas por plataformas tradicionais de perguntas e respostas. Este trabalho apresenta o Telegram4DS, um conjunto de dados rotulado manualmente com 2.000 perguntas coletadas em 10 grupos públicos do Telegram focados em Ciência de Dados e Inteligência Artificial no Brasil. A partir de 631.014 mensagens brutas, foram selecionadas perguntas aleatoriamente e aplicada análise temática para classificá-las em quatro categorias: mercado, cursos, dúvidas gerais e material.

Referências

Baumgartner, J., Zannettou, S., Squire, M. and Blackburn, J. 2020. The Pushshift Telegram Dataset. Proceedings of the International AAAI Conference on Web and Social Media. 14, 1 (May 2020), 840-847. DOI: org/10.1609/icwsm.v14i1.7348

Karimpour, D.; Zare Chahooki, M. A. and Hashemi, A. "User recommendation based on Hybrid filtering in Telegram messenger," 2021 26th International Computer Conference, Computer Society of Iran (CSICC), Tehran, Iran, 2021, pp. 1-7, DOI: 10.1109/CSICC52343.2021.9420562.

Parra, E. et. al. 2020. GitterCom: A Dataset of Open Source Developer Communications in Gitter. In Proceedings of the 17th International Conference on Mining Software Repositories (MSR '20). Association for Computing Machinery, New York, NY, USA, 563–567. DOI: 10.1145/3379597.3387494

Garimella, K., and Tyson, G. 2018. Whatapp Doc? A First Look at Whatsapp Public Group Data. In Twelfth Interna tional AAAI Conference on Web and Social Media.

Tacheva, J.; Lahiri, S. and Saltz, J. "Analyzing a Data Science Online Practitioner Community: Trends and Implications for Data Science Project Management," 2022 IEEE International Conference on Big Data (Big Data), Osaka, Japan, 2022, pp. 2673-2681, DOI: 10.1109/BigData55660.2022.10020600.

Reis, J. C. S. and Benevenuto, F. 2021. Supervised Learning for Misinformation Detection in WhatsApp. In Proceedings of the Brazilian Symposium on Multimedia and the Web (WebMedia '21). Association for Computing Machinery, New York, NY, USA, 245–252. DOI: 10.1145/3470482.3479641

Karbasian, H., & Johri, A. (2020, February). Insights for curriculum development: Identifying emerging data science topics through analysis of Q&A communities. In Proceedings of the 51st ACM Technical Symposium on Computer Science Education (pp. 192-198).

Kou, Y.; Gray, C. M.; Toombs, A. L. and Adams, R. S. "Understanding social roles in an online community of volatile practice: A study of user experience practitioners on reddit", ACM Transactions on Social Computing, vol. 1, no. 4, pp. 1-22, 2018.

Lueg, C.P. (2007), Querying information systems or interacting with intermediaries? Towards understanding the informational capacity of online communities. Proc. Am. Soc. Info. Sci. Tech., 44: 1-6. DOI: 10.1002/meet.1450440249

Júnior, M.; Melo, P.; Silva, A. P. C.; Benevenuto, F. and Almeida, J. 2021. Towards Understanding the Use of Telegram by Political Groups in Brazil. In Proceedings of the Brazilian Symposium on Multimedia and the Web (WebMedia '21). Association for Computing Machinery, New York, NY, USA, 237–244. DOI: 10.1145/3470482.3479640

Silva, C. M. C. Identifying reusable knowledge in developer instant messaging communication. 2022

Subash, K. M., Kumar, L. P., Vadlamani, S. L., Chatterjee, P., & Baysal, O. (2022, May). DISCO: A dataset of Discord chat conversations for software engineering research. In Proceedings of the 19th International Conference on Mining Software Repositories (pp. 227-231).
Publicado
19/07/2026
GARGANO, Leonardo; VIVACQUA, Adriana S.. Telegram4DS: Um Conjunto de Dados de Perguntas de Grupos Brasileiros do Telegram sobre Ciência de Dados e IA. In: BRAZILIAN WORKSHOP ON SOCIAL NETWORK ANALYSIS AND MINING (BRASNAM), 15. , 2026, Gramado/RS. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2026 . p. 277-283. ISSN 2595-6094. DOI: https://doi.org/10.5753/brasnam.2026.23671.

Artigos mais lidos do(s) mesmo(s) autor(es)

1 2 > >>