Exploring Brazilian TikTok and YouTube Shorts: A Public Dataset for Video Characterization

  • Tomas Lacerda UFMG
  • Marcelo Sartori Locatelli UFMG
  • Igor Costa UFMG
  • Lorenzo Carneiro UFMG
  • Virgílio Almeida UFMG
  • Wagner Meira Jr. UFMG

Resumo


Short video platforms have garnered significant attention in recent years, with discussions ranging from concerns about inappropriate content and addiction to strategies for maximizing user engagement and screen time. Despite the large user base and growing relevance of these platforms, there is still a notable lack of comprehensive datasets focused on broad recommendation and moderation. This is especially true for TikTok, where API access is limited and collecting unbiased data is challenging. In this collection, we present a diverse and rich dataset from YouTube Shorts and TikTok’s main feeds in Brazil, comprising over 35,000 videos. The dataset includes detailed engagement statistics, extensive video metadata, over 170,000 keyframes for visual analysis, and Safe Search API assessments for each keyframe. This rich resource fills a critical data gap, offering valuable tools for research on content categorization, user behavior analysis, and platform engagement strategies.

Referências

Anderson, K. E. (2020). Getting acquainted with social networks and apps: it is time to talk about tiktok. Library hi tech news, 37(4):7–12.

Guarda, T., Augusto, M. F., Victor, J. A., Mazón, L. M., Lopes, I., and Oliveira, P. (2021). The impact of tiktok on digital marketing. In Marketing and Smart Technologies: Proceedings of ICMarkTech 2020, pages 35–44. Springer.

Peng, C., Lee, J.-Y., and Liu, S. (2022). Psychological phenomenon analysis of short video users’ anxiety, addiction and subjective well-being. International Journal of Contents, 18(1):27–39.

Pinto, G., Burghardt, K., Lerman, K., and Ferrara, E. (2024). Get-tok: A genai-enriched multimodal tiktok dataset documenting the 2022 attempted coup in peru. arXiv preprint arXiv:2402.05882.

Shutsko, A. (2020). User-generated short video content in social media. a case study of tiktok. In Social Computing and Social Media. Participation, User Experience, Consumer Experience, and Applications of Social Computing: 12th International Conference, SCSM 2020, Held as Part of the 22nd HCI International Conference, HCII 2020, Copenhagen, Denmark, July 19–24, 2020, Proceedings, Part II 22, pages 108–125. Springer.

Steel, B., Parker, S., and Ruths, D. (2023). The invasion of ukraine viewed through tiktok: A dataset. arXiv preprint arXiv:2301.08305.

Valdovinos Kaye, D. B. (2020). Make this go viral: Building musical careers through accidental virality on tiktok. Flow, 27(1).

Vázquez-Herrero, J., Negreira-Rey, M.-C., and López-García, X. (2022). Let’s dance the news! how the news media are adapting to the logic of tiktok. Journalism, 23(8):1717–1735.

Weimann, G. and Masri, N. (2023). Research note: Spreading hate on tiktok. Studies in conflict & terrorism, 46(5):752–765.
Publicado
20/07/2025
LACERDA, Tomas; LOCATELLI, Marcelo Sartori; COSTA, Igor; CARNEIRO, Lorenzo; ALMEIDA, Virgílio; MEIRA JR., Wagner. Exploring Brazilian TikTok and YouTube Shorts: A Public Dataset for Video Characterization. In: BRAZILIAN WORKSHOP ON SOCIAL NETWORK ANALYSIS AND MINING (BRASNAM), 14. , 2025, Maceió/AL. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2025 . p. 290-296. ISSN 2595-6094. DOI: https://doi.org/10.5753/brasnam.2025.9244.

Artigos mais lidos do(s) mesmo(s) autor(es)

1 2 3 4 5 6 > >>