Social Network Data Management with Network Analysis and Topic Modeling
Abstract
This article concerns the management, handling and analysis of data obtained from digital social networks. Initially, it is explained how the information collected was used to perform analyses, detailing the use of software dedicated to network analysis, observing possible communities, and topic modeling algorithms. Practical results are presented for each of these data enrichment processes for researches involving posts about vaccination in Brazil and responses to women candidates in 2022 elections in Brazil. Among the challenges encountered, we highlight the ability to deal with large volumes of data, the application of network analysis concepts and the inference of topics given the algorithms results. Although the examples and experiments are related to Twitter, the subjects investigated and discussed here can apply to other social networks.
Keywords:
Data cleaning, information filtering, and publishing, Data mining and analytics, Social networks and crowdsourcing
References
Blondel, V. D., Guillaume, J.-L., Lambiotte, R., and Lefebvre, E. (2008). Fast unfolding of communities in large networks. Journal of statistical mechanics: theory and experiment, 2008(10):P10008.
Chauhan, U. and Shah, A. (2021). Topic modeling using latent dirichlet allocation: A survey. ACM Comput. Surv., 54(7).
Cherepnalkoski, D. and Mozetic, I. (2015). A retweet network analysis of the european parliament. In Procs. 11th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS), pages 350–357.
Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2019). Bert: Pre-training of deep bidirectional transformers for language understanding.
Gargiulo, F., Cafiero, F., Guille-Escuret, P., Seror, V., and Ward, J. K. (2020). Asymmetric participation of defenders and critics of vaccines to debates on french-speaking twitter. Scientific reports, 10(1):1–12.
Grootendorst, M. (2022). Bertopic: Neural topic modeling with a class-based tf-idf procedure.
Heine, A., Coutinho, B., Barreto, M., Xavier, N., Villas, M., Ituassu, A., and Lifschitz, S. (2021). Análise de dados para comunicação polı́tica a partir de um sistema de coleta de tweets. In Anais Estendidos do XXXVI Simpósio Brasileiro de Bancos de Dados, pages 49–55, Porto Alegre, RS, Brasil. SBC.
Novak, P. K., Amicis, L. D., and Mozetič, I. (2018). Impact investing market on twitter: influential users and communities. Applied network science, 3(1):1–20.
Page, L., Brin, S., Motwani, R., and Winograd, T. (1999). The pagerank citation ranking: Bringing order to the web. Technical Report 1999-66, Stanford InfoLab. Previous number = SIDL-WP-1999-0120.
Reimers, N. and Gurevych, I. (2019). Sentence-bert: Sentence embeddings using siamese bert-networks.
Verjovsky, M., Barreto, M. P., Carmo, I., Coutinho, B., Thomer, L., Lifschitz, S., and Jurberg, C. (2023). Political quarrel overshadows vaccination advocacy: How the vaccine debate on brazilian twitter was framed by anti-vaxxers during bolsonaro government.
Chauhan, U. and Shah, A. (2021). Topic modeling using latent dirichlet allocation: A survey. ACM Comput. Surv., 54(7).
Cherepnalkoski, D. and Mozetic, I. (2015). A retweet network analysis of the european parliament. In Procs. 11th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS), pages 350–357.
Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2019). Bert: Pre-training of deep bidirectional transformers for language understanding.
Gargiulo, F., Cafiero, F., Guille-Escuret, P., Seror, V., and Ward, J. K. (2020). Asymmetric participation of defenders and critics of vaccines to debates on french-speaking twitter. Scientific reports, 10(1):1–12.
Grootendorst, M. (2022). Bertopic: Neural topic modeling with a class-based tf-idf procedure.
Heine, A., Coutinho, B., Barreto, M., Xavier, N., Villas, M., Ituassu, A., and Lifschitz, S. (2021). Análise de dados para comunicação polı́tica a partir de um sistema de coleta de tweets. In Anais Estendidos do XXXVI Simpósio Brasileiro de Bancos de Dados, pages 49–55, Porto Alegre, RS, Brasil. SBC.
Novak, P. K., Amicis, L. D., and Mozetič, I. (2018). Impact investing market on twitter: influential users and communities. Applied network science, 3(1):1–20.
Page, L., Brin, S., Motwani, R., and Winograd, T. (1999). The pagerank citation ranking: Bringing order to the web. Technical Report 1999-66, Stanford InfoLab. Previous number = SIDL-WP-1999-0120.
Reimers, N. and Gurevych, I. (2019). Sentence-bert: Sentence embeddings using siamese bert-networks.
Verjovsky, M., Barreto, M. P., Carmo, I., Coutinho, B., Thomer, L., Lifschitz, S., and Jurberg, C. (2023). Political quarrel overshadows vaccination advocacy: How the vaccine debate on brazilian twitter was framed by anti-vaxxers during bolsonaro government.
Published
2023-09-25
How to Cite
CARMO, Isabella; L. C. RÊGO, André; BARRETO, Mariana; SCHULER, Marina; HEINE, Alexandre; VILLAS, Marcos V.; LIFSCHITZ, Sérgio.
Social Network Data Management with Network Analysis and Topic Modeling. In: WORKSHOP ON UNDERGRADUATE STUDENT WORK (WTAG) - BRAZILIAN SYMPOSIUM ON DATABASES (SBBD), 38. , 2023, Belo Horizonte/MG.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2023
.
p. 64-70.
DOI: https://doi.org/10.5753/sbbd_estendido.2023.233417.
