Explorando os Limites da Reprodutibilidade na Tarefa de Detecção de Comunidades em Modelos de Redes

  • Rainara Araújo Mateus UFMG
  • Carlos H. G. Ferreira UFOP
  • Ana Paula Couto da Silva UFMG

Resumo


Context: Network model-based studies are applied across various fields, including social media. Problem: Data availability and reproducibility are challenging due to restrictive data policies and the significant computational resources required to process large and complex networks. In this context, sampling techniques offer a viable alternative by selecting representative sub-networks that preserve the essential structural properties of the original network. Despite their potential, there is a dearth of studies investigating how sampling methods can generate networks at different scales and quantify their limitations in detecting communities, especially in conjunction with backbone extraction, a crucial step that can significantly affect the network’s probabilistic properties. Solution: This paper evaluates the effectiveness of different sampling methods in improving the availability and reproducibility of network analysis, with a focus on community detection and backbone extraction. SI Theory: Our work is supported by Social Network Theory, which emphasizes relationships and connections among actors within a network over individual attributes. Method: In our research we quantify the limits, properties and scenarios in which smaller network versions can provide comparable communities structures to the original network. Summarization of Results: Our results show that certain sampling methods can effectively capture community structures even in reduced network representations. Contributions and Impact in the area of SI: This research facilitates the reproducibility and democratization of network studies and provides guidelines for creating networks at different scales that allow researchers to replicate certain studies.

Palavras-chave: Reproducibility, Network community detection, Backbone extraction, Network sampling

Referências

Nesreen K Ahmed, Jennifer Neville, and Ramana Kompella. 2013. Network sampling: From static to streaming graphs. ACM Transactions on Knowledge Discovery from Data (TKDD) 8, 2 (2013), 1–56.

Yasir Arfat, Sugimiyanto Suma, Rashid Mehmood, and Aiiad Albeshri. 2020. Parallel shortest path big data graph computations of US road network using apache spark: survey, architecture, and evaluation. Smart Infrastructure and Applications: Foundations for Smarter Cities and Societies (2020), 185–214.

Albert-László Barabási et al. 2016. Network science. Cambridge university press.

Punam Bedi and Chhavi Sharma. 2016. Community detection in social networks. Wiley interdisciplinary reviews: Data mining and knowledge discovery (2016).

Austin R Benson, Rediet Abebe, Michael T Schaub, Ali Jadbabaie, and Jon Kleinberg. 2018. Simplicial closure and higher-order link prediction. Proceedings of the National Academy of Sciences (2018), 11221–11230.

Austin R Benson, Ravi Kumar, and Andrew Tomkins. 2018. Sequences of sets. In ACM SIGKDD International Conference on Knowledge Discovery & Data Mining.

Vincent D Blondel, Jean-Loup Guillaume, Renaud Lambiotte, and Etienne Lefebvre. 2008. Fast unfolding of communities in large networks. Journal of statistical mechanics: theory and experiment 2008, 10 (2008), P10008.

Pham Minh Chuan, Le Hoang Son, Mumtaz Ali, Tran Dinh Khang, Le Thanh Huong, and Nilanjan Dey. 2018. Link prediction in co-authorship networks based on hybrid content similarity metric. Applied Intelligence 48 (2018), 2470–2486.

Michele Coscia. 2021. Noise Corrected Sampling of Online Social Networks. ACM Transactions on Knowledge Discovery from Data (TKDD) 15, 2 (2021), 1–21.

Michele Coscia and Luca Rossi. 2019. The impact of projection and backboning on network topologies. In Proceedings of the 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. 286–293.

Marc Cubrich, Rachel T. King, Derek L. Mracek, Jamie M.G. Strong, Kristen Hassenkamp, Daly Vaughn, and Nikki M. Dudley. 2021. Examining the criterion-related validity evidence of LinkedIn profile elements in an applied sample. Comput. Hum. Behav. (2021).

Jose Martins da Rosa, Renan Saldanha Linhares, Carlos Henrique Gomes Ferreira, Gabriel P. Nobre, Fabricio Murai, and Jussara M. Almeida. 2022. Uncovering Discussion Groups on Claims of Election Fraud from Twitter. In Proc. of Social Informatics: 13th International Conference. DOI: 10.1007/978-3-031-19097-1_20

Liang Dai, Ben Derudder, and Xingjian Liu. 2018. Transport network backbone extraction: A comparison of techniques. Journal of Transport Geography (2018).

Leon Danon, Albert Diaz-Guilera, Jordi Duch, and Alex Arenas. 2005. Comparing community structure identification. Journal of statistical mechanics: Theory and experiment 2005, 09 (2005), P09008.

Zahir Edrees. 2020. Network Analysis of the Stack Overflow Tags. The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences 44 (2020), 195–202.

Meihua Fan, Shudong Li, Weihong Han, Xiaobo Wu, Zhaoquan Gu, and Zhihong Tian. 2020. A novel malware detection framework based onweighted heterograph. In Proceedings of the 2020 International Conference on Cyberspace Innovation of Advanced Technologies. 39–43.

Carlos HG Ferreira, Fabricio Murai, Ana PC Silva, Jussara M Almeida, Martino Trevisan, Luca Vassio, Marco Mellia, and Idilio Drago. 2021. On the dynamics of political discussions on instagram: A network perspective. Online Social Networks and Media 25 (2021), 100155.

Santo Fortunato and Darko Hric. 2016. Community detection in networks: A user guide. Physics reports 659 (2016), 1–44.

Xiang Fu, Shangdi Yu, and Austin R Benson. 2020. Modelling and analysis of tagging networks in Stack Exchange communities. Journal of Complex Networks 8, 5 (2020), cnz045.

Minas Gjoka, Maciej Kurant, Carter T Butts, and Athina Markopoulou. 2010. Walking in facebook: A case study of unbiased sampling of osns. In 2010 Proceedings IEEE Infocom. Ieee, 1–9.

Carlos Henrique Gomes Ferreira, Fabricio Murai, Ana PC Silva, Martino Trevisan, Luca Vassio, Idilio Drago, Marco Mellia, and JussaraMAlmeida. 2022. On network backbone extraction for modeling online collective behavior. Plos one 17, 9 (2022), e0274218.

Douglas D Heckathorn and Christopher J Cameron. 2017. Network sampling: From snowball and multiplicity to respondent-driven sampling. Annual review of sociology 43 (2017), 101–119.

Pili Hu and Wing Cheong Lau. 2013. A survey and taxonomy of graph sampling. arXiv preprint arXiv:1308.5865 (2013).

Luke Hutton and Tristan Henderson. 2015. Toward reproducibility in online social network research. IEEE Transactions on Emerging Topics in Computing (2015).

Jeancarlo C Leao, Michele A Brandao, Pedro OS Vaz de Melo, and Alberto HF Laender. 2017. Classificação de relações sociais para melhorar a detecção de comunidades. In Anais do VI Brazilian Workshop on Social Network Analysis and Mining. SBC.

Jure Leskovec and Christos Faloutsos. 2006. Sampling from large graphs. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. 631–636.

Renan S. Linhares, José M. Rosa, Carlos H. G. Ferreira, Fabricio Murai, Gabriel Nobre, and Jussara Almeida. 2022. Uncovering Coordinated Communities on Twitter During the 2020 U.S. Election. In IEEE/ACM international conference on advances in social networks analysis and mining.

László Lovász. 1993. Random walks on graphs. Combinatorics, Paul erdos is eighty 2, 1-46 (1993), 4.

Anna May, Johannes Wachs, and Anikó Hannák. 2019. Gender differences in participation and reward on Stack Overflow. Empirical Software Engineering 24 (2019), 1997–2019.

Cong Mu, Youngser Park, and Carey E Priebe. 2023. Dynamic network sampling for community detection. Applied Network Science 8, 1 (2023), 5.

Ryan Murtfeldt, Naomi Alterman, Ihsan Kahveci, and Jevin D West. 2024. RIP Twitter API: A eulogy to its vast research contributions. arXiv preprint arXiv:2404.07340 (2024).

Zachary P Neal. 2022. backbone: An R package to extract network backbones. PloS one 17, 5 (2022), e0269137.

Mark Newman. 2018. Networks. Oxford university press.

Mark EJ Newman and Michelle Girvan. 2004. Finding and evaluating community structure in networks. Physical review E 69 (2004), 026113.

Gabriel Peres Nobre, Carlos Henrique Gomes Ferreira, and Jussara Marques Almeida. 2020. Beyond groups: Uncovering dynamic communities on the whatsapp network of information dissemination. In Social Informatics: 12th International Conference, SocInfo 2020, Pisa, Italy. Springer, 252–266.

Randal S Olson and Zachary P Neal. 2015. Navigating the massive world of reddit: Using backbone networks to map user interests in social media. PeerJ Computer Science 1 (2015), e4.

Diogo Pacheco, Pik-Mai Hui, Christopher Torres-Lugo, Bao Tran Truong, Alessandro Flammini, and Filippo Menczer. 2021. Uncovering Coordinated Networks on Social Media: Methods and Case Studies. In International Conference on Web and Social Media.

Anne Plant and Robert Hanisch. 2020. Reproducibility in science: A metrology perspective. Harvard Data Science Review 2, 4 (2020).

Pascal Pons and Matthieu Latapy. 2005. Computing communities in large networks using random walks. In Computer and Information Sciences-ISCIS 2005: 20th International Symposium.

Nicholas Proferes, Naiyan Jones, Sarah Gilbert, Casey Fiesler, and Michael Zimmer. 2021. Studying reddit: A systematic overview of disciplines, approaches, methods, and ethics. Social Media+ Society 7, 2 (2021), 20563051211019004.

Filippo Radicchi, José J. Ramasco, and Santo Fortunato. 2011. Information filtering in complex weighted networks. Phys. Rev. E (2011).

Katerina Rigana, Ernst-Jan Camiel Wit, and Samantha Cook. 2023. A new way of measuring effects of financial crisis on contagion in currency markets. International Review of Financial Analysis 90 (2023), 102764.

Hamid Roghani, Asgarali Bouyer, and Esmaeil Nourani. 2021. PLDLS: A novel parallel label diffusion and label Selection-based community detection algorithm based on Spark in social networks. Expert Systems with Applications (2021).

Giulio Rossetti and Rémy Cazabet. 2018. Community discovery in dynamic networks: a survey. Comput. Surveys 51 (2018), 35.

Benedek Rozemberczki, Oliver Kiss, and Rik Sarkar. 2020. Little Ball of Fur: A Python Library for Graph Sampling. In ACM International Conference on Information and Knowledge Management. ACM.

S Haleh S. Dizaji, Joze M Rozanec, Reza Farahani, Dumitru Roman, and Radu Prodan. 2024. An Extensive Characterization of Graph Sampling Algorithms. In Companion of the 15th ACM/SPEC International Conference on Performance Engineering. 135–140.

M Ángeles Serrano, Marián Boguná, and Alessandro Vespignani. 2009. Extracting the multiscale backbone of complex weighted networks. Proceedings of the national academy of sciences (2009).

Claude Elwood Shannon. 2001. A mathematical theory of communication. ACM SIGMOBILE Mobile Computing and Communications Review 5, 1 (2001), 3–55.

Francesca Soro, Mauro Allegretta, Marco Mellia, Idilio Drago, and Leandro M Bertholdo. 2020. Sensing the noise: Uncovering communities in darknet traffic. In 2020 Mediterranean Communication and Computer Networking Conference (MedComNet). IEEE, 1–8.

Statista. 2024. Volume of data/information created, captured, copied, and consumed worldwide from 2010 to 2020, with forecasts from 2021 to 2025. Retrieved June 18, 2024 from [link]

Michael PH Stumpf, Carsten Wiuf, and Robert M May. 2005. Subnets of scale-free networks are not scale-free: sampling properties of networks. Proceedings of the National Academy of Sciences 102, 12 (2005), 4221–4224.

Cong Tran, Won-Yong Shin, and Andreas Spitz. 2021. Community detection in partially observable social networks. ACM Transactions on Knowledge Discovery from Data (TKDD) 16, 2 (2021), 1–24.

Alexander van der Grinten, Eugenio Angriman, and Henning Meyerhenke. 2020. Scaling up network centrality computations–A brief overview. it-Information Technology 62, 3-4 (2020), 189–204.

Otavio R Venâncio, Carlos HG Ferreira, Jussara M Almeida, and Ana Paula C da Silva. 2024. Unraveling User Coordination on Telegram: A Comprehensive Analysis of Political Mobilization during the 2022 Brazilian Presidential Election. In International AAAI Conference on Web and Social Media.

Michael S Vitevitch and Mary Sale. 2023. Identifying the phonological backbone in the mental lexicon. Plos one 18, 6 (2023), e0287197.

Dong Wang, Zhenyu Li, and Gaogang Xie. 2011. Towards unbiased sampling of online social networks. In 2011 IEEE International Conference on Communications (ICC). IEEE, 1–5.

Yi Wang. 2018. Understanding the reputation differences between women and men on stack overflow. In 2018 25th Asia-Pacific Software Engineering Conference (APSEC). IEEE, 436–444.

Song Xinchao and Yishuang Geng. 2014. Distributed community detection optimization algorithm for complex networks. Journal of Networks 9, 10 (2014), 2758.

Jianpeng Zhang, Hongchang Chen, Dingjiu Yu, Yulong Pei, and Yingjun Deng. 2023. Cluster-preserving sampling algorithm for large-scale graphs. Science China Information Sciences 66, 1 (2023), 112103.

Junzhou Zhao, Pinghui Wang, John CS Lui, Don Towsley, and Xiaohong Guan. 2019. Sampling online social networks by random walk with indirect jumps. Data Miniƒng and Knowledge Discovery 33 (2019), 24–57.
Publicado
19/05/2025
MATEUS, Rainara Araújo; FERREIRA, Carlos H. G.; SILVA, Ana Paula Couto da. Explorando os Limites da Reprodutibilidade na Tarefa de Detecção de Comunidades em Modelos de Redes. In: SIMPÓSIO BRASILEIRO DE SISTEMAS DE INFORMAÇÃO (SBSI), 21. , 2025, Recife/PE. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2025 . p. 515-524. DOI: https://doi.org/10.5753/sbsi.2025.246558.

Artigos mais lidos do(s) mesmo(s) autor(es)

1 2 > >>