A Multi-Strategy Approach to Overcoming Bias in Community Detection Evaluation

  • Jeancarlo C. Leão IFNMG
  • Alberto H. F. Laender UFMG
  • Pedro O. S. Vaz de Melo UFMG

Resumo


Community detection is key to understand the structure of complex networks. However, the lack of appropriate evaluation strategies for this specific task may produce biased and incorrect results that might invalidate further analyses or applications based on such networks. In this context, the main contribution of this paper is an approach that supports a robust quality evaluation when detecting communities in real-world networks. In our approach, we use multiple strategies that capture distinct aspects of the communities. The conclusion on the quality of these communities is based on the consensus among the strategies adopted for the structural evaluation, as well as on the comparison with communities detected by different methods and with their existing ground truths. In this way, our approach allows one to overcome biases in network data, detection algorithms and evaluation metrics, thus providing more consistente conclusions about the quality of the detected communities. Experiments conducted with several real and synthetic networks provided results that show the effectiveness of our approach.

Palavras-chave: Community detection, complex networks, complex networks communities, overcome biases, network data, multi-strategy approach.

Referências

Almeida, H., Guedes, D., Meira Jr, W., and Zaki, M. J. (2012). Towards a Better Quality Metric for Graph Cluster Evaluation. Journal of Information and Data Management, 3(3):378–393.

Blondel, V. D., Guillaume, J.-L., Lambiotte, R., and Lefebvre, E. (2008). Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 2008(10):P10008. DOI: https://doi.org/10.1088/1742-5468/2008/10/p10008

Clauset, A., Newman, M. E. J., and Moore, C. (2004). Finding community structure in very large networks. Phys. Rev. E, 70:066111. DOI: https://doi.org/10.1103/PhysRevE.70.066111

Coscia, M., Giannotti, F., and Pedreschi, D. (2011). A classification for community discovery methods in complex networks. Statistical Analysis and Data Mining: The ASA Data Science Journal, 4(5):512–546. DOI: https://doi.org/10.1002/sam.10133

Fortunato, S. (2010). Community detection in graphs. Physics Reports, 486(3–5):75–174. DOI: https://doi.org/10.1016/j.physrep.2009.11.002

Gemmetto, V., Barrat, A., and Cattuto, C. (2014). Mitigation of infectious disease at school: targeted class closure vs school closure. BMC Infectious Diseases, 14(1):695. DOI: https://doi.org/10.1186/s12879-014-0695-9

Hric, D., Darst, R. K., and Fortunato, S. (2014). Community detection in networks: Structural communities versus ground truth. Phys. Rev. E, 90(6):62805. DOI: https://doi.org/10.1103/PhysRevE.90.062805

Lancichinetti, A. and Fortunato, S. (2012). Consensus clustering in complex networks. Scientific Reports, 2:336. DOI: https://doi.org/10.1038/srep00336

Leão, J. C. (2018). An Approach for Detecting Communities from Sequences of Social Interactions. Master’s thesis, Universidade Federal de Minas Gerais, Belo Horizonte, MG, Brazil (in Portuguese).

Leão, J. C., Brandão, M. A., Vaz de Melo, P. O. S., and Laender, A. H. F. (2018). Who is really in my social circle? Mining social relationships to improve detection of real communities. Journal of Internet Services and Applications, 9(1):20:1–20:17.

Newman, M. E. (2006). Modularity and community structure in networks. Proc. Nat. Acad. Sci., 103(23):8577–8582. DOI: https://doi.org/10.1103/PhysRevE.69.026113

Newman, M. E. J. and Girvan, M. (2004). Finding and evaluating community structure in networks. Phys. Rev. E, 69(2):26113. DOI: https://doi.org/10.1073/pnas.0601602103

Nunes, I. O., Celes, C., Silva, M., Vaz de Melo, P. O. S., and Loureiro, A. A. F. (2017). GRM: Group Regularity Mobility Model. In Proceedings of the 20th ACM International Conference on Modeling, Analysis and Simulation of Wireless and Mobile Systems, pages 85–89, New York, NY, USA. DOI: https://doi.org/10.1145/3127540.3127570

O’Donoghue, T. and K., P. (2003). Qualitative Educational Research in Action: Doing and Reflecting. Routledge, Abingdon, UK. DOI: https://doi.org/10.4324/9780203506301

Peel, L., Larremore, D. B., and Clauset, A. (2017). The ground truth about metadata and community detection in networks. Science Advances, 3(5):1–8. DOI: https://doi.org/10.1126/sciadv.1602548

Pons, P. and Latapy, M. (2005). Computing Communities in Large Networks Using Random Walks, pages 284–293. Springer Berlin Heidelberg, Berlin, Heidelberg. DOI: https://doi.org/10.1007/11569596_31

Raghavan, U. N., Albert, R., and Kumara, S. (2007). Near linear time algorithm to detect community structures in large-scale networks. Physical Review E, 76(3):1–12. DOI: https://doi.org/10.1103/PhysRevE.76.036106

Rand, W. M. (1971). Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association, 66(336):846––850.

Rocha, L. E. C., Masuda, N., and Holme, P. (2017). Sampling of temporal networks: Methods and biases. Phys. Rev. E, 96(5):52302. DOI: https://doi.org/10.1103/PhysRevE.96.052302

Rosvall, M. and Bergstrom, C. T. (2011). Multilevel compression of random walks on networks reveals hierarchical organization in large integrated systems. PLOS ONE, 6(4):1-10. DOI: https://doi.org/10.1371/journal.pone.0018209

Yang, J. and Leskovec, J. (2015). Defining and evaluating network communities based on ground-truth. Knowledge and Information Systems, 42(1):181–213. DOI: https://doi.org/10.1007/s10115-013-0693-z

Zachary, W. W. (1977). An Information Flow Model for Conflict and Fission in Small Groups. Journal of Anthropological Research, 33(4):452–473. DOI: https://doi.org/10.1086/jar.33.4.3629752

Zaki, M. J. and Meira Jr., W. (2014). Data Mining and Analysis: Fundamental Concepts and Algorithms. Cambridge University Press, Cambridge, UK. DOI: https://doi.org/10.1017/CBO9780511810114

Zhao, Y. (2017). A survey on theoretical advances of community detection in networks. Computational Statistics, 9(5):e1403. DOI: https://doi.org/10.1002/wics.1403
Publicado
07/10/2019
LEÃO, Jeancarlo C.; LAENDER, Alberto H. F.; DE MELO, Pedro O. S. Vaz. A Multi-Strategy Approach to Overcoming Bias in Community Detection Evaluation. In: SIMPÓSIO BRASILEIRO DE BANCO DE DADOS (SBBD), 34. , 2019, Fortaleza. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2019 . p. 13-24. ISSN 2763-8979. DOI: https://doi.org/10.5753/sbbd.2019.8804.