A Multi-Strategy Approach to Overcoming Bias in Community Detection Evaluation
Resumo
Community detection is key to understand the structure of complex networks. However, the lack of appropriate evaluation strategies for this specific task may produce biased and incorrect results that might invalidate further analyses or applications based on such networks. In this context, the main contribution of this paper is an approach that supports a robust quality evaluation when detecting communities in real-world networks. In our approach, we use multiple strategies that capture distinct aspects of the communities. The conclusion on the quality of these communities is based on the consensus among the strategies adopted for the structural evaluation, as well as on the comparison with communities detected by different methods and with their existing ground truths. In this way, our approach allows one to overcome biases in network data, detection algorithms and evaluation metrics, thus providing more consistente conclusions about the quality of the detected communities. Experiments conducted with several real and synthetic networks provided results that show the effectiveness of our approach.
Referências
Blondel, V. D., Guillaume, J.-L., Lambiotte, R., and Lefebvre, E. (2008). Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 2008(10):P10008. DOI: https://doi.org/10.1088/1742-5468/2008/10/p10008
Clauset, A., Newman, M. E. J., and Moore, C. (2004). Finding community structure in very large networks. Phys. Rev. E, 70:066111. DOI: https://doi.org/10.1103/PhysRevE.70.066111
Coscia, M., Giannotti, F., and Pedreschi, D. (2011). A classification for community discovery methods in complex networks. Statistical Analysis and Data Mining: The ASA Data Science Journal, 4(5):512–546. DOI: https://doi.org/10.1002/sam.10133
Fortunato, S. (2010). Community detection in graphs. Physics Reports, 486(3–5):75–174. DOI: https://doi.org/10.1016/j.physrep.2009.11.002
Gemmetto, V., Barrat, A., and Cattuto, C. (2014). Mitigation of infectious disease at school: targeted class closure vs school closure. BMC Infectious Diseases, 14(1):695. DOI: https://doi.org/10.1186/s12879-014-0695-9
Hric, D., Darst, R. K., and Fortunato, S. (2014). Community detection in networks: Structural communities versus ground truth. Phys. Rev. E, 90(6):62805. DOI: https://doi.org/10.1103/PhysRevE.90.062805
Lancichinetti, A. and Fortunato, S. (2012). Consensus clustering in complex networks. Scientific Reports, 2:336. DOI: https://doi.org/10.1038/srep00336
Leão, J. C. (2018). An Approach for Detecting Communities from Sequences of Social Interactions. Master’s thesis, Universidade Federal de Minas Gerais, Belo Horizonte, MG, Brazil (in Portuguese).
Leão, J. C., Brandão, M. A., Vaz de Melo, P. O. S., and Laender, A. H. F. (2018). Who is really in my social circle? Mining social relationships to improve detection of real communities. Journal of Internet Services and Applications, 9(1):20:1–20:17.
Newman, M. E. (2006). Modularity and community structure in networks. Proc. Nat. Acad. Sci., 103(23):8577–8582. DOI: https://doi.org/10.1103/PhysRevE.69.026113
Newman, M. E. J. and Girvan, M. (2004). Finding and evaluating community structure in networks. Phys. Rev. E, 69(2):26113. DOI: https://doi.org/10.1073/pnas.0601602103
Nunes, I. O., Celes, C., Silva, M., Vaz de Melo, P. O. S., and Loureiro, A. A. F. (2017). GRM: Group Regularity Mobility Model. In Proceedings of the 20th ACM International Conference on Modeling, Analysis and Simulation of Wireless and Mobile Systems, pages 85–89, New York, NY, USA. DOI: https://doi.org/10.1145/3127540.3127570
O’Donoghue, T. and K., P. (2003). Qualitative Educational Research in Action: Doing and Reflecting. Routledge, Abingdon, UK. DOI: https://doi.org/10.4324/9780203506301
Peel, L., Larremore, D. B., and Clauset, A. (2017). The ground truth about metadata and community detection in networks. Science Advances, 3(5):1–8. DOI: https://doi.org/10.1126/sciadv.1602548
Pons, P. and Latapy, M. (2005). Computing Communities in Large Networks Using Random Walks, pages 284–293. Springer Berlin Heidelberg, Berlin, Heidelberg. DOI: https://doi.org/10.1007/11569596_31
Raghavan, U. N., Albert, R., and Kumara, S. (2007). Near linear time algorithm to detect community structures in large-scale networks. Physical Review E, 76(3):1–12. DOI: https://doi.org/10.1103/PhysRevE.76.036106
Rand, W. M. (1971). Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association, 66(336):846––850.
Rocha, L. E. C., Masuda, N., and Holme, P. (2017). Sampling of temporal networks: Methods and biases. Phys. Rev. E, 96(5):52302. DOI: https://doi.org/10.1103/PhysRevE.96.052302
Rosvall, M. and Bergstrom, C. T. (2011). Multilevel compression of random walks on networks reveals hierarchical organization in large integrated systems. PLOS ONE, 6(4):1-10. DOI: https://doi.org/10.1371/journal.pone.0018209
Yang, J. and Leskovec, J. (2015). Defining and evaluating network communities based on ground-truth. Knowledge and Information Systems, 42(1):181–213. DOI: https://doi.org/10.1007/s10115-013-0693-z
Zachary, W. W. (1977). An Information Flow Model for Conflict and Fission in Small Groups. Journal of Anthropological Research, 33(4):452–473. DOI: https://doi.org/10.1086/jar.33.4.3629752
Zaki, M. J. and Meira Jr., W. (2014). Data Mining and Analysis: Fundamental Concepts and Algorithms. Cambridge University Press, Cambridge, UK. DOI: https://doi.org/10.1017/CBO9780511810114
Zhao, Y. (2017). A survey on theoretical advances of community detection in networks. Computational Statistics, 9(5):e1403. DOI: https://doi.org/10.1002/wics.1403