# A Distributed GPU-based Correlation Clustering Algorithm for Large-scale Signed Social Networks

### Resumo

When applied to signed networks, the Correlation Clustering (CC) problem consists of an important tool to study how balanced a social group behaves and if this group might evolve to a possible balanced state. Solving such combinatorial optimization problem is a challenging task, which heavily relies on heuristic procedures, one of the few solution methods capable of analyzing large network instances. In this work, we present a novel approach to solve the CC problem on large-scale signed networks. A distributed GPU-powered version of the ILS metaheuristic, which benefits from data parallelism, has been developed. This technique provides good quality clustering results when compared to non-distributed methods. Experiments were conducted on both synthetic and real datasets. The proposed algorithm achieved improved solution values when compared to the existing parallel solution method. In particular, one of the largest graphs we have considered in our experiments contains 1 million nodes and 8 million edges – such graph can be clustered in two hours using our algorithm. The new method can process networks for which there is no efficient solution using the existing algorithms found in the literature.

### Referências

Aronson, E. and Cope, V. (1968). My enemy's enemy is my friend. Journal of personality and social psychology, 8(1p1):8.

Bansal, N., Blum, A., and Chawla, S. (2002). Correlation clustering. In The 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002. Proceedings., pages 238–250, Vancouver, Canada. IEEE.

Bhattacharya, A. and De, R. K. (2008). Divisive correlation clustering algorithm (dcca) for grouping of genes: detecting varying patterns in expression proles. bioinformatics, 24(11):1359–1366.

Brandes, U., Delling, D., Gaertler, M., Gorke, R., Hoefer, M., Nikoloski, Z., and Wagner, D. (2008). On modularity clustering. Knowledge and Data Engineering, IEEE Transactions on, 20(2):172–188.

Brunato, M., Hoos, H. H., and Battiti, R. (2007). On effectively nding maximal quasi-cliques in graphs. In International conference on learning and intelligent optimization, pages 41–55. Springer.

Cartwright, D. and Harary, F. (1956). Structural balance: A generalization of heider's theory. Psychological Review, 63(5):277–293.

Charikar, M., Guruswami, V., and Wirth, A. (2003). Clustering with qualitative information. In Foundations of Computer Science, 2003. Proceedings. 44th Annual IEEE Symposium on, pages 524–533. IEEE.

Chierichetti, F., Dalvi, N., and Kumar, R. (2014). Correlation clustering in mapreduce. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM.

DasGupta, B., Encisob, G. A., Sontag, E., and Zhanga, Y. (2007). Algorithmic and complexity results for decompositions of biological networks into monotone subsystems. BioSystems, 90(1):161–178.

Davis, J. (1967). Clustering and structural balance in graphs. Human Relations, 20(2):181–187. Demaine, E. D., Emanuel, D., Fiat, A., and Immorlica, N. (2006). Correlation clustering in general weighted graphs. Theoretical Computer Science, 361(2):172–187.

Doreian, P. and Mrvar, A. (2015). Structural balance and signed international relations. Journal of Social Structure, 16:1.

Drummond, L., Figueiredo, R., Frota, Y., and Levorato, M. (2013). Efcient solution of the correlation clustering problem: An application to structural balance. In Lecture Notes in Computer Science, pages 674–683. Springer Nature.

Duch, J. and Arenas, A. (2005). Community detection in complex networks using extremal optimization. Physical review E, 72(2):027104.

Elsner, M. and Schudy, W. (2009). Bounding and comparing methods for correlation clustering beyond ilp. In Proceedings of the Workshop on Integer Linear Programming for Natural Langauge Processing, ILP '09, pages 19–27, Stroudsburg, PA, USA.

Epinions (1999). Website. URL http://www.epinions.com. Accessed on March 2015. Facchetti, G., Iacono, G., and Altani, C. (2011). Computing global structural balance in large-scale signed social networks. In Proceedings of the National Academy of Sciences of the United States of America, volume 108, pages 20953–20958.

Feo, T. A. and Resende, M. G. (1995). Greedy randomized adaptive search procedures. Journal of Global Optimization, 6(2):109–133.

Gregor, D. and Lumsdaine, A. (2005). The parallel bgl: A generic library for distributed graph computa- tions. Parallel Object-Oriented Scientic Computing (POOSC), 2:1–18.

GroupLens (2017). Movielens dataset collection. https://grouplens.org/datasets/movielens.

Gülpinar, N., Gutin, G., Mitra, G., and Zverovitch, A. (2004). Extracting pure network submatrices in linear programs using signed graphs. Discrete Applied Mathematics, 137:359–372.

Heider, F. (1946). Attitudes and cognitive organization. Journal of Psychology, 21(1):107–112. Huffner, F., Betzler, N., and Niedermeier, R. (2009). Separator-based data reduction for signed graph balancing. Journal of Combinatorial Optimization, 20(4):335–360.

Kim, S., Yoo, C. D., Nowozin, S., and Kohli, P. (2014). Image segmentation UsingHigher-order correlation clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(9):1761–1774.

Kunegis, J. (2013). Konect: the koblenz network collection. In Proceedings of the 22nd International Conference on World Wide Web, pages 1343–1350. ACM.

Leskovec, J., Huttenlocher, D., and Kleinberg, J. (2010). Signed networks in social media. In Proceedings of the 28th international conference on Human factors in computing systems - CHI '10, pages 1361–1370. Association for Computing Machinery (ACM).

Leskovec, J. and Krevl, A. (2014). SNAP Datasets: Stanford large network dataset collection. http://snap.stanford.edu/data.

Levorato, M., Drummond, L., Frota, Y., and Figueiredo, R. (2015a). A GPU-accelerated local search algorithm for the Correlation Clustering problem. In Proceedings of the XLVII Brazilian Symposium on Operations Research, Porto de Galinhas, PE, Brazil.

Levorato, M., Drummond, L., Frota, Y., and Figueiredo, R. (2015b). An ils algorithm to evaluate structural In Proceedings of the 30th Annual ACM Symposium on Applied balance in signed social networks. Computing, pages 1117–1122.

Levorato, M., Figueiredo, R., Frota, Y., and Drummond, L. (2017). Evaluating balancing on social networks through the efcient solution of correlation clustering problems. EURO Journal on Computational Optimization, pages 1–32.

LNCC (2017). Santos dumont supercomputer. http://sdumont.lncc.br.

Lourenço, H. R., Martin, O. C., and Sttzle, T. (2003). Iterated local search. In Handbook of Metaheuristics, pages 320–353. Springer Nature.

Macon, K., Mucha, P., and Porter, M. (2012). Community structure in the united nations general assembly. Physica A: Statistical Mechanics and its Applications, 391(1-2):343–361.

Mendonça, I., Figueiredo, R., Labatut, V., and Michelon, P. (2015). Relevance of negative links in graph partitioning: A case study using votes from the european parliament. In 2015 Second European Network Intelligence Conference, pages 122–129. IEEE.

Mladenoviíc, N. and Hansen, P. (1997). Variable neighborhood search. Computers & Operations Research, 24(11):1097–1100.

Montgomery, D. (2005). Design and analysis of experiments (6th ed) john wiley and sons. New York, NY. Newman, M. E. (2006). Modularity and community structure in networks. Proceedings of the National Academy of Sciences, 103(23):8577–8582.

NVIDIA, C. (2014). Toolkit v6. 5. URL http://docs.nvidia.com/cuda/cuda-c-programming-guide. Pan, X., Papailiopoulos, D., Oymak, S., Recht, B., Ramchandran, K., and Jordan, M. I. (2015). Parallel In Advances in Neural Information Processing Systems, pages correlation clustering on big graphs. 82–90.

Schwartz, T. (2010). The friend of my enemy is my enemy, the enemy of my enemy is my friend: Axioms for structural balance and bi-polarity. Mathematical Social Sciences, 60(1):39–45.

Srinivasan, A. (2011). Local balancing inuences global structure in social networks. In Proceedings of the National Academy of Sciences of the United States of America, volume 108, pages 1751–1752.

Wang, N. and Li, J. (2013). Restoring: A greedy heuristic approach based on neighborhood for correlation clustering. In Advanced Data Mining and Applications, pages 348–359. Springer.

Yang, B., Cheung, W., and Liu, J. (2007). Community mining from signed social networks. IEEE Transactions on Knowledge and Data Engineering, 19(10):1333–1348.

Zhang, Z., Cheng, H., Chen, W., Zhang, S., and Fang, Q. (2008). Correlation clustering based on genetic algorithm for documents clustering. In 2008 IEEE Congress on Evolutionary Computation, pages 3193– 3198.

*In*: SIMPÓSIO EM SISTEMAS COMPUTACIONAIS DE ALTO DESEMPENHO (WSCAD), 18. , 2017, Campinas.

**Anais**[...]. Porto Alegre: Sociedade Brasileira de Computação, 2017 . p. 280-291. DOI: https://doi.org/10.5753/wscad.2017.256.