MapReduce vs BSP para o cálculo de medidas de centralidade em grafos grandes
Abstract
Big Data techniques such as MapReduce have been used to calculate centrality measures of large graphs. MapReduce, however, has proven to be ill-suited for this kind of problem, where the communication rate among nodes is high in every iteration of the algorithm. Other models, such as Bulk Synchronous Parallel (BSP) have been considered in recent literature to be more appropriate for this task. This paper compares a BSP-based algorithm developed by the authors to calculate centrality measures in large graphs with HEDA (Hadoop-based Exact Diameter Algorithm) and HADI (HAdoop-based DIameter estimator), two algorithms based on the MapReduce model.
References
Bondy, J., & Murty, U. (2008). Graph Theory. Springer.
Dean, J., & Ghemawat, S. (2008). MapReduce: Simplied Data Processing on Large Clusters. ACM Digital Library, 107-113.
Erdos, P., & Renyi, A. (1959). On random graphs I. Publ. Math. Debrecen. v. 6., 290-297.
Kajdanowicz, T., Kazienko, P., & Indyk, W. (2014). Parallel Processing of Large Graphs. Future Generation Computer Systems 32, 324-337.
Kang, U., Tsourakakis, C. E., Appel, A. P., Faloutsos, C., & Leskovec, J. (2011). HADI: Mining Radii of Large Graphs. Journal ACM Transactions on Knowledge Discovery from Data (TKDD), Article 8.
Nascimento, J. P., & Murta, C. (2012). Um algoritmo paralelo em Hadoop para Cálculo de Centralidade em Grafos Grandes. XXX Simpósio Brasileiro de Redes de Computadores e Sistemas Distribuídos. Ouro Preto: SBC.
Pace, M. F. (2012). BSP vs MapReduce. Procedia Computer Science. v. 9, 246-255.
Valiant, L. G., Reeve, M., Zenith, S., & Wiley, E. (1989). Bulk-synchronous parallel computers. In Parallel Processing and Artificial Intelligence, 15-22.
