Paralelização do Algoritmo Floyd-Warshall usando GPU

Roussian R. A. Gaioso; Walid A. R. Jradi; Lauro C. M. de Paula; Wanderley de S. Alencar; Wellington S. Martins; Hugo Alexandre D. do Nascimento; Edson N. Cáceres

doi:10.5753/wscad.2013.16769

Roussian R. A. Gaioso UFG
Walid A. R. Jradi UFG
Lauro C. M. de Paula UFG
Wanderley de S. Alencar UFG
Wellington S. Martins UFG
Hugo Alexandre D. do Nascimento UFG
Edson N. Cáceres UFMS

DOI: https://doi.org/10.5753/wscad.2013.16769

Resumo

Este artigo apresenta uma implementação paralela baseada em Graphics Processing Unit (GPU) para o problema da identiﬁcação dos caminhos mínimos entre todos os pares de vértices em um grafo. A implementação é baseada no algoritmo Floyd-Warshall e tira o máximo proveito da arquitetura multithreaded das GPUs atuais. Nossa solução reduz a comunicação entre a Central Processing Unit (CPU) e a GPU, melhora a utilização dos Streaming Multiprocessors (SMs) e faz um uso intensivo de acesso aglutinado em memória para otimizar o acesso de dados do grafo. A vantagem da implementação proposta é demonstrada por vários grafos gerados aleatoriamente utilizando a ferramenta GTgraph. Grafos contendo milhares de vértices foram gerados e utilizados nos experimentos. Os resultados mostraram um excelente desempenho em diversos grafos, alcançando ganhos de até 149x, quando comparado com uma implementação sequencial, e superando implementações tradicionais por um fator de quase quatro vezes. Nossos resultados conﬁrmam que implementações baseadas em GPU podem ser viáveis mesmo para algoritmos de grafos cujo acessos à memória e distribuição de trabalho são irregulares e causam dependência de dados.

Palavras-chave: GPU, Caminho mínimo, Floyd-Warshall, Computação paralela

Referências

F. Dehne and K. Yogaratnam, “Exploring the limits of gpus with parallel graph algorithms,” CoRR, vol. abs/1002.4482, 2010.

A. Aini and A. Salehipour, “Speeding up the floyd–warshall algorithm for the cycled shortest path problem,” Applied Mathematics Letters, vol. 25, no. 1, pp. 1–5, 2012.

G. Venkataraman, S. Sahni, and S. Mukhopadhyaya, “A blocked all-pairs shortest-paths algorithm,” Journal of Experimental Algorithmics (JEA), vol. 8, 2003.

P. Harish and P. J. Narayanan, “Accelerating large graph algorithms on the gpu using cuda,” High performance computing–HiPC, pp. 197–208, 2007.

G. J. Katz and J. T. Kider, “All-pairs shortest-paths for large graphs on the gpu,” Proceedings of the 23rd ACM SIGGRAPHEUROGRAPHICS symposium on Graphics Hardware, pp. 47–55, 2008.

L. Ridi, J. Torrini, and E. Vicario, “Developing a scheduler with difference-bound matrices and the floyd-warshall algorithm,” Software, IEEE, vol. 29, no. 1, pp. 76–83, 2012.

K. M. Borgwardt and H.-P. Kriegel, “Shortest-path kernels on graphs,” in Data Mining, Fifth IEEE International Conference on. IEEE, 2005, pp. 8–pp.

J. Ma, K.-p. Li, and L.-y. Zhang, “A parallel floyd-warshall algorithm based on tbb,” in Information Management and Engineering (ICIME), 2010 The 2nd IEEE International Conference on. IEEE, 2010, pp. 429–433.

S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, and K. Skadron, “A Performance Study of General-Purpose Applications on Graphics Processors Using CUDA,” Journal of Parallel and Distributed Computing, vol. 68, no. 10, pp. 1370–1380, 2008.

N. CUDA, NVIDIA CUDA C Programming Guide, 4th ed. 2701 San Tomas Expressway Santa Clara, CA 95050: NVIDIA Corporation, 2011.

L. C. M. de Paula, L. B. S. de Souza, L. B. S. de Souza, and W. S. Martins, “Aplicação de processamento paralelo em método iterativo para solução de sistemas lineares,” X Encontro Anual de Computação, pp. 129–136, 2013.

L. C. M. de Paula, A. da Silva Soares, T. W. Soares, W. S. Martins, A. R. G. Filho, and C. J. Coelho, “Partial parallelization of the successive projections algorithm using compute unified device architecture,” 19th International Conference on Parallel and Distributed Processing Techniques and Applications, 2013.

D. Luebke and G. Humphreys, “How gpus work,” Computer, vol. 40, pp. 96–100, February 2007.

R. Tsuchiyama, T. Nakamura, T. Iizuka, A. Asahara, J. Son, and S. Miki, The OpenCL Programming Book. Fixstars, 2010.

S. Hougardy, “The floyd-warshall algorithm on graphs with negative cycles,” Information Processing Letters, vol. 110, no. 8, pp. 279–281, 2010.

B. M. P. S. Carvalho, “Algoritmo de dijkstra,” Master’s thesis, Departamento de Engenharia Inform´atica Universidade de Coimbra, Portugal, 2003.

A. V. Goldberg and T. Radzik, “A heuristic improvement of the bellmanford algorithm,” Applied Mathematics Letters, vol. 6, no. 3, pp. 3–6, 1993.

R. Bellman, Dynamic Programming, ser. Dover Books on Computer Science Series. Dover Publications, Incorporated, 2003.

S. R. Eddy, “What is dynamic programming?” Nature Biotechnology, vol. 22, no. 7, pp. 909–910, Jul. 2004. [Online]. Available: http://dx.doi.org/10.1038/nbt0704-909

C. Papadimitriou and M. Sideri, “On the floyd-warshall algorithm for logic programs,” The journal of logic programming, vol. 41, no. 1, pp. 129–137, 1999.

S. Pettie and V. Ramachandran, “Computing shortest paths with comparisons and additions,” in Proceedings of the thirteenth annual ACMSIAM symposium on Discrete algorithms. Society for Industrial and Applied Mathematics, 2002, pp. 267–276.

N. CUDATM, NVIDIA CUDA C Programming Best Practices Guide. 2701 San Tomas Expressway Santa Clara, CA 95050: NVIDIA Corporation, 2009.

S. Xiao and W.-c. Feng, “Inter-block gpu communication via fast barrier synchronization,” in Parallel & Distributed Processing (IPDPS), 2010 IEEE International Symposium on. IEEE, 2010, pp. 1–12.

D. A. Bader and K. Madduri, “Gtgraph: A synthetic graph generator suite,” Atlanta, GA, February, 2006.

W. Jradi, H. do Nascimento, H. Longo, and B. Hall, “Simulation and analysis of urban traffic the architecture of a web-based interactive decision support system,” in Intelligent Transportation Systems, 2009. ITSC ’09. 12th International IEEE Conference on, 2009, pp. 1–6.