Towards a Single-Host Many-GPU System

  • Ming-Hung Chen IBM Research
  • I.-Hsin Chung IBM Research
  • Bulent Abali IBM Research
  • Paul Crumley IBM Research

Resumo


As computation-intensive tasks such as deep learning and big data analysis take advantage of GPU based accelerators, the interconnection links may become a bottleneck. In this paper, we investigate the upcoming performance bottleneck of multi-accelerator systems, as the number of accelerators equipped with single host grows. We instrumented the host PCIe fabric to measure the data transfer and compared it with the measurements from the software tool. It shows how the data transfer (P2P) helps to avoid the bottleneck on the interconnection links, but multi-GPU performance does not scale up as expected due to the control messages. We quantify the impact of host control messages with suggestions to remedy scalability bottlenecks. We also implement the proposed strategy on Lulesh to validate the concept. The result shows our strategy can save 59.86% time cost of the kernel and 13.32% PCIe H2D payload.
Palavras-chave: Graphics processing units, Fabrics, Topology, Hardware, Kernel, Deep learning, Bandwidth, Measurement, Peer-to-peer, GPU, PCIe, NVLINK
Publicado
24/09/2018
CHEN, Ming-Hung; CHUNG, I.-Hsin; ABALI, Bulent; CRUMLEY, Paul. Towards a Single-Host Many-GPU System. In: INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD), 30. , 2018, Lyon/FR. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2018 . p. 140-147.