Deep Learning on Large-Scale Muticore Clusters

Kazumasa Sakivama; Shinpei Kato; Yutaka Ishikawa; Atsushi Hori; Abraham Monrroy

Kazumasa Sakivama The University of Tokyo
Shinpei Kato The University of Tokyo
Yutaka Ishikawa RIKEN
Atsushi Hori RIKEN
Abraham Monrroy Nagoya University

Resumo

Convolutional neural networks (CNNs) have achieved outstanding accuracy among conventional machine learning algorithms. Recent works have shown that large and complicated models, which take significant cost for training are needed to get higher accuracy. To train these models efficiently in high performance computers (HPCs), many parallelization techniques for CNNs have been developed. However, most techniques are mainly targeting GPUs and parallelizations for CPUs are not fully investigated. This paper explores CNN training performance on large-scale multicore clusters by optimizing intra-node processing and applying techniques of inter-node parallelization for multiple GPUs. Detailed experiments conducted on state-of-the-art multi-core processors using the openMP API and MPI framework demonstrated that Caffe-based CNNs can be accelerated by using well-designed multithreaded programs. We achieved at most 1.64 times speedup in convolution operations with devised lowering strategy compared to conventional lowering and acquired 772 times speedup with 864 nodes compared to one node.

Palavras-chave: Parallel processing, Computational modeling, Training, Convolution, Multicore processing, Backpropagation, Data models, Convolutional Neural Networks, openMP, MPI, Deep Learning, Caffe, Multicore cluster