Accelerating Solution of Generalized Linear Models by Solving Normal Equation Using GPGPU on a Large Real-World Tall-Skinny Data Set

  • Tran Van Sang The University of Tokyo
  • Ryosuke Kobayashi The University of Tokyo
  • Rie S. Yamaguchi The University of Tokyo
  • Toshiyuki Nakata The University of Tokyo

Resumo


The amount of data available has grown rapidly in recently years, not least in the context of Industrialization 4.0 and the advent of Cyber Physical Systems (CPS) and IoT devices. Thus, Machine Learning and Big Data analysis are being taken seriously as promising solutions to cope with the challenges of exponentially growing data. A widely known and widely used approach to quantify the relationship between a dependent variable and multiple numerical predictors is the Generalized Linear Model (GLM). In this paper, we introduce an approach to accelerate the GLM's fitting algorithm. Our approach is involved in two steps. First, reimplement GLM fitting algorithm which applies Normal Equation [1] method when solving the Linear Least Squares Equation. Then, we port the implemented GLM fitting to be executable with GPGPU. When Normal Equation method is applied to tall-skinny data, which is typically found in log data of CPS, solving the post Linear Least Squares equation becomes trivial, and the computational burden is transferred from solving that equation to matrix multiplication which can be parallelized in a straightforward manner. In an experiment employing actual user log access data, the Normal Equation method was executed 1.9 times faster when being executed in CPU, combined with 16.8-fold acceleration by GPGPU, leading to being 31.3 times faster overall.
Palavras-chave: Mathematical model, Acceleration, Training, Shape, Data models, Convergence, Machine learning, Generalized Linear Model, IoT, Internet of Things, Supervised Machine Learning, Linear Regression, Least Squares equation, Normal Equation, Real-world data, Tall-Skinny data
Publicado
15/10/2019
SANG, Tran Van; KOBAYASHI, Ryosuke; YAMAGUCHI, Rie S.; NAKATA, Toshiyuki. Accelerating Solution of Generalized Linear Models by Solving Normal Equation Using GPGPU on a Large Real-World Tall-Skinny Data Set. In: INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD), 31. , 2019, Campo Grande/MS. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2019 . p. 112-119.