A Shared Memory SMC Sampler for Decision Trees

  • Efthyvoulos Drousiotis University of Liverpool
  • Alessandro Varsi University of Liverpool
  • Paul G. Spirakis University of Liverpool
  • Simon Maskell University of Liverpool

Resumo


Modern classification problems tackled by using Decision Tree (DT) models often require demanding constraints in terms of accuracy and scalability. This is often hard to achieve due to the ever-increasing volume of data used for training and testing. Bayesian approaches to DTs using Markov Chain Monte Carlo (MCMC) methods have demonstrated great accuracy in a wide range of applications. However, the inherently sequential nature of MCMC makes it unsuitable to meet both accuracy and scaling constraints. One could run multiple MCMC chains in an embarrassingly parallel fashion. Despite the improved run-time, this approach sacrifices accuracy in exchange for strong scaling. Sequential Monte Carlo (SMC) samplers are another class of Bayesian inference methods that also have the appealing property of being parallelizable without trading off accuracy. Nevertheless, finding an effective parallelization for the SMC sampler is difficult, due to the challenges in parallelizing its bottleneck, redistribution, in such a way that the workload is equally divided across the processing elements, especially when dealing with variable-size models such as DTs. This study presents a parallel SMC sampler for DTs on Shared Memory (SM) architectures, with an $O(log_{2} N)$ parallel redistribution for variable-size samples. On an SM machine mounting 32 cores, the experimental results show that our proposed method scales up to a factor of 16 compared to its serial implementation, and provides comparable accuracy to MCMC, but 51 times faster.
Palavras-chave: Parallel Algorithms, Sequential Monte Carlo Samplers, Markov Chain Monte Carlo, Bayesian Decision Trees, Shared Memory Programming
Publicado
17/10/2023
DROUSIOTIS, Efthyvoulos; VARSI, Alessandro; SPIRAKIS, Paul G.; MASKELL, Simon. A Shared Memory SMC Sampler for Decision Trees. In: INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD), 35. , 2023, Porto Alegre/RS. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2023 . p. 209-218.