Selecting efficient VM types to train deep learning models on Amazon SageMaker

  • Rafael Keller Tesser UNICAMP
  • Alvaro Marques UNICAMP
  • Edson Borin UNICAMP


The cloud has become a popular environment for running Deep Learning (DL) applications. Public cloud providers charge by the amount time the resources are actually used, with the price by hour depending on the configuration of the chosen cloud instance. Instances are usually provided in the form of a VM that gives access to a certain hardware configuration, and may also come with a pre-configured software environment. More advanced, and theoretically faster, VMs are usually more expensive, but may not necessarily provide the best performance for all applications. Therefore, in order to choose the best instance (or VM type), users must consider the relative performances (and consequent cost) of different VMs when running their specific target application. Taking this into account, we propose a model to estimate the relative performance and cost of training deep learning applications running in different VM instances. This model is built upon observations derived from the performance profile of executions of three different DL applications, on 12 different public cloud instances. We argue that this model is a valuable tool for cloud users looking for optimal VM types to train their deep learning applications on the cloud.
Palavras-chave: Deep learning, Training, Cloud computing, Costs, Computational modeling, High performance computing, Conferences, cloud computing, machine learning, deep learning, performance prediction, cost prediction
TESSER, Rafael Keller; MARQUES, Alvaro; BORIN, Edson. Selecting efficient VM types to train deep learning models on Amazon SageMaker. In: WORKSHOP ON CLOUD COMPUTING - INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD), 33. , 2021, Belo Horizonte. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2021 . p. 20-27.