Fast and Low-cost Search for Efficient Cloud Configurations for HPC Workloads

  • Vanderson M. do Rosario UNICAMP
  • Thais A. Silva Camacho UNICAMP
  • Otávio O. Napoli UNICAMP
  • Edson Borin UNICAMP


The wide variety of virtual machine types, network configurations, number of instances, among others configuration tweaks, in cloud computing, makes the finding of the best configuration a hard problem. Trying to reduce costs and resource underutilization while achieving acceptable performance can be a hard task even for specialists. Thus, many approaches to find these optimal or almost optimal configurations for a given program were proposed in the literature. Observing the performance of an application in the cloud takes time and money. Therefore, most of the approaches aim not only to find good solutions but also to reduce the search cost. One of those approaches relies on Bayesian Optimization, which analyzes fewer configurations, reducing the search cost while still finding good solutions. Another approach found in the literature is the use of a technique named Paramount Iteration, which enables users to reason about cloud configurations' cost and performance without executing the application to its completion (early-stopping) this approach reduces the cost of each observation. In this work, we show that both techniques can be used together to do fewer and lower-cost observations. We demonstrate that such an approach can recommend solutions that are 1.68x better on average than Random Searching and with a 6x cheaper search.


Alipourfard, O., Liu, H. H., Chen, J., Venkataraman, S., Yu, M., and Zhang, M. (2017). Cherrypick: Adaptively unearthing the best cloud configurations for big data analytics. In 14th USENIX NSDI 17), pages 469–482.

Bailey, D. H. (2011). NAS Parallel Benchmarks, pages 1254–1259. Springer US, Boston, MA.

Brunetta, J. R. and Borin, E. (2019). Selecting efficient cloud resources for hpc workloads. In 12th IEEE/ACM ICUCC, UCC’19, page 155–164, New York, NY, USA. Association for Computing Machinery.

Ferguson, A. D., Bodik, P., Kandula, S., Boutin, E., and Fonseca, R. (2012). Jockey: Guaranteed job latency in data parallel clusters. In 7th ACM ECCS, EuroSys ’12, page 99–112, New York, NY, USA. Association for Computing Machinery.

Herodotou, H., Dong, F., and Babu, S. (2011). No one (cluster) size fits all: Automatic cluster sizing for data-intensive analytics. In 2nd ACM SCC, SOCC ’11, New York, NY, USA. ACM.

Hsu, C., Nair, V., Menzies, T., and Freeh, V. W. (2018a). Scout: An experienced guide to find the best cloud configuration. CoRR, abs/1803.01296.

Hsu, C.-J., Nair, V., Freeh, V. W., and Menzies, T. (2018b). Arrow: Low-level augmented bayesian optimization for finding the best cloud vm. In 2018 IEEE 38th ICDCS, pages 660–670. IEEE.

Hsu, C.-J., Nair, V., Menzies, T., and Freeh, V. (2018c). Micky: A cheaper alternative for selecting cloud instances. In 2018 IEEE 11th CLOUD, pages 409–416. IEEE.

Kushner, H. J. (1964). A new method of locating the maximum point of an arbitrary multipeak curve in the presence of noise. Journal of Basic Engineering, 86(1):97–106.

Li, A., Yang, X., Kandula, S., and Zhang, M. (2010). Cloudcmp: comparing public cloud providers. In 10th ACM SIGCOMM, pages 1–14.

Mockus, J. (1975). On bayesian methods for seeking the extremum. In Optimization techniques IFIP technical conference, pages 400–404. Springer.

Srinivas, N., Krause, A., Kakade, S. M., and Seeger, M. (2009). Gaussian process optimization in the bandit setting: No regret and experimental design. arXiv preprint arXiv:0912.3995.

Wu, C., Summer, T., Li, Z., Woodard, A., Chard, R., Baughman, M., Babuji, Y., Chard, K., Pitt, J., and Foster, I. (2019). Paraopt: Automated application parameterization and optimization for the cloud. In 2019 IEEE CloudCom, pages 255–262. IEEE.

Yadwadkar, N. J., Hariharan, B., Gonzalez, J. E., Smith, B., and Katz, R. H. (2017). Selecting the best vm across multiple public clouds: A data-driven performance modeling approach. In 2017 SCC, pages 452–465.
ROSARIO, Vanderson M. do; CAMACHO, Thais A. Silva; NAPOLI, Otávio O.; BORIN, Edson. Fast and Low-cost Search for Efficient Cloud Configurations for HPC Workloads. In: SIMPÓSIO EM SISTEMAS COMPUTACIONAIS DE ALTO DESEMPENHO (SSCAD), 22. , 2021, Belo Horizonte. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2021 . p. 144-155. DOI: