Fair scheduling in cloud infrastructures with multiple service classes

  • Giovanni Farias Universidade Federal de Campina Grande
  • Raquel Lopes Universidade Federal de Campina Grande
  • Francisco Brasileiro Universidade Federal de Campina Grande
  • Marcus Carvalho Universidade Federal da Paraíba
  • Fabio Morais Universidade Federal da Paraíba
  • João Mafra Universidade Federal de Campina Grande
  • Daniel Turull Ericsson Research

Abstract


Cloud computing providers offer multiple service classes to deal with workload heterogeneity. Classes are distinguished by their expected Quality of Service (QoS), which is defined in terms of Service Level Objectives (SLO). A priority-based scheduling policy is commonly used to ensure that requests submitted to the different service classes achieve the desired QoS. However, the QoS delivered during resource contention periods may be unfair to certain users. In this paper, we present a SLO-driven scheduling policy which takes the SLOs and current QoS delivered for each request into account when making decisions. We used simulation experiments fed with traces from a production system to compare the SLO-driven policy with a priority-based one. In general, the SLO-driven policy delivered a better service than the priority-based one.

Keywords: Cloud Computing, Quality of Service, Scheduling Policies

References

Boutin, E., Ekanayake, J., Lin, W., Shi, B., Zhou, J., Qian, Z., Wu, M., and Zhou, L. (2014). Apollo: Scalable and coordinated scheduling for cloud-scale computing. In OSDI.

Burns, B., Grant, B., Oppenheimer, D., Brewer, E., and Wilkes, J. (2016). Borg, omega, and kubernetes. Commun. ACM.

Carvalho, M., Menascé, D., and Brasileiro, F. (2015). Prediction-based admission control for iaas clouds with multiple service classes. In IEEE 7th Int. Conf. on Cloud Computing. IEEE.

Delgado, P., Dinu, F., Kermarrec, A.-M., and Zwaenepoel, W. (2015). Hawk: Hybrid datacenter scheduling. In Proc. of the 2015 USENIX Annual Technical Conf., number EPFL-CONF-208856. USENIX Association.

Delimitrou, C. and Kozyrakis, C. (2013). Paragon: Qos-aware scheduling for heterogeneous datacenters. In ACM SIGPLAN Notices. ACM.

Delimitrou, C. and Kozyrakis, C. (2014). Quasar: resourceefficient and qos-aware cluster management. ACM SIGPLAN Notices.

Delimitrou, C., Sanchez, D., and Kozyrakis, C. (2015). Tarcil: reconciling scheduling speed and quality in large shared clusters. In Proc. of the 6th Symp. on Cloud Computing. ACM.

Gog, I., Schwarzkopf, M., Gleave, A., Watson, R. N., and Hand, S. (2016). Firmament: Fast, centralized cluster scheduling at scale. Usenix.

Goiri, I., Julia, F., Nou, R., Berral, J. L., Guitart, J., and Torres, J. (2010). Energy-aware scheduling in virtualized datacenters. In Cluster Computing (CLUSTER), 2010 IEEE Int. Conf. on.

Hindman, B., Konwinski, A., Zaharia, M., Ghodsi, A., Joseph, A. D., Katz, R. H., Shenker, S., and Stoica, I. (2011). Mesos: A platform for fine-grained resource sharing in the data center. In NSDI.

Isard, M., Prabhakaran, V., Currey, J., Wieder, U., Talwar, K., and Goldberg, A. (2009). Quincy: fair scheduling for distributed computing clusters. In Proc. of the ACM SIGOPS 22nd Symp. on Operating systems principles.

Karanasos, K., Rao, S., Curino, C., Douglas, C., Chaliparambil, K., Fumarola, G. M., Heddaya, S., Ramakrishnan, R., and Sakalanaga, S. (2015). Mercury: Hybrid centralized and distributed scheduling in large shared clusters. In USENIX ATC.

Kong, X., Lin, C., Jiang, Y., Yan, W., and Chu, X. (2011). Efficient dynamic task scheduling in virtualized data centers with fuzzy prediction. Journal of network and Computer Applications.

Ousterhout, K., Wendell, P., Zaharia, M., and Stoica, I. (2013). Sparrow: Distributed, low latency scheduling. In Proc. of the 24th ACM SOSP.

Pan, W., Rowe, J., and Barlaoura, G. (2013). Records in the cloud (ric) user survey report. Technical report.

Reiss, C., Tumanov, A., Ganger, G. R., Katz, R. H., and Kozuch, M. A. (2012). Heterogeneity and dynamicity of clouds at scale: Google trace analysis. In Symp. on Cloud Computing.

Schwarzkopf, M., Konwinski, A., Abd-El-Malek, M., and Wilkes, J. (2013). Omega: Flexible, scalable schedulers for large compute clusters. In Proc. of the 8th ACM European Conf. on Computer Systems.

Shahrad, M. andWentzlaff, D. (2016). Availability knob: Flexible user-defined availability in the cloud. In Proc. of the 7th ACM Symp. on Cloud Computing.

Vavilapalli, V. K., Murthy, A. C., Douglas, C., Agarwal, S., Konar, M., Evans, R., Graves, T., Lowe, J., Shah, H., Seth, S., et al. (2013). Apache hadoop yarn: Yet another resource negotiator. In Proc. of the 4th Symp. on Cloud Computing. ACM.

Verma, A., Korupolu, M., and Wilkes, J. (2014). Evaluating job packing in warehouse-scale computing. In 2014 IEEE Int’l Conf. on Cluster Computing, CLUSTER.

Verma, A., Pedrosa, L., Korupolu, M., Oppenheimer, D., Tune, E., and Wilkes, J. (2015). Large-scale cluster management at google with borg. In Proc. of the 10th European Conf. on Computer Systems.

Wilkes, J. (2011). More Google cluster data. Google research blog.
Published
2019-05-06
FARIAS, Giovanni; LOPES, Raquel; BRASILEIRO, Francisco; CARVALHO, Marcus; MORAIS, Fabio; MAFRA, João; TURULL, Daniel. Fair scheduling in cloud infrastructures with multiple service classes. In: BRAZILIAN SYMPOSIUM ON COMPUTER NETWORKS AND DISTRIBUTED SYSTEMS (SBRC), 37. , 2019, Gramado. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2019 . p. 636-649. ISSN 2177-9384. DOI: https://doi.org/10.5753/sbrc.2019.7392.