Previsão do tempo de resposta de aplicações de big data em ambientes de nuvem

  • Túlio B. M. Pinto UFMG
  • Ana Paula Couto da Silva UFMG
  • Jussara M. Almeida UFMG

Abstract


Data access heterogeneity and irregularity are typical properties of Big Data applications, and, therefore, turn hardware and software resource scheduling much more challenging. However, the flexibility and elasticity provided by cloud environments decrease the difficulty by allowing on-demand resource provisioning. Nonetheless, the performance prediction (e.g.: response time) of such applications increase in complexity as all these characteristics are combined. This work explores an analytical model for Spark applications’ response time prediction, a popular platform for large-scale data processing, parametrized by earlier execution logs. This model is evaluated in several scenarios and applications. The results show relative errors lower than 8% for response time prediction, in average.

References

Ardagna, D. et al. (2016). Modeling performance of hadoop applications: A journey from queueing networks to stochastic well formed nets. p. 599–613.

Bertoli, M., Casale, G., and Serazzi, G. (2009). JMT: performance engineering tools for system modeling. SIGMETRICS Performance Evaluation Review, 36(4). p. 10–15.

Chen, C. P. and Zhang, C.-Y. (2014). Data-intensive applications, challenges, techniques and technologies: A survey on big data. Information Sciences, 275. p. 314–347.

Chiola, G. (1985). A software package for the analysis of generalized stochastic petri net models. In International Workshop on Timed Petri Nets, Italy, 1985. p. 136–143. Spark Databricks (2016).

Apache Survey 2016. Disponível em: https://databricks.com/2016-spark-survey. Acessado em: 30/11/2017.

Herodotou, H. et al. (2011). Starsh: A self-tuning system for big data analytics. In Proceedings of the 5th CIDR, pages 261–272.

Laskowski, J. (2016). Mastering Apache Spark. Disponível em: https://gitbook.com/book/jaceklaskowski/mastering-apache-spark. Acessado em: 28/06/2017.

Mak, V. and Lundstrom, S. (1990). Predicting performance of parallel computations. IEEE Transactions on Parallel & Distributed Systems, 1. p. 257-270.

Menasce, D. A., Almeida, V. A., Dowdy, L. W., and Dowdy, L. (2004). Performance by design: computer capacity planning by example. Prentice Hall Professional.

Microsoft (2016). What is PaaS? Disponível em: https://azure.microsoft.com/enus/overview/what-is-paas/. Acessado em: 15/06/2017.

Nambiar, R. O. and Poess, M. (2006). The making of tpc-ds. In Proceedings of the 32Nd International Conference on Very Large Data Bases. p. 1049–1058.

Nelson, R. D. and Tantawi, A. N. (1988). Approximate analysis of fork/join synchronization in parallel queues. IEEE Trans. Computers, 37(6). p. 739-743.

Poess, M., Nambiar, R. O., and Walrath, D. (2007). Why you should run tpc-ds: A workload analysis. In Proceedings of the 33rd VLDB. p. 1138–1149.

Popescu, A. D. (2015). Runtime Prediction for Scale-Out Data Analytics. PhD thesis, íEcole Polytechnique Fédérale de Lausanne.

Reisig, W., Rozenberg, G., and Thiagarajan, P. S. (2013). In Memoriam: Carl Adam Petri. Springer Berlin Heidelberg, Berlin, Heidelberg. p. 1–5.

Song, G., Meng, Z., Huet, F., Magoules, F., Yu, L., and et al. (2013). A hadoop mapreduce performance prediction method. In Proceedings of the HPCC 2013. p. 820-825.

Tripathi, S. K. and Liang, D.-R. (2000). On performance prediction of parallel computations with precedent constraints. IEEE TPDS, 11. p. 491-508.

Truta, H., Vivas, J. L., Brito, A., and Nobrega, T. (2017). A predictive approach for enhancing resource utilization in paas clouds. In Proceedings of the SAC 2017.

Zaharia, M., Chowdhury, M., Franklin, M. J., Shenker, S., and Stoica, I. (2010). Spark: Cluster computing with working sets. In Proceedings of the 2Nd USENIX HotCloud.

Zaharia, M. et al. (2012). Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In Proceedings of the 9th USENIX NSDI.

Zaki, M. J., Meira Jr, W., and Meira, W. (2014). Data mining and analysis: fundamental concepts and algorithms. Cambridge University Press.
Published
2018-05-10
PINTO, Túlio B. M.; SILVA, Ana Paula Couto da; ALMEIDA, Jussara M.. Previsão do tempo de resposta de aplicações de big data em ambientes de nuvem. In: BRAZILIAN SYMPOSIUM ON COMPUTER NETWORKS AND DISTRIBUTED SYSTEMS (SBRC), 36. , 2018, Campos do Jordão. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2018 . p. 533-546. ISSN 2177-9384. DOI: https://doi.org/10.5753/sbrc.2018.2440.

Most read articles by the same author(s)

1 2 3 > >>