Online Performance Modeling for NoSQL Databases using Extreme Learning Machines
Resumo
NoSQL databases rise as a solution to manage large amounts of data in the cloud. Mechanisms to guarantee Quality of Service in can significantly benefit from performance predictability. Building an accurate predictive model to estimate a DBMS performance in a cloud environment is challenging since i) workload and resources allocation change dynamically; ii) concurrency and distribution introduce nonlinearity on performance metrics and iii) predictive models should be trained and updated online to capture unseen workloads. This paper presents an online performance modeling approach for NoSQL databases using extreme learning machines. Experimental results confirm that our performance modeling can accurately predict throughput under several scenarios.
Palavras-chave:
NoSQL, Predictive Model, DBMS Performance, Cloud
Referências
Cooper, B. F., Silberstein, A., Tam, E., Ramakrishnan, R., and Sears, R. (2010). Benchmarking cloud serving systems with ycsb. In Proceedings of the 1st ACM symposium on Cloud computing, pages 143–154. ACM.
Didona, D. and Romano, P. (2014). On bootstrapping machine learning performance predictors via analytical models. arXiv preprint arXiv:1410.5102.
Duggan, J., Cetintemel, U., Papaemmanouil, O., and Upfal, E. (2011). Performance prediction for concurrent database workloads. In ACM SIGMOD, pages 337–348. ACM.
Elmore, A. J., Das, S., Agrawal, D., and El Abbadi, A. (2011). Zephyr: live migration in shared nothing databases for elastic cloud platforms. In SIGMOD ’11, pages 301–312.
Farias, V. A. E., Sousa, F. R. C., Maia, J. G. R., Gomes, J. a. P. P., and Machado, J. C. (2016a). Elastic provisioning for cloud databases with uncertainty management. In ACM SAC, pages 390–397.
Farias, V. A. E., Sousa, F. R. C., Maia, J. G. R., Gomes, J. P. P., and Machado, J. C. (2016b). Machine learning approach for cloud nosql databases performance modeling. In CCGrid, pages 617–620.
Ganapathi, A., Kuno, H., Dayal, U., Wiener, J. L., Fox, A., Jordan, M., and Patterson, D. (2009). Predicting multiple metrics for queries: Better decisions enabled by machine learning. In ICDE, pages 592–603. IEEE.
Gray, J., Helland, P., O’Neil, P., and Shasha, D. (1996). The dangers of replication and a solution. In ACM SIGMOD Record, volume 25, pages 173–182. ACM.
Huang, G.-B., Zhu, Q.-Y., and Siew, C.-K. (2004). Extreme learning machine: a new learning scheme of feedforward neural networks. In Neural Networks, 2004. Proceedings. 2004 IEEE International Joint Conference on, volume 2, pages 985–990. IEEE.
Inc, M. (2015). MongoDB. http://www.mongodb.com.
Liang, N.-Y., Huang, G.-B., Saratchandran, P., and Sundararajan, N. (2006). A fast and accurate online sequential learning algorithm for feedforward networks. IEEE Transactions on neural networks, 17(6):1411–1423.
Mozafari, B., Curino, C., Jindal, A., and Madden, S. (2013). Performance and resource modeling in highly-concurrent oltp workloads. In ACM SIGMOD, pages 301–312. ACM.
Schmidt, W. F., Kraaijveld, M. A., and Duin, R. P. (1992). Feedforward neural networks with random weights. In Pattern Recognition, 1992. Vol. II. Conference B: Pattern Recognition Methodology and Systems, Proceedings., 11th IAPR International Conference on, pages 1–4. IEEE.
Sheikh, M. B., Minhas, U. F., Khan, O. Z., Aboulnaga, A., Poupart, P., and Taylor, D. J. (2011). A bayesian approach to online performance modeling for database appliances using gaussian models. In Proceedings of the 8th ACM international conference on computing, pages 121–130. ACM.
Walt, S. v. d., Colbert, S. C., and Varoquaux, G. (2011). The numpy array: a structure for efficient numerical computation. Computing in Science & Engineering, 13(2):22–30.
Didona, D. and Romano, P. (2014). On bootstrapping machine learning performance predictors via analytical models. arXiv preprint arXiv:1410.5102.
Duggan, J., Cetintemel, U., Papaemmanouil, O., and Upfal, E. (2011). Performance prediction for concurrent database workloads. In ACM SIGMOD, pages 337–348. ACM.
Elmore, A. J., Das, S., Agrawal, D., and El Abbadi, A. (2011). Zephyr: live migration in shared nothing databases for elastic cloud platforms. In SIGMOD ’11, pages 301–312.
Farias, V. A. E., Sousa, F. R. C., Maia, J. G. R., Gomes, J. a. P. P., and Machado, J. C. (2016a). Elastic provisioning for cloud databases with uncertainty management. In ACM SAC, pages 390–397.
Farias, V. A. E., Sousa, F. R. C., Maia, J. G. R., Gomes, J. P. P., and Machado, J. C. (2016b). Machine learning approach for cloud nosql databases performance modeling. In CCGrid, pages 617–620.
Ganapathi, A., Kuno, H., Dayal, U., Wiener, J. L., Fox, A., Jordan, M., and Patterson, D. (2009). Predicting multiple metrics for queries: Better decisions enabled by machine learning. In ICDE, pages 592–603. IEEE.
Gray, J., Helland, P., O’Neil, P., and Shasha, D. (1996). The dangers of replication and a solution. In ACM SIGMOD Record, volume 25, pages 173–182. ACM.
Huang, G.-B., Zhu, Q.-Y., and Siew, C.-K. (2004). Extreme learning machine: a new learning scheme of feedforward neural networks. In Neural Networks, 2004. Proceedings. 2004 IEEE International Joint Conference on, volume 2, pages 985–990. IEEE.
Inc, M. (2015). MongoDB. http://www.mongodb.com.
Liang, N.-Y., Huang, G.-B., Saratchandran, P., and Sundararajan, N. (2006). A fast and accurate online sequential learning algorithm for feedforward networks. IEEE Transactions on neural networks, 17(6):1411–1423.
Mozafari, B., Curino, C., Jindal, A., and Madden, S. (2013). Performance and resource modeling in highly-concurrent oltp workloads. In ACM SIGMOD, pages 301–312. ACM.
Schmidt, W. F., Kraaijveld, M. A., and Duin, R. P. (1992). Feedforward neural networks with random weights. In Pattern Recognition, 1992. Vol. II. Conference B: Pattern Recognition Methodology and Systems, Proceedings., 11th IAPR International Conference on, pages 1–4. IEEE.
Sheikh, M. B., Minhas, U. F., Khan, O. Z., Aboulnaga, A., Poupart, P., and Taylor, D. J. (2011). A bayesian approach to online performance modeling for database appliances using gaussian models. In Proceedings of the 8th ACM international conference on computing, pages 121–130. ACM.
Walt, S. v. d., Colbert, S. C., and Varoquaux, G. (2011). The numpy array: a structure for efficient numerical computation. Computing in Science & Engineering, 13(2):22–30.
Publicado
02/10/2017
Como Citar
FARIAS, Victor A. E.; PINHEIRO, Pedro R. A.; SOUSA, Flávio R. C.; GOMES, João P. P.; MACHADO, Javam C..
Online Performance Modeling for NoSQL Databases using Extreme Learning Machines. In: SIMPÓSIO BRASILEIRO DE BANCO DE DADOS (SBBD), 32. , 2017, Uberlândia/MG.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2017
.
p. 276-281.
ISSN 2763-8979.
DOI: https://doi.org/10.5753/sbbd.2017.174648.