Workload-aware Parameter Selection and Performance Prediction for In-memory Databases

Maria I. V. Lima; Victor A. E de Farias; Francisco D. B. S. Praciano; Javam C. Machado

doi:10.5753/sbbd.2018.22228

Maria I. V. Lima Universidade Federal do Ceará (UFC)
Victor A. E de Farias Universidade Federal do Ceará (UFC)
Francisco D. B. S. Praciano Universidade Federal do Ceará (UFC)
Javam C. Machado Universidade Federal do Ceará (UFC)

DOI: https://doi.org/10.5753/sbbd.2018.22228

Resumo

In-memory databases, just as hard drive ones, may offer hundreds of customizable settings, making the task of system tuning overwhelming for a database administrator. Even worse, the number of parameters continues to grow over the years and they can affect performance in a not intuitive manner. Models that capture their behavior can assist automatic tuning mechanisms to obtain optimal performance. In this work, we propose a learning-based approach to select the most meaningful parameters and generate a performance model based on both the workload and the database configurations. Experimental results confirm that our approach can create accurate performance models using only a reduced set of selected parameters.

Palavras-chave: In-memory databases, earning-based approach, database configurations, tuning

Referências

Aken, D. V., Pavlo, A., Gordon, G. J., and Zhang, B. (2017). Automatic Database Management System Tuning Through Large-scale Machine Learning. In Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD Conference 2017, Chicago, IL, USA, May 14-19, 2017, pages 1009–1024.

Breiman, L. (2001). Random Forests. Machine Learning, 45(1):5–32.

Breiman, L., Friedman, J. H., Olshen, R. A., and Stone, C. J. (1984). Classification and Regression Trees. Wadsworth.

Debnath, B. K., Lilja, D. J., and Mokbel, M. F. (2008). SARD: A statistical approach for ranking database tuning parameters. In Proceedings of the 24th International Conference on Data Engineering Workshops, ICDE 2008, April 7-12, 2008, Cancun, México , pages 11–18.

Difallah, D. E., Pavlo, A., Curino, C., and Cudre-Mauroux, P. (2013). OLTP-Bench: An Extensible Testbed for Benchmarking Relational Databases. PVLDB, 7(4):277–288.

Duan, S., Thummala, V., and Babu, S. (2009). Tuning Database Configuration Parameters with iTuned. PVLDB, 2(1):1246–1257.

Friedman, J. H. (2001). Greedy function approximation: a gradient boosting machine. Annals of statistics, pages 1189–1232.

Ganapathi, A., Kuno, H. A., Dayal, U., Wiener, J. L., Fox, A., Jordan, M. I., and Patterson, D. A. (2009). Predicting Multiple Metrics for Queries: Better Decisions Enabled by Machine Learning. In Proceedings of the 25th International Conference on Data Engineering, ICDE 2009, March 29 2009 - April 2 2009, Shanghai, China, pages 592–603.

Garcia-Molina, H. and Salem, K. (1992). Main Memory Database Systems: An Overview. IEEE Trans. Knowl. Data Eng., 4(6):509–516.

Guyon, I., Weston, J., Barnhill, S., and Vapnik, V. (2002). Gene selection for cancer classification using support vector machines. Machine Learning, 46(1-3):389–422.

Lake, P. and Crowther, P. (2013). In-memory databases. In Concise Guide to Databases, pages 183–197. Springer.

Levandoski, J. J., Lomet, D. B., and Sengupta, S. (2013). The Bw-Tree: A B-tree for new hardware platforms. In 29th IEEE International Conference on Data Engineering, ICDE 2013, Brisbane, Australia, April 8-12, 2013, pages 302–313.

McCallum, J. C. (2017). Memory prices (1957-2017). https://jcmit.net/memoryprice.htm. Accessed: 2018-03-05.

Mozafari, B., Curino, C., Jindal, A., and Madden, S. (2013). Performance and resource modeling in highly-concurrent OLTP workloads. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2013, New York, NY, USA, June 22-27, 2013, pages 301–312.

Pavlo, A., Angulo, G., Arulraj, J., Lin, H., Lin, J., Ma, L., Menon, P., Mowry, T. C., Perron, M., Quah, I., Santurkar, S., Tomasic, A., Toor, S., Aken, D. V., Wang, Z., Wu, Y., Xian, R., and Zhang, T. (2017). Self-Driving Database Management Systems. In CIDR 2017, 8th Biennial Conference on Innovative Data Systems Research, Chaminade, CA, USA, January 8-11, 2017, Online Proceedings.

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., VanderPlas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., and Duchesnay, E. (2011). Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12:2825–2830.

Shamgunov, N. (2014). The MemSQL In-Memory Database System. In Proceedings of the 2nd International Workshop on In Memory Data Management and Analytics, IMDM 2014, Hangzhou, China, September 1, 2014.

Stone, M. (1974). Cross-validatory choice and assessment of statistical predictions. Journal of the royal statistical society. Series B (Methodological), pages 111–147.

Stonebraker, M. and Weisberg, A. (2013). The VoltDB Main Memory DBMS. IEEE Data Eng. Bull., 36(2):21–27.

The Transaction Processing Council (2007). TPC-C Benchmark (Revision 5.11). http://www.tpc.org/TPC_Documents_Current_Versions/pdf/tpc-c_v5.11.0.pdf.

Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), pages 267–288.

Xu, T., Jin, L., Fan, X., Zhou, Y., Pasupathy, S., and Talwadker, R. (2015). Hey, you have given me too many knobs!: understanding and dealing with over-designed configuration in system software. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2015, Bergamo, Italy, August 30 - September 4, 2015, pages 307–319.