Abstract
The running time complexity is a crucial measure for determining the computational efficiency of a given program or algorithm. Depending on the problem complexity class, it can be considered intractable; a program that solves this problem will consume so many resources for a sufficiently large input that it will be unfeasible to execute it. Due to Alan Turing’s halting problem, it is impossible to write a program capable of determining the execution time of a given program and, therefore, classifying it according to its complexity class. Despite this limitation, an approximate running time value can be helpful to support development teams in evaluating the efficiency of their produced code. Furthermore, software-integrated development environments (IDEs) could show real-time efficiency indicators for their programmers. Recent research efforts have made, through artificial intelligence techniques, complexity estimations based on code characteristics (e.g., number of nested loops and number of conditional tests). However, there are no databases that relate code characteristics with complexity classes considered inefficient (e.g., \(O(c^n)\) and O(n!)), which limits current research results. This research compared three machine learning approaches (i.e., Random Forest, eXtreme Gradient Boosting, and Artificial Neural Networks) regarding their accuracy in predicting Java program codes’ efficiency and complexity class. We train each model using a dataset that merges data from literature and 394 program codes with their respective complexity classes crawled from a publicly available website. Results show that Random Forest resulted in the best accuracy, being 90.17% accurate when predicting codes’ efficiency and 89.84% in estimating complexity classes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13(2) (2012)
Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)
Chang, Y.C., Chang, K.H., Wu, G.J.: Application of extreme gradient boosting trees in the construction of credit risk assessment models for financial institutions. Appl. Soft Comput. 73, 914–920 (2018)
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Chen, T., Guestrin, C.: Xgboost: A scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794 (2016)
Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to algorithms. MIT press (2022)
Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Annals of statistics, pp. 1189–1232 (2001)
Garbin, C., Zhu, X., Marques, O.: Dropout vs. batch normalization: an empirical study of their impact to deep learning. Multimed. Tools Appl. 79, 12777–12815 (2020)
Hutter, F., Xu, L., Hoos, H.H., Leyton-Brown, K.: Algorithm runtime prediction: methods & evaluation. Artif. Intell. 206, 79–111 (2014)
Kleinberg, J., Tardos, E.: Algorithm design. Pearson Education India (2006)
Kursa, M.B., Rudnicki, W.R.: Feature selection with the boruta package. J. Stat. Softw. 36, 1–13 (2010)
Lucas, S.: The origins of the halting problem. J. Logical Algebraic Methods Programming 121, 100687 (2021)
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Ramyachitra, D., Manikandan, P.: Imbalanced dataset classification and solutions: a review. Int. J. Comput. Bus. Res. (IJCBR) 5(4), 1–29 (2014)
Santos, M.S., Soares, J.P., Abreu, P.H., Araujo, H., Santos, J.: Cross-validation for imbalanced datasets: avoiding overoptimistic and overfitting approaches [research frontier]. IEEE Comput. Intell. Magaz. 13(4), 59–76 (2018)
Sechidis, K., Tsoumakas, G., Vlahavas, I.: On the stratification of multi-label data. In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds.) ECML PKDD 2011. LNCS (LNAI), vol. 6913, pp. 145–158. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23808-6_10
Seifzadeh, S.: Ai for code: Predict code complexity using IBM’s CodeNet Dataset, October 2021. https://community.ibm.com/community/user/ai-datascience/blogs/sepideh-seifzadeh1/2021/10/05/ai-for-code-predict-code-complexity-using-ibms-cod
Sikka, J., Satya, K., Kumar, Y., Uppal, S., Shah, R.R., Zimmermann, R.: Learning based methods for code runtime complexity prediction. In: Jose, J.M., Yilmaz, E., Magalhães, J., Castells, P., Ferro, N., Silva, M.J., Martins, F. (eds.) ECIR 2020. LNCS, vol. 12035, pp. 313–325. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-45439-5_21
Speiser, J.L., Miller, M.E., Tooze, J., Ip, E.: A comparison of random forest variable selection methods for classification prediction modeling. Expert Syst. Appl. 134, 93–101 (2019)
Thara, D., PremaSudha, B., Xiong, F.: Auto-detection of epileptic seizure events using deep neural network with different feature scaling techniques. Pattern Recogn. Lett. 128, 544–550 (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Pfitscher, R.J., Rodenbusch, G.B., Dias, A., Vieira, P., Fouto, N.M.M.D. (2023). Estimating Code Running Time Complexity with Machine Learning. In: Naldi, M.C., Bianchi, R.A.C. (eds) Intelligent Systems. BRACIS 2023. Lecture Notes in Computer Science(), vol 14196. Springer, Cham. https://doi.org/10.1007/978-3-031-45389-2_27
Download citation
DOI: https://doi.org/10.1007/978-3-031-45389-2_27
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-45388-5
Online ISBN: 978-3-031-45389-2
eBook Packages: Computer ScienceComputer Science (R0)