Estimating Code Running Time Complexity with Machine Learning

Pfitscher, Ricardo J.; Rodenbusch, Gabriel B.; Dias, Anderson; Vieira, Paulo; Fouto, Nuno M. M. D.

doi:10.1007/978-3-031-45389-2_27

Ricardo J. Pfitscher⁹,
Gabriel B. Rodenbusch⁹,
Anderson Dias¹⁰,
Paulo Vieira¹⁰ &
…
Nuno M. M. D. Fouto¹¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14196))

Included in the following conference series:

Brazilian Conference on Intelligent Systems

287 Accesses

Abstract

The running time complexity is a crucial measure for determining the computational efficiency of a given program or algorithm. Depending on the problem complexity class, it can be considered intractable; a program that solves this problem will consume so many resources for a sufficiently large input that it will be unfeasible to execute it. Due to Alan Turing’s halting problem, it is impossible to write a program capable of determining the execution time of a given program and, therefore, classifying it according to its complexity class. Despite this limitation, an approximate running time value can be helpful to support development teams in evaluating the efficiency of their produced code. Furthermore, software-integrated development environments (IDEs) could show real-time efficiency indicators for their programmers. Recent research efforts have made, through artificial intelligence techniques, complexity estimations based on code characteristics (e.g., number of nested loops and number of conditional tests). However, there are no databases that relate code characteristics with complexity classes considered inefficient (e.g., \(O(c^n)\) and O(n!)), which limits current research results. This research compared three machine learning approaches (i.e., Random Forest, eXtreme Gradient Boosting, and Artificial Neural Networks) regarding their accuracy in predicting Java program codes’ efficiency and complexity class. We train each model using a dataset that merges data from literature and 394 program codes with their respective complexity classes crawled from a publicly available website. Results show that Random Forest resulted in the best accuracy, being 90.17% accurate when predicting codes’ efficiency and 89.84% in estimating complexity classes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13(2) (2012)
Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)
Article Google Scholar
Chang, Y.C., Chang, K.H., Wu, G.J.: Application of extreme gradient boosting trees in the construction of credit risk assessment models for financial institutions. Appl. Soft Comput. 73, 914–920 (2018)
Article Google Scholar
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Article MATH Google Scholar
Chen, T., Guestrin, C.: Xgboost: A scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794 (2016)
Google Scholar
Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to algorithms. MIT press (2022)
Google Scholar
Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Annals of statistics, pp. 1189–1232 (2001)
Google Scholar
Garbin, C., Zhu, X., Marques, O.: Dropout vs. batch normalization: an empirical study of their impact to deep learning. Multimed. Tools Appl. 79, 12777–12815 (2020)
Google Scholar
Hutter, F., Xu, L., Hoos, H.H., Leyton-Brown, K.: Algorithm runtime prediction: methods & evaluation. Artif. Intell. 206, 79–111 (2014)
Article MathSciNet MATH Google Scholar
Kleinberg, J., Tardos, E.: Algorithm design. Pearson Education India (2006)
Google Scholar
Kursa, M.B., Rudnicki, W.R.: Feature selection with the boruta package. J. Stat. Softw. 36, 1–13 (2010)
Article Google Scholar
Lucas, S.: The origins of the halting problem. J. Logical Algebraic Methods Programming 121, 100687 (2021)
Article MathSciNet MATH Google Scholar
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet MATH Google Scholar
Ramyachitra, D., Manikandan, P.: Imbalanced dataset classification and solutions: a review. Int. J. Comput. Bus. Res. (IJCBR) 5(4), 1–29 (2014)
Google Scholar
Santos, M.S., Soares, J.P., Abreu, P.H., Araujo, H., Santos, J.: Cross-validation for imbalanced datasets: avoiding overoptimistic and overfitting approaches [research frontier]. IEEE Comput. Intell. Magaz. 13(4), 59–76 (2018)
Google Scholar
Sechidis, K., Tsoumakas, G., Vlahavas, I.: On the stratification of multi-label data. In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds.) ECML PKDD 2011. LNCS (LNAI), vol. 6913, pp. 145–158. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23808-6_10
Chapter Google Scholar
Seifzadeh, S.: Ai for code: Predict code complexity using IBM’s CodeNet Dataset, October 2021. https://community.ibm.com/community/user/ai-datascience/blogs/sepideh-seifzadeh1/2021/10/05/ai-for-code-predict-code-complexity-using-ibms-cod
Sikka, J., Satya, K., Kumar, Y., Uppal, S., Shah, R.R., Zimmermann, R.: Learning based methods for code runtime complexity prediction. In: Jose, J.M., Yilmaz, E., Magalhães, J., Castells, P., Ferro, N., Silva, M.J., Martins, F. (eds.) ECIR 2020. LNCS, vol. 12035, pp. 313–325. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-45439-5_21
Chapter Google Scholar
Speiser, J.L., Miller, M.E., Tooze, J., Ip, E.: A comparison of random forest variable selection methods for classification prediction modeling. Expert Syst. Appl. 134, 93–101 (2019)
Article Google Scholar
Thara, D., PremaSudha, B., Xiong, F.: Auto-detection of epileptic seizure events using deep neural network with different feature scaling techniques. Pattern Recogn. Lett. 128, 544–550 (2019)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Federal University of Santa Catarina, Florianópolis, Brazil
Ricardo J. Pfitscher & Gabriel B. Rodenbusch
UniSociesc Campus Anita Garibaldi, Joinville, Brazil
Anderson Dias & Paulo Vieira
University of São Paulo, São Paulo, Brazil
Nuno M. M. D. Fouto

Authors

Ricardo J. Pfitscher
View author publications
You can also search for this author in PubMed Google Scholar
Gabriel B. Rodenbusch
View author publications
You can also search for this author in PubMed Google Scholar
Anderson Dias
View author publications
You can also search for this author in PubMed Google Scholar
Paulo Vieira
View author publications
You can also search for this author in PubMed Google Scholar
Nuno M. M. D. Fouto
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ricardo J. Pfitscher .

Editor information

Editors and Affiliations

Federal University of São Carlos, São Carlos, Brazil
Murilo C. Naldi
Centro Universitario da FEI, São Bernardo do Campo, Brazil
Reinaldo A. C. Bianchi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pfitscher, R.J., Rodenbusch, G.B., Dias, A., Vieira, P., Fouto, N.M.M.D. (2023). Estimating Code Running Time Complexity with Machine Learning. In: Naldi, M.C., Bianchi, R.A.C. (eds) Intelligent Systems. BRACIS 2023. Lecture Notes in Computer Science(), vol 14196. Springer, Cham. https://doi.org/10.1007/978-3-031-45389-2_27

Download citation

DOI: https://doi.org/10.1007/978-3-031-45389-2_27
Published: 12 October 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-45388-5
Online ISBN: 978-3-031-45389-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Estimating Code Running Time Complexity with Machine Learning