Skip to main content

Estimating Code Running Time Complexity with Machine Learning

  • Conference paper
  • First Online:
Intelligent Systems (BRACIS 2023)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14196))

Included in the following conference series:

  • 287 Accesses

Abstract

The running time complexity is a crucial measure for determining the computational efficiency of a given program or algorithm. Depending on the problem complexity class, it can be considered intractable; a program that solves this problem will consume so many resources for a sufficiently large input that it will be unfeasible to execute it. Due to Alan Turing’s halting problem, it is impossible to write a program capable of determining the execution time of a given program and, therefore, classifying it according to its complexity class. Despite this limitation, an approximate running time value can be helpful to support development teams in evaluating the efficiency of their produced code. Furthermore, software-integrated development environments (IDEs) could show real-time efficiency indicators for their programmers. Recent research efforts have made, through artificial intelligence techniques, complexity estimations based on code characteristics (e.g., number of nested loops and number of conditional tests). However, there are no databases that relate code characteristics with complexity classes considered inefficient (e.g., \(O(c^n)\) and O(n!)), which limits current research results. This research compared three machine learning approaches (i.e., Random Forest, eXtreme Gradient Boosting, and Artificial Neural Networks) regarding their accuracy in predicting Java program codes’ efficiency and complexity class. We train each model using a dataset that merges data from literature and 394 program codes with their respective complexity classes crawled from a publicly available website. Results show that Random Forest resulted in the best accuracy, being 90.17% accurate when predicting codes’ efficiency and 89.84% in estimating complexity classes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/ricardopfitscher/RuTiCo.

  2. 2.

    https://www.geeksforgeeks.org/fundamentals-of-algorithms/.

References

  1. Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13(2) (2012)

    Google Scholar 

  2. Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)

    Article  Google Scholar 

  3. Chang, Y.C., Chang, K.H., Wu, G.J.: Application of extreme gradient boosting trees in the construction of credit risk assessment models for financial institutions. Appl. Soft Comput. 73, 914–920 (2018)

    Article  Google Scholar 

  4. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)

    Article  MATH  Google Scholar 

  5. Chen, T., Guestrin, C.: Xgboost: A scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794 (2016)

    Google Scholar 

  6. Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to algorithms. MIT press (2022)

    Google Scholar 

  7. Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Annals of statistics, pp. 1189–1232 (2001)

    Google Scholar 

  8. Garbin, C., Zhu, X., Marques, O.: Dropout vs. batch normalization: an empirical study of their impact to deep learning. Multimed. Tools Appl. 79, 12777–12815 (2020)

    Google Scholar 

  9. Hutter, F., Xu, L., Hoos, H.H., Leyton-Brown, K.: Algorithm runtime prediction: methods & evaluation. Artif. Intell. 206, 79–111 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  10. Kleinberg, J., Tardos, E.: Algorithm design. Pearson Education India (2006)

    Google Scholar 

  11. Kursa, M.B., Rudnicki, W.R.: Feature selection with the boruta package. J. Stat. Softw. 36, 1–13 (2010)

    Article  Google Scholar 

  12. Lucas, S.: The origins of the halting problem. J. Logical Algebraic Methods Programming 121, 100687 (2021)

    Article  MathSciNet  MATH  Google Scholar 

  13. Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  14. Ramyachitra, D., Manikandan, P.: Imbalanced dataset classification and solutions: a review. Int. J. Comput. Bus. Res. (IJCBR) 5(4), 1–29 (2014)

    Google Scholar 

  15. Santos, M.S., Soares, J.P., Abreu, P.H., Araujo, H., Santos, J.: Cross-validation for imbalanced datasets: avoiding overoptimistic and overfitting approaches [research frontier]. IEEE Comput. Intell. Magaz. 13(4), 59–76 (2018)

    Google Scholar 

  16. Sechidis, K., Tsoumakas, G., Vlahavas, I.: On the stratification of multi-label data. In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds.) ECML PKDD 2011. LNCS (LNAI), vol. 6913, pp. 145–158. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23808-6_10

    Chapter  Google Scholar 

  17. Seifzadeh, S.: Ai for code: Predict code complexity using IBM’s CodeNet Dataset, October 2021. https://community.ibm.com/community/user/ai-datascience/blogs/sepideh-seifzadeh1/2021/10/05/ai-for-code-predict-code-complexity-using-ibms-cod

  18. Sikka, J., Satya, K., Kumar, Y., Uppal, S., Shah, R.R., Zimmermann, R.: Learning based methods for code runtime complexity prediction. In: Jose, J.M., Yilmaz, E., Magalhães, J., Castells, P., Ferro, N., Silva, M.J., Martins, F. (eds.) ECIR 2020. LNCS, vol. 12035, pp. 313–325. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-45439-5_21

    Chapter  Google Scholar 

  19. Speiser, J.L., Miller, M.E., Tooze, J., Ip, E.: A comparison of random forest variable selection methods for classification prediction modeling. Expert Syst. Appl. 134, 93–101 (2019)

    Article  Google Scholar 

  20. Thara, D., PremaSudha, B., Xiong, F.: Auto-detection of epileptic seizure events using deep neural network with different feature scaling techniques. Pattern Recogn. Lett. 128, 544–550 (2019)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ricardo J. Pfitscher .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Pfitscher, R.J., Rodenbusch, G.B., Dias, A., Vieira, P., Fouto, N.M.M.D. (2023). Estimating Code Running Time Complexity with Machine Learning. In: Naldi, M.C., Bianchi, R.A.C. (eds) Intelligent Systems. BRACIS 2023. Lecture Notes in Computer Science(), vol 14196. Springer, Cham. https://doi.org/10.1007/978-3-031-45389-2_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-45389-2_27

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-45388-5

  • Online ISBN: 978-3-031-45389-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics