Estimating Code Running Time Complexity with Machine Learning

Ricardo J. Pfitscher; Gabriel B. Rodenbusch; Anderson Dias; Paulo Vieira; Nuno M. M. D. Fouto

Ricardo J. Pfitscher UFSC
Gabriel B. Rodenbusch UFSC
Anderson Dias UniSociesc
Paulo Vieira UniSociesc
Nuno M. M. D. Fouto USP

Resumo

The running time complexity is a crucial measure for determining the computational efficiency of a given program or algorithm. Depending on the problem complexity class, it can be considered intractable; a program that solves this problem will consume so many resources for a sufficiently large input that it will be unfeasible to execute it. Due to Alan Turing’s halting problem, it is impossible to write a program capable of determining the execution time of a given program and, therefore, classifying it according to its complexity class. Despite this limitation, an approximate running time value can be helpful to support development teams in evaluating the efficiency of their produced code. Furthermore, software-integrated development environments (IDEs) could show real-time efficiency indicators for their programmers. Recent research efforts have made, through artificial intelligence techniques, complexity estimations based on code characteristics (e.g., number of nested loops and number of conditional tests). However, there are no databases that relate code characteristics with complexity classes considered inefficient (e.g., and O(n!)), which limits current research results. This research compared three machine learning approaches (i.e., Random Forest, eXtreme Gradient Boosting, and Artificial Neural Networks) regarding their accuracy in predicting Java program codes’ efficiency and complexity class. We train each model using a dataset that merges data from literature and 394 program codes with their respective complexity classes crawled from a publicly available website. Results show that Random Forest resulted in the best accuracy, being 90.17% accurate when predicting codes’ efficiency and 89.84% in estimating complexity classes.