On the Effectiveness of Trivial Refactorings in Predicting Non-trivial Refactorings

Authors

DOI:

https://doi.org/10.5753/jserd.2024.3324

Keywords:

Refactoring, Machine Learning, Software Quality

Abstract

Refactoring is the process of restructuring source code without changing the external behavior of the software. Refactoring can bring many benefits, such as removing code with poor structural quality, avoiding or reducing technical debt, and improving maintainability, reuse, or code readability. Although there is research on how to predict refactorings, there is still a clear lack of studies that assess the impact of operations considered less complex (trivial) to more complex (non-trivial). In addition, the literature suggests conducting studies that invest in improving automated solutions through detecting and correcting refactoring. This study aims to identify refactoring activity in non-trivial operations through trivial operations accurately. For this, we use classifier models of supervised learning, considering the influence of trivial refactorings and evaluating performance in other data domains. To achieve this goal, we assembled 3 datasets totaling 1,291 open-source projects, extracted approximately 1.9M refactoring operations, collected 45 attributes and code metrics from each file involved in the refactoring and used the algorithms Decision Tree, Random Forest, Logistic Regression, Naive Bayes and Neural Network of supervised learning to investigate the impact of trivial refactorings on the prediction of non-trivial refactorings. For this study, we contextualize the data and call context each experiment configuration in which it combines trivial and non-trivial refactorings. Our results indicate that: (i) Tree-based models such as Random Forest, Decision Tree, and Neural Networks performed very well when trained with code metrics to detect refactoring opportunities. However, only the first two were able to demonstrate good generalization in other data domain contexts of refactoring; (ii) Separating trivial and non-trivial refactorings into different classes resulted in a more efficient model. This approach still resulted in a more efficient model even when tested on different datasets; (iii) Using balancing techniques that increase or decrease samples may not be the best strategy to improve models trained on datasets composed of code metrics and configured according to our study.

Downloads

Download data is not yet available.

References

Aggarwal, K., Singh, Y., Kaur, A., and Malhotra, R. (2006). Empirical study of object-oriented metrics. J. Object Technol., 5(8):149–173.

Agnihotri, M. and Chug, A. (2020). A systematic literature survey of software metrics, code smells and refactoring techniques. Journal of Information Processing Systems, 16(4):915–934.

Al Dallal, J. (2012). Constructing models for predicting extract subclass refactoring opportunities using object-oriented quality metrics. Information and Software Technology, 54(10):1125–1141.

Alkhalid, A., Alshayeb, M., and Mahmoud, S. (2010). Software refactoring at the function level using new adaptive k-nearest neighbor algorithm. Advances in Engineering Software, 41(10-11):1160–1178.

Alkhalid, A., Alshayeb, M., and Mahmoud, S. A. (2011). Software refactoring at the package level using clustering techniques. IET software, 5(3):274–286.

AlOmar, E. A., Liu, J., Addo, K., Mkaouer, M. W., Newman, C., Ouni, A., and Yu, Z. (2022). On the documentation of refactoring types. Automated Software Engineering, 29(1):1–40.

AlOmar, E. A., Peruma, A., Mkaouer, M. W., Newman, C., Ouni, A., and Kessentini, M. (2021). How we refactor and how we document it? on the use of supervised machine learning algorithms to classify refactoring documentation. Expert Systems with Applications, 167:114176.

Aniche, M. (2015). Java code metrics calculator (CK). Available in [link].

Aniche, M., Maziero, E., Durelli, R., and Durelli, V. (2020). The effectiveness of supervised machine learning algorithms in predicting software refactoring. IEEE Transactions on Software Engineering, pages 1–1.

Azeem, M. I., Palomba, F., Shi, L., and Wang, Q. (2019). Machine learning techniques for code smell detection: A systematic literature review and meta-analysis. Information and Software Technology, 108:115–138.

Bavota, G., De Lucia, A., Di Penta, M., Oliveto, R., and Palomba, F. (2015). An experimental investigation on the innate relationship between quality and refactoring. Journal of Systems and Software, 107:1–14.

Bavota, G., Oliveto, R., De Lucia, A., Antoniol, G., and Guéhéneuc, Y.-G. (2010). Playing with refactoring: Identifying extract class opportunities through game theory. In 2010 IEEE International Conference on Software Maintenance, pages 1–5. IEEE.

Bibiano, A. C., Uchôa, A., Assunção, W. K., Tenório, D., Colanzi, T. E., Vergilio, S. R., and Garcia, A. (2023). Composite refactoring: Representations, characteristics and effects on software projects. Information and Software Technology, 156:107134.

Bishop, C. M. and Nasrabadi, N. M. (2006). Pattern recognition and machine learning, volume 4. Springer.

Bryksin, T., Novozhilov, E., and Shpilman, A. (2018). Automatic recommendation of move method refactorings using clustering ensembles. In Proceedings of the 2nd International Workshop on Refactoring, pages 42–45.

Carvalho, D. V., Pereira, E. M., and Cardoso, J. S. (2019). Machine learning interpretability: A survey on methods and metrics. Electronics, 8(8).

Cassell, K., Andreae, P., and Groves, L. (2011). A dual clustering approach to the extract class refactoring. In SEKE, pages 77–82.

Chawla, N. V., Bowyer, K. W., Hall, L. O., and Kegelmeyer, W. P. (2002). Smote: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16:321–357.

Chicco, D. and Jurman, G. (2020). The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC ge-nomics, 21(1):1–13.

Chidamber, S. R. and Kemerer, C. F. (1994). A metrics suite for object oriented design. IEEE Transactions on software engineering, 20(6):476–493.

Cutler, A., Cutler, D. R., and Stevens, J. R. (2012). Random forests. In Ensemble machine learning, pages 157–175. Springer.

Davis, J. and Goadrich, M. (2006). The relationship between precision-recall and roc curves. In Proceedings of the 23rd international conference on Machine learning, pages 233– 240.

de Mello, R., Oliveira, R., Uchôa, A., Oizumi, W., Garcia, A., Fonseca, B., and de Mello, F. (2022). Recommendations for developers identifying code smells. IEEE Software, 40(2):90–98.

de Paulo Sobrinho, E. V., De Lucia, A., and de Almeida Maia, M. (2018). A systematic literature review on bad smells–5 w’s: which, when, what, who, where. IEEE Transactions on Software Engineering, 47(1):17–66.

Du Bois, B., Demeyer, S., and Verelst, J. (2004). Refactoring-improving coupling and cohesion of existing code. In 11th working conference on reverse engineering, pages 144–151. IEEE.

Eposhi, A., Oizumi, W., Garcia, A., Sousa, L., Oliveira, R., and Oliveira, A. (2019). Removal of design problems through refactorings: are we looking at the right symptoms? In 2019 IEEE/ACM 27th International Conference on Program Comprehension (ICPC), pages 148–153. IEEE.

Fernandes, E., Chávez, A., Garcia, A., Ferreira, I., Cedrim, D., Sousa, L., and Oizumi, W. (2020). Refactoring effect on internal quality attributes: What haven’t they told you yet? Information and Software Technology, 126:106347.

Hanley, J. A. and McNeil, B. J. (1982). The meaning and use of the area under a receiver operating characteristic (roc) curve. Radiology, 143(1):29–36.

Hasanin, T. and Khoshgoftaar, T. (2018). The effects of random undersampling with simulated class imbalance for big data. In 2018 IEEE international conference on information reuse and integration (IRI), pages 70–79. IEEE.

Jin, W., Li, Z. J., Wei, L. S., and Zhen, H. (2000). The improvements of bp neural network learning algorithm. In WCC 2000-ICSP 2000. 2000 5th international conference on signal processing proceedings. 16th world computer congress 2000, volume 3, pages 1647–1649. IEEE.

Jordan, M. I. and Mitchell, T. M. (2015). learning: Trends, perspectives, and prospects. 349(6245):255–260.

Machine Science,

Jupyter, P. (2022). Notebook jupyter. [link].

Khanam, Z. (2018). Analyzing refactoring trends and practices in the software industry. International Journal of Advanced Research in Computer Science, 10(5).

Kim, M., Zimmermann, T., and Nagappan, N. (2014). An empirical study of refactoringchallenges and benefits at microsoft. IEEE Transactions on Software Engineering, 40(7):633–649.

Kumar, L., Lal, S., Goyal, A., and Murthy, N. B. (2019a). Change-proneness of object-oriented software using combination of feature selection techniques and ensemble learning techniques. In Proceedings of the 12th Innovations on Software Engineering Conference (formerly known as India Software Engineering Conference), pages 1–11.

Kumar, L., Satapathy, S. M., and Murthy, L. B. (2019b). Method level refactoring prediction on five open source java projects using machine learning techniques. In Proceedings of the 12th Innovations on Software Engineering Conference (Formerly Known as India Software Engineering Conference), ISEC’19, New York, NY, USA. Association for Computing Machinery.

Lorenz, M. and Kidd, J. (1994). Object-oriented software metrics: a practical guide. Prentice-Hall, Inc.

Malhotra1, R. and Chug, A. (2012). Software maintainability prediction using machine learning algorithms. Software engineering: an international Journal (SeiJ), 2(2).

Malhotra, R. and Chug, A. (2012). Software maintainability prediction using machine learning algorithms. Software engineering: an international Journal (SeiJ), 2(2).

Martin Fowler, K. B. (2000). Refactoring: Improving the Existing Code Design. Bookman Co., Inc., 1st edition. Mens, T. and Tourwé, T. (2004). A survey of software refactoring. IEEE Transactions on software engineering, 30(2):126–139.

Mohammed, R., Rawashdeh, J., and Abdullah, M. (2020). Machine learning with oversampling and undersampling techniques: overview study and experimental results. In 2020 11th international conference on information and communication systems (ICICS), pages 243–248. IEEE.

Moreo, A., Esuli, A., and Sebastiani, F. (2016). Distributional random oversampling for imbalanced text classification. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, pages 805–808.

Murphy-Hill, E., Parnin, C., and Black, A. P. (2011). How we refactor, and how we know it. IEEE Transactions on Software Engineering, 38(1):5–18.

Muschelli III, J. (2020). Roc and auc with a binary predictor: a potentially misleading metric. Journal of classification, 37(3):696–708.

Nyamawe, A. S. (2022). Mining commit messages to enhance software refactorings recommendation: A machine learning approach. Machine Learning with Applications, 9:100316.

Opdyke, W. F. (1992). Refactoring object-oriented frameworks. University of Illinois at Urbana-Champaign.

Ouni, A., Kessentini, M., Bechikh, S., and Sahraoui, H. (2015). Prioritizing code-smells correction tasks using chemical reaction optimization. Software Quality Journal, 23(2):323–361.

Padhy, N., Panigrahi, R., and Baboo, S. (2015). A systematic literature review of an object oriented metric: Reusability. In 2015 International Conference on Computational Intelligence and Networks, pages 190–191.

Paixão, M., Uchôa, A., Bibiano, A. C., Oliveira, D., Garcia, A., Krinke, J., and Arvonio, E. (2020). Behind the intents: An in-depth empirical study on software refactoring in modern code review. In Proceedings of the 17th International Conference on Mining Software Repositories, pages 125–136.

Palomba, F., Zaidman, A., Oliveto, R., and De Lucia, A. Jordan, M. I. and Mitchell, T. M. (2015). learning: Trends, perspectives, and prospects. 349(6245):255–260. Machine Science,

Panigrahi, R., kuanar, S. K., and Kumar, L. (2020). Application of naïve bayes classifiers for refactoring prediction at the method level. In 2020 International Conference on Computer Science, Engineering and Applications (ICC-SEA), pages 1–6.

Peruma, A., Mkaouer, M. W., Decker, M. J., and Newman, C. D. (2020). Contextualizing rename decisions using refactorings, commit messages, and data types. Journal of Systems and Software, 169:110704.

Pinheiro, D., Bezerra, C. I. M., and Uchoa, A. (2022). How do trivial refactorings affect classification prediction models? In Proceedings of the 16th Brazilian Symposium on Software Components, Architectures, and Reuse, SBCARS ’22, page 81–90, New York, NY, USA. Association for Computing Machinery.

Quinlan, J. R. (2014). C4. 5: programs for machine learning. Elsevier.

Rish, I. et al. (2001). An empirical study of the naive bayes classifier. In IJCAI 2001 workshop on empirical methods in artificial intelligence, volume 3, pages 41–46.

Sellitto, G., Iannone, E., Codabux, Z., Lenarduzzi, V., Lucia, A., Palomba, F., and Ferrucci, F. (2021). Toward understanding the impact of refactoring on program comprehension.

Sharma, T., Suryanarayana, G., and Samarthyam, G. (2015). Challenges to and solutions for refactoring adoption: An industrial perspective. IEEE Software, 32(6):44–51.

Sheneamer, A. M. (2020). An automatic advisor for refactoring software clones based on machine learning. IEEE Access, 8:124978–124988.

Silva, D., Tsantalis, N., and Valente, M. T. (2016a). Why we refactor? confessions of github contributors. In Proceedings of the 2016 24th acm sigsoft international symposium on foundations of software engineering, pages 858–870.

Silva, D., Tsantalis, N., and Valente, M. T. (2016b). Why we refactor? confessions of github contributors. In Proceedings of the 2016 24th acm sigsoft international symposium on foundations of software engineering, pages 858–870.

Smiari, P., Bibi, S., Ampatzoglou, A., and Arvanitou, E. M. (2022). Refactoring embedded software: A study in healthcare domain. Information and Software Technology, 143:106760.

Spadini, D., Aniche, M., and Bacchelli, A. (2018). PyDriller: Python framework for mining software repositories. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering - ESEC/FSE 2018, pages 908–911, New York, New York, USA. ACM Press.

Tabassum, N., Namoun, A., Alyas, T., Tufail, A., Taqi, M., and Kim, K.-H. (2023). Classification of bugs in cloud computing applications using machine learning techniques. Applied Sciences, 13(5).

Tsantalis, N., Chaikalis, T., and Chatzigeorgiou, A. (2018). Ten years of jdeodorant: Lessons learned from the hunt

for smells. In 2018 IEEE 25th international conference on software analysis, evolution and reengineering (SANER), pages 4–14. IEEE.

Tsantalis, N. and Chatzigeorgiou, A. (2011). Identification of extract method refactoring opportunities for the decomposition of methods. Journal of Systems and Software, 84(10):1757–1782.

Tsantalis, N., Ketkar, A., and Dig, D. (2020). Refactoringminer 2.0. IEEE Transactions on Software Engineering, 48(3):930–950.

Wohlin, C., Runeson, P., Höst, M., Ohlsson, M. C., Regnell, B., and Wesslén, A. (2012). Experimentation in software engineering. Springer Science & Business Media.

Yamashita, A. and Moonen, L. (2012). Do code smells reflect important maintainability aspects? In 2012 28th IEEE international conference on software maintenance (ICSM), pages 306–315. IEEE.

Downloads

Published

2024-04-25

How to Cite

Pinheiro, D., Bezerra, C., & Uchôa, A. (2024). On the Effectiveness of Trivial Refactorings in Predicting Non-trivial Refactorings. Journal of Software Engineering Research and Development, 12(1), 5:1 – 5:16. https://doi.org/10.5753/jserd.2024.3324

Issue

Section

Research Article