Code Smell Classification in Python: Are Small Language Models Up to the Task?

Abstract


Code quality is essential for maintainable and evolvable software systems. Traditional code smell detection tools rely on AST-based techniques and metric-driven heuristics, which, while effective, often lack interpretability and require specialized knowledge. This paper investigates the use of Small Language Models (SLMs) for classifying two widely studied code smells—Long Method and Long Parameter List—in Python codebases. Unlike Large Language Models (LLMs), SLMs offer lower latency and reduced computational cost, making them suitable for deployment in resourceconstrained environments. We systematically compare the performance of SLMs with traditional Machine Learning (ML) and Deep Learning (DL) models, and the AST-based DPy tool. We also analyze the role of prompt engineering techniques—zero-shot and chain-of-thought—in enhancing SLM performance. Our evaluation considers precision, recall, F1-score, and processing time, using a custom event-driven dataset designed for code classification. Results show that SLMs achieve competitive accuracy with improved interpretability. Additionally, we release an annotated dataset comprising classifications from all approaches. This work provides new insights into lightweight, explainable, and practical methods for automated code quality assessment, and supports the broader adoption of SLMs in software engineering.

Keywords: Software Quality, Code Smells, Python, Small Language Models

References

S. Balakrishnama and Aravind Ganapathiraju. 1998. Linear discriminant analysisa brief tutorial. Institute for Signal and information Processing 18, 1998 (1998), 1–8.

Aryan Boloori and Tushar Sharma. 2025. DPy: Code Smells Detection Tool for Python. In 2025 IEEE/ACM 22nd International Conference on Mining Software Repositories (MSR). 826–830.

Zhifei Chen, Lin Chen,Wanwangying Ma, and Baowen Xu. 2016. Detecting Code Smells in Python Programs. In 2016 International Conference on Software Analysis, Testing and Evolution (SATE). 18–23.

Opher Etzion and Peter Niblett. 2010. Event Processing in Action. Manning Publications. [link]

Junliang Fan, Xin Ma, Lifeng Wu, Fucang Zhang, Xiang Yu, and Wenzhi Zeng. 2019. Light Gradient Boosting Machine: An efficient soft computing model for estimating daily reference evapotranspiration with local and external meteorological data. Agricultural water management 225 (2019), 105758.

Eduardo Fernandes, Johnatan Oliveira, Gustavo Vale, Thanis Paiva, and Eduardo Figueiredo. 2016. A review-based comparative study of bad smell detection tools. In Proceedings of the 20th International Conference on Evaluation and Assessment in Software Engineering (EASE ’16). Association for Computing Machinery, Article 18, 12 pages. DOI: 10.1145/2915970.2915984

Python Software Foundation. 2024. AST — Abstract Syntax Trees. [link] Accessed on: December 24, 2024.

M. Fowler, K. Beck, J. Brant, W. Opdyke, and D. Roberts. 2012. Refactoring: Improving the Design of Existing Code. Pearson Education. [link]

Yoav Freund and Robert E Schapire. 1997. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of computer and system sciences 55, 1 (1997), 119–139.

Jerome H Friedman. 2001. Greedy function approximation: a gradient boosting machine. Annals of statistics (2001), 1189–1232.

Pierre Geurts, Damien Ernst, and Louis Wehenkel. 2006. Extremely randomized trees. Machine learning 63 (2006), 3–42.

Amid Golmohammadi, Man Zhang, and Andrea Arcuri. 2023. Testing RESTful APIs: A Survey. ACM Trans. Softw. Eng. Methodol. 33, 1, Article 27 (Nov. 2023), 41 pages. DOI: 10.1145/3617175

Ruchin Gupta, Narendra Kumar, Sunil Kumar, and Jitendra Kumar Seth. 2024. Unsupervised Machine Learning for Effective Code Smell Detection: A Novel Method. Journal of Communications Software and Systems 20, 4 (2024), 307–316.

Aurélien Géron. 2019. Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems. O’Reilly Media, Incorporated.

Arthur E Hoerl and RobertWKennard. 1970. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 12, 1 (1970), 55–67.

Gregor Hohpe and Bobby Woolf. 2003. Enterprise Integration Patterns: Designing, Building, and Deploying Messaging Solutions. Addison-Wesley Professional. [link]

Wenhua Hu, Lei Liu, Peixin Yang, Kuan Zou, Jiajun Li, Guancheng Lin, and Jianwen Xiang. 2023. Revisiting" code smell severity classification using machine learning techniques". In 2023 IEEE 47th Annual Computers, Software, and Applications Conference (COMPSAC). IEEE, 840–849.

Nasraldeen Alnor Adam Khleel and Károly Nehéz. 2024. Improving accuracy of code smells detection using machine learning with data balancing techniques. The Journal of Supercomputing 80, 14 (2024), 21048–21093.

Takeshi Kojima, Shixiang Shane Gu, Machel Reid, Yutaka Matsuo, and Yusuke Iwasawa. 2023. Large Language Models are Zero-Shot Reasoners. arXiv:2205.11916 [cs.CL] [link]

Hui Liu, Jiahao Jin, Zhifeng Xu, Yanzhen Zou, Yifan Bu, and Lu Zhang. 2019. Deep learning based code smell detection. IEEE transactions on Software Engineering 47, 9 (2019), 1811–1837.

R.C. Martin. 2009. Clean Code: A Handbook of Agile Software Craftsmanship. Prentice Hall. [link]

R.C. Martin. 2020. Solid relevance. [link] Accessed on: July 01, 2025.

Vinícius Martins, Pedro Lopes Verardo Ramos, Breno Braga Neves, Maria Vitoria Lima, Johny Arriel, João Victor Godinho, Joanne Ribeiro, Alessandro Garcia, and Juliana Alves Pereira. 2024. Eyes on Code Smells: Analyzing Developers’ Responses During Code Snippet Analysis. In XXXVIII Simpósio Brasileiro de Engenharia de Software (Curitiba/PR). SBC, Porto Alegre, RS, Brasil, 302–312. DOI: 10.5753/sbes.2024.3431

M. Masse. 2011. REST API Design Rulebook. O’Reilly Media. [link]

Lin Shi Qing Wang Muhammad Ilyas Azeem, Fabio Palomba. 2019. Machine learning techniques for code smell detection: A systematic literature review and meta-analysis. Information and Software Technology 108, – (2019), 115–138.

Himesh Nanadani, Mootez Saad, and Tushar Sharma. 2023. Calibrating Deep Learning-based Code Smell Detection using Human Feedback. In 23rd International Working Conference on Source Code Analysis and Manipulation (SCAM). 37–48.

Ollama. 2025. Ollama. [link] Accessed on: July 2, 2025.

Thanis Paiva, Amanda Damasceno, Eduardo Figueiredo, and Cláudio Sant’Anna. 2017. On the evaluation of code smells and detection tools. Journal of Software Engineering Research and Development 5, 1 (2017), 7. DOI: 10.1186/s40411-017-0041-1

Yingli Qin. 2018. Areviewof quadratic discriminant analysis for high-dimensional data. Wiley Interdisciplinary Reviews: Computational Statistics 10, 4 (2018), e1434.

Rana Sandouka and Hamoud Aljamaan. 2023. Python code smells detection using conventional machine learning models. PeerJ Computer Science 9 (2023), e1370. DOI: 10.7717/peerj-cs.1370.

Geanderson Santos, Amanda Santana, Gustavo Vale, and Eduardo Figueiredo. 2023. Yet Another Model! A Study on Model’s Similarities for Defect and Code Smells. In Fundamental Approaches to Software Engineering: 26th International Conference, FASE 2023, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2023, Paris, France, April 22–27, 2023, Proceedings. Springer-Verlag, 282–305. DOI: 10.1007/978-3-031-30826-0_16

Amazon Web Services. 2024. Anti-Corruption Layer (ACL). [link] Accessed on: July 24, 2025.

Luciana Lourdes Silva, Janio Rosa da Silva, Joao Eduardo Montandon, Marcus Andrade, and Marco Tulio Valente. 2024. Detecting Code Smells using ChatGPT: Initial Insights. In Proceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM ’24). Association for Computing Machinery, New York, NY, USA, 400–406. DOI: 10.1145/3674805.3690742

Igor Soares de Oliveira, Joanne Carneiro, Jessica Ribas, and Juliana Alves Pereira. 2025. Code Smell Classification in Python: Are Small Language Models Up to the Task? [link] Accessed on: July 15, 2025.

Shreyas Subramanian, Vikram Elango, and Mecit Gungor. 2025. Small Language Models (SLMs) Can Still Pack a Punch: A survey. arXiv:2501.05465 [cs.CL] [link]

Nikolaos Tsantalis, Theodoros Chaikalis, and Alexander Chatzigeorgiou. 2008. JDeodorant: Identification and Removal of Type-Checking Bad Smells. In 2008 12th European Conference on Software Maintenance and Reengineering. IEEE, 329–331.

Santiago A. Vidal, Claudia Marcos, and J. Andrés Díaz-Pace. 2016. An approach to prioritize code smells for refactoring. Automated Software Engg. 23, 3 (Sept. 2016), 501–532. DOI: 10.1007/s10515-014-0175-x

Fali Wang, Zhiwei Zhang, Xianren Zhang, Zongyu Wu, Tzuhao Mo, Qiuhao Lu, Wanjing Wang, Rui Li, Junjie Xu, Xianfeng Tang, Qi He, Yao Ma, Ming Huang, and Suhang Wang. 2024. A Comprehensive Survey of Small Language Models in the Era of Large Language Models: Techniques, Enhancements, Applications, Collaboration with LLMs, and Trustworthiness. arXiv:2411.03350 [cs.CL] [link]

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, and Denny Zhou. 2023. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. arXiv:2201.11903 [cs.CL] [link]

Stewart W Wilson. 2002. Classifiers that approximate functions. Natural Computing 1, 2 (2002), 211–234.

Di Wu, Fangwen Mu, Lin Shi, Zhaoqiang Guo, Kui Liu, Weiguang Zhuang, Yuqi Zhong, and Li Zhang. 2024. iSMELL: Assembling LLMs with Expert Toolsets for Code Smell Detection and Refactoring. In Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering (ASE ’24).Association for Computing Machinery, 1345–1357. DOI: 10.1145/3691620.3695508

Zhifei Chen; Lin Chen; Wanwangying Ma; Baowen Xu. 2025. Pysmell. [link] Accessed on: April 28, 2025.

Pravin Singh Yadav, Rajwant Singh Rao, Alok Mishra, and Manjari Gupta. 2024. Machine learning-based methods for code smell detection: a survey. Applied Sciences 14, 14 (2024), 6149.

Dongwen Zhang, Shuai Song, Yang Zhang, Haiyang Liu, and Gaojie Shen. 2023. Code Smell Detection Research Based on Pre-training and Stacking Models. Latin America Transactions 22, 1 (2023), 22–30.

Haiyin Zhang, Luís Cruz, and Arie Van Deursen. 2022. Code smells for machine learning applications. In Proceedings of the 1st international conference on AI engineering: software engineering for AI. IEEE, 217–228.
Published
2025-09-22
OLIVEIRA, Igor Soares de; CARNEIRO, Joanne; RIBAS, Jessica; PEREIRA, Juliana Alves. Code Smell Classification in Python: Are Small Language Models Up to the Task?. In: BRAZILIAN SYMPOSIUM ON SOFTWARE ENGINEERING (SBES), 39. , 2025, Recife/PE. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2025 . p. 699-705. ISSN 2833-0633. DOI: https://doi.org/10.5753/sbes.2025.11046.