Bugs in the Shadows: Static Detection of Faulty Python Refactorings

  • Jonhnanthan Oliveira UFCG
  • Rohit Gheyi UFCG
  • Márcio Ribeiro UFAL
  • Alessandro Garcia PUC-Rio

Resumo


Python is a widely adopted programming language, valued for its simplicity and flexibility. However, its dynamic type system poses significant challenges for automated refactoring – an essential practice in software evolution aimed at improving internal code structure without changing external behavior. Understanding how type errors are introduced during refactoring is crucial, as such errors can compromise software reliability and reduce developer productivity. In this work, we propose a static analysis technique to detect type errors introduced by refactoring implementations for Python. We evaluated our technique on Rope refactoring implementations, applying them to open-source Python projects. Our analysis uncovered 29 bugs across four refactoring types from a total of 1,152 refactoring attempts. Several of these issues were also found in widely used IDEs such as PyCharm and PyDev. All reported bugs were submitted to the respective developers, and some of them were acknowledged and accepted. These results highlight the need to improve the robustness of current Python refactoring tools to ensure the correctness of automated code transformations and support reliable software maintenance.
Palavras-chave: Refactoring, Type error, Testing, Python

Referências

Eman Abdullah AlOmar, Mohamed Wiem Mkaouer, Christian Newman, and Ali Ouni. 2021. On preserving the behavior in software refactoring: A systematic mapping study. Information and Software Technology 140 (2021).

Victor R. Basili, Gianluigi Caldiera, and H. Dieter Rombach. 2002. The Goal Question Metric Approach. Wiley. 528–532 pages.

Brett Daniel, Danny Dig, Kely Garcia, and Darko Marinov. 2007. Automated Testing of Refactoring Engines. In Foundations of Software Engineering. 185–194.

Malinda Dilhara, Abhiram Bellur, Timofey Bryksin, and Danny Dig. 2024. Unprecedented Code Change Automation: The Fusion of LLMs and Transformation by Example. Proceedings of the ACM on Software Engineering 1 (2024).

Malinda Dilhara, Ameya Ketkar, Nikhith Sannidhi, and Danny Dig. 2022. Discovering Repetitive Code Changes in Python ML Systems. In International Conference on Software Engineering. 736–748.

Chunhao Dong, Yanjie Jiang, Yuxia Zhang, Yang Zhang, and Liu Hui. 2025. ChatGPT-Based Test Generation for Refactoring Engines Enhanced by Feature Analysis on Examples. In International Conference on Software Engineering. 746–746.

Martin Fowler. 2018. Refactoring: improving the design of existing code. Addison-Wesley, online.

Rohit Gheyi, Marcio Ribeiro, and Jonhnanthan Oliveira. 2025. Evaluating the Effectiveness of Small Language Models in Detecting Refactoring Bugs. arXiv:2502.18454 [cs.SE] [link]

Milos Gligoric, Farnaz Behrang, Yilong Li, Jeffrey Overbey, Munawar Hafiz, and Darko Marinov. 2013. Systematic Testing of Refactoring Engines on Real Software Projects. In European Conference on Object-Oriented Programming. 629–653.

Milos Gligoric, Tihomir Gvero, Vilas Jagannath, Sarfraz Khurshid, Viktor Kuncak, and Darko Marinov. 2010. Test generation through programming in UDITA. In International Conference on Software Engineering. 225–234.

Yaroslav Golubev, Zarina Kurbatova, Eman Abdullah AlOmar, Timofey Bryksin, and Mohamed Wiem Mkaouer. 2021. One thousand and one stories: a largescale survey of software refactoring. In Foundations of Software Engineering. 1303–1313.

William McKeeman. 1998. Differential Testing for Software. Digital Technical Journal 10, 1 (1998), 100–107.

Tom Mens and Tom Tourwé. 2004. A Survey of Software Refactoring. Transactions on Software Engineering 30, 2 (2004), 126–139.

Meta Platforms, Inc. 2025. Pyre - A performant type-checker for Python 3. [link].

Melina Mongiovi, Rohit Gheyi, Gustavo Soares, Márcio Ribeiro, Paulo Borba, and Leopoldo Teixeira. 2018. Detecting Overly Strong Preconditions in Refactoring Engines. Transactions on Software Engineering 44, 5 (2018), 429–452.

Melina Mongiovi, Rohit Gheyi, Gustavo Soares, Leopoldo Teixeira, and Paulo Borba. 2014. Making refactoring safer through impact analysis. Science of Computer Programming 93 (2014), 39–64.

Melina Mongiovi, Gustavo Mendes, Rohit Gheyi, Gustavo Soares, and Márcio Ribeiro. 2014. Scaling Testing of Refactoring Engines. In International Conference on Software Maintenance and Evolution. 371–380.

Emerson Murphy-Hill, Chris Parnin, and Andrew P. Black. 2012. How We Refactor, and HowWe Know It. Transactions on Software Engineering 38, 1 (2012), 5–18.

Jonhnanthan Oliveira, Rohit Gheyi, Melina Mongiovi, Gustavo Soares, Márcio Ribeiro, and Alessandro Garcia. 2019. Revisiting the refactoring mechanics. Information and Software Technology 110 (2019), 136–138.

Jonhnanthan Oliveira, Rohit Gheyi, Márcio Ribeiro, and Alessandro Garcia. 2025. Bugs in the Shadows: Static Detection of Faulty Python Refactorings. [link]

William Opdyke. 1992. Refactoring Object-oriented Frameworks. Ph.D. Dissertation. UIUC.

William Opdyke and Ralph Johnson. 1990. Refactoring: An Aid in Designing Application Frameworks and Evolving Object-Oriented Systems. In Symposium Object-Oriented Programming Emphasizing Practical Applications. 274–282.

Gustavo Pinto and Fernando Kamei. 2013. What programmers say about refactoring tools? an empirical investigation of stack overflow. InWorkshop on Refactoring Tools. 33–36.

Felipe Pontes, Rohit Gheyi, Sabrina Souto, Alessandro Garcia, and Márcio Ribeiro. 2019. Java reflection API: revealing the dark side of the mirror. In Foundations of Software Engineering. 636–646.

Napol Rachatasumrit and Miryung Kim. 2012. An empirical investigation into the impact of refactoring on regression testing. In International Conference on Software Maintenance. 357–366.

Donald Roberts. 1999. Practical Analysis for Refactoring. Ph.D. Dissertation. University of Illinois at Urbana-Champaign.

Rope. 2025. Open source Python refactoring library. [link].

G. Van Rossum and F. L. Drake. 2011. An Introduction to Python. Network Theory, online.

Max Schäfer. 2012. Refactoring Tools for Dynamic Languages. In Workshop on Refactoring Tools. 59–62.

Max Schäfer and Oege de Moor. 2010. Specifying and implementing refactorings. In Object-Oriented Programming Systems Languages and Applications. 286–301.

Max Schäfer, Torbjörn Ekman, and Oege de Moor. 2008. Challenge Proposal: Verification of Refactorings. In Programming Languages Meets Program Verification. 67–72.

Max Schäfer, Torbjörn Ekman, and Oege de Moor. 2008. Sound and Extensible Renaming for Java. In Object-Oriented Programming Systems Languages and Applications. 277–294.

Max Schäfer, Mathieu Verbaere, Torbjörn Ekman, and Oege de Moor. 2009. Stepping Stones over the Refactoring Rubicon. In European Conference on Object-Oriented Programming. 369–393.

Gustavo Soares, Rohit Gheyi, and Tiago Massoni. 2013. Automated Behavioral Testing of Refactoring Engines. Transactions on Software Engineeringv 39, 2 (2013), 147–162.

Gustavo Soares, Rohit Gheyi, Dalton Serey, and Tiago Massoni. 2010. Making Program Refactoring Safer. IEEE Software 27 (2010), 52–57.

Gustavo Soares, Melina Mongiovi, and Rohit Gheyi. 2011. Identifying overly strong conditions in refactoring implementations. In International Conference on Software Maintenance. 173–182.

Friedrich Steimann and Andreas Thies. 2009. From Public to Private to Absent: Refactoring Java Programs under Constrained Accessibility. In European Conference on Object-Oriented Programming. 419–443.

Ke Sun, Yifan Zhao, Dan Hao, and Lu Zhang. 2023. Static Type Recommendation for Python. In Automated Software Engineering. 1–13.

Ewan Tempero, Tony Gorschek, and Lefteris Angelis. 2017. Barriers to Refactoring. Communications of the ACM 60, 10 (2017), 54–61.

Frank Tip, Adam Kiezun, and Dirk Bäumer. 2003. Refactoring for Generalization Using Type Constraints. In Object-Oriented Programing, Systems, Languages, and Applications. 13–26.

Lance Tokuda and Don Batory. 2001. Evolving Object-Oriented Designs with Refactorings. In Automated Software Engineering. 89–120.

Nikolaos Tsantalis, Matin Mansouri, Laleh Mousavi Eshkevari, Davood Mazinanian, and Danny Dig. 2018. Accurate and efficient refactoring detection in commit history. In International Conference on Software Engineering. 483–494.

Haibo Wang, Zhuolin Xu, Huaien Zhang, Nikolaos Tsantalis, and Shin Hwei Tan. 2024. An Empirical Study of Refactoring Engine Bugs. arXiv:2409.14610 [cs.SE] [link]

Siqi Wang, Xing Hu, Bei Wang, Wenxin Yao, Xin Xia, and Xinyu Wang. 2025. Refactoring Deep Learning Code: A Study of Practices and Unsatisfied Tool Needs. In International Conference on Software Maintenance and Evolution.

Andreas Zeller. 2009. Why Programs Fail: A Guide to Systematic Debugging. Morgan Kaufmann, online.
Publicado
22/09/2025
OLIVEIRA, Jonhnanthan; GHEYI, Rohit; RIBEIRO, Márcio; GARCIA, Alessandro. Bugs in the Shadows: Static Detection of Faulty Python Refactorings. In: SIMPÓSIO BRASILEIRO DE ENGENHARIA DE SOFTWARE (SBES), 39. , 2025, Recife/PE. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2025 . p. 182-192. ISSN 2833-0633. DOI: https://doi.org/10.5753/sbes.2025.9889.