Pre-trained Code Language Models for Just-in-Time Software Defect Prediction: An Empirical Study

Monique Louise Monteiro; George Gomes Cabral; Adriano Lorena de Oliveira

Monique Louise Monteiro UFPE
George Gomes Cabral UFRPE
Adriano Lorena de Oliveira UFPE

Resumo

Just-in-time Software Defect Prediction (JIT-SDP) aims to identify defects before they are introduced, enabling mitigation of risky software submission to the repository during the software development cycle. Recent approaches model JIT-SDP as a multimodal task that involves code changes, commit messages, and hand-crafted commit-level features, also known as expert features. The present study thoroughly compares the performance of pretrained code language models, including encoder-only, decoder-only, and encoder-decoder architectures across both open-source fine-tuned models (e.g., CodeT5+, UniXCoder) and closed API-based large language models (e.g., GPT models, Gemini) for the JIT-SDP task. Furthermore, to the best of our knowledge, this is the first study to compare open (trainable) and closed (prompt-based) decoder-only models in this context. Our experiments show that fine-tuned small and medium-sized open models significantly outperform zero-shot prompting with closed large language models. In particular, CodeT5+ and UniXCoder overcome the state-of-the-art performance in a cross-project setting. The results highlight the importance of architecture, fine-tuning strategies, and expert feature integration for accurate JIT-SDP.