Effort Estimation in Story Points from User Stories with Large Language Models

Giseldo da Silva Neo; José Antão Beltrão Moura; Alana Viana Borges da Silva Neo; Olival de Gusmão Freitas Júnior

doi:10.5753/sbes.2025.11121

Giseldo da Silva Neo IFAL http://orcid.org/0000-0001-5574-9260
José Antão Beltrão Moura UFCG https://orcid.org/0000-0002-6393-5722
Alana Viana Borges da Silva Neo IFMS https://orcid.org/0009-0000-1910-1598
Olival de Gusmão Freitas Júnior UFAL https://orcid.org/0000-0003-4418-8386

DOI: https://doi.org/10.5753/sbes.2025.11121

Resumo

A estimativa de esforço em projetos ágeis continua sendo um desafio recorrente, especialmente quando os story points precisam ser inferidos apenas a partir do texto das user stories. Estudos anteriores focaram principalmente em abordagens de aprendizagem de máquina para predizer o esforço, mas a recente disponibilidade de Large Language Models (LLMs) oferece uma alternativa. O objetivo do artigo é investigar a eficácia dos LLMs em estimar story points. Um derivado do modelo BERT foi ajustado (fine-tuning) e comparado, em relação ao erro médio absoluto, a três baselines: (i) um modelo preditivo tradicional baseado em vetores TF-IDF acoplados a um classificador de Regressão Linear, (ii) um modelo LLM Zero Shot, e (iii) um modelo LLM few shot. Foi utilizado um conjunto de dados de user stories de projetos reais de desenvolvimento de software ágil, o Deep-SE, um dataset com várias User Stories de 16 projetos open-source diferentes retirados do Jira. Os resultados mostram que o LLM ajustado teve MAE menor na maioria dos projetos. Os achados sugerem que, apesar do custo computacional maior, LLMs constituem uma alternativa com menor erro para a estimativa de esforço do que as técnicas comparadas.

Palavras-chave: Estimativa de esforço, Story points, User story, Large language model

Referências

N Akhila et al. 2023. Comparative study of bert models and roberta in transformer based question answering. In 2023 3rd International Conference on Intelligent Technologies (CONIT). IEEE, 1–5.

Levi Alexander and Riyanto Jayadi. 2024. Machine Learning for Story Point Estimation: Do Large Language Models Outperform Traditional Methods? Journal of Theoretical and Applied Information Technology 102, 20 (2024), 7387–7399.

Morakot Choetkiertikul, Hoa Khanh Dam, Truyen Tran, Aditya Ghose, and John Grundy. 2015. Predicting Delivery Capability in Iterative Software Development. JOURNAL OF LATEX CLASS FILES 14, 8 (2015), 551–573. DOI: 10.1109/TSE.2017.2693989

Morakot Choetkiertikul, Hoa Khanh Dam, Truyen Tran, Trang Pham, Aditya Ghose, and Tim Menzies. 2019. A Deep Learning Model for Estimating Story Points. IEEE Transactions on Software Engineering 45, 7 (2019), 637–656. DOI: 10.1109/TSE.2018.2792473 arXiv:1609.00489

Mike Cohn. 2005. Agile Estimating and Planning.

M Fu and C Tantithamthavorn. 2023. GPT2SP: A Transformer-Based Agile Story Point Estimation Approach. IEEE Transactions on Software Engineering 49, 02 (2023), 611–625. DOI: 10.1109/TSE.2022.3158252

Haithem Kassem, Khaled Mahar, and Amani A. Saad. 2023. Story Point Estimation Using Issue Reports With Deep Attention Neural Network. E-Informatica Software Engineering Journal 17, 1 (2023), 1–15. DOI: 10.37190/e-Inf230104

William B. Langdon, Javier Dolado, Federica Sarro, and Mark Harman. 2016. Exact Mean Absolute Error of Baseline Predictor, MARP0. Information and Software Technology 73 (2016), 16–18. DOI: 10.1016/j.infsof.2016.01.003

Shervin Minaee, Tomas Mikolov, Narjes Nikzad, Meysam Chenaghlu, Richard Socher, Xavier Amatriain, and Jianfeng Gao. 2025. Large Language Models: A Survey. arXiv (2025). arXiv:2503.23037 [link]

Giseldo da Silva Neo, Antão B. Moura, Alana Viana Borges da Silva, Neo, and Evandro de Barros Costa. 2025. A Predictive Model for Story Points leveraging features like readability and sentiment from User Story description. ITS2025 - Intelligent Tutoring Systems (2025).

Giseldo da Silva Neo, José Antão Beltrão Moura, Hyggo Almeida, Alana Viana Borges da Silva Neo, and Olival de Gusmão Freitas. 2024. User Story Tutor (UST) to Support Agile Software Developers. International Conference on Computer Supported Education, CSEDU - Proceedings 2 (2024), 51–62. DOI: 10.5220/0012619200003693 arXiv:2406.16259

Giseldo da Silva Neo, Alana Viana Borges da Silva Neo, Kleber Jose Araújo Galvão Filho, José Antão Beltrão Moura, and Olival de Gusmão Freitas Junior. 2024. NeoDataset: um conjunto de dados com user stories e story points. Revista dos Mestrados Profissionais 133, 2 (2024), 194–211.

Bodem Niharika and Shivali Chopra. 2024. Story Point Estimation Using Machine Learning for Agile Projects. SSRN Electronic Journal (2024). DOI: 10.2139/ssrn.4485276

Simone Porru, Alessandro Murgia, Serge Demeyer, Michele Marchesi, and Roberto Tonelli. 2016. Estimating story points from issue reports. ACM International Conference Proceeding Series (2016). DOI: 10.1145/2972958.2972959

Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2018. Language Models are Unsupervised Multitask Learners. (2018).

Sebastian Raschka. 2024. Build a Large Language Model (From Scratch). Simon and Schuster.

Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2019. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. (2019), 2–6. arXiv:1910.01108 [link]

E. G. Santana, Gabriel Benjamin, Melissa Araujo, Harrison Santos, David Freitas, Eduardo Almeida, Paulo Anselmo da M. S. Neto, Jiawei Li, Jina Chun, and Iftekhar Ahmed. 2025. Which Prompting Technique Should I Use? An Empirical Investigation of Prompting Techniques for Software Engineering Tasks. (2025). arXiv:2506.05614 [link]

Federica Sarro, Alessio Petrozziello, and Mark Harman. 2016. Multi-objective software effort estimation. Proceedings - International Conference on Software Engineering 14-22-May- (2016), 619–630. DOI: 10.1145/2884781.2884830

Ezequiel Scott and Dietmar Pfahl. 2018. Using developers’ features to estimate story points. ACM International Conference Proceeding Series 106 (2018), 106–110. DOI: 10.1145/3202710.3203160

Martin Shepperd and Steve MacDonell. 2012. Evaluating prediction systems in software project estimation. Information and Software Technology 54, 8 (2012), 820–827. DOI: 10.1016/j.infsof.2011.12.008

Krishnamoorthy Srinivasan and Douglas Fisher. 2005. Machine Learning Approaches to Estimating Software Development Effort. Machine Learning Applications In Software Engineering 21, 2 (2005), 52–63.

Jeff Sutherland. 2014. A arte de fazer o dobro do trabalho na metade do tempo (1 ed.).

Ritesh Tamrakar and Magne Jørgensen. 2012. Does the use of Fibonacci numbers in planning poker affect effort estimates? IET Seminar Digest 2012, 1 (2012), 228–232. DOI: 10.1049/ic.2012.0030

Vali Tawosi, Rebecca Moussa, and Federica Sarro. 2022. Agile Effort Estimation: Have We Solved the Problem Yet? Insights From A Replication Study. IEEE Transactions on Software Engineering (2022), 1–19. DOI: 10.1109/TSE.2022.3228739 arXiv:2201.05401

Victor Uc-Cetina. 2023. Recent Advances in Software Effort Estimation using Machine Learning. (2023), 1–10. arXiv:2303.03482 [link]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention Is All You Need. CoRR abs/1706.03762 (2017). [link]

Raul SidneiWazlawick. 2009. Metodologia de pesquisa para ciência da computação. Vol. 2. Elsevier Rio de Janeiro.

Burcu Yalçıner, Kıvanç Dinçer, Adil Gürsel Karaçor, and Mehmet Önder Efe. 2024. Enhancing Agile Story Point Estimation: Integrating Deep Learning, Machine Learning, and Natural Language Processing with SBERT and Gradient Boosted Trees. Applied Sciences (Switzerland) 14, 16 (2024). DOI: 10.3390/app14167305

Estimativa de Esforço em Story Points a partir de User Stories com Large Language Models

Resumo

Referências