Estimativa de Esforço em Story Points a partir de User Stories com Large Language Models
Resumo
A estimativa de esforço em projetos ágeis continua sendo um desafio recorrente, especialmente quando os story points precisam ser inferidos apenas a partir do texto das user stories. Estudos anteriores focaram principalmente em abordagens de aprendizagem de máquina para predizer o esforço, mas a recente disponibilidade de Large Language Models (LLMs) oferece uma alternativa. O objetivo do artigo é investigar a eficácia dos LLMs em estimar story points. Um derivado do modelo BERT foi ajustado (fine-tuning) e comparado, em relação ao erro médio absoluto, a três baselines: (i) um modelo preditivo tradicional baseado em vetores TF-IDF acoplados a um classificador de Regressão Linear, (ii) um modelo LLM Zero Shot, e (iii) um modelo LLM few shot. Foi utilizado um conjunto de dados de user stories de projetos reais de desenvolvimento de software ágil, o Deep-SE, um dataset com várias User Stories de 16 projetos open-source diferentes retirados do Jira. Os resultados mostram que o LLM ajustado teve MAE menor na maioria dos projetos. Os achados sugerem que, apesar do custo computacional maior, LLMs constituem uma alternativa com menor erro para a estimativa de esforço do que as técnicas comparadas.
Referências
Levi Alexander and Riyanto Jayadi. 2024. Machine Learning for Story Point Estimation: Do Large Language Models Outperform Traditional Methods? Journal of Theoretical and Applied Information Technology 102, 20 (2024), 7387–7399.
Morakot Choetkiertikul, Hoa Khanh Dam, Truyen Tran, Aditya Ghose, and John Grundy. 2015. Predicting Delivery Capability in Iterative Software Development. JOURNAL OF LATEX CLASS FILES 14, 8 (2015), 551–573. DOI: 10.1109/TSE.2017.2693989
Morakot Choetkiertikul, Hoa Khanh Dam, Truyen Tran, Trang Pham, Aditya Ghose, and Tim Menzies. 2019. A Deep Learning Model for Estimating Story Points. IEEE Transactions on Software Engineering 45, 7 (2019), 637–656. DOI: 10.1109/TSE.2018.2792473 arXiv:1609.00489
Mike Cohn. 2005. Agile Estimating and Planning.
M Fu and C Tantithamthavorn. 2023. GPT2SP: A Transformer-Based Agile Story Point Estimation Approach. IEEE Transactions on Software Engineering 49, 02 (2023), 611–625. DOI: 10.1109/TSE.2022.3158252
Haithem Kassem, Khaled Mahar, and Amani A. Saad. 2023. Story Point Estimation Using Issue Reports With Deep Attention Neural Network. E-Informatica Software Engineering Journal 17, 1 (2023), 1–15. DOI: 10.37190/e-Inf230104
William B. Langdon, Javier Dolado, Federica Sarro, and Mark Harman. 2016. Exact Mean Absolute Error of Baseline Predictor, MARP0. Information and Software Technology 73 (2016), 16–18. DOI: 10.1016/j.infsof.2016.01.003
Shervin Minaee, Tomas Mikolov, Narjes Nikzad, Meysam Chenaghlu, Richard Socher, Xavier Amatriain, and Jianfeng Gao. 2025. Large Language Models: A Survey. arXiv (2025). arXiv:2503.23037 [link]
Giseldo da Silva Neo, Antão B. Moura, Alana Viana Borges da Silva, Neo, and Evandro de Barros Costa. 2025. A Predictive Model for Story Points leveraging features like readability and sentiment from User Story description. ITS2025 - Intelligent Tutoring Systems (2025).
Giseldo da Silva Neo, José Antão Beltrão Moura, Hyggo Almeida, Alana Viana Borges da Silva Neo, and Olival de Gusmão Freitas. 2024. User Story Tutor (UST) to Support Agile Software Developers. International Conference on Computer Supported Education, CSEDU - Proceedings 2 (2024), 51–62. DOI: 10.5220/0012619200003693 arXiv:2406.16259
Giseldo da Silva Neo, Alana Viana Borges da Silva Neo, Kleber Jose Araújo Galvão Filho, José Antão Beltrão Moura, and Olival de Gusmão Freitas Junior. 2024. NeoDataset: um conjunto de dados com user stories e story points. Revista dos Mestrados Profissionais 133, 2 (2024), 194–211.
Bodem Niharika and Shivali Chopra. 2024. Story Point Estimation Using Machine Learning for Agile Projects. SSRN Electronic Journal (2024). DOI: 10.2139/ssrn.4485276
Simone Porru, Alessandro Murgia, Serge Demeyer, Michele Marchesi, and Roberto Tonelli. 2016. Estimating story points from issue reports. ACM International Conference Proceeding Series (2016). DOI: 10.1145/2972958.2972959
Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2018. Language Models are Unsupervised Multitask Learners. (2018).
Sebastian Raschka. 2024. Build a Large Language Model (From Scratch). Simon and Schuster.
Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2019. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. (2019), 2–6. arXiv:1910.01108 [link]
E. G. Santana, Gabriel Benjamin, Melissa Araujo, Harrison Santos, David Freitas, Eduardo Almeida, Paulo Anselmo da M. S. Neto, Jiawei Li, Jina Chun, and Iftekhar Ahmed. 2025. Which Prompting Technique Should I Use? An Empirical Investigation of Prompting Techniques for Software Engineering Tasks. (2025). arXiv:2506.05614 [link]
Federica Sarro, Alessio Petrozziello, and Mark Harman. 2016. Multi-objective software effort estimation. Proceedings - International Conference on Software Engineering 14-22-May- (2016), 619–630. DOI: 10.1145/2884781.2884830
Ezequiel Scott and Dietmar Pfahl. 2018. Using developers’ features to estimate story points. ACM International Conference Proceeding Series 106 (2018), 106–110. DOI: 10.1145/3202710.3203160
Martin Shepperd and Steve MacDonell. 2012. Evaluating prediction systems in software project estimation. Information and Software Technology 54, 8 (2012), 820–827. DOI: 10.1016/j.infsof.2011.12.008
Krishnamoorthy Srinivasan and Douglas Fisher. 2005. Machine Learning Approaches to Estimating Software Development Effort. Machine Learning Applications In Software Engineering 21, 2 (2005), 52–63.
Jeff Sutherland. 2014. A arte de fazer o dobro do trabalho na metade do tempo (1 ed.).
Ritesh Tamrakar and Magne Jørgensen. 2012. Does the use of Fibonacci numbers in planning poker affect effort estimates? IET Seminar Digest 2012, 1 (2012), 228–232. DOI: 10.1049/ic.2012.0030
Vali Tawosi, Rebecca Moussa, and Federica Sarro. 2022. Agile Effort Estimation: Have We Solved the Problem Yet? Insights From A Replication Study. IEEE Transactions on Software Engineering (2022), 1–19. DOI: 10.1109/TSE.2022.3228739 arXiv:2201.05401
Victor Uc-Cetina. 2023. Recent Advances in Software Effort Estimation using Machine Learning. (2023), 1–10. arXiv:2303.03482 [link]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention Is All You Need. CoRR abs/1706.03762 (2017). [link]
Raul SidneiWazlawick. 2009. Metodologia de pesquisa para ciência da computação. Vol. 2. Elsevier Rio de Janeiro.
Burcu Yalçıner, Kıvanç Dinçer, Adil Gürsel Karaçor, and Mehmet Önder Efe. 2024. Enhancing Agile Story Point Estimation: Integrating Deep Learning, Machine Learning, and Natural Language Processing with SBERT and Gradient Boosted Trees. Applied Sciences (Switzerland) 14, 16 (2024). DOI: 10.3390/app14167305
