Software Engineering Repositories: Expanding the PROMISE Database

Márcia Lima; Victor Valle; Estevão Costa; Fylype Lira; Bruno Gadelha

Márcia Lima Universidade Federal do Amazonas
Victor Valle Universidade Federal do Amazonas
Estevão Costa Universidade Federal do Amazonas
Fylype Lira Universidade Federal do Amazonas
Bruno Gadelha Universidade Federal do Amazonas

Resumo

Defining and classifying software requirements are critical tasks for determining software functionality and overall software architecture. In this sense, several types of research are being developed aiming to automate the classification of software requirements through the use of machine learning algorithms. However, the feasibility of such studies runs counter to the existence of a public database that is adequate in terms of quantity and quality of sample requirements. A requirement base widely used in this type of task is the PROMISE. However, the number of requirements is considered low for practical applications involving machine learning. This research presents an expansion of the PROMISE corpus. New software requirements were incorporated, and the resulting dataset was evaluated through the use of well-known machine learning algorithms. We observed some improvement in the performance of these algorithms regarding the identification of some types of software requirements.

Palavras-chave: software repositories, requirements classification, machine learning

Referências

Zahra Shakeri Hossein Abad, Oliver Karras, Parisa Ghazi, Martin Glinz, Günther Ruhe, and Kurt Schneider. 2017. What Works Better? A Study of Classifying Requirements. 2017 IEEE 25th International Requirements Engineering Conference (RE) (2017), 496--501.

Rana Alkadhi, Teodora Lata, Emitza Guzmany, and Bernd Bruegge. 2017. Rationale in development chat messages: an exploratory study. IEEE.

Rana Alkadhi, Manuel Nonnenmacher, Emitza Guzman, and Bernd Bruegge. 2018. How do developers discuss rationale?. In 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, IEEE, Campobasso, Italy, 357--369.

Elisa Baniassad, Paul C Clements, Joao Araujo, Ana Moreira, Awais Rashid, and Bedir Tekinerdogan. 2006. Discovering early aspects. IEEE software 23, 1 (2006), 61--70.

Anna L Buczak and Erhan Guven. 2016. A survey of data mining and machine learning methods for cyber security intrusion detection. IEEE Communications Surveys & Tutorials 18, 2 (2016), 1153--1176.

Agustín Casamayor, Daniela Godoy, and Marcelo Campo. 2009. Semi-Supervised Classification of Non-Functional Requirements: An Empirical Analysis. Inteligencia artificial: Revista Iberoamericana de Inteligencia Artificial, ISSN 1137-3601, Vol. 13, N°. 44, 2009, pags. 35-44 (05 2009). https://doi.org/10.4114/ia.v13i44.1044

Min Chen, Yixue Hao, Kai Hwang, Lu Wang, and Lin Wang. 2017. Disease prediction by machine learning over big data from healthcare communities. Ieee Access 5 (2017), 8869--8879.

Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, et al. 2016. Wide & deep learning for recommender systems. In Proceedings of the 1st workshop on deep learning for recommender systems. ACM, ACM, Boston, MA, USA, 7--10.

J. Cleland-Huang, R. Settimi, X. Zou, and P. Solc. 2006. The Detection and Classification of Non-Functional Requirements with Application to Early Aspects. In 14th IEEE International Requirements Engineering Conference (RE'06). IEEE, Minneapolis/St. Paul, MN, USA, 39--48. https://doi.org/10.1109/RE.2006.65

Jacob Cohen. 1960. A coefficient of agreement for nominal scales. Educational and psychological measurement 20, 1 (1960), 37--46.

Bradley Efron. 2013. Bayes' theorem in the 21st century. Science 340, 6137 (2013), 1177--1178.

Katti Faceli et al. 2011. Inteligência Artificial: Uma Abordagem de Aprendizagem de Máquina. LTC.

Yeongsu Kim et. al. 2018. Improving Classifiers for Semantic Annotation of Software Requirements with Elaborate Syntatic Structure. International Journal of Advanced Science and Technology, ISSN 2005-4238 IJAST, Vol. 112, N°. 44, 2009, pags. 123--136 (2018), 14. https://doi.org/10.14257/ijast.2018.112.12

Aurélien Géron. 2017. Hands-on machine learning with Scikit-Learn and Tensor-Flow: concepts, tools, and techniques to build intelligent systems. "O'Reilly Media, Inc.", USA.

Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H Witten. 2009. The WEKA data mining software: an update. ACM SIGKDD explorations newsletter 11, 1 (2009), 10--18.

Ahmed E Hassan and Tao Xie. 2010. Mining software engineering data. IEEE.

IEEE. 1998. IEEE Recommended Practice for Software Requirements Specifications. (1998), 37. https://doi.org/10.1109/IEEESTD.1998.88286

Justin Johnson, Andrej Karpathy, and Li Fei-Fei. 2016. Densecap: Fully convo-lutional localization networks for dense captioning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Las Vegas, NV, USA, 4565--4574.

Reyes Ju, Guillermo Licea, et al. 2017. Towards supporting software engineering using deep learning: A case of software requirements classification. In 2017 5th International Conference in Software Engineering Research and Innovation (CONISOFT). IEEE, IEEE, Mérida, Mexico, 116--120.

Qadeem Khan, Usman Akram, Wasi Haider Butt, and Saad Rehman. 2016. Implementation and evaluation of optimized algorithm for software architectures analysis through unsupervised learning (clustering). In 2016 17th International Conference on Sciences and Techniques of Automatic Control and Computer Engineering (STA). IEEE, IEEE, Sousse, Tunisian, 266--276.

Sotiris B Kotsiantis, I Zaharakis, and P Pintelas. 2007. Supervised machine learning: A review of classification techniques. Emerging artificial intelligence applications in computer engineering 160 (2007), 3--24.

J Richard Landis and Gary G Koch. 1977. The measurement of observer agreement for categorical data. biometrics (1977), 159--174.

R. Navarro-Almanza, R. Juárez-Ramírez, and G. Licea. 2017. Towards Supporting Software Engineering Using Deep Learning: A Case of Software Requirements Classification. In 2017 5th International Conference in Software Engineering Research and Innovation (CONISOFT). IEEE, Mérida, Mexico, 116--120. https://doi.org/10.1109/CONISOFT.2017.00021

Mohd Hafeez Osman and Mohd Firdaus Zaharin. 2018. Ambiguous software requirement specification detection: an automated approach. In 2018 IEEE/ACM 5th International Workshop on Requirements Engineering and Testing (RET). IEEE, IEEE, Gothenburg, Sweden, Sweden, 33--40.

Fabrizio Sebastiani. 2002. Machine Learning in Automated Text Categorization. ACM Comput. Surv. 34, 1 (March 2002), 1--47. https://doi.org/10.1145/505282.505283

I. Sommerville. 2011. Engenharia de software. PEARSON BRASIL.

Jason Van Hulse, Taghi M. Khoshgoftaar, and Amri Napolitano. 2007. Experimental Perspectives on Learning from Imbalanced Data. In Proceedings of the 24th International Conference on Machine Learning (ICML '07). ACM, New York, NY, USA, 935--942. https://doi.org/10.1145/1273496.1273614

C. J. van Rijsbergen. 1979. Information Retrieval. http://www.dcs.gla.ac.uk/Keith/Preface.html. Acessado em 8 de maio de 2019.

Peter Willett. 2006. The Porter stemming algorithm: then and now. Program 40, 3 (2006), 219--223.

David H Wolpert, William G Macready, et al. 1997. No free lunch theorems for optimization. IEEE transactions on evolutionary computation 1, 1 (1997), 67--82.

Ong Shu Yee, Saravanan Sagadevan, and Nurul Hashimah Ahamed Hassain Malim. 2018. Credit card fraud detection using machine learning as data mining technique. Journal of Telecommunication, Electronic and Computer Engineering (JTEC) 10, 1-4 (2018), 23--27.