Validation and construction of a lexical dictionary to assist sentiment analysis in software project repositories
Abstract
Sentiment analysis makes inference about polarities in words that can represent possible emotions. The assertiveness of this classification is important for the results reliability. For this reason, this article investigates, validates and builds the lexicon dictionary, in the context of Software Engineering, using words, emoticons and idiomatic expressions from the SentiStrength-SE tool. An experiment online with 559 questions answered from 48 participants in the Computing area was performed to validate lexical terms agreement from the dictionary. At the end of the data collection, the terms were gathered for validation using a Stack Overflow database to find the results on accuracy, precision, recall and F1-score of the new dictionary. The new lexical dictionary has 79% Accuracy and Precision, with 78% Recall and f1-score with a smaller polarity interval than the original dictionary.
Keywords:
Software Engineering, software maintenance, sentiment analysis, lexicon dictionary, polarity, validation
References
Boechat, G., Júnior, J. M., Machado, I., and Mendonça, M. (2019). Análise de sentimentos em discussões de issues reabertas do github. In Anais do VII Workshop on Software Visualization, Evolution and Maintenance (VEM), pages 13–20. SBC.
Calefato, F., Lanubile, F., Maiorano, F., and Novielli, N. (2018). Sentiment Polarity Detection for Software Development. In Proceedings of the 40th International Conference on Software Engineering, ICSE, pages 128–128, NY, USA. ACM.
Islam, M. R. and Zibran, M. F. (2018). SentiStrength-SE: Exploiting domain specificity for improved sentiment analysis in software engineering text. J. of Systems and Software, 145:125 – 146.
Liu, B. (2015). Sentiment Analysis: Mining Opinions, Sentiments, and Emotions. C.U.P.
Menezes, H., Boechat, G., Júnior, J. M., and Machado, I. (2020). Validação e construção de um dicionário léxico para auxiliar a análise de sentimentos em repositórios de projetos de software (material suplementar). Zenodo. http://doi.org/10.5281/zenodo.4029777.
Murgia, A., Tourani, P., Adams, B., and Ortu, M. (2014). Do Developers Feel Emotions? An Exploratory Analysis of Emotions in Software Artifacts. In Proceedings of the 11th Conf. on Mining Software Repositories(MSR), page 262–271, NY, USA. ACM.
Thelwall, M., Buckley, K., Paltoglou, G., Cai, D., and Kappas, A. (2010). Sentiment Strength Detection in Short Informal Text. J. Am. Soc. Inf. Sci. Technol., 61(12):2544–2558.
Calefato, F., Lanubile, F., Maiorano, F., and Novielli, N. (2018). Sentiment Polarity Detection for Software Development. In Proceedings of the 40th International Conference on Software Engineering, ICSE, pages 128–128, NY, USA. ACM.
Islam, M. R. and Zibran, M. F. (2018). SentiStrength-SE: Exploiting domain specificity for improved sentiment analysis in software engineering text. J. of Systems and Software, 145:125 – 146.
Liu, B. (2015). Sentiment Analysis: Mining Opinions, Sentiments, and Emotions. C.U.P.
Menezes, H., Boechat, G., Júnior, J. M., and Machado, I. (2020). Validação e construção de um dicionário léxico para auxiliar a análise de sentimentos em repositórios de projetos de software (material suplementar). Zenodo. http://doi.org/10.5281/zenodo.4029777.
Murgia, A., Tourani, P., Adams, B., and Ortu, M. (2014). Do Developers Feel Emotions? An Exploratory Analysis of Emotions in Software Artifacts. In Proceedings of the 11th Conf. on Mining Software Repositories(MSR), page 262–271, NY, USA. ACM.
Thelwall, M., Buckley, K., Paltoglou, G., Cai, D., and Kappas, A. (2010). Sentiment Strength Detection in Short Informal Text. J. Am. Soc. Inf. Sci. Technol., 61(12):2544–2558.
Published
2020-10-19
How to Cite
MENEZES, Hiolanda; BOECHAT, Gláucya; MOTA JR, Joselito; MACHADO, Ivan.
Validation and construction of a lexical dictionary to assist sentiment analysis in software project repositories. In: WORKSHOP ON SOFTWARE VISUALIZATION, EVOLUTION AND MAINTENANCE (VEM), 8. , 2020, Evento Online.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2020
.
p. 41-48.
DOI: https://doi.org/10.5753/vem.2020.14527.
