Exploring the Potential and Feasibility of Open-Source LLMs in Sentiment Analysis

  • Breno Braga Neves PUC-Rio
  • Theo Sousa PUC-Rio
  • Daniel Coutinho PUC-Rio
  • Alessandro Garcia PUC-Rio
  • Juliana Alves Pereira PUC-Rio

Abstract


Sentiment analysis tools are commonly used in SE to understand developer communication in collaborative environments such as GitHub. As state-of-the-art tools can underperform, newer tools utilizing large language models (LLMs) are being adopted, though they can be computationally expensive. This study evaluates three open-source models: Lllama3, Gemma, and Mistral. Using a GitHub discussion dataset, it explores how these models perform and how prompt engineering influences results. Findings show that these open-source LLMs offer similar performance to state-of-the-art tools, making them viable, cost-effective alternatives. This study also assessed the advantages and limitations of different prompting strategies.

References

Ain, Q. T., Ali, M., Riaz, A., Noureen, A., Kamran, M., Hayat, B., and Rehman, A. (2017). Sentiment analysis using deep learning techniques: a review. International Journal of Advanced Computer Science and Applications, 8(6).

Barbosa, C., Uchôa, A., Coutinho, D., Assunçao, W. K., Oliveira, A., Garcia, A., Fonseca, B., Rabelo, M., Coelho, J. E., Carvalho, E., et al. (2023). Beyond the code: Investiga ting the effects of pull request conversations on design decay. In 2023 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM), pages 1–12. IEEE.

Barbosa, C., Uchôa, A., Coutinho, D., Falcão, F., Brito, H., Amaral, G., Soares, V., Garcia, A., Fonseca, B., Ribeiro, M., et al. (2020). Revealing the social aspects of design decay: A retrospective study of pull requests. In Proceedings of the XXXIV Brazilian Symposium on Software Engineering, pages 364–373.

Braga, B. (2024). Complementary material. [link]. Accessed: setembro/2024.

Brown, T. B., Mann, B., Ryder, N., et al. (2020). Language models are few-shot learners.

Coutinho, D., Cito, L., Lima, M. V., Arantes, B., Pereira, J. A., Arriel, J., Godinho, J., Martins, V., Libório, P., Leite, L., Garcia, A., Assunção, W. K. G., Steinmacher, I., Baffa, A., and Fonseca, B. (2024). ”looks good to me ;-)”: Assessing sentiment analysis tools for pull request discussions. In Proceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering (EASE 2024), page 11, Salerno, Italy. ACM.

Graziotin, D., Wang, X., and Abrahamsson, P. (2014). Happy software developers solve problems better: psychological measurements in empirical software engineering. PeerJ, 2:e289.

Graziotin, D., Wang, X., and Abrahamsson, P. (2015). How do you feel, developer? an explanatory theory of the impact of affects on programming performance. PeerJ Computer Science, 1:e18.

Gururangan, S., Marasovic, A., Swayamdipta, S., Lo, K., Beltagy, I., Downey, D., and Smith, N. A. (2020). Don’t stop pretraining: Adapt language models to domains and tasks. arXiv preprint arXiv:2004.10964.

Hasan, M. A., Das, S., Anjum, A., Alam, F., Anjum, A., Sarker, A., and Noori, S. R. H. (2024). Zero- and few-shot prompting with llms: A comparative study with fine-tuned models for bangla sentiment analysis. arXiv preprint arXiv:2308.10783v2.

Herrmann, M. and Klünder, J. (2021). From textual to verbal communication: Towards applying sentiment analysis to a software project meeting. In Leibniz University Hannover.

Hou, G. and Lian, Q. (2024). Benchmarking of commercial large language models: Chatgpt, mistral, and llama. Shanghai Quangong AI Lab. DOI: 10.21203/rs.3.rs-4376810/v1.

Jiang, A. Q., Sablayrolles, A., Mensch, A., et al. (2024). The model arena for cross-lingual sentiment analysis: A comparative study in the era of large language models. arXiv preprint arXiv:2406.19358v1.

Kaplan, J., McCandlish, S., Henighan, T., et al. (2020). Scaling laws for neural language models.

Mo, K., Liu, W., Xu, X., Yu, C., Zou, Y., and Xia, F. (2024). Fine-tuning gemma-7b for enhanced sentiment analysis of financial news headlines. arXiv preprint arXiv:2406.13626.

Niimi, J. (2024). Dynamic sentiment analysis with local large language models using majority voting: A study on factors affecting restaurant evaluation. arXiv preprint arXiv:2407.13069.

Ramesh, K., Sitaram, S., and Choudhury, M. (2023). Fairness in language models beyond english: Gaps and challenges. arXiv preprint arXiv:2302.12578.

Siino, M. (2024). Transmistral at semeval-2024 task 10: Using mistral 7b for emotion discovery and reasoning its flip in conversation. In Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024), pages 298–304. Association for Computational Linguistics.

Touvron, H., Lavril, T., Izacard, G., et al. (2023a). Llama: Open and efficient foundation language models.

Touvron, H., Martin, L., Stone, K., et al. (2023b). Large language models performance comparison of emotion and sentiment classification. arXiv preprint arXiv:2407.04050v1.

Tsay, J., Dabbish, L., and Herbsleb, J. (2014). Influence of social and technical factors for evaluating contribution in github. pages 356–366. ACM.

Vorakitphan, V., Basic, M., and Meline, G. L. (2024). Deep content understanding toward entity and aspect target sentiment analysis on foundation models. Proceedings of the 41st International Conference on Machine Learning.

Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E. H., Le, Q. V., and Zhou, D. (2022). Chain of thought prompting elicits reasoning in large language models. arXiv preprint arXiv:2201.11903.

Xing, F. (2024). Designing heterogeneous llm agents for financial sentiment analysis. arXiv preprint arXiv:2401.05799.

Yu, Y., Wang, H., Filkov, V., Devanbu, P., and Vasilescu, B. (2015). Wait for it: Determinants of pull request evaluation latency on github. In Mining software repositories (MSR), 2015 IEEE/ACM 12th working conference on, pages 367–371. IEEE.

Zhan, T., Shi, C., Shi, Y., Li, H., and Lin, Y. (2024). Optimization techniques for sentiment analysis based on llm (gpt-3). arXiv preprint arXiv:2405.09770.

Zhang, W., Deng, Y., Liu, B., Pan, S. J., and Bing, L. (2023a). Sentiment analysis in the era of large language models: A reality check. arXiv preprint arXiv:2305.15005.

Zhang, X., Li, S., Hauer, B., Shi, N., and Kondrak, G. (2023b). Don’t trust chatgpt when your question is not in english: A study of multilingual abilities and types of llms. arXiv preprint arXiv:2305.16339.
Published
2024-09-30
NEVES, Breno Braga; SOUSA, Theo; COUTINHO, Daniel; GARCIA, Alessandro; PEREIRA, Juliana Alves. Exploring the Potential and Feasibility of Open-Source LLMs in Sentiment Analysis. In: SOFTWARE ENGINEERING UNDERGRADUATE RESEARCH COMPETITION - BRAZILIAN CONFERENCE ON SOFTWARE: THEORY AND PRACTICE (CBSOFT), 15. , 2024, Curitiba/PR. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2024 . p. 89-98. DOI: https://doi.org/10.5753/cbsoft_estendido.2024.4106.