B2T: A Dataset of Tweets in Portuguese Language about Brazilian Banks

  • Gabriel K. Kakimoto Universidade Estadual de Campinas (UNICAMP)
  • Seyed J. Haddadi Universidade Estadual de Campinas (UNICAMP)
  • Patrick M. Araújo Universidade Estadual de Campinas (UNICAMP)
  • Fillipe S. Silva Universidade Estadual de Campinas (UNICAMP)
  • Julio C. dos Reis Universidade Estadual de Campinas (UNICAMP) http://orcid.org/0000-0002-9545-2098
  • Marcelo da Silva Reis Universidade Estadual de Campinas (UNICAMP)

Resumo


Sentiment Analysis models have numerous applications, including evaluating business performance through comments and reviews. This capability helps businesses understand the public perception of their products and services and identify areas for improvement. However, a significant limitation in developing such models for the Portuguese language is the lack of labeled datasets, which restricts effective model training. This article addresses this issue by collecting 375,912 comments from Twitter/X, focusing specifically on comments about Brazilian banks due to the public's widespread use of their services. The labeled dataset currently contains 1,096 comments labeled as Positive, Neutral, or Negative. We present results in fine-tuning Sentiment Analysis models based on this dataset. We found it holds great potential for providing insights into customer perceptions and market trends within the banking sector. By leveraging this dataset, businesses can gain a valuable understanding of their market position and areas for service improvement.
Palavras-chave: Sentiment Analysis, Twitter/X, Brazilian Banks

Referências

Alves, M., Macedo, M., Ribeiro, J., Mancine, L., and Júnior, C. P. (2024). Sentimentos em cena: uma análise dos comentários em trailers de filmes da netflix brasil no youtube. In Anais do XIII Brazilian Workshop on Social Network Analysis and Mining, pages 228–234, Porto Alegre, RS, Brasil. SBC.

Brum, H. B. and das Graças Volpe Nunes, M. (2017). Building a sentiment corpus of tweets in brazilian portuguese. CoRR, abs/1712.08917.

Drus, Z. and Khalid, H. (2019). Sentiment analysis in social media and its application: Systematic literature review. Procedia Computer Science, 161:707–714. The Fifth Information Systems International Conference, 23-24 July 2019, Surabaya, Indonesia.

França, T., Gomes, J., and Oliveira, J. (2017). A twitter opinion mining gold standard for brazilian uprising in 2013. In XXXII Simpósio Brasileiro de Banco de Dados: Dataset Showcase Workshop, SBBD 2017 Companion, pages 182–192.

Hovy, E. and Lavid, J. (2010). Towards a ‘science’of corpus annotation: a new methodological challenge for corpus linguistics. International journal of translation, 22(1):13–36.

Liu, B. et al. (2010). Sentiment analysis and subjectivity. Handbook of natural language processing, 2(2010):627–666.

Malo, P., Sinha, A., Korhonen, P., Wallenius, J., and Takala, P. (2014). Good debt or bad debt: Detecting semantic orientations in economic texts. Journal of the Association for Information Science and Technology, 65(4):782–796.

McHugh, M. L. (2012). Interrater reliability: the kappa statistic. Biochem Med (Zagreb), 22(3):276–282.

Mohanty, A. and Cherukuri, R. C. (2023). Sentiment analysis on banking feedback and news data using synonyms and antonyms. International Journal of Advanced Computer Science & Applications, 14(12).

Pereira, D. A. (2021). A survey of sentiment analysis in the portuguese language. Artificial Intelligence Review, 54(2):1087–1115.

Plotnikov, A., Shcheludyakov, A., Cherdantsev, V., Bochkarev, A., and Zagoruiko, I. (2020). Data on post bank customer reviews from web. Data in Brief, 32:106152.

Saragih, M. H. and Girsang, A. S. (2017). Sentiment analysis of customer engagement on social media in transport online. In 2017 International Conference on Sustainable Information Engineering and Technology (SIET), pages 24–29.

Sousa, R. F. d., Brum, H. B., and Nunes, M. d. G. V. (2019). A bunch of helpfulness and sentiment corpora in brazilian portuguese. In Symposium in Information and Human Language Technology - STIL. SBC.

Souza, F., Nogueira, R., and Lotufo, R. (2020). Bertimbau: Pretrained bert models for brazilian portuguese. In Cerri, R. and Prati, R. C., editors, Intelligent Systems, pages 403–417, Cham. Springer International Publishing.
Publicado
14/10/2024
KAKIMOTO, Gabriel K.; HADDADI, Seyed J.; ARAÚJO, Patrick M.; SILVA, Fillipe S.; DOS REIS, Julio C.; REIS, Marcelo da Silva. B2T: A Dataset of Tweets in Portuguese Language about Brazilian Banks. In: DATASET SHOWCASE WORKSHOP (DSW), 6. , 2024, Florianópolis/SC. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2024 . p. 1-11. DOI: https://doi.org/10.5753/dsw.2024.243980.