FinBERT-PT-BR: Sentiment Analysis of Texts in Portuguese from the Financial Market

  • Lucas L. Santos USP
  • Reinaldo A. C. Bianchi FEI University Center
  • Anna H. R. Costa USP

Abstract


This article contributes a sentiment analysis model for financial news in Portuguese using the BERT neural network architecture. The model was trained in two stages: language modeling and sentiment modeling, with 1.4 million texts and 500 labeled texts, respectively. The model showed better performance than current state-of-the-art models across various metrics and can be used to build sentiment indices, investment strategies, and analyze macroeconomic data. The study demonstrates the potential of natural language processing and transformers for quantitative finance.

References

Araci, D. (2019). Finbert: Financial sentiment analysis with pre-trained language models. arXiv preprint arXiv:1908.10063.

Ardia, D., Chopard, B., and Boudt, K. (2015). Using twitter to model the eur/usd exchange rate. Economics Letters, 132:23–26.

Artstein, R. and Poesio, M. (2008). Inter-coder agreement for computational linguistics. Computational linguistics, 34(4):555–596.

Bollen, J., Mao, H., and Zeng, X. (2011). Twitter mood predicts the stock market.

Chen, S. F., Beeferman, D., and Rosenfeld, R. (1998). Evaluation metrics for language models.

de Souza, V. A., de Souza, F., and Meinerz, G. V. (2021). Análise de sentimento em tempo real de notícias do mercado de ações / real-time sentiment analysis of stock market news. Brazilian Journal of Development, 7(1):11084–11091.

Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint ar-Xiv:1810.04805.

Hiew, J. Z. G., Huang, X., Mou, H., Li, D., Wu, Q., and Xu, Y. (2019). Bert-based financial sentiment index and lstm-based stock return predictability. arXiv preprint arXiv:1906.09024.

Januário, B. A., Carosia, A. E. d. O., Silva, A. E. A. d., and Coelho, G. P. (2022). Sentiment analysis applied to news from the brazilian stock market. IEEE Latin America Transactions, 20(3):512–518.

Junjie, Z. and Mengoni, P. (2020). Spot gold price prediction using financial news sentiment analysis. In 2020 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT), pages 758–763.

Kordonis, J., Symeonidis, S., and Arampatzis, A. (2016). Stock price forecasting via sentiment analysis on twitter. In Proceedings of the 20th Pan-Hellenic Conference on Informatics, PCI ’16, New York, NY, USA. Association for Computing Machinery.

Kraaijeveld, O. and De Smedt, J. (2020). The predictive power of public twitter sentiment for forecasting cryptocurrency prices. Journal of International Financial Markets, Institutions and Money, 65:101188.

Krippendorff, K. (2018). Content analysis: An introduction to its methodology. Sage publications.

Liu, B. (2012). Sentiment analysis and opinion mining. Synthesis Lectures on Human Language Technologies, 5(1):1–167.

Lo, A. W. (2004). The adaptive markets hypothesis. The Journal of Portfolio Management, 30(5):15–29.

Man, X., Luo, T., and Lin, J. (2019). Financial sentiment analysis(fsa): A survey. In 2019 IEEE International Conference on Industrial Cyber Physical Systems (ICPS), pages 617–622.

Manning, C. D. and Schütze, H. (1999). Foundations of Statistical Natural Language Processing. The MIT Press.

Medeiros, M. and Borges, V. (2019). Tweet sentiment analysis regarding the brazilian stock market. In Anais do VIII Brazilian Workshop on Social Network Analysis and Mining, pages 71–82, Porto Alegre, RS, Brasil. SBC.

Otabek, S. and Choi, J. (2022). Twitter attribute classification with q-learning on bitcoin price prediction. IEEE Access, 10:96136–96148.

Pagolu, V. S., Reddy, K. N., Panda, G., and Majhi, B. (2016). Sentiment analysis of twitter data for predicting stock market movements. In 2016 Int. Conf. on Signal Processing, Communication, Power and Embedded System, pages 1345–1350.

Pang, B. and Lee, L. (2004). A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. In Proceedings of the 42Nd Annual Meeting on Association for Computational Linguistics, ACL ’04, pages 271–278.

Pereira, J. G. (2019). Análise de sentimentos da população brasileira em relação a eleição presidencial de 2018 através da rede social twitter.

Silva, M. C. A. (2018). Percepções sobre corrupção durante as eleições presidenciais no brasil em 2018: uma análise baseada no twitter.

Souza, F., Nogueira, R., and Lotufo, R. (2020). Bertimbau: pretrained bert models for brazilian portuguese. In Brazilian Conference on Intelligent Systems, pages 403–417. Springer.

Sun, C., Qiu, X., Xu, Y., and Huang, X. (2019). How to fine-tune bert for text classification? In China national conference on Chinese computational linguistics, pages 194–206. Springer.

Tan, K. L., Lee, C. P., and Lim, K. M. (2023). A survey of sentiment analysis: Approaches, datasets, and future research. Applied Sciences, 13(7).

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., and Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30.

Xavier, F., OLENSCKI, J. R. W., ACOSTA, A. L., SALLUM, M. A. M., and SARAIVA, A. M. (2020). Análise de redes sociais como estratégia de apoio à vigilância em saúde durante a covid-19. Estudos Avançados, 34(99):261–282.
Published
2023-08-06
SANTOS, Lucas L.; BIANCHI, Reinaldo A. C.; COSTA, Anna H. R.. FinBERT-PT-BR: Sentiment Analysis of Texts in Portuguese from the Financial Market. In: BRAZILIAN WORKSHOP ON ARTIFICIAL INTELLIGENCE IN FINANCE (BWAIF), 2. , 2023, João Pessoa/PB. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2023 . p. 144-155. DOI: https://doi.org/10.5753/bwaif.2023.231151.