Automatic Lexicon Expansion for Sentiment Analysis of Twitter in the Brazilian Financial Market Domain
Abstract
This article investigates the opportunities in creating specialized lexicons with a focus on building a glossary in Portuguese aimed at the Brazilian Financial Market (MFB). The methodology used involves the design of a sequence of steps aimed at enriching a set of seed words, which is subsequently used in the task of analyzing sentiments in tweets and news related to the MFB domain. As results, a f1-score of 71.5% was achieved in the classification of tweets and a f1-score of 67.9% in news, both in the lexical approach. Furthermore, a mixed approach, combining the lexicon with the machine learning support vector machine model, achieved a f1-score of 77.4% in classifying tweets.
References
Bos, T. and Frasincar, F. (2022). Automatically building financial sentiment lexicons while accounting for negation. Cognitive Computation, 14:442–460.
Carosia, A. E., Coelho, G. P., and Silva, A. E. (2020). Analyzing the Brazilian financial market through Portuguese sentiment analysis in social media. Applied Artificial Intelligence, 34:1–19.
Das, S. R., Donini, M., Zafar, M. B., He, J., and Kenthapadi, K. (2022). Finlex: An effective use of word embeddings for financial lexicon generation. Journal of Finance and Data Science, 8:1–11.
Fernandes, D. S. A., Fernandes, M. G. C., Borges, G. A., and Soares, F. A. (2019). Decision-making simulator for buying and selling stock market shares based on twitter indicators and technical analysis. In 2019 IEEE International Conference on Systems, Man and Cybernetics (SMC), pages 2626–2632.
Hartmann, N., Fonseca, E., Shulby, C., Treviso, M., Rodrigues, J., and Aluisio, S. (2017). Portuguese word embeddings: Evaluating on word analogies and natural language tasks.
Januário, B. A., Carosia, A. E. d. O., Silva, A. E. A. d., and Coelho, G. P. (2022). Sentiment analysis applied to news from the Brazilian stock market. IEEE Latin America Transactions, 20:512–518.
Jung, E., Jain, H., Sinha, A. P., and Gaudioso, C. (2021). Building a specialized lexicon for breast cancer clinical trial subject eligibility analysis. Health Informatics Journal, 27.
Losada, D. E. and Gamallo, P. (2020). Evaluating and improving lexical resources for detecting signs of depression in text. Language Resources and Evaluation, 54:1–24.
Loughran, T. and Mcdonald, B. (2011). When is a liability not a liability? textual analysis, dictionaries, and 10-ks. Journal of Finance, 66:35–65.
Mahmood, A. T., Kamaruddin, S. S., Naser, R. K., and Nadzir, M. M. (2020). A combination of lexicon and machine learning approaches for sentiment analysis on facebook. Journal of System and Management Sciences, 10:140–150.
Oliveira, N., Cortez, P., and Areal, N. (2016). Stock market sentiment lexicon acquisition using microblogging data and statistical measures. Decision Support Systems, 85:62–73.
Pereira, D. A. (2021). A survey of sentiment analysis in the Portuguese language. Artificial Intelligence Review, 54:1087–1115.
Shan, R., Jiang, T., and Wang, Y. (2021). Research on the construction of domain sentiment lexicon based on label propagation algorithm. ACM International Conference Proceeding Series, pages 1024–1029.
Smywiński-Pohl, A., Lasocki, K., Wróbel, K., and Strzała, M. (2019). Automatic construction of a Polish legal dictionary with mappings to extra-legal terms established via word embeddings. Proceedings of the 17th International Conference on Artificial Intelligence and Law, ICAIL 2019, pages 234–238.
Wang, Y., Yin, F., Liu, J., and Tosato, M. (2020). Automatic construction of domain sentiment lexicon for semantic disambiguation. Multimedia Tools and Applications, 79:22355–22373.
