Assessing Regression-Based Sentiment Analysis Techniques in Financial Texts

  • Taynan Ferreira Universidade de São Paulo
  • Francisco Paiva Universidade de São Paulo
  • Roberto da Silva Universidade de São Paulo
  • Angel de Paula Universidade de São Paulo
  • Anna Costa Universidade de São Paulo
  • Carlos Cugnasca Universidade de São Paulo

Resumo


Sentiment analysis (SA) is increasing its importance due to the enormous amount of opinionated textual data available today. Most of the researches have investigated different models, feature representation and hyperparameters in SA classification tasks. However, few studies were conducted to evaluate the impact of these features on regression SA tasks. In this paper, we conduct such assessment on a financial domain data set by investigating different feature representations and hyperparameters in two important models -- Support Vector Regression (SVR) and Convolution Neural Networks (CNN). We conclude presenting the most relevant feature representations and hyperparameters and how they impact outcomes on a regression SA task.

Palavras-chave: Machine Learning, Text and Web Mining, Natural Language Processing, Deep Learning

Referências

Bilen, H. and Vedaldi, A. (2016). Weakly supervised deep detection networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

Bollen, J., Mao, H., and Zeng, X. (2011). Twitter mood predicts the stock market. Journal of Computational Science, 2(1):1–8.

Cortis, K., Freitas, A., Daudert, T., Huerlimann, M., Zarrouk, M., Handschuh, S., and Davis, B. (2017). SemEval-2017 Task 5: Fine-Grained Sentiment Analysis on Financial Microblogs and News. In Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), pages 519–535, Stroudsburg, PA, USA. Association for Computational Linguistics.

Davis, B., Cortis, K., Vasiliu, L., Koumpis, A., Mcdermott, R., and Handschuh, S. (2016). Social Sentiment Indices Powered by X-Scores. In ALLDATA 2016, The Second International Conference on Big Data, Small Data, Linked Data and Open Data, Lisbon, Portugal.

Hamilton, W. L., Clark, K., Leskovec, J., and Jurafsky, D. (2016). Inducing domainspecific sentiment lexicons from unlabeled corpora. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 595–605, Austin, Texas. Association for Computational Linguistics.

Hu, M. and Liu, B. (2004). Mining and summarizing customer reviews. In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’04, pages 168–177, New York, NY, USA. ACM.

Hutto, C. J. and Gilbert, E. (2014). VADER : A Parsimonious Rule-based Model for Sentiment Analysis of Social Media Text. In Eighth international AAAI conference on weblogs and social media, pages 216–225.

Khadjeh Nassirtoussi, A., Aghabozorgi, S., Ying Wah, T., and Ngo, D. C. L. (2014). Text mining for market prediction: A systematic review. Expert Systems with Applications, 41(16):7653–7670.

Khadjeh Nassirtoussi, A., Aghabozorgi, S., Ying Wah, T., and Ngo, D. C. L. (2015). Text mining of news-headlines for FOREX market prediction: A Multi-layer Dimension Reduction Algorithm with semantics and sentiment. Expert Systems with Applications, 42(1):306–324.

Kim, Y. (2014). Convolutional Neural Networks for Sentence Classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1746–1751.

Liu, B. (2012). Sentiment Analysis and Opinion Mining. Synthesis Lectures on Human Language Technologies, 5(1):1–167.

Loughran, T. and Mcdonald, B. (2011). When Is a Liability Not a Liability ? Textual Analysis , Dictionaries , and 10-Ks. Journal of Finance, 66(1):35–65.

Mansar, Y., Gatti, L., Ferradans, S., Guerini, M., Staiano, J., Solutions, F. F., and Kessler, F. B. (2017). Fortia-FBK at SemEval-2017 task 5: Bullish or bearish? inferring sentiment towards brands from financial news headlines. In Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), pages 1–6.

Medhat, W., Hassan, A., and Korashy, H. (2014). Sentiment analysis algorithms and applications: A survey. Ain Shams Engineering Journal, 5(4):1093–1113.

Melamud, O., McClosky, D., Patwardhan, S., and Bansal, M. (2016). The role of context types and dimensionality in learning word embeddings. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1030–1040, San Diego, California. Association for Computational Linguistics.

Mikolov, T., Sutskever, I., Chen, K., Corrado, G., and Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2, NIPS’13, pages 3111–3119, USA. Curran Associates Inc.

Pang, B. and Lee, L. (2004). A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts. In Proceedings of the 42Nd Annual Meeting on Association for Computational Linguistics, ACL ’04, Stroudsburg, PA, USA. Association for Computational Linguistics.

Pennington, J., Socher, R., and Manning, C. D. (2014). GloVe : Global Vectors for Word Representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pages 1532–1543.

Petrolito, R. and Dell’Orletta, F. (2018). Word embeddings in sentiment analysis. In CLiC-it.

Ruder, S., Ghaffari, P., and Breslin, J. G. (2016). A hierarchical model of reviews for aspect-based sentiment analysis. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 999–1005, Austin, Texas. Association for Computational Linguistics.

Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning, C. D., Ng, A., and Potts, C. (2013). Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pages 1631–1642, Seattle, Washington, USA. Association for Computational Linguistics.

Wang, H., Can, D., Kazemzadeh, A., Bar, F., and Narayanan, S. (2012). A system for real-time twitter sentiment analysis of 2012 U.S. presidential election cycle. In Proceedings of the ACL 2012 System Demonstrations, pages 115–120, Jeju Island, Korea. Association for Computational Linguistics.

Yadollahi, A., Shahraki, A. G., and Zaiane, O. R. (2017). Current state of text sentiment analysis from opinion to emotion mining. ACM Comput. Surv., 50(2):25:1–25:33.

Zhang, L., Wang, S., and Liu, B. (2018). Deep Learning for Sentiment Analysis : A Survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 8(4):1–25.

Zhang, Y. and Wallace, B. C. (2017). A Sensitivity Analysis of (and Practitioners’ Guide to) Convolutional Neural Networks for Sentence Classification. In Proceedings of the The 8th International Joint Conference on Natural Language Processing, pages 253– 263.
Publicado
15/10/2019
FERREIRA, Taynan; PAIVA, Francisco; SILVA, Roberto da; PAULA, Angel de; COSTA, Anna; CUGNASCA, Carlos. Assessing Regression-Based Sentiment Analysis Techniques in Financial Texts. In: ENCONTRO NACIONAL DE INTELIGÊNCIA ARTIFICIAL E COMPUTACIONAL (ENIAC), 16. , 2019, Salvador. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2019 . p. 729-740. ISSN 2763-9061. DOI: https://doi.org/10.5753/eniac.2019.9329.