A comparison of parameter selection measures for sensor learning from financial news events
Resumo
The popularization of web platforms promoted a significant increase in the publication of financial news and reports in digital media. In this sense, a multidisciplinary research area called “learning to sense” (or sensor learning) has received attention recently. Unlike traditional machine learning methods, in sensor learning there is an interest in obtaining a time series that indicates the activity of a particular topic over time. A sensor is represented by a set of parameters learned from a historical news events dataset. The sensor generates time series as news events are processed and these time series are used in decision support systems. This paper presents an overview of sensor learning for financial news. We compared six parameter selection measures for sensor learning, with the differential of considering an unsupervised scenario. The general idea is to use the concept of k-recurrent events, i.e, news events that are similar and occur together in different periods of up-trends and down-trends of a financial time series. Thus, if a specific event (extracted from news) occurred at least k times in the past always associated with up-trends, then such news is labeled as positive news. Analogously, it can be labeled as negative. The experimental results from real data provided evidence that the approach investigated in this work is a promising alternative for sensor learning from financial news events, especially in contexts where there are no domain experts or external information to label a training set.
Referências
Ackland, R. (2013). Web social science: Concepts, data and tools for social scientists in the digital age. Sage.
Aggarwal, C. C. (2018). Machine learning for text. Springer.
Chan, S. W. and Chong, M. W. (2017). Sentiment analysis in financial texts. Decision Support Systems, 94:53–64.
Chandra, P. (2017). Investment analysis and portfolio management. McGraw-Hill Education.
Chu, H. (2003). Information representation and retrieval in the digital age. Information Today, Inc.
Einav, L. and Levin, J. (2014). Economics in the age of big data. Science, 346(6210):715–721.
Fama, E. F. (1970). Efficient capital markets: A review of theory and empirical work. The journal of Finance, 25(2):383–417.
Fama, E. F. (1995). Random walks in stock market prices. Financial analysts journal, 51(1):75–80.
Feldman, R. and Sanger, J. (2006). The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data. Cambridge University Press.
Florence, R., Nogueira, B., and Marcacini, R. (2017). Constrained hierarchical clustering for news events. In Proceedings of the 21st International Database Engineering & Applications Symposium, pages 49–56. ACM.
Forman, G. (2003). An extensive empirical study of feature selection metrics for text classification. Journal of machine learning research, 3(Mar):1289–1305.
Ghiassi, M., Skinner, J., and Zimbra, D. (2013). Twitter brand sentiment analysis: A hybrid system using n-gram analysis and dynamic artificial neural network. Expert Systems with applications, 40(16):6266–6282.
Kaltwasser, P. R. (2010). Uncertainty about fundamentals and herding behavior in the forex market. Physica A: Statistical Mechanics and its Applications, 389(6):1215–1222.
Liu, B. (2012). Sentiment analysis and opinion mining. Synthesis lectures on human language technologies, 5(1):1–167.
Lupiani-Ruiz, E., GarcíA-Manotas, I., Valencia-GarcíA, R., GarcíA-SáNchez, F., Castellanos-Nieves, D., FernáNdez-Breis, J. T., and CamóN-Herrero, J. B. (2011). Financial news semantic search engine. Expert systems with applications, 38(12):15565–15572.
Majumder, D. (2013). Towards an efficient stock market: Empirical evidence from the indian market. Journal of Policy Modeling, 35(4):572–587.
Marcacini, R. M., Rossi, R. G., Nogueira, B. M., Martins, L. V., Cherman, E. A., and Rezende, S. O. (2017). Websensors analytics: Learning to sense the real world using web news events. In Proceedings of the Workshops of the 23rd Brazillian Symposium on Multimedia and the Web, pages 1–4.
Mitra, G. and Mitra, L. (2011). The handbook of news analytics in finance, volume 596. John Wiley & Sons.
Mostafa, M. M. (2013). More than words: Social networks’ text mining for consumer brand sentiments. Expert Systems with Applications, 40(10):4241–4251.
Nassirtoussi, A. K., Aghabozorgi, S., Wah, T. Y., and Ngo, D. C. L. (2014). Text mining for market prediction: A systematic review. Expert Systems with Applications, 41(16):7653–7670.
Ortigosa-Hernández, J., Rodríguez, J. D., Alzate, L., Lucania, M., Inza, I., and Lozano, J. A. (2012). Approaching sentiment analysis by using semi-supervised learning of multi-dimensional classifiers. Neurocomputing, 92:98–115.
Pang, G. and Jiang, S. (2013). A generalized cluster centroid based classifier for text categorization. Information Processing & Management, 49(2):576–586.
Radinsky, K. and Horvitz, E. (2013). Mining the web to predict future events. In Proceedings of the sixth ACM international conference on Web search and data mining, pages 255–264. ACM.
Radinsky, K., Svore, K. M., Dumais, S. T., Shokouhi, M., Teevan, J., Bocharov, A., and Horvitz, E. (2013). Behavioral dynamics on the web: Learning, modeling, and prediction. ACM Transactions on Information Systems (TOIS), 31(3):16.
Schumaker, R. P., Zhang, Y., Huang, C.-N., and Chen, H. (2012). Evaluating sentiment in financial news articles. Decision Support Systems, 53(3):458–464.
Schumpeter, J. A. (2017). Theory of economic development. Routledge.
Taylor, L., Schroeder, R., and Meyer, E. (2014). Emerging practices and perspectives on big data analysis in economics: Bigger and better or more of the same? Big Data & Society, 1(2):2053951714536877.
Urquhart, A. and Hudson, R. (2013). Efficient or adaptive markets? evidence from major stock markets using very long run historic data. International Review of Financial Analysis, 28:130–142.
Yu, L.-C., Wu, J.-L., Chang, P.-C., and Chu, H.-S. (2013). Using a contextual entropy model to expand emotion words and their intensity for the sentiment classification of stock market news. Knowledge-Based Systems, 41:89–97.