Commodities trend link prediction on heterogeneous information networks
Resumo
Events can be defined as an action or a series of actions that have a determined theme, time, and place. Event analysis tasks for knowledge extraction from news and social media have been explored in recent years. However, there are still few studies that aim to enrich predictive models using event data. In particular, agribusiness events have multiple components to be considered for a successful prediction model. For example, price trend predictions for commodities can be performed through time series analysis of prices, but we can also consider events that represent knowledge about external factors during the training step of predictive models. In this paper, we present a method for integrating events into trend prediction tasks. First, we propose to model events and time-series information through heterogeneous information networks (HIN) that allow multiple components to be directly modeled through multi-type nodes and edges. Second, we learn features from HIN through network embedding methods, i.e., network nodes are mapped to a dense vector of features. In particular, we propose a network embedding method that propagates the semantic of the pre-trained neural language models to a heterogeneous information network and evaluates its performance in a trend link prediction. We show that the use of our proposed model language-based embedding propagation is competitive with state-of-art network embeddings algorithms. Moreover, our proposal performs network embedding incrementally, thereby allowing new events to be inserted in the same semantic space without rebuilding the entire network embedding.
Referências
Allan, J. Topic detection and tracking: event-based information organization. Vol. 12. Springer Science & Business Media, 2012.
Belkin, M., Niyogi, P., and Sindhwani, V. Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. Journal of Machine Learning Research vol. 7, pp. 2399–2434, 2006.
Chang, S., Han, W., Tang, J., Qi, G.-J., Aggarwal, C. C., and Huang, T. S. Heterogeneous network embedding via deep architectures. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp. 119–128, 2015.
Chen, H.-H., Chen, M., and Chiu, C.-C. The integration of artificial neural networks and text mining to forecast gold futures prices. Communications in Statistics - Simulation and Computation 45 (4): 1213–1225, 2016.
Chen, X. and Li, Q. Event modeling and mining: a long journey toward explainable events. The VLDB Journal 29 (1): 459–482, 2020.
Cordeiro, M. and Gama, J. Online social networks event detection: a survey. In Solving Large Scale Learning Tasks. Challenges and Algorithms. Springer, pp. 1–41, 2016.
Cui, P., Wang, X., Pei, J., and Zhu, W. A survey on network embedding. IEEE Transactions on Knowledge and Data Engineering 31 (5): 833–852, 2018.
Darekar, A. and Reddy, A. Predicting market price of soybean in major india studies through arima model. Journal of Food Legumes 30 (2): 73–76, 2017.
Deng, S., Rangwala, H., and Ning, Y. Learning dynamic context graphs for predicting social events. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. pp. 1007–1016, 2019.
Deng, S., Rangwala, H., and Ning, Y. Dynamic knowledge graph based multi-event forecasting. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. pp. 1585–1595, 2020.
Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 , 2018.
Dong, Y., Chawla, N. V., and Swami, A. metapath2vec: Scalable representation learning for heterogeneous networks. In ACM SIGKDD international conference on knowledge discovery and data mining. pp. 135–144, 2017.
dos Reis Filho, I. J., Correa, G. B., Freire, G. M., and Rezende, S. O. Forecasting future corn and soybean prices: an analysis of the use of textual information to enrich time-series. In Anais do VIII Symposium on Knowledge Discovery, Mining and Learning. SBC, pp. 113–120, 2020.
dos Santos, B. N., Rossi, R. G., Rezende, S. O., and Marcacini, R. M. A two-stage regularization framework for heterogeneous event networks. Pattern Recognition Letters vol. 138, pp. 490–496, 2020.
Goodfellow, I., Bengio, Y., and Courville, A. 6.2. 2.3 softmax units for multinoulli output distributions. Deep learning (1): 180, 2016.
Grover, A. and Leskovec, J. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining. pp. 855–864, 2016.
Hamborg, F., Lachnit, S., Schubotz, M., Hepp, T., and Gipp, B. Giveme5w: main event retrieval from news articles by extraction of the five journalistic w questions. In International Conference on Information. Springer, pp. 356–366, 2018.
Huang, Z. and Mamoulis, N. Heterogeneous information network embedding for meta path based proximity. arXiv preprint arXiv:1701.05291 , 2017.
Ji, M., Sun, Y., Danilevsky, M., Han, J., and Gao, J. Graph regularized transductive classification on heterogeneous information networks. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, pp. 570–586, 2010.
Kipf, T. N. and Welling, M. Semi-supervised classification with graph convolutional networks. In International Conference on Learning Representations (ICLR), 2017.
Marcacini, R. M., Rossi, R. G., Nogueira, B. M., Martins, L. V., Cherman, E. A., and Rezende, S. O. Websensors analytics: Learning to sense the real world using web news events. In Simp. Brasileiro de Sistemas Multimídia e Web. pp. 169–173, 2017.
Ning, Y., Zhao, L., Chen, F., Lu, C.-T., and Rangwala, H. Spatio-temporal event forecasting and precursor identification. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. pp. 3237–3238, 2019.
Perozzi, B., Al-Rfou, R., and Skiena, S. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. pp. 701–710, 2014.
Powers, D. Evaluation: From precision, recall and f-factor to roc, informedness, markedness & correlation. Mach. Learn. Technol. vol. 2, 01, 2008.
Radinsky, K. and Horvitz, E. Mining the web to predict future events. In Proceedings of the sixth ACM international conference on Web search and data mining. pp. 255–264, 2013.
Ribeiro, L. F., Saverese, P. H., and Figueiredo, D. R. struc2vec: Learning node representations from structural identity. In ACM SIGKDD international conference on knowledge discovery and data mining. pp. 385–394, 2017.
Setty, V. and Hose, K. Event2vec: Neural embeddings for news events. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. pp. 1013–1016, 2018.
Shi, C., Li, Y., Zhang, J., Sun, Y., and Philip, S. Y. A survey of heterogeneous information network analysis. IEEE Transactions on Knowledge and Data Engineering 29 (1): 17–37, 2016.
Tang, J., Qu, M., Wang, M., Zhang, M., Yan, J., and Mei, Q. Line: Large-scale information network embedding. In Proceedings of the 24th international conference on world wide web. pp. 1067–1077, 2015.
Venter, M., Strydom, D., and Grové, B. Stochastic efficiency analysis of alternative basic grain marketing strategies. Agrekon 52 (sup1): 46–63, 2013.
Wang, J., Wang, Z., Li, X., and Zhou, H. Artificial bee colony-based combination approach to forecasting agricultural commodity prices. International Journal of Forecasting, 2019.
Wu, Z., Pan, S., Chen, F., Long, G., Zhang, C., and Philip, S. Y. A comprehensive survey on graph neural networks. IEEE Transactions on Neural Networks and Learning Systems, 2020.
Xue, F., Hong, R., He, X., Wang, J., Qian, S., and Xu, C. Knowledge based topic model for multi-modal social event analysis. IEEE Transactions on Multimedia, 2019.
Yang, C., Zhang, J., and Han, J. Neural embedding propagation on heterogeneous networks. In 2019 IEEE International Conference on Data Mining (ICDM). IEEE, pp. 698–707, 2019.
Zhu, X., Ghahramani, Z., and Lafferty, J. D. Semi-supervised learning using gaussian fields and harmonic functions. In Proceedings of the 20th International conference on Machine learning (ICML-03). pp. 912–919, 2003.