TRENCHANT: TRENd PrediCtion on Heterogeneous informAtion NeTworks

Authors

  • P. do Carmo University of São Paulo - São Carlos
  • I. J. Reis Filho State University of Minas Gerais - Frutal / University of São Paulo - São Carlos
  • R. Marcacini University of São Paulo - São Carlos

DOI:

https://doi.org/10.5753/jidm.2022.2546

Keywords:

agribusiness, event analysis, heterogeneous networks, network embedding, text mining

Abstract

Events can be defined as an action or a series of actions with a determined theme, time, and place. Recently, event analysis tasks for knowledge extraction from news and social media have been explored. In particular, agribusiness events have multiple components for a successful prediction model. For example, price trend predictions for commodities can be performed through a time series analysis of prices. However, we can also consider events that represent external factors during the training step of predictive models. This paper presents a method for integrating agribusiness news events into trend prediction tasks. First, we propose to model events and time-series information through heterogeneous information networks (HIN) that allow multiple components to be directly modeled through multi-type nodes and edges. Second, we learn features from HIN through network embedding methods, i.e., network nodes are mapped to a dense vector of features. In particular, we propose a network embedding method that propagates the semantics of the pre-trained language models to a heterogeneous information network and evaluates its performance in trend prediction for agribusiness commodities prices. Finally, we propose a second method that leverages the HIN architecture to fine-tune a pre-trained language model before propagation. We show that using our proposed models of language-based embedding propagation is competitive with state-of-art network embeddings algorithms. Moreover, our proposal performs network embedding incrementally, allowing new events to be inserted in the same semantic space without rebuilding the entire network embedding.

Downloads

Download data is not yet available.

References

Adanacioglu, H., Yercan, M., et al. An analysis of tomato prices at wholesale level in turkey: an application of sarima model. Custos e Agronegócio Online 8 (4): 52–75, 2012.

Allan, J. Topic detection and tracking: event-based information organization. Vol. 12. Springer Science & Business Media, 2012.

Belkin, M., Niyogi, P., and Sindhwani, V. Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. Journal of Machine Learning Research vol. 7, pp. 2399–2434, 2006.

Carmo, P., Reis Filho, I., and Marcacini, R. Commodities trend link prediction on heterogeneous information networks. In Anais do IX Symposium on Knowledge Discovery, Mining and Learning. SBC, pp. 81–88, 2021.

Chang, S., Han, W., Tang, J., Qi, G.-J., Aggarwal, C. C., and Huang, T. S. Heterogeneous network embedding via deep architectures. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp. 119–128, 2015.

Chen, H.-H., Chen, M., and Chiu, C.-C. The integration of artificial neural networks and text mining to forecast gold futures prices. Communications in Statistics - Simulation and Computation 45 (4): 1213–1225, 2016.

Chen, X. and Li, Q. Event modeling and mining: a long journey toward explainable events. The VLDB Journal 29 (1): 459–482, 2020.

Cordeiro, M. and Gama, J. Online social networks event detection: a survey. In Solving Large Scale Learning Tasks. Challenges and Algorithms. Springer, pp. 1–41, 2016.

Cui, P., Wang, X., Pei, J., and Zhu, W. A survey on network embedding. IEEE Transactions on Knowledge and Data Engineering 31 (5): 833–852, 2018.

Darekar, A. and Reddy, A. Predicting market price of soybean in major india studies through arima model. Journal of Food Legumes 30 (2): 73–76, 2017.

Deng, S., Rangwala, H., and Ning, Y. Learning dynamic context graphs for predicting social events. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. pp. 1007–1016, 2019.

Deng, S., Rangwala, H., and Ning, Y. Dynamic knowledge graph based multi-event forecasting. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. pp. 1585–1595, 2020.

Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 , 2018.

Dong, Y., Chawla, N. V., and Swami, A. metapath2vec: Scalable representation learning for heterogeneous networks. In ACM SIGKDD international conference on knowledge discovery and data mining. pp. 135–144, 2017.

dos Reis Filho, I. J., Correa, G. B., Freire, G. M., and Rezende, S. O. Forecasting future corn and soybean prices: an analysis of the use of textual information to enrich time-series. In Anais do VIII Symposium on nowledge Discovery, Mining and Learning. SBC, pp. 113–120, 2020.

dos Santos, B. N., Rossi, R. G., Rezende, S. O., and Marcacini, R. M. A two-stage regularization framework for heterogeneous event networks. Pattern Recognition Letters vol. 138, pp. 490–496, 2020.

Goodfellow, I., Bengio, Y., and Courville, A. 6.2. 2.3 softmax units for multinoulli output distributions. Deep learning (1): 180, 2016.

Grover, A. and Leskovec, J. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining. pp. 855–864, 2016.

Hamborg, F., Lachnit, S., Schubotz, M., Hepp, T., and Gipp, B. Giveme5w: main event retrieval from news articles by extraction of the five journalistic w questions. In International Conference on Information. Springer, pp. 356–366, 2018.

Hao, Y., Dong, L., Wei, F., and Xu, K. Investigating learning dynamics of BERT fine-tuning. In Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing. Association for Computational Linguistics, Suzhou, China, pp. 87–92, 2020.

Huang, Z. and Mamoulis, N. Heterogeneous information network embedding for meta path based proximity. arXiv preprint arXiv:1701.05291 , 2017.

Ji, M., Sun, Y., Danilevsky, M., Han, J., and Gao, J. Graph regularized transductive classification on heterogeneous information networks. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, pp. 570–586, 2010.

Kipf, T. N. and Welling, M. Semi-supervised classification with graph convolutional networks. In International Conference on Learning Representations (ICLR), 2017.

Marcacini, R. M., Rossi, R. G., Nogueira, B. M., Martins, L. V., Cherman, E. A., and Rezende, S. O. Websensors analytics: Learning to sense the real world using web news events. In Simp. Brasileiro de Sistemas Multimídia e Web. pp. 169–173, 2017.

Merchant, A., Rahimtoroghi, E., Pavlick, E., and Tenney, I. What happens to bert embeddings during finetuning? arXiv preprint arXiv:2004.14448 , 2020.

Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., and Dean, J. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems. pp. 3111–3119, 2013.

Ning, Y., Zhao, L., Chen, F., Lu, C.-T., and Rangwala, H. Spatio-temporal event forecasting and precursor identification. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. pp. 3237–3238, 2019.

Perozzi, B., Al-Rfou, R., and Skiena, S. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. pp. 701–710, 2014.

Peters, M. E., Ruder, S., and Smith, N. A. To tune or not to tune? adapting pretrained representations to diverse tasks. arXiv preprint arXiv:1903.05987 , 2019.

Radford, A., Narasimhan, K., Salimans, T., and Sutskever, I. Improving language understanding with unsupervised learning. Technical report, OpenAI , 2018.

Radinsky, K. and Horvitz, E. Mining the web to predict future events. In Proceedings of the sixth ACM international conference on Web search and data mining. pp. 255–264, 2013.

Reimers, N. and Gurevych, I. Sentence-bert: Sentence embeddings using siamese bert-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). pp. 3982–3992, 2019.

Ribeiro, L. F., Saverese, P. H., and Figueiredo, D. R. struc2vec: Learning node representations from structural identity. In ACM SIGKDD international conference on knowledge discovery and data mining. pp. 385–394, 2017.

Setty, V. and Hose, K. Event2vec: Neural embeddings for news events. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. pp. 1013–1016, 2018.

Shi, C., Li, Y., Zhang, J., Sun, Y., and Philip, S. Y. A survey of heterogeneous information network analysis. IEEE Transactions on Knowledge and Data Engineering 29 (1): 17–37, 2016.

Sun, C., Qiu, X., Xu, Y., and Huang, X. How to fine-tune bert for text classification? In China national conference on Chinese computational linguistics. Springer, pp. 194–206, 2019.

Tang, J., Qu, M., Wang, M., Zhang, M., Yan, J., and Mei, Q. Line: Large-scale information network embedding. In Proceedings of the 24th international conference on world wide web. pp. 1067–1077, 2015.

Venter, M., Strydom, D., and Grové, B. Stochastic efficiency analysis of alternative basic grain marketing strategies. Agrekon 52 (sup1): 46–63, 2013.

Wang, J., Wang, Z., Li, X., and Zhou, H. Artificial bee colony-based combination approach to forecasting agricultural commodity prices. International Journal of Forecasting, 2019.

Wu, Z., Pan, S., Chen, F., Long, G., Zhang, C., and Philip, S. Y. A comprehensive survey on graph neural networks. IEEE Transactions on Neural Networks and Learning Systems, 2020.

Xue, F., Hong, R., He, X., Wang, J., Qian, S., and Xu, C. Knowledge based topic model for multi-modal social event analysis. IEEE Transactions on Multimedia, 2019.

Yang, C., Zhang, J., and Han, J. Neural embedding propagation on heterogeneous networks. In 2019 IEEE International Conference on Data Mining (ICDM). IEEE, pp. 698–707, 2019.

Zhu, X., Ghahramani, Z., and Lafferty, J. D. Semi-supervised learning using gaussian fields and harmonic functions. In Proceedings of the 20th International conference on Machine learning (ICML-03). pp. 912–919, 2003.

Downloads

Published

2023-01-17

How to Cite

do Carmo, P., Reis Filho, I. J., & Marcacini, R. (2023). TRENCHANT: TRENd PrediCtion on Heterogeneous informAtion NeTworks. Journal of Information and Data Management, 13(6). https://doi.org/10.5753/jidm.2022.2546

Issue

Section

KDMiLe 2021