Embedding Propagation over Heterogeneous Information Networks

Paulo Viviurka do Carmo; Ricardo Marcacini

doi:10.5753/webmedia_estendido.2023.233762

Paulo Viviurka do Carmo HTWK Leipzig http://orcid.org/0000-0002-8550-4368
Ricardo Marcacini USP https://orcid.org/0000-0002-2309-3487

DOI: https://doi.org/10.5753/webmedia_estendido.2023.233762

Resumo

Heterogeneous Information Networks (HINs) play a crucial role in modeling and analyzing multimedia systems and heterogeneous data. They provide a comprehensive understanding of entities and relationships within complex data structures. However, integrating HINs with machine learning tasks poses challenges that require specific models or vector space representation. This paper proposes an innovative embedding propagation graph method for HINs with textual data. By leveraging language models like BERT, our method propagates contextual text embeddings, combining the network’s topological information and the semantic information of textual objects, which are then propagated to non-textual objects within the network. The method facilitates the integration of machine learning techniques with various modeling approaches, enhancing analysis capabilities in multimedia and heterogeneous data domains. Through robust experimental evaluations on different datasets and in three application domains, our method demonstrates competitive performance, enabling direct comparison of entities and relationships within a unified latent space. This research highlights the potential of HINs for intelligent analysis and information retrieval in multimedia systems and heterogeneous data contexts.

Palavras-chave: embedding propagation, network embedding, heterogeneous information networks

Referências

Charu C Aggarwal. 2018. Machine learning for text. Springer.

Nick Craswell. 2009. Mean Reciprocal Rank. Springer US, Boston, MA, 1703–1703. https://doi.org/10.1007/978-0-387-39940-9_488

Peng Cui, Xiao Wang, Jian Pei, and Wenwu Zhu. 2018. A survey on network embedding. IEEE Transactions on Knowledge and Data Engineering 31, 5 (2018), 833–852.

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).

Paulo do Carmo and Ricardo Marcacini. 2021. Embedding propagation over heterogeneous event networks for link prediction. In 2021 IEEE International Conference on Big Data (Big Data). 4812–4821.

P do Carmo, IJ Reis Filho, and R Marcacini. 2021. Commodities trend link prediction on heterogeneous information networks. In Anais do IX Symposium on Knowledge Discovery, Mining and Learning. SBC, 81–88.

P. do Carmo, I. J. Reis Filho, and R. Marcacini. 2023. TRENCHANT: TRENd PrediCtion on Heterogeneous informAtion NeTworks. Journal of Information and Data Management 13, 6 (Jan. 2023). https://doi.org/10.5753/jidm.2022.2546

Paulo Viviurka Do Carmo, Edgard Marx, Ricardo Marcacini, Marilia Valli, João Victor Silva e Silva, and Alan Pilon. 2023. NatUKE: A Benchmark for Natural Product Knowledge Extraction from Academic Literature. In 2023 IEEE 17th International Conference on Semantic Computing (ICSC). IEEE, 199–203.

AmpliGraph Docs. 2019. Hits at n score.

Yuxiao Dong, Nitesh V Chawla, and Ananthram Swami. 2017. metapath2vec: Scalable representation learning for heterogeneous networks. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. 135–144.

Maarten Grootendorst. 2020. BERTopic: Leveraging BERT and c-TF-IDF to create easily interpretable topics. https://doi.org/10.5281/zenodo.4381785

Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining. 855–864.

Felix Hamborg, Soeren Lachnit, Moritz Schubotz, Thomas Hepp, and Bela Gipp. 2018. Giveme5W: main event retrieval from news articles by extraction of the five journalistic w questions. In International Conference on Information. Springer, 356–366.

Aidan Hogan, Eva Blomqvist, Michael Cochez, Claudia d’Amato, Gerard de Melo, Claudio Gutierrez, Sabrina Kirrane, José Emilio Labra Gayo, Roberto Navigli, Sebastian Neumaier, et al. 2021. Knowledge graphs. ACM Computing Surveys (CSUR) 54, 4 (2021), 1–37.

Thomas N. Kipf and MaxWelling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. In International Conference on Learning Representations (ICLR).

Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems. 3111–3119.

Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. 701–710.

David Powers. 2008. Evaluation: From Precision, Recall and F-Factor to ROC, Informedness, Markedness & Correlation. Mach. Learn. Technol. 2 (01 2008).

Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018. Improving language understanding with unsupervised learning. Technical report, OpenAI (2018).

Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 3982–3992.

Leonardo FR Ribeiro, Pedro HP Saverese, and Daniel R Figueiredo. 2017. struc2vec: Learning node representations from structural identity. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. 385–394.

Rafael G Rossi, Alneu A Lopes, and Solange O Rezende. 2014. A parameter-free label propagation algorithm using bipartite heterogeneous networks for text classification. In Proceedings of the 29th annual acm symposium on applied computing. 79–84.

Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. 2015. Line: Large-scale information network embedding. In Proceedings of the 24th international conference on world wide web. 1067–1077.

Ronald J Williams and Jing Peng. 1990. An efficient gradient-based algorithm for on-line training of recurrent network trajectories. Neural computation 2, 4 (1990), 490–501.

Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and S Yu Philip. 2020. A comprehensive survey on graph neural networks. IEEE Transactions on Neural Networks and Learning Systems (2020).

Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen, Christopher Dewan, Mona Diab, Xian Li, Xi Victoria Lin, et al. 2022. Opt: Open pre-trained transformer language models. arXiv preprint arXiv:2205.01068 (2022).