On the Use of Early Fusion Operators on Heterogeneous Graph Neural Networks for One-Class Learning

  • Marcos Paulo Silva Gôlo USP
  • Marcelo Isaias De Moraes USP
  • Rudinei Goularte USP
  • Ricardo Marcondes Marcacini USP


Multimodal data fusion generates robust and unified representations considering supplementary and complementary information from different modalities, such as audio, image, and text. Different strategies for data fusion have been explored for decades, from simple concatenation-based strategies of the modalities’ features to the use of vector fusion operators (sum, average, subtraction, multiplication, etc.) between feature vectors in latent spaces of each modality. However, existing studies do not investigate multimodal fusion operators for heterogeneous graphs, which are powerful representations for modeling real-world data through a powerful structure that considers the different relations between different node types. Those representations are suited for important multimedia-related tasks, such as classification, recommendation, summarization, web sensing, and content-based retrieval. This paper presents a Graph Neural Network (GNN) method for heterogeneous graphs that explores different types of early fusion operators to deal with multiple modalities. Moreover, we evaluated the proposal’s performance with different early fusion operators considering one-class learning, a popular learning approach for real-world applications. A statistical analysis of the experimental results shows that early fusion operators improve the f1-Score when considering GNNs from heterogeneous graphs. We highlight the subtraction, multiplication, and minimum operators outperforming the other operators. Thus, we argue that our early-fusion operators’ proposal in heterogeneous graph neural networks leads to improved performance and is also a competitive alternative to the well-often-used concatenation technique or costly hand-based approaches of combining different modalities.

Palavras-chave: Early Fusion, One-Class Learning, Heterogeneous Graphs


Shamshe Alam, Sanjay Kumar Sonbhadra, Sonali Agarwal, and P Nagabhushan. 2020. One-class support vector classifiers: A survey. Knowledge-Based Systems 196 (2020), 105754. https://doi.org/10.1016/j.knosys.2020.105754

Pradeep K Atrey, M Anwar Hossain, Abdulmotaleb El Saddik, and Mohan S Kankanhalli. 2010. Multimodal fusion for multimedia analysis: a survey. Multimedia systems 16 (2010), 345–379. https://doi.org/10.1007/s00530-010-0182-0

Tadas Baltrušaitis, Chaitanya Ahuja, and Louis-Philippe Morency. 2018. Multimodal machine learning: A survey and taxonomy. IEEE transactions on pattern analysis and machine intelligence 41, 2 (2018), 423–443. https://doi.org/10.1109/TPAMI.2018.2798607

Antonio AR Beserra and Rudinei Goularte. 2023. Multimodal early fusion operators for temporal video scene segmentation tasks. Multimedia Tools and Applications 82 (2023), 1–18. https://doi.org/10.1007/s11042-023-14953-6

Antonio AR Beserra, Rodrigo M Kishi, and Rudinei Goularte. 2020. Evaluating Early Fusion Operators at Mid-Level Feature Space. In Proceedings of the Brazilian Symposium on Multimedia and the Web. ACM, online, 113–120. https://doi.org/10.1145/3428658.3431079

Antonio Alessandro Rocha Beserra. 2022. Operadores de fusão prévia para segmentação temporal de vídeo em cenas. Master’s thesis. Universidade de São Paulo. [link].

Angelo da Silva, Marcos Gôlo, and Ricardo Marcacini. 2023. Unsupervised Heterogeneous Graph Neural Network for Hit Song Prediction through One Class Learning. In 10th Symposium on Knowledge Discovery, Mining and Learning (KDMiLe). SBC, Campinas, SP, Brazil, –. https://doi.org/10.5753/kdmile.2022.227954

Mariana Caravanti de Souza, Bruno Magalhães Nogueira, Rafael Geraldeli Rossi, Ricardo Marcondes Marcacini, Brucce Neves Dos Santos, and Solange Oliveira Rezende. 2022. A network-based positive and unlabeled learning approach for fake news detection. Machine Learning 111, 10 (2022), 3549–3592. https://doi.org/10.1007/s10994-021-06111-6

Mariana C de Souza, Bruno M Nogueira, Rafael G Rossi, Ricardo M Marcacini, and Solange O Rezende. 2021. A Heterogeneous Network-Based Positive and Unlabeled Learning Approach to Detect Fake News. In Intelligent Systems: 10th Brazilian Conference, BRACIS 2021, Virtual Event, November 29–December 3, 2021, Proceedings, Part II. Springer, online, 3–18. https://doi.org/10.1007/978-3-030-91699-2_1

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL 2019: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minnesota, 4171–4186. https://doi.org/10.18653/v1/N19-1423

Paulo do Carmo and Ricardo Marcacini. 2021. Embedding propagation over heterogeneous event networks for link prediction. In 2021 IEEE International Conference on Big Data (Big Data). IEEE, online, 4812–4821. https://doi.org/10.1109/BigData52589.2021.9671645

Frank Emmert-Streib and Matthias Dehmer. 2022. Taxonomy of machine learning paradigms: A data-centric perspective. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 12, 5 (2022), e1470. https://doi.org/10.1002/widm.1470

Tom Ganz, Inaam Ashraf, Martin Härterich, and Konrad Rieck. 2023. Detecting Backdoors in Collaboration Graphs of Software Repositories. In Proceedings of the Thirteenth Conference on Data and Application Security and Privacy. ACM, Charlotte, NC, USA, 189–200. https://doi.org/10.1145/3577923.3583657

Marcos Gôlo, Mariana Caravanti, Rafael Rossi, Solange Rezende, Bruno Nogueira, and Ricardo Marcacini. 2021. Learning textual representations from multiple modalities to detect fake news through one-class learning. In Proceedings of the Brazilian Symposium on Multimedia and the Web. ACM, Online, 197–204. https://doi.org/10.1145/3470482.3479634

Marcos Paulo Silva Gôlo, Mariana Caravanti de Souza, Rafael Geraldeli Rossi, Solange Oliveira Rezende, Bruno Magalhães Nogueira, and Ricardo Marcondes Marcacini. 2023. One-class learning for fake news detection through multimodal variational autoencoders. Engineering Applications of Artificial Intelligence 122 (2023), 106088. https://doi.org/10.1016/j.engappai.2023.106088

Qingyu Guo, Fuzhen Zhuang, Chuan Qin, Hengshu Zhu, Xing Xie, Hui Xiong, and Qing He. 2020. A survey on knowledge graph-based recommender systems. IEEE Transactions on Knowledge and Data Engineering 34, 8 (2020), 3549–3568. https://doi.org/10.1109/TKDE.2020.3028705

Wenzhong Guo, Jianwen Wang, and Shiping Wang. 2019. Deep multimodal representation learning: A survey. IEEE Access 7 (2019), 63373–63394. https://doi.org/10.1109/ACCESS.2019.2916887

Marcos Gôlo, Leonardo Moraes, Rudinei Goularte, and Ricardo Marcacini. 2023. One-Class Recommendation through Unsupervised Graph Neural Networks for Link Prediction. In 10th Symposium on Knowledge Discovery, Mining and Learning (KDMiLe). SBC, campinas, SP, Brazil, –. https://doi.org/10.5753/kdmile.2022.227810

Zeqi Huang, Yonghao Gu, and Qing Zhao. 2022. One-Class Directed Heterogeneous Graph Neural Network for Intrusion Detection. In 6th International Conference on Innovation in Artificial Intelligence (ICIAI). ACM, Guangzhou, China, 178–184. https://doi.org/10.1145/3529466.3529480

Peter Jakob, Manav Madan, Tobias Schmid-Schirling, and Abhinav Valada. 2021. Multi-perspective anomaly detection. Sensors 21, 16 (2021), 5311. https://doi.org/10.3390/s21165311

Shehroz S Khan and Michael G Madden. 2014. One-class classification: taxonomy of study and review of techniques. The Knowledge Engineering Review 29, 3 (2014), 345–374. https://doi.org/10.1017/S026988891300043X

Thomas N Kipf and Max Welling. 2016. Variational Graph Auto-Encoders. In NIPS Workshop on Bayesian Deep Learning. NIPS, Barcelona, Spain, 1–3. [link].

Thomas N. Kipf and Max Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. In International Conference on Learning Representations (ICLR). OpenReview, Toulon, France, 1–14.

Ashnil Kumar, Jinman Kim, Weidong Cai, Michael Fulham, and Dagan Feng. 2013. Content-based medical image retrieval: a survey of applications to multidimensional and multimodality data. Journal of digital imaging 26 (2013), 1025–1039. https://doi.org/10.1007/s10278-013-9619-2

Xiaojing Liu, Feiyu Gao, Qiong Zhang, and Huasha Zhao. 2019. Graph Convolution for Multimodal Information Extraction from Visually Rich Documents. In Proceedings of NAACL-HLT. Association for Computational Linguistics, Minneapolis, Minnesota, 32–39. https://doi.org/10.18653/v1/N19-2005

Joao Pedro Rodrigues Mattos and Ricardo M Marcacini. 2021. Semi-Supervised Graph Attention Networks for Event Representation Learning. In 2021 IEEE International Conference on Data Mining (ICDM). IEEE, online, 1234–1239. https://doi.org/10.1109/ICDM51629.2021.00150

Thien Nguyen and Ralph Grishman. 2018. Graph convolutional networks with argument-aware pooling for event detection. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32. AAAI, Vancouver, Canada, 5900–5907. https://doi.org/10.1609/aaai.v32i1.12039

Daniel Otter, Julian Medina, and Jugal Kalita. 2020. A survey of the usages of deep learning for natural language processing. IEEE Transactions on Neural Networks and Learning Systems 32, 2 (2020), 604–624. https://doi.org/10.1109/TNNLS.2020.2979670

Md Saidur Rahman. 2017. Basic graph theory. Vol. 9. Springer, online

Lukas Ruff, Robert Vandermeulen, Nico Goernitz, Lucas Deecke, Shoaib Ahmed Siddiqui, Alexander Binder, Emmanuel Müller, and Marius Kloft. 2018. Deep one-class classification. In International Conference on Machine Learning (ICML). PMLR, Stockholm, SWEDEN, 4393–4402. [link]

Manos Schinas, Symeon Papadopoulos, Georgios Petkos, Yiannis Kompatsiaris, and Pericles A Mitkas. 2015. Multimodal graph-based event detection and summarization in social media streams. In Proceedings of the 23rd ACM international conference on Multimedia. ACM, Brisbane, Australia, 189–192. https://doi.org/10.1145/2733373.2809933

Bernhard Schölkopf, John C Platt, John Shawe-Taylor, Alex J Smola, and Robert C Williamson. 2001. Estimating the support of a high-dimensional distribution. Neural computation 13, 7 (2001), 1443–1471. https://doi.org/10.1162/089976601750264965

David Martinus Johannes Tax. 2001. One-class classification: Concept learning in the absence of counter-examples. Ph. D. Dissertation. Technische Universiteit Delft. [link]

Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of machine learning research 9, 11 (2008), 2579–2605. [link]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017), 1–12. [link].

Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. 2018. Graph Attention Networks. In International Conference on Learning Representations. OpenReview, Vancouver, BC, Canada, 1–12. [link]

Xiao Wang, Deyu Bo, Chuan Shi, Shaohua Fan, Yanfang Ye, and S Yu Philip. 2022. A survey on heterogeneous graph embedding: methods, techniques, applications and sources. IEEE Transactions on Big Data 9 (2022), 415 – 436. https://doi.org/10.1109/TBDATA.2022.3177455

Xuhong Wang, Baihong Jin, Ying Du, Ping Cui, Yingshui Tan, and Yupu Yang. 2021. One-class graph neural networks for anomaly detection in attributed networks. Neural computing and applications 33, 18 (2021), 12073–12085. https://doi.org/10.1007/s00521-021-05924-9

Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and S Yu Philip. 2020. A comprehensive survey on graph neural networks. IEEE transactions on neural networks and learning systems 32, 1 (2020), 4–24. https://doi.org/10.1109/TNNLS.2020.2978386

Feng Xia, Ke Sun, Shuo Yu, Abdul Aziz, Liangtian Wan, Shirui Pan, and Huan Liu. 2021. Graph learning: A survey. IEEE Transactions on Artificial Intelligence 2, 2 (2021), 109–127. https://doi.org/10.1109/TAI.2021.3076021

Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. 2019. How Powerful are Graph Neural Networks?. In International Conference on Learning Representations. OpenReview, New Orleans, 1–17. [link]

Dengyong Zhou and Bernhard Schölkopf. 2004. A regularization framework for learning from graph data. In ICML 2004 Workshop on Statistical Relational Learning and Its Connections to Other Fields (SRL 2004). MPG Pure, Alberta, Canada, 132–137. [link].

Hanzhang Zhou and Kezhi Mao. 2022. Document-Level Event Argument Extraction by Leveraging Redundant Information and Closed Boundary Loss. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. ACL, Seattle, Washington, 3041–3052. https://doi.org/10.18653/v1/2022.naacl-main.222

Jie Zhou, Ganqu Cui, Shengding Hu, Zhengyan Zhang, Cheng Yang, Zhiyuan Liu, Lifeng Wang, Changcheng Li, and Maosong Sun. 2020. Graph neural networks: A review of methods and applications. AI Open 1 (2020), 57–81. https://doi.org/10.1016/j.aiopen.2021.01.001

Potapov, D., Douze, M., Harchaoui, Z., & Schmid, C. (2014). Category-specific video summarization. In Springer (Ed.), European Conference on Computer Vision (pp. 540-555). [S.l.]
GÔLO, Marcos Paulo Silva; DE MORAES, Marcelo Isaias; GOULARTE, Rudinei; MARCACINI, Ricardo Marcondes. On the Use of Early Fusion Operators on Heterogeneous Graph Neural Networks for One-Class Learning. In: SIMPÓSIO BRASILEIRO DE SISTEMAS MULTIMÍDIA E WEB (WEBMEDIA), 29. , 2023, Ribeirão Preto/SP. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2023 . p. 128–136.

Artigos mais lidos do(s) mesmo(s) autor(es)