ABSTRACT
Multimodal data fusion generates robust and unified representations considering supplementary and complementary information from different modalities, such as audio, image, and text. Different strategies for data fusion have been explored for decades, from simple concatenation-based strategies of the modalities’ features to the use of vector fusion operators (sum, average, subtraction, multiplication, etc.) between feature vectors in latent spaces of each modality. However, existing studies do not investigate multimodal fusion operators for heterogeneous graphs, which are powerful representations for modeling real-world data through a powerful structure that considers the different relations between different node types. Those representations are suited for important multimedia-related tasks, such as classification, recommendation, summarization, web sensing, and content-based retrieval. This paper presents a Graph Neural Network (GNN) method for heterogeneous graphs that explores different types of early fusion operators to deal with multiple modalities. Moreover, we evaluated the proposal’s performance with different early fusion operators considering one-class learning, a popular learning approach for real-world applications. A statistical analysis of the experimental results shows that early fusion operators improve the f1-Score when considering GNNs from heterogeneous graphs. We highlight the subtraction, multiplication, and minimum operators outperforming the other operators. Thus, we argue that our early-fusion operators’ proposal in heterogeneous graph neural networks leads to improved performance and is also a competitive alternative to the well-often-used concatenation technique or costly hand-based approaches of combining different modalities.
- Shamshe Alam, Sanjay Kumar Sonbhadra, Sonali Agarwal, and P Nagabhushan. 2020. One-class support vector classifiers: A survey. Knowledge-Based Systems 196 (2020), 105754. https://doi.org/10.1016/j.knosys.2020.105754Google ScholarCross Ref
- Pradeep K Atrey, M Anwar Hossain, Abdulmotaleb El Saddik, and Mohan S Kankanhalli. 2010. Multimodal fusion for multimedia analysis: a survey. Multimedia systems 16 (2010), 345–379. https://doi.org/10.1007/s00530-010-0182-0Google ScholarDigital Library
- Tadas Baltrušaitis, Chaitanya Ahuja, and Louis-Philippe Morency. 2018. Multimodal machine learning: A survey and taxonomy. IEEE transactions on pattern analysis and machine intelligence 41, 2 (2018), 423–443. https://doi.org/10.1109/TPAMI.2018.2798607Google ScholarDigital Library
- Antonio AR Beserra and Rudinei Goularte. 2023. Multimodal early fusion operators for temporal video scene segmentation tasks. Multimedia Tools and Applications 82 (2023), 1–18. https://doi.org/10.1007/s11042-023-14953-6Google ScholarDigital Library
- Antonio AR Beserra, Rodrigo M Kishi, and Rudinei Goularte. 2020. Evaluating Early Fusion Operators at Mid-Level Feature Space. In Proceedings of the Brazilian Symposium on Multimedia and the Web. ACM, online, 113–120. https://doi.org/10.1145/3428658.3431079Google ScholarDigital Library
- Antonio Alessandro Rocha Beserra. 2022. Operadores de fusão prévia para segmentação temporal de vídeo em cenas. Master’s thesis. Universidade de São Paulo. https://www.teses.usp.br/teses/disponiveis/55/55134/tde-07022023-152229/en.phpGoogle Scholar
- Angelo da Silva, Marcos Gôlo, and Ricardo Marcacini. 2023. Unsupervised Heterogeneous Graph Neural Network for Hit Song Prediction through One Class Learning. In 10th Symposium on Knowledge Discovery, Mining and Learning (KDMiLe). SBC, Campinas, SP, Brazil, –. https://doi.org/10.5753/kdmile.2022.227954Google ScholarCross Ref
- Mariana Caravanti de Souza, Bruno Magalhães Nogueira, Rafael Geraldeli Rossi, Ricardo Marcondes Marcacini, Brucce Neves Dos Santos, and Solange Oliveira Rezende. 2022. A network-based positive and unlabeled learning approach for fake news detection. Machine Learning 111, 10 (2022), 3549–3592. https://doi.org/10.1007/s10994-021-06111-6Google ScholarDigital Library
- Mariana C de Souza, Bruno M Nogueira, Rafael G Rossi, Ricardo M Marcacini, and Solange O Rezende. 2021. A Heterogeneous Network-Based Positive and Unlabeled Learning Approach to Detect Fake News. In Intelligent Systems: 10th Brazilian Conference, BRACIS 2021, Virtual Event, November 29–December 3, 2021, Proceedings, Part II. Springer, online, 3–18. https://doi.org/10.1007/978-3-030-91699-2_1Google ScholarDigital Library
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL 2019: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minnesota, 4171–4186. https://doi.org/10.18653/v1/N19-1423Google ScholarCross Ref
- Paulo do Carmo and Ricardo Marcacini. 2021. Embedding propagation over heterogeneous event networks for link prediction. In 2021 IEEE International Conference on Big Data (Big Data). IEEE, online, 4812–4821. https://doi.org/10.1109/BigData52589.2021.9671645Google ScholarCross Ref
- Frank Emmert-Streib and Matthias Dehmer. 2022. Taxonomy of machine learning paradigms: A data-centric perspective. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 12, 5 (2022), e1470. https://doi.org/10.1002/widm.1470Google ScholarCross Ref
- Tom Ganz, Inaam Ashraf, Martin Härterich, and Konrad Rieck. 2023. Detecting Backdoors in Collaboration Graphs of Software Repositories. In Proceedings of the Thirteenth Conference on Data and Application Security and Privacy. ACM, Charlotte, NC, USA, 189–200. https://doi.org/10.1145/3577923.3583657Google ScholarDigital Library
- Marcos Gôlo, Mariana Caravanti, Rafael Rossi, Solange Rezende, Bruno Nogueira, and Ricardo Marcacini. 2021. Learning textual representations from multiple modalities to detect fake news through one-class learning. In Proceedings of the Brazilian Symposium on Multimedia and the Web. ACM, Online, 197–204. https://doi.org/10.1145/3470482.3479634Google ScholarDigital Library
- Marcos Paulo Silva Gôlo, Mariana Caravanti de Souza, Rafael Geraldeli Rossi, Solange Oliveira Rezende, Bruno Magalhães Nogueira, and Ricardo Marcondes Marcacini. 2023. One-class learning for fake news detection through multimodal variational autoencoders. Engineering Applications of Artificial Intelligence 122 (2023), 106088. https://doi.org/10.1016/j.engappai.2023.106088Google ScholarDigital Library
- Qingyu Guo, Fuzhen Zhuang, Chuan Qin, Hengshu Zhu, Xing Xie, Hui Xiong, and Qing He. 2020. A survey on knowledge graph-based recommender systems. IEEE Transactions on Knowledge and Data Engineering 34, 8 (2020), 3549–3568. https://doi.org/10.1109/TKDE.2020.3028705Google ScholarCross Ref
- Wenzhong Guo, Jianwen Wang, and Shiping Wang. 2019. Deep multimodal representation learning: A survey. IEEE Access 7 (2019), 63373–63394. https://doi.org/10.1109/ACCESS.2019.2916887Google ScholarCross Ref
- Marcos Gôlo, Leonardo Moraes, Rudinei Goularte, and Ricardo Marcacini. 2023. One-Class Recommendation through Unsupervised Graph Neural Networks for Link Prediction. In 10th Symposium on Knowledge Discovery, Mining and Learning (KDMiLe). SBC, campinas, SP, Brazil, –. https://doi.org/10.5753/kdmile.2022.227810Google ScholarCross Ref
- Zeqi Huang, Yonghao Gu, and Qing Zhao. 2022. One-Class Directed Heterogeneous Graph Neural Network for Intrusion Detection. In 6th International Conference on Innovation in Artificial Intelligence (ICIAI). ACM, Guangzhou, China, 178–184. https://doi.org/10.1145/3529466.3529480Google ScholarDigital Library
- Peter Jakob, Manav Madan, Tobias Schmid-Schirling, and Abhinav Valada. 2021. Multi-perspective anomaly detection. Sensors 21, 16 (2021), 5311. https://doi.org/10.3390/s21165311Google ScholarCross Ref
- Shehroz S Khan and Michael G Madden. 2014. One-class classification: taxonomy of study and review of techniques. The Knowledge Engineering Review 29, 3 (2014), 345–374. https://doi.org/10.1017/S026988891300043XGoogle ScholarCross Ref
- Thomas N Kipf and Max Welling. 2016. Variational Graph Auto-Encoders. In NIPS Workshop on Bayesian Deep Learning. NIPS, Barcelona, Spain, 1–3. http://bayesiandeeplearning.org/2016/papers/BDL_16.pdfGoogle Scholar
- Thomas N. Kipf and Max Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. In International Conference on Learning Representations (ICLR). OpenReview, Toulon, France, 1–14. https://openreview.net/forum?id=SJU4ayYglGoogle Scholar
- Ashnil Kumar, Jinman Kim, Weidong Cai, Michael Fulham, and Dagan Feng. 2013. Content-based medical image retrieval: a survey of applications to multidimensional and multimodality data. Journal of digital imaging 26 (2013), 1025–1039. https://doi.org/10.1007/s10278-013-9619-2Google ScholarCross Ref
- Xiaojing Liu, Feiyu Gao, Qiong Zhang, and Huasha Zhao. 2019. Graph Convolution for Multimodal Information Extraction from Visually Rich Documents. In Proceedings of NAACL-HLT. Association for Computational Linguistics, Minneapolis, Minnesota, 32–39. https://doi.org/10.18653/v1/N19-2005Google ScholarCross Ref
- Joao Pedro Rodrigues Mattos and Ricardo M Marcacini. 2021. Semi-Supervised Graph Attention Networks for Event Representation Learning. In 2021 IEEE International Conference on Data Mining (ICDM). IEEE, online, 1234–1239. https://doi.org/10.1109/ICDM51629.2021.00150Google ScholarCross Ref
- Thien Nguyen and Ralph Grishman. 2018. Graph convolutional networks with argument-aware pooling for event detection. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32. AAAI, Vancouver, Canada, 5900–5907. https://doi.org/10.1609/aaai.v32i1.12039Google ScholarCross Ref
- Daniel Otter, Julian Medina, and Jugal Kalita. 2020. A survey of the usages of deep learning for natural language processing. IEEE Transactions on Neural Networks and Learning Systems 32, 2 (2020), 604–624. https://doi.org/10.1109/TNNLS.2020.2979670Google ScholarCross Ref
- Md Saidur Rahman. 2017. Basic graph theory. Vol. 9. Springer, online.Google Scholar
- Lukas Ruff, Robert Vandermeulen, Nico Goernitz, Lucas Deecke, Shoaib Ahmed Siddiqui, Alexander Binder, Emmanuel Müller, and Marius Kloft. 2018. Deep one-class classification. In International Conference on Machine Learning (ICML). PMLR, Stockholm, SWEDEN, 4393–4402. https://proceedings.mlr.press/v80/ruff18a.htmlGoogle Scholar
- Manos Schinas, Symeon Papadopoulos, Georgios Petkos, Yiannis Kompatsiaris, and Pericles A Mitkas. 2015. Multimodal graph-based event detection and summarization in social media streams. In Proceedings of the 23rd ACM international conference on Multimedia. ACM, Brisbane, Australia, 189–192. https://doi.org/10.1145/2733373.2809933Google ScholarDigital Library
- Bernhard Schölkopf, John C Platt, John Shawe-Taylor, Alex J Smola, and Robert C Williamson. 2001. Estimating the support of a high-dimensional distribution. Neural computation 13, 7 (2001), 1443–1471. https://doi.org/10.1162/089976601750264965Google ScholarDigital Library
- David Martinus Johannes Tax. 2001. One-class classification: Concept learning in the absence of counter-examples. Ph. D. Dissertation. Technische Universiteit Delft. http://homepage.tudelft.nl/n9d04/thesis.pdfGoogle Scholar
- Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of machine learning research 9, 11 (2008), 2579–2605. http://jmlr.org/papers/v9/vandermaaten08a.htmlGoogle ScholarDigital Library
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017), 1–12. https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdfGoogle Scholar
- Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. 2018. Graph Attention Networks. In International Conference on Learning Representations. OpenReview, Vancouver, BC, Canada, 1–12. https://openreview.net/forum?id=rJXMpikCZGoogle Scholar
- Xiao Wang, Deyu Bo, Chuan Shi, Shaohua Fan, Yanfang Ye, and S Yu Philip. 2022. A survey on heterogeneous graph embedding: methods, techniques, applications and sources. IEEE Transactions on Big Data 9 (2022), 415 – 436. https://doi.org/10.1109/TBDATA.2022.3177455Google ScholarCross Ref
- Xuhong Wang, Baihong Jin, Ying Du, Ping Cui, Yingshui Tan, and Yupu Yang. 2021. One-class graph neural networks for anomaly detection in attributed networks. Neural computing and applications 33, 18 (2021), 12073–12085. https://doi.org/10.1007/s00521-021-05924-9Google ScholarDigital Library
- Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and S Yu Philip. 2020. A comprehensive survey on graph neural networks. IEEE transactions on neural networks and learning systems 32, 1 (2020), 4–24. https://doi.org/10.1109/TNNLS.2020.2978386Google ScholarCross Ref
- Feng Xia, Ke Sun, Shuo Yu, Abdul Aziz, Liangtian Wan, Shirui Pan, and Huan Liu. 2021. Graph learning: A survey. IEEE Transactions on Artificial Intelligence 2, 2 (2021), 109–127. https://doi.org/10.1109/TAI.2021.3076021Google ScholarCross Ref
- Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. 2019. How Powerful are Graph Neural Networks?. In International Conference on Learning Representations. OpenReview, New Orleans, 1–17. https://openreview.net/forum?id=ryGs6iA5KmGoogle Scholar
- Dengyong Zhou and Bernhard Schölkopf. 2004. A regularization framework for learning from graph data. In ICML 2004 Workshop on Statistical Relational Learning and Its Connections to Other Fields (SRL 2004). MPG Pure, Alberta, Canada, 132–137. https://www.microsoft.com/en-us/research/publication/regularization-framework-learning-graph-data/Google Scholar
- Hanzhang Zhou and Kezhi Mao. 2022. Document-Level Event Argument Extraction by Leveraging Redundant Information and Closed Boundary Loss. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. ACL, Seattle, Washington, 3041–3052. https://doi.org/10.18653/v1/2022.naacl-main.222Google ScholarCross Ref
- Jie Zhou, Ganqu Cui, Shengding Hu, Zhengyan Zhang, Cheng Yang, Zhiyuan Liu, Lifeng Wang, Changcheng Li, and Maosong Sun. 2020. Graph neural networks: A review of methods and applications. AI Open 1 (2020), 57–81. https://doi.org/10.1016/j.aiopen.2021.01.001Google ScholarCross Ref
Index Terms
- On the Use of Early Fusion Operators on Heterogeneous Graph Neural Networks for One-Class Learning
Recommendations
On Comparing Early and Late Fusion Methods
Advances in Computational IntelligenceAbstractThis paper presents a theoretical comparison of early and late fusion methods. An initial discussion on the conditions to apply early or late (soft or hard) fusion is introduced. The analysis show that, if large training sets are available, early ...
Evaluating Early Fusion Operators at Mid-Level Feature Space
WebMedia '20: Proceedings of the Brazilian Symposium on Multimedia and the WebEarly fusion techniques have been proposed in video analysis tasks as a way to improve efficacy by generating compact data models capable of keeping semantic clues present on multimodal data. First attempts to fuse multimodal data employed fusion ...
Early versus late fusion in semantic video analysis
MULTIMEDIA '05: Proceedings of the 13th annual ACM international conference on MultimediaSemantic analysis of multimodal video aims to index segments of interest at a conceptual level. In reaching this goal, it requires an analysis of several information streams. At some point in the analysis these streams need to be fused. In this paper, ...
Comments