research-article

On the Use of Early Fusion Operators on Heterogeneous Graph Neural Networks for One-Class Learning

Authors:
Marcos Paulo Silva Gôlo

Institute of Mathematics and Computer Sciences, University of São Paulo (USP), Brazil

Institute of Mathematics and Computer Sciences, University of São Paulo (USP), Brazil

0000-0002-9093-8195
View Profile

,
Marcelo Isaias De Moraes

Institute of Mathematics and Computer Sciences, University of São Paulo (USP), Brazil

Institute of Mathematics and Computer Sciences, University of São Paulo (USP), Brazil

0000-0002-7831-2165
View Profile

,
Rudinei Goularte

Institute of Mathematics and Computer Sciences, University of São Paulo (USP), Brazil

Institute of Mathematics and Computer Sciences, University of São Paulo (USP), Brazil

0000-0003-1531-1576
View Profile

,
Ricardo Marcondes Marcacini

Institute of Mathematics and Computer Sciences, University of São Paulo (USP), Brazil

Institute of Mathematics and Computer Sciences, University of São Paulo (USP), Brazil

0000-0002-2309-3487
View Profile

WebMedia '23: Proceedings of the 29th Brazilian Symposium on Multimedia and the WebOctober 2023Pages 128–136https://doi.org/10.1145/3617023.3617041

Published:23 October 2023Publication History

WebMedia '23: Proceedings of the 29th Brazilian Symposium on Multimedia and the Web

Pages 128–136

ABSTRACT

Multimodal data fusion generates robust and unified representations considering supplementary and complementary information from different modalities, such as audio, image, and text. Different strategies for data fusion have been explored for decades, from simple concatenation-based strategies of the modalities’ features to the use of vector fusion operators (sum, average, subtraction, multiplication, etc.) between feature vectors in latent spaces of each modality. However, existing studies do not investigate multimodal fusion operators for heterogeneous graphs, which are powerful representations for modeling real-world data through a powerful structure that considers the different relations between different node types. Those representations are suited for important multimedia-related tasks, such as classification, recommendation, summarization, web sensing, and content-based retrieval. This paper presents a Graph Neural Network (GNN) method for heterogeneous graphs that explores different types of early fusion operators to deal with multiple modalities. Moreover, we evaluated the proposal’s performance with different early fusion operators considering one-class learning, a popular learning approach for real-world applications. A statistical analysis of the experimental results shows that early fusion operators improve the f1-Score when considering GNNs from heterogeneous graphs. We highlight the subtraction, multiplication, and minimum operators outperforming the other operators. Thus, we argue that our early-fusion operators’ proposal in heterogeneous graph neural networks leads to improved performance and is also a competitive alternative to the well-often-used concatenation technique or costly hand-based approaches of combining different modalities.

References

Shamshe Alam, Sanjay Kumar Sonbhadra, Sonali Agarwal, and P Nagabhushan. 2020. One-class support vector classifiers: A survey. Knowledge-Based Systems 196 (2020), 105754. https://doi.org/10.1016/j.knosys.2020.105754Google ScholarCross Ref
Pradeep K Atrey, M Anwar Hossain, Abdulmotaleb El Saddik, and Mohan S Kankanhalli. 2010. Multimodal fusion for multimedia analysis: a survey. Multimedia systems 16 (2010), 345–379. https://doi.org/10.1007/s00530-010-0182-0Google ScholarDigital Library
Tadas Baltrušaitis, Chaitanya Ahuja, and Louis-Philippe Morency. 2018. Multimodal machine learning: A survey and taxonomy. IEEE transactions on pattern analysis and machine intelligence 41, 2 (2018), 423–443. https://doi.org/10.1109/TPAMI.2018.2798607Google ScholarDigital Library
Antonio AR Beserra and Rudinei Goularte. 2023. Multimodal early fusion operators for temporal video scene segmentation tasks. Multimedia Tools and Applications 82 (2023), 1–18. https://doi.org/10.1007/s11042-023-14953-6Google ScholarDigital Library
Antonio AR Beserra, Rodrigo M Kishi, and Rudinei Goularte. 2020. Evaluating Early Fusion Operators at Mid-Level Feature Space. In Proceedings of the Brazilian Symposium on Multimedia and the Web. ACM, online, 113–120. https://doi.org/10.1145/3428658.3431079Google ScholarDigital Library
Antonio Alessandro Rocha Beserra. 2022. Operadores de fusão prévia para segmentação temporal de vídeo em cenas. Master’s thesis. Universidade de São Paulo. https://www.teses.usp.br/teses/disponiveis/55/55134/tde-07022023-152229/en.phpGoogle Scholar
Angelo da Silva, Marcos Gôlo, and Ricardo Marcacini. 2023. Unsupervised Heterogeneous Graph Neural Network for Hit Song Prediction through One Class Learning. In 10th Symposium on Knowledge Discovery, Mining and Learning (KDMiLe). SBC, Campinas, SP, Brazil, –. https://doi.org/10.5753/kdmile.2022.227954Google ScholarCross Ref
Mariana Caravanti de Souza, Bruno Magalhães Nogueira, Rafael Geraldeli Rossi, Ricardo Marcondes Marcacini, Brucce Neves Dos Santos, and Solange Oliveira Rezende. 2022. A network-based positive and unlabeled learning approach for fake news detection. Machine Learning 111, 10 (2022), 3549–3592. https://doi.org/10.1007/s10994-021-06111-6Google ScholarDigital Library
Mariana C de Souza, Bruno M Nogueira, Rafael G Rossi, Ricardo M Marcacini, and Solange O Rezende. 2021. A Heterogeneous Network-Based Positive and Unlabeled Learning Approach to Detect Fake News. In Intelligent Systems: 10th Brazilian Conference, BRACIS 2021, Virtual Event, November 29–December 3, 2021, Proceedings, Part II. Springer, online, 3–18. https://doi.org/10.1007/978-3-030-91699-2_1Google ScholarDigital Library
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL 2019: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minnesota, 4171–4186. https://doi.org/10.18653/v1/N19-1423Google ScholarCross Ref
Paulo do Carmo and Ricardo Marcacini. 2021. Embedding propagation over heterogeneous event networks for link prediction. In 2021 IEEE International Conference on Big Data (Big Data). IEEE, online, 4812–4821. https://doi.org/10.1109/BigData52589.2021.9671645Google ScholarCross Ref
Frank Emmert-Streib and Matthias Dehmer. 2022. Taxonomy of machine learning paradigms: A data-centric perspective. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 12, 5 (2022), e1470. https://doi.org/10.1002/widm.1470Google ScholarCross Ref
Tom Ganz, Inaam Ashraf, Martin Härterich, and Konrad Rieck. 2023. Detecting Backdoors in Collaboration Graphs of Software Repositories. In Proceedings of the Thirteenth Conference on Data and Application Security and Privacy. ACM, Charlotte, NC, USA, 189–200. https://doi.org/10.1145/3577923.3583657Google ScholarDigital Library
Marcos Gôlo, Mariana Caravanti, Rafael Rossi, Solange Rezende, Bruno Nogueira, and Ricardo Marcacini. 2021. Learning textual representations from multiple modalities to detect fake news through one-class learning. In Proceedings of the Brazilian Symposium on Multimedia and the Web. ACM, Online, 197–204. https://doi.org/10.1145/3470482.3479634Google ScholarDigital Library
Marcos Paulo Silva Gôlo, Mariana Caravanti de Souza, Rafael Geraldeli Rossi, Solange Oliveira Rezende, Bruno Magalhães Nogueira, and Ricardo Marcondes Marcacini. 2023. One-class learning for fake news detection through multimodal variational autoencoders. Engineering Applications of Artificial Intelligence 122 (2023), 106088. https://doi.org/10.1016/j.engappai.2023.106088Google ScholarDigital Library
Qingyu Guo, Fuzhen Zhuang, Chuan Qin, Hengshu Zhu, Xing Xie, Hui Xiong, and Qing He. 2020. A survey on knowledge graph-based recommender systems. IEEE Transactions on Knowledge and Data Engineering 34, 8 (2020), 3549–3568. https://doi.org/10.1109/TKDE.2020.3028705Google ScholarCross Ref
Wenzhong Guo, Jianwen Wang, and Shiping Wang. 2019. Deep multimodal representation learning: A survey. IEEE Access 7 (2019), 63373–63394. https://doi.org/10.1109/ACCESS.2019.2916887Google ScholarCross Ref
Marcos Gôlo, Leonardo Moraes, Rudinei Goularte, and Ricardo Marcacini. 2023. One-Class Recommendation through Unsupervised Graph Neural Networks for Link Prediction. In 10th Symposium on Knowledge Discovery, Mining and Learning (KDMiLe). SBC, campinas, SP, Brazil, –. https://doi.org/10.5753/kdmile.2022.227810Google ScholarCross Ref
Zeqi Huang, Yonghao Gu, and Qing Zhao. 2022. One-Class Directed Heterogeneous Graph Neural Network for Intrusion Detection. In 6th International Conference on Innovation in Artificial Intelligence (ICIAI). ACM, Guangzhou, China, 178–184. https://doi.org/10.1145/3529466.3529480Google ScholarDigital Library
Peter Jakob, Manav Madan, Tobias Schmid-Schirling, and Abhinav Valada. 2021. Multi-perspective anomaly detection. Sensors 21, 16 (2021), 5311. https://doi.org/10.3390/s21165311Google ScholarCross Ref
Shehroz S Khan and Michael G Madden. 2014. One-class classification: taxonomy of study and review of techniques. The Knowledge Engineering Review 29, 3 (2014), 345–374. https://doi.org/10.1017/S026988891300043XGoogle ScholarCross Ref
Thomas N Kipf and Max Welling. 2016. Variational Graph Auto-Encoders. In NIPS Workshop on Bayesian Deep Learning. NIPS, Barcelona, Spain, 1–3. http://bayesiandeeplearning.org/2016/papers/BDL_16.pdfGoogle Scholar
Thomas N. Kipf and Max Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. In International Conference on Learning Representations (ICLR). OpenReview, Toulon, France, 1–14. https://openreview.net/forum?id=SJU4ayYglGoogle Scholar
Ashnil Kumar, Jinman Kim, Weidong Cai, Michael Fulham, and Dagan Feng. 2013. Content-based medical image retrieval: a survey of applications to multidimensional and multimodality data. Journal of digital imaging 26 (2013), 1025–1039. https://doi.org/10.1007/s10278-013-9619-2Google ScholarCross Ref
Xiaojing Liu, Feiyu Gao, Qiong Zhang, and Huasha Zhao. 2019. Graph Convolution for Multimodal Information Extraction from Visually Rich Documents. In Proceedings of NAACL-HLT. Association for Computational Linguistics, Minneapolis, Minnesota, 32–39. https://doi.org/10.18653/v1/N19-2005Google ScholarCross Ref
Joao Pedro Rodrigues Mattos and Ricardo M Marcacini. 2021. Semi-Supervised Graph Attention Networks for Event Representation Learning. In 2021 IEEE International Conference on Data Mining (ICDM). IEEE, online, 1234–1239. https://doi.org/10.1109/ICDM51629.2021.00150Google ScholarCross Ref
Thien Nguyen and Ralph Grishman. 2018. Graph convolutional networks with argument-aware pooling for event detection. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32. AAAI, Vancouver, Canada, 5900–5907. https://doi.org/10.1609/aaai.v32i1.12039Google ScholarCross Ref
Daniel Otter, Julian Medina, and Jugal Kalita. 2020. A survey of the usages of deep learning for natural language processing. IEEE Transactions on Neural Networks and Learning Systems 32, 2 (2020), 604–624. https://doi.org/10.1109/TNNLS.2020.2979670Google ScholarCross Ref
Md Saidur Rahman. 2017. Basic graph theory. Vol. 9. Springer, online.Google Scholar
Lukas Ruff, Robert Vandermeulen, Nico Goernitz, Lucas Deecke, Shoaib Ahmed Siddiqui, Alexander Binder, Emmanuel Müller, and Marius Kloft. 2018. Deep one-class classification. In International Conference on Machine Learning (ICML). PMLR, Stockholm, SWEDEN, 4393–4402. https://proceedings.mlr.press/v80/ruff18a.htmlGoogle Scholar
Manos Schinas, Symeon Papadopoulos, Georgios Petkos, Yiannis Kompatsiaris, and Pericles A Mitkas. 2015. Multimodal graph-based event detection and summarization in social media streams. In Proceedings of the 23rd ACM international conference on Multimedia. ACM, Brisbane, Australia, 189–192. https://doi.org/10.1145/2733373.2809933Google ScholarDigital Library
Bernhard Schölkopf, John C Platt, John Shawe-Taylor, Alex J Smola, and Robert C Williamson. 2001. Estimating the support of a high-dimensional distribution. Neural computation 13, 7 (2001), 1443–1471. https://doi.org/10.1162/089976601750264965Google ScholarDigital Library
David Martinus Johannes Tax. 2001. One-class classification: Concept learning in the absence of counter-examples. Ph. D. Dissertation. Technische Universiteit Delft. http://homepage.tudelft.nl/n9d04/thesis.pdfGoogle Scholar
Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of machine learning research 9, 11 (2008), 2579–2605. http://jmlr.org/papers/v9/vandermaaten08a.htmlGoogle ScholarDigital Library
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017), 1–12. https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdfGoogle Scholar
Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. 2018. Graph Attention Networks. In International Conference on Learning Representations. OpenReview, Vancouver, BC, Canada, 1–12. https://openreview.net/forum?id=rJXMpikCZGoogle Scholar
Xiao Wang, Deyu Bo, Chuan Shi, Shaohua Fan, Yanfang Ye, and S Yu Philip. 2022. A survey on heterogeneous graph embedding: methods, techniques, applications and sources. IEEE Transactions on Big Data 9 (2022), 415 – 436. https://doi.org/10.1109/TBDATA.2022.3177455Google ScholarCross Ref
Xuhong Wang, Baihong Jin, Ying Du, Ping Cui, Yingshui Tan, and Yupu Yang. 2021. One-class graph neural networks for anomaly detection in attributed networks. Neural computing and applications 33, 18 (2021), 12073–12085. https://doi.org/10.1007/s00521-021-05924-9Google ScholarDigital Library
Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and S Yu Philip. 2020. A comprehensive survey on graph neural networks. IEEE transactions on neural networks and learning systems 32, 1 (2020), 4–24. https://doi.org/10.1109/TNNLS.2020.2978386Google ScholarCross Ref
Feng Xia, Ke Sun, Shuo Yu, Abdul Aziz, Liangtian Wan, Shirui Pan, and Huan Liu. 2021. Graph learning: A survey. IEEE Transactions on Artificial Intelligence 2, 2 (2021), 109–127. https://doi.org/10.1109/TAI.2021.3076021Google ScholarCross Ref
Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. 2019. How Powerful are Graph Neural Networks?. In International Conference on Learning Representations. OpenReview, New Orleans, 1–17. https://openreview.net/forum?id=ryGs6iA5KmGoogle Scholar
Dengyong Zhou and Bernhard Schölkopf. 2004. A regularization framework for learning from graph data. In ICML 2004 Workshop on Statistical Relational Learning and Its Connections to Other Fields (SRL 2004). MPG Pure, Alberta, Canada, 132–137. https://www.microsoft.com/en-us/research/publication/regularization-framework-learning-graph-data/Google Scholar
Hanzhang Zhou and Kezhi Mao. 2022. Document-Level Event Argument Extraction by Leveraging Redundant Information and Closed Boundary Loss. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. ACL, Seattle, Washington, 3041–3052. https://doi.org/10.18653/v1/2022.naacl-main.222Google ScholarCross Ref
Jie Zhou, Ganqu Cui, Shengding Hu, Zhengyan Zhang, Cheng Yang, Zhiyuan Liu, Lifeng Wang, Changcheng Li, and Maosong Sun. 2020. Graph neural networks: A review of methods and applications. AI Open 1 (2020), 57–81. https://doi.org/10.1016/j.aiopen.2021.01.001Google ScholarCross Ref

Index Terms

On the Use of Early Fusion Operators on Heterogeneous Graph Neural Networks for One-Class Learning
1. Computing methodologies
  1. Artificial intelligence
  2. Machine learning
    1. Machine learning approaches
      1. Neural networks

Recommendations

On Comparing Early and Late Fusion Methods
Advances in Computational Intelligence
Abstract
This paper presents a theoretical comparison of early and late fusion methods. An initial discussion on the conditions to apply early or late (soft or hard) fusion is introduced. The analysis show that, if large training sets are available, early ...
Read More
Evaluating Early Fusion Operators at Mid-Level Feature Space
WebMedia '20: Proceedings of the Brazilian Symposium on Multimedia and the Web

Early fusion techniques have been proposed in video analysis tasks as a way to improve efficacy by generating compact data models capable of keeping semantic clues present on multimodal data. First attempts to fuse multimodal data employed fusion ...
Read More
Early versus late fusion in semantic video analysis
MULTIMEDIA '05: Proceedings of the 13th annual ACM international conference on Multimedia

Semantic analysis of multimodal video aims to index segments of interest at a conceptual level. In reaching this goal, it requires an analysis of several information streams. At some point in the analysis these streams need to be fused. In this paper, ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

WebMedia '23: Proceedings of the 29th Brazilian Symposium on Multimedia and the Web
October 2023
285 pages
ISBN:9798400709081
DOI:10.1145/3617023

Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 23 October 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Early Fusion
Heterogeneous Graphs
One-Class Learning
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate270of873submissions,31%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 35
  Total Downloads
- Downloads (Last 12 months)35
- Downloads (Last 6 weeks)7
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

On the Use of Early Fusion Operators on Heterogeneous Graph Neural Networks for One-Class Learning

WebMedia '23: Proceedings of the 29th Brazilian Symposium on Multimedia and the Web

ABSTRACT

References

Cited By

Index Terms

Recommendations

On Comparing Early and Late Fusion Methods

Evaluating Early Fusion Operators at Mid-Level Feature Space

Early versus late fusion in semantic video analysis

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

On the Use of Early Fusion Operators on Heterogeneous Graph Neural Networks for One-Class Learning

WebMedia '23: Proceedings of the 29th Brazilian Symposium on Multimedia and the Web

ABSTRACT

References

Cited By

Index Terms

Recommendations

On Comparing Early and Late Fusion Methods

Evaluating Early Fusion Operators at Mid-Level Feature Space

Early versus late fusion in semantic video analysis

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media