Multimodal intent classification with incomplete modalities using text embedding propagation

Victor Machado Gonzaga; Nils Murrugarra-Llerena; Ricardo Marcacini

Victor Machado Gonzaga USP
Nils Murrugarra-Llerena Snap Research
Ricardo Marcacini USP

Resumo

Determining the author's intent in a social media post is a challenging multimodal task and requires identifying complex relationships between image and text in the post. For example, the post image can represent an object, person, product, or company, while the text can be an ironic message about the image content. Similarly, a text can be a news headline, while the image represents a provocation, meme, or satire about the news. Existing approaches propose intent classification techniques combining both modalities. However, some posts may have missing textual annotations. Hence, we investigate a graph-based approach that propagates available text embedding data from complete multimodal posts to incomplete ones. This paper presents a text embedding propagation method, which transfers embeddings from BERT neural language models to image-only posts (i.e., posts with incomplete modality) considering the topology of a graph constructed from both visual and textual modalities available during the training step. By using this inference approach, our method provides competitive results when textual modality is available at different completeness levels, even compared to reference methods that require complete modalities.

Palavras-chave: social networks, multimodal learning, network embedding

Multimodal intent classification with incomplete modalities using text embedding propagation

Resumo

Artigos mais lidos do(s) mesmo(s) autor(es)