TF-MVSA: Multimodal Video Sentiment Analysis Tool using Transfer Learning

Victor Akihito Kamada Tomita; Ricardo Marcondes Marcacini

doi:10.5753/webmedia_estendido.2023.235544

Victor Akihito Kamada Tomita USP
Ricardo Marcondes Marcacini USP

DOI: https://doi.org/10.5753/webmedia_estendido.2023.235544

Resumo

Existing methods for sentiment analysis in videos rely on extensive training on large labeled datasets, making them expensive and impractical for real-world applications. This challenge becomes even more complex when dealing with labeled data in different modalities. To address these limitations, we proposed a transfer learning method and a computational tool that leverage pre-trained models for each modality and employ modality consensus to automatically annotate video segments. Our tool implements neural networks with attention mechanisms to learn the significance of each modality during the learning process. The experimental results demonstrate that our tool surpasses unimodal methods and remains competitive with multimodal approaches, even when labeled data for analyzing new videos are unavailable. Moreover, the tool is publicly available, thereby serving as a competitive baseline for similar multimodal sentiment analysis methods.

Palavras-chave: video sentiment analysis, transfer learning, multimodal learning

Referências

Octavio Arriaga, Matias Valdenegro-Toro, and Paul Plöger. 2019. Realtime Convolutional Neural Networks for emotion and gender classification. In 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019. 221–226.

Ringki Das and Thoudam Doren Singh. 2023. Multimodal sentiment analysis: A survey of methods, trends and challenges. Comput. Surveys (2023).

Bernard J Jansen, Mimi Zhang, Kate Sobel, and Abdur Chowdury. 2009. Twitter power: Tweets as electronic word of mouth. Journal of the American society for information science and technology 60, 11 (2009).

Taeyong Kim and Bowon Lee. 2020. Multi-attention multimodal sentiment analysis. In International Conference on Multimedia Retrieval. 436–441.

Louis-Philippe Morency, Rada Mihalcea, and Payal Doshi. 2011. Towards multimodal sentiment analysis: Harvesting opinions from the web. In Proceedings of the 13th international conference on multimodal interfaces. 169–176.

Myriam Munezero, Calkin Suero Montero, Erkki Sutinen, and John Pajunen. 2014. Are they different? Affect, feeling, emotion, sentiment, and opinion detection in text. IEEE transactions on affective computing 5, 2 (2014), 101–111.

Zhaoyang Niu, Guoqiang Zhong, and Hui Yu. 2021. A review on the attention mechanism of deep learning. Neurocomputing 452 (2021), 48–62.

Soujanya Poria, Devamanyu Hazarika, Navonil Majumder, Gautam Naik, Erik Cambria, and Rada Mihalcea. 2018. Meld: A multimodal multi-party dataset for emotion recognition in conversations. arXiv preprint arXiv:1810.02508 (2018).

Annamaria Porreca, Francesca Scozzari, and Marta Di Nicola. 2020. Using text mining and sentiment analysis to analyse YouTube Italian videos concerning vaccination. BMC Public Health 20, 1 (2020), 1–9.

Cees GM Snoek, Marcel Worring, and Arnold WM Smeulders. 2005. Early versus late fusion in semantic video analysis. In Proceedings of the 13th annual ACM international conference on Multimedia. 399–402.

Mohammad Soleymani, David Garcia, Brendan Jou, Björn Schuller, Shih-Fu Chang, and Maja Pantic. 2017. A survey of multimodal sentiment analysis. Image and Vision Computing 65 (2017), 3–14.

Martin Wöllmer, Felix Weninger, Tobias Knaup, Björn Schuller, Congkai Sun, Kenji Sagae, and Louis-Philippe Morency. 2013. Youtube movie reviews: Sentiment analysis in an audio-visual context. IEEE Intelligent Systems 28, 3 (2013).

Lei Zhang, Shuai Wang, and Bing Liu. 2018. Deep learning for sentiment analysis: A survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 8, 4 (2018), e1253.