Video Summarization using Text Subjectivity Classification

  • Leonardo Moraes USP
  • Ricardo Marcondes Marcacini USP
  • Rudinei Goularte USP

Resumo


Video summarization has attracted researchers’ attention because it provides a compact and informative video version, supporting users and systems to save efforts in searching and understanding content of interest. Current techniques employ different strategies to select which video segments should be included in the final summary. The challenge is to process multimodal data present in the video looking for relevance clues (like redundant or complementary information) that help make a decision. A recent strategy is to use subjectivity detection. The presence or the absence of subjectivity can be explored as a relevance clue, helping to bring video summaries closer to the final user’s expectations. However, despite this potential, there is a gap on how to capture subjectivity information from videos. This paper investigates video summarization through subjectivity classification from video transcripts. This approach requires dealing with recent challenges that are important in video summarization tasks, such as detecting subjectivity in different languages and across multiple domains. We propose a multilingual machine learning model trained to deal with subjectivity classification in multiple domains. An experimental evaluation with different benchmark datasets indicates that our multilingual and multi-domain method achieves competitive results, even compared to language-specific models. Furthermore, such a model can be used to provide subjectivity as a content selection criterion in the video summarization task, filtering out segments that are not relevant to a video domain of interest.
Palavras-chave: video summarization, subjectivity classification, sentiment analysis, BERT, NLP

Referências

Charu C Aggarwal. 2018. Opinion mining and sentiment analysis. In Machine learning for text. Springer, 491–514. https://doi.org/10.1007/978-3-030-96623-2_15

Marta Aparício, Paulo Figueiredo, Francisco Raposo, David Martins de Matos, Ricardo Ribeiro, and Luís Marujo. 2016. Summarization of films and documentaries based on subtitles and scripts. Pattern Recognition Letters 73 (2016), 7–12. https://doi.org/10.1016/j.patrec.2015.12.016

Evlampios Apostolidis, Eleni Adamantidou, Alexandros I. Metsai, Vasileios Mezaris, and Ioannis Patras. 2021. Video Summarization Using Deep Neural Networks: A Survey. Proc. IEEE 109, 11 (2021), 1838–1863. https://doi.org/10.1109/JPROC.2021.3117472

Tamires Tessarolli de Souza Barbieri and Rudinei Goularte. 2021. Content selection criteria for news multi-video summarization based on human strategies. International Journal on Digital Libraries 22, 1 (2021), 1–14. https://doi.org/10.1007/s00799-020-00281-9

Luana Balador Belisário, Luiz Gabriel Ferreira, and Thiago Alexandre Salgueiro Pardo. 2020. Evaluating methods of different paradigms for subjectivity classification in portuguese. In International Conference on Computational Processing of the Portuguese Language. Springer, 261–269. https://doi.org/10.1007/978-3-030-41505-1_25

Luana Balador Belisário, Luiz Gabriel Ferreira, and Thiago Alexandre Salgueiro Pardo. 2020. Evaluating richer features and varied machine learning models for subjectivity classification of book review sentences in portuguese. Information 11, 9 (2020), 437. https://doi.org/10.3390/info11090437

Felipe Bravo-Marquez, Marcelo Mendoza, and Barbara Poblete. 2014. Meta-level sentiment models for big social data analysis. Knowledge-based systems 69 (2014), 86–99. https://doi.org/10.1016/j.knosys.2014.05.016

Jireh Yi-Le Chan, Khean Thye Bea, Steven Mun Hong Leow, Seuk Wai Phoong, and Wai Khuen Cheng. 2022. State of the art: a review of sentiment analysis based on sequential transfer learning. Artificial Intelligence Review(2022), 1–32. https://doi.org/10.1007/s10462-022-10183-8

Iti Chaturvedi, Erik Cambria, Roy E Welsch, and Francisco Herrera. 2018. Distinguishing between facts and opinions for sentiment analysis: Survey and challenges. Information Fusion 44(2018), 65–77. https://doi.org/10.1016/j.inffus.2017.12.006

Jose M Chenlo and David E Losada. 2014. An empirical study of sentence features for subjectivity and polarity classification. Information Sciences 280(2014), 275–288. https://doi.org/10.1016/j.ins.2014.05.009

Tamires Tessarolli de Souza Barbieri and Rudinei Goularte. 2020. Investigating Subjectivity Criterion for Multi-Video Summarization(WebMedia ’20). Association for Computing Machinery, New York, NY, USA, 137–144. https://doi.org/10.1145/3428658.3430964

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 4171–4186. https://doi.org/10.48550/arXiv.1810.04805

Ehsan Elhamifar and Zwe Naing. 2019. Unsupervised Procedure Learning via Joint Dynamic Summarization. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV).

Suchin Gururangan, Ana Marasović, Swabha Swayamdipta, Kyle Lo, Iz Beltagy, Doug Downey, and Noah A Smith. 2020. Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 8342–8360. https://doi.org/10.48550/arXiv.2004.10964

Mohammad Hesham, Bishoy Hani, Nour Fouad, and Eslam Amer. 2018. Smart trailer: Automatic generation of movie trailer using only subtitles. In 2018 First International Workshop on Deep and Representation Learning (IWDRL). 26–30. https://doi.org/10.1109/IWDRL.2018.8358211

Hairong Huo and Mizuho Iwaihara. 2020. Utilizing BERT pretrained models with various fine-tune methods for subjectivity detection. In Asia-Pacific Web (APWeb) and Web-Age Information Management (WAIM) Joint International Conference on Web and Big Data. Springer, 270–284. https://doi.org/10.1007/978-3-030-60290-1_21

Tanveer Hussain, Khan Muhammad, Weiping Ding, Jaime Lloret, Sung Wook Baik, and Victor Hugo C. de Albuquerque. 2021. A comprehensive survey of multi-view video summarization. Pattern Recognition 109(2021), 107567. https://doi.org/10.1016/j.patcog.2020.107567

Clayton Hutto and Eric Gilbert. 2014. Vader: A parsimonious rule-based model for sentiment analysis of social media text. In Proceedings of the international AAAI conference on web and social media, Vol. 8. 216–225.

I Ide, Y Zhang, R Tanishige, K Doman, Y Kawanishi, D Deguchi, and H Murase. 2017. Summarization of News Videos Considering the Consistency of Auditory and Visual Contents. In 2017 IEEE International Symposium on Multimedia (ISM). IEEE, New York, NY, USA, 193–199. https://doi.org/10.1109/ISM.2017.33

Zakia Jalil, Jamal Abdul Nasir, and Muhammad Nasir. 2021. Extractive Multi-Document Summarization: A Review of Progress in the Last Decade. IEEE Access 9(2021), 130928–130946. https://doi.org/10.1109/ACCESS.2021.3112496

Zhong Ji, Yuxiao Zhao, Yanwei Pang, and Xuelong Li. 2020. Cross-modal guidance based auto-encoder for multi-video summarization. Pattern Recognition Letters 135 (2020), 131 – 137. https://doi.org/10.1016/j.patrec.2020.04.011

Haoran Li, Junnan Zhu, Cong Ma, Jiajun Zhang, and Chengqing Zong. 2019. Read, Watch, Listen, and Summarize: Multi-Modal Summarization for Asynchronous Text, Image, Audio and Video. IEEE Transactions on Knowledge and Data Engineering 31, 5(2019), 996–1009. https://doi.org/10.1109/TKDE.2018.2848260

Ping Li, Qinghao Ye, Luming Zhang, Li Yuan, Xianghua Xu, and Ling Shao. 2021. Exploring global diverse attention via pairwise temporal relation for video summarization. Pattern Recognition 111(2021), 107677. https://doi.org/10.1016/j.patcog.2020.107677

Bing Liu. 2020. Sentiment analysis: Mining opinions, sentiments, and emotions. Cambridge university press. https://doi.org/10.1162/COLI_r_00259

Zhieh Lor, Hae Jung Oh, and Jihyang Choi. 2022. Excluding and Including: News Tailoring Strategies in an Era of News Overload. Digital Journalism 0, 0 (2022), 1–19. https://doi.org/10.1080/21670811.2022.2048187

Bin Lu and Benjamin K Tsou. 2010. Combining a large sentiment lexicon and machine learning for subjectivity classification. In 2010 international conference on machine learning and cybernetics, Vol. 6. IEEE, 3311–3316. https://doi.org/10.1109/ICMLC.2010.5580672

Market.US. 2020. Amazon Prime Video Statistics and Facts. [link]. 23 de maio de 2022

Silvia MW Moraes, André LL Santos, Matheus Redecker, Rackel M Machado, and Felipe R Meneguzzi. 2016. Comparing approaches to subjectivity classification: A study on portuguese tweets. In International Conference on Computational Processing of the Portuguese Language. Springer, 86–94. https://doi.org/10.1007/978-3-319-41552-9_8

Ritika Nandi, Geetha Maiya, Priya Kamath, and Shashank Shekhar. 2021. An Empirical Evaluation of Word Embedding Models for Subjectivity Analysis Tasks. In 2021 International Conference on Advances in Electrical, Computing, Communication and Sustainable Technologies (ICAECT). IEEE, 1–5. https://doi.org/10.1109/ICAECT49130.2021.9392437

Usman Naseem, Imran Razzak, Shah Khalid Khan, and Mukesh Prasad. 2021. A comprehensive survey on word representation models: From classical to state-of-the-art word representation language models. Transactions on Asian and Low-Resource Language Information Processing 20, 5(2021), 1–35. https://doi.org/10.1145/3434237

Netflix. 2020. About Us - About Netflix. https://about.netflix.com/en 23 de maio de 2022

Bo Pang and Lillian Lee. 2004. A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts. In Proceedings of the ACL. https://doi.org/10.48550/arXiv.cs/0409058

Telmo Pires, Eva Schlinger, and Dan Garrette. 2019. How Multilingual is Multilingual BERT?. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 4996–5001. https://doi.org/10.48550/arXiv.1906.01502

Stevan Rudinac, Tat-Seng Chua, Nicolas Diaz-Ferreyra, Gerald Friedland, Tatjana Gornostaja, Benoit Huet, Rianne Kaptein, Krister Lindén, Marie-Francine Moens, Jaakko Peltonen, 2018. Rethinking summarization and storytelling for modern social multimedia. In International Conference on Multimedia Modeling. Springer, 632–644. https://doi.org/10.1007/978-3-319-73603-7_51

Prashant Giridhar Shambharkar and Ruchi Goel. 2021. Analysis of Real Time Video Summarization using Subtitles. In 2021 International Conference on Industrial Electronics Research and Applications (ICIERA). 1–4. https://doi.org/10.1109/ICIERA53202.2021.9726769

Bilin Shao, Xiaojun Li, and Genqing Bian. 2021. A survey of research hotspots and frontier trends of recommendation systems from the perspective of knowledge graph. Expert Systems with Applications 165 (2021), 113764. https://doi.org/10.1016/j.eswa.2020.113764

RR Silva and Thiago Alexandre Salgueiro Pardo. 2019. Córpus 4P: um córpus anotado de opiniões em português sobre produtos eletrônicos para fins de sumarização contrastiva de opinião. Anais da 6a Jornada de Descrição do Português (JDP) (2019), 330–338.

Fábio Souza, Rodrigo Nogueira, and Roberto Lotufo. 2019. Portuguese Named Entity Recognition using BERT-CRF. arXiv preprint arXiv:1909.10649(2019). https://doi.org/10.48550/arXiv.1909.10649

Fábio Souza, Rodrigo Nogueira, and Roberto Lotufo. 2020. BERTimbau: pretrained BERT models for Brazilian Portuguese. In 9th Brazilian Conference on Intelligent Systems, BRACIS, Rio Grande do Sul, Brazil, October 20-23 (to appear). Springer. https://doi.org/10.1007/978-3-030-61377-8_28

Chi Sun, Xipeng Qiu, Yige Xu, and Xuanjing Huang. 2019. How to fine-tune bert for text classification?. In China national conference on Chinese computational linguistics. Springer, 194–206. https://doi.org/10.1007/978-3-030-32381-3_16

Vasudha Tiwari and Charul Bhatnagar. 2021. A survey of recent work on video summarization: approaches and techniques. Multimedia Tools and Applications 80, 18 (2021), 27187–27221. https://doi.org/10.1007/s11042-021-10977-y

Alvin Toffler. 1984. Future Shock (1 ed.). Bantam, New York, NY, USA. 576 pages.

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017).

Janyce Wiebe, Theresa Wilson, Rebecca Bruce, Matthew Bell, and Melanie Martin. 2004. Learning subjective language. Computational linguistics 30, 3 (2004), 277–308. https://doi.org/10.1162/0891201041850885

Jiaxin Wu, Shenghua Zhong, and Yan Liu. 2020. Dynamic graph convolutional network for multi-video summarization. Pattern Recognition 107(2020), 107382. https://doi.org/10.1016/j.patcog.2020.107382

Amir Zadeh, Rowan Zellers, Eli Pincus, and Louis-Philippe Morency. 2016. Mosi: multimodal corpus of sentiment intensity and subjectivity analysis in online opinion videos. arXiv preprint arXiv:1606.06259(2016). https://doi.org/10.48550/arXiv.1606.06259

Xuejun Zhang, Shan Huang, Jianqiang Zhao, Xiaogang Du, and Fucun He. 2018. Exploring deep recurrent convolution neural networks for subjectivity classification. IEEE Access 7(2018), 347–357. https://doi.org/10.1109/ACCESS.2018.2885362
Publicado
07/11/2022
MORAES, Leonardo; MARCACINI, Ricardo Marcondes; GOULARTE, Rudinei. Video Summarization using Text Subjectivity Classification. In: BRAZILIAN SYMPOSIUM ON MULTIMEDIA AND THE WEB (WEBMEDIA), 28. , 2022, Curitiba. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2022 . p. 141-149.