Visual Foundation Model-Based Classification of Characters in Narrative Media

  • Yan Martins B. Gurevitz Cunha PUC-Rio
  • Daniel de Sousa Moraes PUC-Rio
  • Antonio José Grandson Busson PUC-Rio
  • Julio Cesar Duarte IME
  • Sérgio Colcher PUC-Rio

Resumo


The field of narratology has long worked on different ways to classify characters according to their narrative role and importance; meanwhile, studies on character design have observed the existence of unclear patterns that correlate specific visual features and those narrative classifications. Therefore, it becomes worth considering if modern image classification methods can be used to not only achieve satisfactory performance on this task, but also help us learn about the patterns that shape it. The task of classifying characters carries a vast array of practical uses, such as creating recommender systems based on specific visual-narrative correlations, automatic annotation for media preservation, adaptive interactive storytelling based on how a character is perceived, and the creation of education tools on how those patterns change depending on culture. In this work, we construct a dataset customized to assessing both performance and visual–narrative correlations within a specific medium, conduct an anonymous survey to evaluate human performance and tendencies, and compare the results with those of a traditional CNN classifier and a Foundation Model–augmented approach. Both models outperform the survey participants, with the latter achieving 76.93% accuracy, 18.11% higher than human performance, while also providing insights into the visual cues most closely tied to the given narrative roles.

Palavras-chave: Image Classification, Deep Learning, Anime Characters, Narratology, Foundation Model

Referências

Geoffrey E. Hinton Alex Krizhevsky, Ilya Sutskever. 2012. ImageNet Classification with Deep Convolutional Neural Networks. NIPS (2012). [link]

Joseph Campbell. 2012. The Hero with a Thousand Faces (3 ed.). New World Library.

Wenhui Cheng. 2009. Analysis on nationality of Japanese Anime color. IEEE 10th International Conference on Computer-Aided Industrial Design & Conceptual Design (nov 2009).

François Chollet. 2017. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, Honolulu, HI, USA, 1251–1258. DOI: 10.1109/CVPR.2017.195

Joanne Cantor Cynthia Hoffner. 1985. Developmental differences in responses to a television character’s appearance and behavior. Developmental Psychology 21, 6 (1985).

Joanne Cantor Cynthia Hoffner. 1991. Responding to the screen: Reception and reaction processes. Chapter Perceiving and responding to mass media characters, 63–101.

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. IEEE conference on computer vision and pattern recognition (may 2009). DOI: 10.1109

Alex Fox. 2013. A Viewer’s Guide to Anime Hair’s Meaning. Retrieved June 2, 2022 from [link]

Bruno Rocha Gomes, Antonio J. G. Busson, José Boaro, and Sérgio Colcher. 2023. Realistic Facial Deep Fakes Detection Through Self-Supervised Features Generated by a Self-Distilled Vision Transformer. In Proceedings of the 29th Brazilian Symposium on Multimedia and the Web (WebMedia ’23). Association for Computing Machinery, New York, NY, USA, 177–183. DOI: 10.1145/3617023.3617047

Timothy Hickson. 2019. On Writing and Worldbuilding. Vol. 1.

Ivan Jesus, Jessica Cardoso, Antonio Busson, Álan Guedes, Sérgio Colcher, and Ruy Milidiú. 2019. A CNN-based Tool to Index Emotion on Anime Character Stickers. 2019 IEEE International Symposium on Multimedia (ISM) (december 2019). DOI: 10.1109/ISM46123.2019.00071

Ivan Jesus, Jessica Cardoso, Antonio Jose G. Busson, Álan Livio Guedes, Sérgio Colcher, and Ruy Luiz Milidiú. 2019. A CNN-Based Tool to Index Emotion on Anime Character Stickers. In 2019 IEEE International Symposium on Multimedia (ISM). 319–3193. DOI: 10.1109/ISM46123.2019.00071

Yanghua Jin, Jiakai Zhang, Minjun Li, Yingtao Tian, Huachun Zhu, and Zhihao Fang. 2017. Towards the Automatic Anime Characters Creation with Generative Adversarial Networks. Comiket 92 (Summer 2017), Tokyo Big Sight (aug 2017). DOI: 1708.05509

Shaoqing Ren Kaiming He, Xiangyu Zhang and Jian Sun. 2015. Deep Residual Learning for Image Recognition. IEEE conference on computer vision and pattern recognition (dec 2015). DOI: 1512.03385

Michael Weber Katja Rogers, Maria Aufheimer and Lennart E. Nacke. 2018. Towards the Visual Design of Non-Player Characters for Narrative Roles. Graphics Interface Conference (may 2018). [link]

Kevin. 2017. Anime Hair - Colors and Hairstyles and their Meanings. Retrieved June 2, 2022 from [link]

Hayeon Kim, Eun-Cheol Lee, Yongseok Seo, Dong-Hyuck Im, and In-Kwon Lee. 2021. Character Detection in Animated Movies Using Multi-Style Adaptation and Visual Attention. IEEE Transactions on Multimedia 23 (2021), 1990–2004. DOI: 10.1109/TMM.2020.3006372

Kang-Ming Chang Kun Liu, Jun-Hong Chen. 2019. A Study of Facial Features of American and Japanese Cartoon Characters. Symmetry (may 2019). [link]

Jochen Laubrock and David Dubray. 2018. CNN-Based classification of illustrator style in graphic novels: which features contribute most?. In International Conference on Multimedia Modeling. Springer, 684–695.

Kun Liu, Kang-Ming Chang, Ying-Ju Liu, and Jun-Hong Chen. 2020. Animated Character Style Investigation with Decision Tree Classification. Symmetry (jul 2020). [link]

Mengyi Liu, Ruiping Wang, Shaoxin Li, S. Shan, Zhiwu Huang, and Xilin Chen. 2014. Combining Multiple Kernel Methods on Riemannian Manifold for Emotion Recognition in the Wild. Proceedings of the 16th International Conference on Multimodal Interaction (2014).

Yusuke Matsui Masaki Saito. 2015. Illustration2Vec: A Semantic Vector Representation of Illustrations. SIGGRAPH Asia 2015 Technical Briefs (2015). [link]

Paulo Renato C. Mendes, Antonio José G. Busson, Sérgio Colcher, Daniel Schwabe, Álan Lívio V. Guedes, and Carlos Laufer. 2020. A Cluster-Matching-Based Method for Video Face Recognition. In Proceedings of the Brazilian Symposium on Multimedia and the Web (São Luís, Brazil) (WebMedia ’20). Association for Computing Machinery, NewYork, NY, USA, 97–104. DOI: 10.1145/3428658.3430967

Ping Wang Minghua Liu. 2010. Study on Image Design in Animation. Asian Social Science 6, 4 (apr 2010). [link]

Walter Mischel Nancy Cantor. 1979. Advances in experimental social psychology. Elsevier Inc. Prototypes in person perception.

Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy V. Vo, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel HAZIZA, Francisco Massa, Alaaeldin El-Nouby, Mido Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Herve Jegou, Julien Mairal, Patrick Labatut, Armand Joulin, and Piotr Bojanowski. 2024. DINOv2: Learning Robust Visual Features without Supervision. Transactions on Machine Learning Research (2024). [link] Featured Certification.

Gerald Prince. 1989. A dictionary of narratology (1 ed.). University of Nebraska Press.

Vladimir Propp. 1968. Morphology of the Folktale (1 ed.). University of Texas Press.

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. 2021. Learning Transferable Visual Models From Natural Language Supervision. (feb 2021). DOI: 2103.00020

Martin Schneider. 2015. Anime Hair Colors: Do They Carry Any Significant Meaning In Japanese Culture? Retrieved June 2, 2022 from [link]

Frederik L. Schodt. 1996. Dreamland Japan Writings On Modern Manga. Stone Bridge Press.

Anders Tychsen, Michael Hitchens, and Thea Brolund. 2008. Character play: the use of game characters in multi-player role-playing games across platforms. Comput. Entertain. 6, 2, Article 22 (July 2008), 24 pages. DOI: 10.1145/1371216.1371225

Valentin. 2013. Why do Anime Characters have Big Eyes? Retrieved June 2, 2022 from [link]

Christopher Vogler. 1998. The Writer’s Journey (3 ed.). Michael Wiese Productions.
Publicado
10/11/2025
CUNHA, Yan Martins B. Gurevitz; MORAES, Daniel de Sousa; BUSSON, Antonio José Grandson; DUARTE, Julio Cesar; COLCHER, Sérgio. Visual Foundation Model-Based Classification of Characters in Narrative Media. In: BRAZILIAN SYMPOSIUM ON MULTIMEDIA AND THE WEB (WEBMEDIA), 31. , 2025, Rio de Janeiro/RJ. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2025 . p. 473-481. DOI: https://doi.org/10.5753/webmedia.2025.16139.

Artigos mais lidos do(s) mesmo(s) autor(es)

<< < 1 2 3