Interactive Storytelling with Gaze-Responsive Subtitles
Abstract
The paper describes an eye-tracking framework for offline analysis of and real-time interaction with gaze-responsive subtitled media. The eventual goal is to introduce and to evaluate gaze-responsive subtitles, which afford pausing of video when reading subtitles. Initial modes of interaction include: look-to-read and look-to-release. The former pauses video as long as gaze is detected over subtitles, the latter pauses video until gaze falls on subtitles. To avoid disrupted perception of media content, an additional ambient soundtrack matched to the general content of video is proposed. Note that this is potentially revolutionary as it would require an entirely novel approach to film direction. Just as Audio Description is now included in most modern films, ambient sound would also be needed to fill in brief temporal gaps when the user’s visual attention is directed toward subtitles. Concomitantly, the eye-tracking framework fosters quantitative analysis of attention to audiovisual content apart from qualitative evaluation on which most of subtitling standardization is based.
References
Marie-Josée Bisson, Walter van Heuven, Kathy Conklin, and Richard Tunney. 2012. Processing of native and foreign language subtitles in films: An eye tracking study. Applied Psycholinguistics 35 (03 2012). DOI: 10.1017/S0142716412000434
Marta Brescia-Zapata. 2022. The Present and Future of Accessibility Services in VR360 Players. inTRAlinea 24 (1 2022). [link]
Marta Brescia-Zapata, Krzysztof Krejtz, Pilar Orero, Andrew T. Duchowski, and Christopher J. Hughes. 2022. VR 360◦ subtitles: Designing a test suite with eye-tracking technology. Journal of Audiovisual Translation 5, 2 (2022). DOI: 10.47476/jat.v5i2.2022
Marta Brescia-Zapata, Krzysztof Krejtz, Pilar Orero, Andrew T. Duchowski, and Christopher J. Hughes. 2023. Subtitles in VR 360◦ video: Results from and eye-tracking experiment. Perspectives: Studies in Translation Theory and Practice (2023), 1–23. DOI: 10.1080/0907676X.2023.2268122
Andy Brown, Rhia Jones, Michael Crabb, James Sandford, Matthew Brooks, Michael Armstrong, and Caroline Jay. 2015. Dynamic Subtitles: The User Experience. In TVX 2015 (Brussels, Belgium). DOI: 10.1145/2745197.2745204
Colm Caffrey. 2009. Relevant abuse? Investigating the effects of an abusive subtitling procedure on the perception of TV anime using eye tracker and questionnaire. Ph.D. Dissertation. Dublin City University, Dublin, Ireland. [link] (last accessed Sep. 2010).
Christopher S. Campbell and Paul P. Maglio. 2001. A Robust Algorithm for Reading Detection. In ACM Workshop on Perceptive User Interfaces. ACM Press, 1–7.
Andrew T. Duchowski, Sophie Jörg, Jaret Screws, Nina A. Gehrer, Michael Schönenberg, and Krzysztof Krejtz. 2019. Guiding Gaze: Expressive Models of Reading and Face Scanning. In Proceedings of the 11th ACM Symposium on Eye Tracking Research & Applications (Denver, CO) (ETRA ’19). ACM, New York, NY, Article 25, 9 pages. DOI: 10.1145/3314111.3319848
Géry d’Ydewalle and Marijke Van de Poel. 1999. Incidental Foreign-Language Acquisition by Children Watching Subtitled Television Programs. Journal of Psycholinguistic Research 28, 3 (1999), 227–244.
Stephanie Economou. 2023. Assassin’s Creed Valhalla: Dawn of Ragnarok. American National Academy of Television Arts & Sciences 65th Annual Grammy Awards. Best Score Soundtrack for Video Games and Other Interactive Media.
Wendy Fox. 2018. Can integrated titles improve the viewing experience? Investigating the impact of subtitling on the reception and enjoyment of film using eye tracking and questionnaire data. Language Science Press, Berlin, Germany. DOI: 10.5281/zenodo.1180721
Peter A. Gorry. 1990. General Least-Squares Smoothing and Differentiation by the Convolution (Savitzky-Golay) Method. Analytical Chemistry 62, 6 (1990), 570–573. DOI: 10.1021/ac00205a007, arXiv: [link]
Thomas J. Grindinger, Andrew T. Duchowski, and Pilar Orero. 2011. Differentiating Aggregate Gaze Distributions. In Applied Perception in Graphics & Visualization (APGV). ACM, Toulouse, France. (Poster).
Thomas J. Grindinger, Vidya N. Murali, Stephen Tetreault, Andrew T. Duchowski, Stan T. Birchfield, and Pilar Orero. 2010. Algorithm for Discriminating Aggregate Gaze Points: Comparison with Salient Regions-Of-Interest. In International Workshop on Gaze Sensing and Interactions. IWGSI/ACCV, Queenstown, New Zealand.
Chris J. Hughes, Mike Armstrong, Rhianne Jones, and Michael Crabb. 2015. Responsive Design for Personalised Subtitles. In Proceedings of the 12th International Web for All Conference (Florence, Italy) (W4A ’15). Association for Computing Machinery, New York, NY, USA, Article 8, 4 pages. DOI: 10.1145/2745555.2746650
Chris J. Hughes, Marta Brescia-Zapata, Matthew Johnston, and Pilar Orero. 2020b. Immersive captioning: developing a framework for evaluating user needs. In IEEE AIVR 2020: 3rd International Conference on Artificial Intelligence & Virtual Reality 2020. IEEE. [link]
Chris J. Hughes, M. Brescia-Zapata, and Pilar Orero. 2020a. Evaluating subtitle readability in media immersive environments. In DSAI 2020 proceedings. Association for Computing Machinery (ACM). [link]
Chris J. Hughes and Mario Montagud Climent. 2020. Accessibility in 360◦ video players. Multimedia Tools and Applications (October 2020), 1–28. DOI: 10.1007/s11042-020-10088-0
Robert J. K. Jacob and Keith S. Karn. 2003. Eye Tracking in Human-Computer Interaction and Usability Research: Ready to Deliver the Promises. In The Mind’s Eye: Cognitive and Applied Aspects of Eye Movement Research, Jukka Hyönä, Ralph Radach, and Heiner Deubel (Eds.). Elsevier Science, Amsterdam, The Netherlands, 573–605.
Carl Jensema. 2003. The Relation Between Eye Movement and Reading Captions and Print by School-Age Deaf Children. Final Report Grant Award Number HH327H000002. Institute for Disability Research and Training, Inc., Wheaton, MD.
Carl J. Jensema, Sameh El Sharkawy, Ramalinga Sarma Danturthi, Robert Burch, and Daviv Hsu. 2000. Eye Movement Patterns of Captioned Television Viewers. American Annals of the Deaf 145, 3 (July 2000), 275–285.
Sheree Josephson and Michael E. Holmes. 2006. Clutter or Content? How On-Screen Enhancements Affect How TV Viewers Scan and What They Learn. In Eye Tracking Research & Applications (ETRA) Symposium. ACM, San Diego, CA, 155–162.
Jan-Louis Kruger, Agnieszka Szarkowska, and Izabela Krejtz. 2015. Subtitles on the Moving Image: An Overview of Eye Tracking Studies. Refractory: A journal of Entertainment Media 25 (02 2015).
Kuno Kurzhals, Fabian Göbel, Katrin Angerbauer, Michael Sedlmair, and Martin Raubal. 2020. A View on the Viewer: Gaze-Adaptive Captions for Videos. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI ’20). Association for Computing Machinery, New York, NY, 1–12. DOI: 10.1145/3313831.3376266
Juha Lång, Jukka Mäkisalo, Tersia Gowases, and Sami Pietinen. 2013. Using Eye Tracking to Study the Effect of Badly Synchronized Subtitles on the Gaze Paths of Television Viewers. New Voices in Translation Studies 10 (2013), 72–86.
Juan Martínez and Gion Linder. 2010. The Reception of a New Display Mode in Live Subtitling. In Proceedings of the Eighth Languages & The Media Conference (Berlin, Germany). 35–37.
Anna Matamala and Pilar Orero (Eds.). 2010. Listening to Subtitles: Subtitles for the Deaf and Hard of Hearing. Peter Lang AG, Bern, Switzerland.
Rebecca McClarty. 2012. Towards a multidisciplinary approach in creative subtitling. MonTI: Monografías de Traducción e Interpretación (January 2012), 133–153. DOI: 10.6035/MonTI.2012.4.6
Rebecca McClarty. 2014. In support of creative subtitling: contemporary context and theoretical framework. Perspectives 22, 4 (2014), 592–606. DOI: 10.1080/0907676X.2013.842258
Mario Montagud Climent, Isaac Fraile, Einar Meyerson, Maria Genis, and Sergi Fernández. 2019. ImAc Player: Enabling a Personalized Consumption of Accessible Immersive Contents. In Adjunct Publication of the 2019 ACM International Conference on Interactive Experiences for TV and Online Video (Hilversum, The Netherlands) (TVX ’17 Adjunct). Association for Computing Machinery, New York, NY, 3––8. DOI: 10.6084/m9.figshare.9879254.v1
Marcus Nyström and Kenneth Holmqvist. 2010. An adaptive algorithm for fixation, saccade, and glissade detection in eyetracking data. Behaviour Research Methods 42, 1 (2010), 188–204.
Frank Papenmeier and Markus Huff. 2010. DynAOI: A tool for matching eye-movement data with dynamic areas of interest in animations and movies. Behavior Research Methods 42, 1 (2010), 179–187.
Elisa Perego, Fabio Del Missier, Marco Porta, and Mauro Mosconi. 2010. The Cognitive Effectiveness of Subtitle Processing. Media Psychology 13, 3 (2010), 243–272. URL: DOI: 10.1080/15213269.2010.502873 (last accessed Sep. 2010).
Ana Pereira. 2010. Criteria for elaborating subtitles for deaf and hard of hearing adults in Spain: Description of a case study. In Listening to Subtitles: Subtitles for the Deaf and Hard of Hearing, Anna Matamala and Pilar Orero (Eds.). Peter Lang AG, Bern, Switzerland, 87–102.
Keith Rayner. 1998. Eye Movements in Reading and Information Processing: 20 Years of Research. Psychological Bulletin 124, 3 (1998), 372–422.
Wayne J. Ryan, Andrew T. Duchowski, Ellen A. Vincent, and Dina Battisto. 2010. Match-Moving for Area-Based Analysis of Eye Movements in Natural Tasks. In ETRA ’10: Proceedings of the 2010 Symposium on Eye Tracking Research & Applications. ACM, Austin, TX.
Dario D. Salvucci and Joseph H. Goldberg. 2000. Identifying Fixations and Saccades in Eye-Tracking Protocols. In Proceedings of the 2000 Symposium on Eye Tracking Research & Applications (Palm Beach Gardens, Florida, USA) (ETRA ’00). Association for Computing Machinery, New York, NY, 71––78. DOI: 10.1145/355017.355028
Abraham Savitzky and Marcel J. E. Golay. 1964. Smoothing and Differentiation of Data by Simplified Least Squares Procedures. Analytical Chemistry 36, 8 (1964), 1627–1639. [link]
Alexandra Sipatchin, Siegfried Wahl, and Katharina Rifai. 2020. Eye-tracking for low vision with virtual reality (VR): testing status quo usability of the HTC Vive Pro Eye. bioRxiv (2020). DOI: 10.1101/2020.07.29.220889
University Autònoma de Barcelona. 2010. Digital Television For All (DTV4All). Technical Report. University of Roehampton, UK. URL: [link], (last accessed Sep. 2010).
Francisco Utray, Belén Ruiz, and José Antonio Moreiro. 2010. Maximum font size for subtitles in Standard Definition Digital Television: Test for a font magnifying application. In Listening to Subtitles: Subtitles for the Deaf and Hard of Hearing, Anna Matamala and Pilar Orero (Eds.). Peter Lang AG, Bern, Switzerland, 59–68.
Verónica Arnáiz Uzquiza. 2010. SUBSORDIG: The need for a deep analysis of data. In Listening to Subtitles: Subtitles for the Deaf and Hard of Hearing, Anna Matamala and Pilar Orero (Eds.). Peter Lang AG, Bern, Switzerland, 163–174.
Roel Vertegaal. 2003. Attentive User Interfaces. Commun. ACM 46 (03 2003). DOI: 10.1145/636772.636794
Anna Vilaró, Pilar Duchowski, Andrew T. Orero, Thomas J. Grindinger, Stephen Tetreault, and Elena Di Giovanni. 2011. How Sound is The Pear Tree Story? Testing the Effect of Varying Audio Stimuli on Visual Attention Distribution. Perspectives: Studies in Translatology (2011). Special Issue on The Pear Stories (to appear).
Natalie Webb and Tony Renshaw. 2008. Eyetracking in HCI. In Research Methods for Human-Computer Interaction, Paul Cairns and Anna L. Cox (Eds.). Cambridge University Press, Cambridge, UK, 35–69.
Alfred L. Yarbus. 1967. Eye Movements and Vision. Plenum Press, New York, NY.