skip to main content
10.1145/3470482.3479617acmconferencesArticle/Chapter ViewAbstractPublication PageswebmediaConference Proceedingsconference-collections
short-paper

An Approach for Automatic Description of Characters for Blind People

Published:05 November 2021Publication History

ABSTRACT

Audio Description (AD) or Video Description is a vital accessibility concept in blind and visually impaired people's life. Automating this task is not easy and involves many problems, such as describing the scenario, actions, emotions, and characters. This paper presents an approach to automatically describe characters --- in a video or image --- combining Deep Learning (DL), Face detection, Facial Expression detection techniques, and audio synthesizers. Our proposal uses the detection tools, applies some DL models to the analyzed data, and generates an audio description. To evaluate the feasibility of our proposal, we have developed a proof of concept of the solution and performed some computational experiments to evaluate it.

References

  1. Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dandelion Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. https://www.tensorflow.org/ Software available from tensorflow.org.Google ScholarGoogle Scholar
  2. Inc. Amazon Web Services. 2021. Amazon Polly. https://aws.amazon.com/polly/Google ScholarGoogle Scholar
  3. Langis Gagnon, Samuel Foucher, Maguelonne Heritier, Marc Lalonde, David Byrns, Claude Chapdelaine, James Turner, Suzanne Mathieu, Denis Laurendeau, Nath Nguyen, and Denis Ouellet. 2009. Towards computer-vision software tools to increase production and accessibility of video description for people with vision loss. Universal Access in the Information Society 8 (08 2009), 199--218. https://doi.org/10.1007/s10209-008-0141-0Google ScholarGoogle Scholar
  4. Charles R. Harris, K. Jarrod Millman, Stéfan J. van der Walt, Ralf Gommers, Pauli Virtanen, David Cournapeau, Eric Wieser, Julian Taylor, Sebastian Berg, Nathaniel J. Smith, Robert Kern, Matti Picus, Stephan Hoyer, Marten H. van Kerkwijk, Matthew Brett, Allan Haldane, Jaime Fernández del Río, Mark Wiebe, Pearu Peterson, Pierre Gérard-Marchant, Kevin Sheppard, Tyler Reddy, Warren Weckesser, Hameer Abbasi, Christoph Gohlke, and Travis E. Oliphant. 2020. Array programming with NumPy. Nature 585, 7825 (Sept. 2020), 357--362. https://doi.org/10.1038/s41586-020-2649-2Google ScholarGoogle ScholarCross RefCross Ref
  5. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep Residual Learning for Image Recognition. arXiv:1512.03385 [cs.CV]Google ScholarGoogle Scholar
  6. C. Jiménez, C.J. Hurtado, A. Rodríguez, and C. Seibel. 2012. Un corpus de cine: fundamentos teóricos y aplicados de la audiodescripción. Tragacanto. https://books.google.com.br/books?id=OOSXYgEACAAJGoogle ScholarGoogle Scholar
  7. Kimmo Karkkainen and Jungseock Joo. 2021. FairFace: Face Attribute Dataset for Balanced Race, Gender, and Age for Bias Measurement and Mitigation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 1548--1558.Google ScholarGoogle ScholarCross RefCross Ref
  8. Davis E. King. 2009. Dlib-ml: A Machine Learning Toolkit. Journal of Machine Learning Research 10 (2009), 1755--1758.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Diederik P. Kingma and Jimmy Ba. 2017. Adam: A Method for Stochastic Optimization. arXiv:1412.6980 [cs.LG]Google ScholarGoogle Scholar
  10. Masatomo Kobayashi, Kentarou Fukuda, Hironobu Takagi, and Chieko Asakawa. 2009. Providing Synthesized Audio Description for Online Videos. In Proceedings of the 11th International ACM SIGACCESS Conference on Computers and Accessibility (Pittsburgh, Pennsylvania, USA) (Assets '09). Association for Computing Machinery, New York, NY, USA, 249--250. https://doi.org/10.1145/1639642.1639699Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Masatomo Kobayashi, Trisha O'Connell, Bryan Gould, Hironobu Takagi, and Chieko Asakawa. 2010. Are synthesized video descriptions acceptable? ASSETS'10 - Proceedings of the 12th International ACM SIGACCESS Conference on Computers and Accessibility. https://doi.org/10.1145/1878803.1878833Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. James Lakritz and Andrew Salway. 2006. The Semi-Automatic Generation of Audio Description from Screenplays. Dept. of Computing Technical Report CS-06-05, University of Surrey (2006).Google ScholarGoogle Scholar
  13. Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. 2015. Deep Learning Face Attributes in the Wild. In Proceedings of International Conference on Computer Vision (ICCV).Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Adrian Rosebrock. 2021. imutils. https://github.com/jrosebr1/imutilsGoogle ScholarGoogle Scholar
  15. Rasmus Rothe, Radu Timofte, and Luc Van Gool. 2018. Deep expectation of real and apparent age from a single image without facial landmarks. International Journal of Computer Vision 126, 2-4 (2018), 144--157.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Barbara Cristina A. Silveira, Thiago Silva-de Souza, and Ana Regina C. da Rocha. 2018. Software Accessibility for Visually Impaired People: A Systematic Mapping Study. In Proceedings of the 17th Brazilian Symposium on Software Quality (Curitiba, Brazil) (SBQS). Association for Computing Machinery, New York, NY, USA, 190--199. https://doi.org/10.1145/3275245.3275266Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Yujia Wang, Wei Liang, Haikun Huang, Yongqi Zhang, Dingzeyu Li, and Lap-Fai Yu. 2021. Toward Automatic Audio Description Generation for Accessible Videos. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (Yokohama, Japan) (CHI '21). Association for Computing Machinery, New York, NY, USA, Article 277, 12 pages. https://doi.org/10.1145/3411764.3445347Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Zhifei Zhang, Yang Song, and Hairong Qi. 2017. Age Progression/Regression by Conditional Adversarial Autoencoder. arXiv:1702.08423 [cs.CV]Google ScholarGoogle Scholar

Index Terms

  1. An Approach for Automatic Description of Characters for Blind People

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        WebMedia '21: Proceedings of the Brazilian Symposium on Multimedia and the Web
        November 2021
        271 pages
        ISBN:9781450386098
        DOI:10.1145/3470482

        Copyright © 2021 ACM

        © 2021 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 5 November 2021

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • short-paper
        • Research
        • Refereed limited

        Acceptance Rates

        WebMedia '21 Paper Acceptance Rate24of75submissions,32%Overall Acceptance Rate270of873submissions,31%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader