short-paper

An Approach for Automatic Description of Characters for Blind People

Authors:
Itamar Rocha Filho

UFPB, Brasil

UFPB, Brasil
View Profile

,
Felipe Honorato

UFPB, Brasil

UFPB, Brasil
View Profile

,
J. Wallace Lucena

UFPB, Brasil

UFPB, Brasil
View Profile

,
J. Pedro Teixeira

UFPB, Brasil

UFPB, Brasil
View Profile

,
Tiago Maritan

UFPB, Brasil

UFPB, Brasil
View Profile

WebMedia '21: Proceedings of the Brazilian Symposium on Multimedia and the WebNovember 2021Pages 53–56https://doi.org/10.1145/3470482.3479617

Published:05 November 2021Publication History

WebMedia '21: Proceedings of the Brazilian Symposium on Multimedia and the Web

Pages 53–56

ABSTRACT

Audio Description (AD) or Video Description is a vital accessibility concept in blind and visually impaired people's life. Automating this task is not easy and involves many problems, such as describing the scenario, actions, emotions, and characters. This paper presents an approach to automatically describe characters --- in a video or image --- combining Deep Learning (DL), Face detection, Facial Expression detection techniques, and audio synthesizers. Our proposal uses the detection tools, applies some DL models to the analyzed data, and generates an audio description. To evaluate the feasibility of our proposal, we have developed a proof of concept of the solution and performed some computational experiments to evaluate it.

References

Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dandelion Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. https://www.tensorflow.org/ Software available from tensorflow.org.Google Scholar
Inc. Amazon Web Services. 2021. Amazon Polly. https://aws.amazon.com/polly/Google Scholar
Langis Gagnon, Samuel Foucher, Maguelonne Heritier, Marc Lalonde, David Byrns, Claude Chapdelaine, James Turner, Suzanne Mathieu, Denis Laurendeau, Nath Nguyen, and Denis Ouellet. 2009. Towards computer-vision software tools to increase production and accessibility of video description for people with vision loss. Universal Access in the Information Society 8 (08 2009), 199--218. https://doi.org/10.1007/s10209-008-0141-0Google Scholar
Charles R. Harris, K. Jarrod Millman, Stéfan J. van der Walt, Ralf Gommers, Pauli Virtanen, David Cournapeau, Eric Wieser, Julian Taylor, Sebastian Berg, Nathaniel J. Smith, Robert Kern, Matti Picus, Stephan Hoyer, Marten H. van Kerkwijk, Matthew Brett, Allan Haldane, Jaime Fernández del Río, Mark Wiebe, Pearu Peterson, Pierre Gérard-Marchant, Kevin Sheppard, Tyler Reddy, Warren Weckesser, Hameer Abbasi, Christoph Gohlke, and Travis E. Oliphant. 2020. Array programming with NumPy. Nature 585, 7825 (Sept. 2020), 357--362. https://doi.org/10.1038/s41586-020-2649-2Google ScholarCross Ref
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep Residual Learning for Image Recognition. arXiv:1512.03385 [cs.CV]Google Scholar
C. Jiménez, C.J. Hurtado, A. Rodríguez, and C. Seibel. 2012. Un corpus de cine: fundamentos teóricos y aplicados de la audiodescripción. Tragacanto. https://books.google.com.br/books?id=OOSXYgEACAAJGoogle Scholar
Kimmo Karkkainen and Jungseock Joo. 2021. FairFace: Face Attribute Dataset for Balanced Race, Gender, and Age for Bias Measurement and Mitigation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 1548--1558.Google ScholarCross Ref
Davis E. King. 2009. Dlib-ml: A Machine Learning Toolkit. Journal of Machine Learning Research 10 (2009), 1755--1758.Google ScholarDigital Library
Diederik P. Kingma and Jimmy Ba. 2017. Adam: A Method for Stochastic Optimization. arXiv:1412.6980 [cs.LG]Google Scholar
Masatomo Kobayashi, Kentarou Fukuda, Hironobu Takagi, and Chieko Asakawa. 2009. Providing Synthesized Audio Description for Online Videos. In Proceedings of the 11th International ACM SIGACCESS Conference on Computers and Accessibility (Pittsburgh, Pennsylvania, USA) (Assets '09). Association for Computing Machinery, New York, NY, USA, 249--250. https://doi.org/10.1145/1639642.1639699Google ScholarDigital Library
Masatomo Kobayashi, Trisha O'Connell, Bryan Gould, Hironobu Takagi, and Chieko Asakawa. 2010. Are synthesized video descriptions acceptable? ASSETS'10 - Proceedings of the 12th International ACM SIGACCESS Conference on Computers and Accessibility. https://doi.org/10.1145/1878803.1878833Google ScholarDigital Library
James Lakritz and Andrew Salway. 2006. The Semi-Automatic Generation of Audio Description from Screenplays. Dept. of Computing Technical Report CS-06-05, University of Surrey (2006).Google Scholar
Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. 2015. Deep Learning Face Attributes in the Wild. In Proceedings of International Conference on Computer Vision (ICCV).Google ScholarDigital Library
Adrian Rosebrock. 2021. imutils. https://github.com/jrosebr1/imutilsGoogle Scholar
Rasmus Rothe, Radu Timofte, and Luc Van Gool. 2018. Deep expectation of real and apparent age from a single image without facial landmarks. International Journal of Computer Vision 126, 2-4 (2018), 144--157.Google ScholarDigital Library
Barbara Cristina A. Silveira, Thiago Silva-de Souza, and Ana Regina C. da Rocha. 2018. Software Accessibility for Visually Impaired People: A Systematic Mapping Study. In Proceedings of the 17th Brazilian Symposium on Software Quality (Curitiba, Brazil) (SBQS). Association for Computing Machinery, New York, NY, USA, 190--199. https://doi.org/10.1145/3275245.3275266Google ScholarDigital Library
Yujia Wang, Wei Liang, Haikun Huang, Yongqi Zhang, Dingzeyu Li, and Lap-Fai Yu. 2021. Toward Automatic Audio Description Generation for Accessible Videos. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (Yokohama, Japan) (CHI '21). Association for Computing Machinery, New York, NY, USA, Article 277, 12 pages. https://doi.org/10.1145/3411764.3445347Google ScholarDigital Library
Zhifei Zhang, Yang Song, and Hairong Qi. 2017. Age Progression/Regression by Conditional Adversarial Autoencoder. arXiv:1702.08423 [cs.CV]Google Scholar

Index Terms

An Approach for Automatic Description of Characters for Blind People
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
        Scene understanding
2. Human-centered computing
  1. Accessibility
    1. Accessibility systems and tools

Recommendations

What Makes Videos Accessible to Blind and Visually Impaired People?
CHI '21: Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems

User-generated videos are an increasingly important source of information online, yet most online videos are inaccessible to blind and visually impaired (BVI) people. To find videos that are accessible, or understandable without additional description ...
Read More
Toward Automatic Audio Description Generation for Accessible Videos
CHI '21: Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems

Video accessibility is essential for people with visual impairments. Audio descriptions describe what is happening on-screen, e.g., physical actions, facial expressions, and scene changes. Generating high-quality audio descriptions requires a lot of ...
Read More
Machine Generation of Audio Description for Blind and Visually Impaired People
Automating the generation of audio descriptions (AD) for blind and visually impaired (BVI) people is a difficult task, since it has several challenges involved, such as: identifying gaps in dialogues; describing the essential elements; summarizing and ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WebMedia '21: Proceedings of the Brazilian Symposium on Multimedia and the Web
November 2021
271 pages
ISBN:9781450386098
DOI:10.1145/3470482
General Chairs:
Adriano César Machado Pereira
UFMG
,
Leonardo Chaves Dutra da Rocha
UFSJ
Copyright © 2021 ACM
© 2021 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 5 November 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
accessibility
audio description
blind people
deep learning
Qualifiers
- short-paper
- Research
- Refereed limited
Conference

Acceptance Rates
WebMedia '21 Paper Acceptance Rate24of75submissions,32%Overall Acceptance Rate270of873submissions,31%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 94
  Total Downloads
- Downloads (Last 12 months)18
- Downloads (Last 6 weeks)5
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

An Approach for Automatic Description of Characters for Blind People

WebMedia '21: Proceedings of the Brazilian Symposium on Multimedia and the Web

ABSTRACT

References

Cited By

Index Terms

Recommendations

What Makes Videos Accessible to Blind and Visually Impaired People?

Toward Automatic Audio Description Generation for Accessible Videos

Machine Generation of Audio Description for Blind and Visually Impaired People

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

An Approach for Automatic Description of Characters for Blind People

WebMedia '21: Proceedings of the Brazilian Symposium on Multimedia and the Web

ABSTRACT

References

Cited By

Index Terms

Recommendations

What Makes Videos Accessible to Blind and Visually Impaired People?

Toward Automatic Audio Description Generation for Accessible Videos

Machine Generation of Audio Description for Blind and Visually Impaired People

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media