skip to main content
10.1145/3638067.3638105acmotherconferencesArticle/Chapter ViewAbstractPublication PagesihcConference Proceedingsconference-collections
research-article
Honorable Mention

Imagery contents descriptions for People with visual impairments

Published:24 January 2024Publication History

ABSTRACT

Image descriptions are crucial in assisting individuals without eyesight by providing verbal representations of visual content. While manual and Artificial Intelligence (AI)-generated descriptions exist, automatic description generators have not fully met the needs of visually impaired People. In this study, we have examined the problems related to image descriptions reported in existing literature using the Snowballing technique. Through this method, we have identified thirteen issues, including ethical concerns surrounding physical appearance, gender and identity, race, and disability. Furthermore, we have identified five reasons why sighted Individuals often fail to provide descriptions for visual content, highlighting the necessity for accessibility campaigns that raise awareness about the social significance of descriptive sentences. We conducted interviews with eight low-vision volunteers, in which we analyzed the characteristics of descriptive sentences for 25 indoor images and gathered participants’ expectations regarding image descriptions. As a result, we propose a set of Good Practices for writing descriptive sentences aimed to assist automatic tools and sighted Individuals in generating more satisfactory and high-quality image descriptions. We hope our results will emphasize the societal importance of imagery descriptions and inspire the community to pursue further interdisciplinary research to address the issues identified in our study.

References

  1. ABNT. Associação Brasileira de Normas Técnicas/Brazilian Association of Technical Standards. NBR 16452:2016. Comitê Técnico de Acessibilidade. 2016. Accessibility in communication — Audio description. https://www.prefeitura.sp.gov.br/cidade/secretarias/upload/ABNT%20-%20Acessibilidade.pdfGoogle ScholarGoogle Scholar
  2. Maria Lúcia Toledo Moraes Amiralian. 1997. Compreendendo O Cego - Uma Visão Psicanalítica Da Cegueira Por Meio De Desenhos-Estórias. Casa do Psicólogo, São Paulo.Google ScholarGoogle Scholar
  3. Vera Lúcia Santiago Araújo. 2010. A formação de audiodescritores no Ceará e em Minas Gerais: Uma proposta baseada em pesquisa acadêmica. In Audiodescrição: transformando imagens em palavras, Lívia Maria Villela Mello Motta and Paulo Romeu Filho (Eds.). Secretaria dos Direitos da Pessoa com Deficiência do Estado de São Paulo, 93–105. https://www.prefeitura.sp.gov.br/cidade/secretarias/upload/planejamento/prodam/arquivos/Livro_Audiodescricao.pdfGoogle ScholarGoogle Scholar
  4. San Pa Pa Aung, Win Pa Pa, and Tin Lay Nwe. 2020. Automatic Myanmar Image Captioning using CNN and LSTM-Based Language Model. In Proceedings of the Joint Workshop on Spoken Language Technologies for Under-resourced languages and Collaboration and Computing for Under-Resourced Languages. European Language Resources Association, 139–143.Google ScholarGoogle Scholar
  5. Shuang Bai and Shan An. 2018. A survey on automatic image caption generation. Neurocomputing 311 (May 2018), 291–304. https://doi.org/10.1016/j.neucom.2018.05.080Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Cynthia L. Bennett, Cole Gleason, Morgan Klaus Scheuerman, Jeffrey P. Bigham, Anhong Guo, and Alexandra To. 2021. “It’s Complicated”: Negotiating Accessibility and (Mis)Representation in Image Descriptions of Race, Gender, and Disability. In Proceedings of the Conference on Human Factors in Computing Systems. Association for Computing Machinery, New York, NY, USA, 1–19. https://doi.org/10.1145/3411764.3445498Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Rajarshi Biswas, Michael Barz, and Daniel Sonntag. 2020. Towards Explanatory Interactive Image Captioning Using Top-Down and Bottom-Up Features, Beam Search and Re-ranking. KI-Künstliche Intelligenz 34 (Jul 2020), 1–14. https://doi.org/10.1007/s13218-020-00679-2Google ScholarGoogle ScholarCross RefCross Ref
  8. Julia Brannen. 2005. Mixing Methods: The Entry of Qualitative and Quantitative Approaches into the Research Process. International Journal of Social Research Methodology 8, 3 (Feb. 2005), 173–184. https://doi.org/10.1080/13645570500154642Google ScholarGoogle ScholarCross RefCross Ref
  9. Shizhe Chen, Qin Jin, Peng Wang, and Qi Wu. 2020. Say As You Wish: Fine-Grained Control of Image Caption Generation With Abstract Scene Graphs. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 9959–9968. https://doi.org/10.1109/CVPR42600.2020.00998Google ScholarGoogle ScholarCross RefCross Ref
  10. Xinlei Chen, Hao Fang, Tsung-Yi Lin, Ramakrishna Vedantam, Saurabh Gupta, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft COCO Captions: Data Collection and Evaluation Server. In Proceedings of the European Conference on Computer Vision, Vol. 8693. Springer, Cham, Zurich, Switzerland, 740–755. https://doi.org/10.1007/978-3-319-10602-1_48Google ScholarGoogle ScholarCross RefCross Ref
  11. Domenico Chiarella, Justin Yarbrough, and Christopher A.L. Jackson. 2020. Using alt text to make science Twitter more accessible for people with visual impairments. Nature communications 11, 1 (Nov. 2020), 5803. https://doi.org/10.1038/s41467-020-19640-wGoogle ScholarGoogle ScholarCross RefCross Ref
  12. Larissa Magalhães Costa. 2014. Audiodescrição em filmes: história, discussão conceitual e pesquisa de recepção. Ph. D. Dissertation. Departamento de Letras–Pontifícia Universidade Católica do Rio de Janeiro, Rio de Janeiro, RJ, Brasil. https://doi.org/10.17771/PUCRio.acad.29932Google ScholarGoogle ScholarCross RefCross Ref
  13. Emanuel Diamant. 2008. Unveiling the mystery of visual information processing in human brain. Brain Research 1225 (Aug. 2008), 171–178. https://doi.org/10.1016/j.brainres.2008.05.017Google ScholarGoogle ScholarCross RefCross Ref
  14. Pierre Dognin, Igor Melnyk, Youssef Mroueh, Jerret Ross, and Tom Sercu. 2019. Adversarial Semantic Alignment for Improved Image Captions. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 10455–10463. https://doi.org/10.1109/CVPR.2019.01071Google ScholarGoogle ScholarCross RefCross Ref
  15. Barbara Downe‐Wamboldt. 1992. Content analysis: Method, applications, and issues. Health Care for Women International 13, 3 (Nov. 1992), 313–321. https://doi.org/10.1080/07399339209516006Google ScholarGoogle ScholarCross RefCross Ref
  16. Desmond Elliott and Frank Keller. 2013. Image Description using Visual Dependency Representations. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 1292–1302. https://www.aclweb.org/anthology/D13-1128/Google ScholarGoogle Scholar
  17. Satu Elo and Helvi Kyngäs. 2008. The qualitative content analysis process. Journal of Advanced Nursing 62, 1 (Nov. 2008), 107–115. https://doi.org/10.1111/j.1365-2648.2007.04569.xGoogle ScholarGoogle ScholarCross RefCross Ref
  18. Mikaela Daiane Prestes Floriano, Paulo Vanderlei Cassanego Junior, and Andressa Hennig Silva. 2020. #PraCegoVer: uma discussão da inclusão digital e social sob a ótica da pesquisa transformativa do consumidor. CTS: Revista iberoamericana de ciencia, tecnología y sociedad 15, 45 (Oct. 2020), 183–207.Google ScholarGoogle Scholar
  19. Lawrence R. Frey, Carl H. Botan, and Gary L. Kreps. 1999. Investigating Communication: An Introduction to Research Methods (second ed.). Allyn and Bacon, Boston, MA, United States, Chapter 9, 139–161.Google ScholarGoogle Scholar
  20. Cole Gleason, Patrick Carrington, Cameron Cassidy, Meredith Ringel Morris, Kris M. Kitani, and Jeffrey P. Bigham. 2019. “It’s Almost like They’re Trying to Hide It”: How User-Provided Image Descriptions Have Failed to Make Twitter Accessible. In Proceedings of the The World Wide Web Conference (San Francisco, CA, USA) (WWW ’19). Association for Computing Machinery, New York, NY, USA, 549–559. https://doi.org/10.1145/3308558.3313605Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Danna Gurari, Yinan Zhao, Meng Zhang, and Nilavra Bhattacharya. 2020. Captioning Images Taken by People Who Are Blind. In Proceedings of the European Conference on Computer Vision, Vol. 12362. Springer, Cham, Glasgow, UK, 417–434.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Micah Hodosh, Peter Young, and Julia Hockenmaier. 2013. Framing Image Description as a Ranking Task: Data, Models and Evaluation Metrics. Journal of Artificial Intelligence Research 47, 1 (May. 2013), 853–899. https://doi.org/10.1613/jair.3994Google ScholarGoogle ScholarCross RefCross Ref
  23. Laura Hollink, A. Th. Schreiber, Bob J. Wielinga, and Marcel Worring. 2004. Classification of user image descriptions. International Journal of Human-Computer Studies 61, 5 (Nov 2004), 601–626. https://doi.org/10.1016/j.ijhcs.2004.03.002Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Md. Zakir Hossain, Ferdous Sohel, Mohd Fairuz Shiratuddin, and Hamid Laga. 2019. A Comprehensive Survey of Deep Learning for Image Captioning. Comput. Surveys 51, 6, Article 118 (Feb 2019), 36 pages. https://doi.org/10.1145/3295748Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Rachel Hutchinson, Hannah Thompson, and Matthew Cock. 2020. Describing diversity: an exploration of the description of human characteristics and appearance within the practice of theatre audio description. Technical Report. Describing Diversity project in partnership of VocalEyes with Royal Holloway, University of London. 1–84 pages. https://doi.org/10.13140/RG.2.2.23958.78400Google ScholarGoogle Scholar
  26. IBM. 2018. IBM Developer Model Asset Exchange: Image Caption Generator. https://developer.ibm.com/technologies/artificial-intelligence/models/max-image-caption-generator/Google ScholarGoogle Scholar
  27. Nikolai Ilinykh and Simon Dobnik. 2020. When an Image Tells a Story: The Role of Visual and Semantic Information for Generating Paragraph Descriptions. In Proceedings of the International Conference on Natural Language Generation. Association for Computational Linguistics, Dublin, Ireland, 338–348.Google ScholarGoogle ScholarCross RefCross Ref
  28. Marina Ivasic-Kos, Ivo Ipsic, and Slobodan Ribaric. 2015. A knowledge-based multi-layered image annotation system. Expert Systems with Applications 42, 24 (Dec. 2015), 9539–9553. https://doi.org/10.1016/j.eswa.2015.07.068Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Alejandro Jaimes and Shih-Fu Chang. 2000. A Conceptual Framework for Indexing Visual Information at Multiple Levels. Electronic Imaging 3964 (Jan 2000), 2–15. https://doi.org/10.1117/12.373443Google ScholarGoogle ScholarCross RefCross Ref
  30. Alessandra Helena Jandrey, Duncan Dubugras Alcoba Ruiz, and Milene Selbach Silveira. 2021. Image Descriptions’ Limitations for People with Visual Impairments: Where Are We and Where Are We Going?. In Proceedings of the Brazilian Symposium on Human Factors in Computing Systems (Online, Brazil) (IHC ’21). Association for Computing Machinery, New York, NY, USA, 1–11. https://doi.org/10.1145/3472301.3484356Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. C. R. Kothari. 2013. Research Methodology Methods and Techniques (second ed.). New Age International, New Delhi, Delhi, India.Google ScholarGoogle Scholar
  32. Shuang Liu, Liang Bai, Yanli Hu, and Haoran Wang. 2018. Image Captioning Based on Deep Neural Networks. In Proceedings of the International Conference on Electronic Information Technology and Computer Engineering(MATEC Web of Conferences, Vol. 232). Article 7, 7 pages. https://doi.org/10.1051/matecconf/201823201052Google ScholarGoogle ScholarCross RefCross Ref
  33. Haley MacLeod, Cynthia L. Bennett, Meredith Ringel Morris, and Edward Cutrell. 2017. Understanding Blind People’s Experiences with Computer-Generated Captions of Social Media Images. In Proceedings of the Conference on Human Factors in Computing Systems(CHI ’17). Association for Computing Machinery, New York, NY, USA, 5988–5999. https://doi.org/10.1145/3025453.3025814Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Celia Maria Magalhães and Pedro Henrique Lima Praxedes Filho. 2018. Neutrality in Audio Descriptions of paintings: an appraisal system-based study of corpora in English and Portuguese. Revista da Anpoll 44, 1 (Feb-Apr 2018), 279–298. https://doi.org/10.18309/anp.v1i44.1169Google ScholarGoogle ScholarCross RefCross Ref
  35. Burak Makav and Volkan Kılıç. 2019. A New Image Captioning Approach for Visually Impaired People. In Proceedings of the International Conference on Electrical and Electronics Engineering. IEEE, 945–949. https://doi.org/10.23919/ELECO47770.2019.8990630Google ScholarGoogle ScholarCross RefCross Ref
  36. Nahema Marchal, Lisa-Maria Neudert, Bence Kollanyi, and Philip Howard. 2021. Investigating Visual Content Shared over Twitter during the 2019 EU Parliamentary Election Campaign. Media and Communication 9, 1 (Feb. 2021), 158–170. https://doi.org/10.17645/mac.v9i1.3421Google ScholarGoogle ScholarCross RefCross Ref
  37. Meredith Ringel Morris, Jazette Johnson, Cynthia L. Bennett, and Edward Cutrell. 2018. Rich Representations of Visual Content for Screen Reader Users. In Proceedings of the Conference on Human Factors in Computing Systems (Montreal QC, Canada) (CHI ’18). Association for Computing Machinery, New York, NY, USA, 1–11. https://doi.org/10.1145/3173574.3173633Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Meredith Ringel Morris, Annuska Zolyomi, Catherine Yao, Sina Bahram, Jeffrey P. Bigham, and Shaun K. Kane. 2016. " With Most of It Being Pictures Now, I Rarely Use It": Understanding Twitter’s Evolving Accessibility to Blind Users. In Proceedings of the Conference on Human Factors in Computing Systems. Association for Computing Machinery, New York, NY, USA, 5506–5516. https://doi.org/10.1145/2858036.2858116Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Sylvia Bahiense Naves, Carla Mauch, Soraya Ferreira Alves, and Vera Lúcia Santiago Araújo. 2016. Guia para Produções Audiovisuais Acessíveis. Technical Report. Ministério da Cultura por meio da Secretaria do Audiovisual, Rio de Janeiro, RJ, Brasil. 1–80 pages. https://inclusao.enap.gov.br/wp-content/uploads/2018/05/Guia-para-Producoes-Audiovisuais-Acessiveis-com-audiodescricao-das-imagens-1.pdfGoogle ScholarGoogle Scholar
  40. Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: A Method for Automatic Evaluation of Machine Translation. In Proceedings of the Annual Meeting on Association for Computational Linguistics (Philadelphia, Pennsylvania) (ACL ’02). Association for Computational Linguistics, USA, 311–318. https://doi.org/10.3115/1073083.1073135Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Helen Petrie, Chandra Harrison, and Sundeep Dev. 2005. Describing images on the Web: a survey of current practice and prospects for the future. In Proceedings of the Human Computer Interaction International, Vol. 71. 1–10.Google ScholarGoogle Scholar
  42. Rune Pettersson. 2013. Views on Visual Literacy. Journal on Images and Culture 1, 1 (Feb. 2013), 1–9.Google ScholarGoogle Scholar
  43. Bryan A. Plummer, Liwei Wang, Chris M. Cervantes, Juan C. Caicedo, Julia Hockenmaier, and Svetlana Lazebnik. 2017. Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models. International Journal of Computer Vision 123, 1 (May 2017), 74–93. https://doi.org/10.1007/s11263-016-0965-7Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Carolina Sacramento, Leonardo Nardi, Simone Bacellar Leal Ferreira, and Jo ao Marcelo dos Santos Marques. 2020. #PraCegoVer: Investigating the Description of Visual Content in Brazilian Online Social Media. In Proceedings of the Brazilian Symposium on Human Factors in Computing Systems (Diamantina, Brazil) (IHC ’20). Association for Computing Machinery, New York, NY, USA, Article 1, 10 pages. https://doi.org/10.1145/3424953.3426489Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Johnny Saldaña. 2013. The Coding Manual for Qualitative Researchers (second ed.). SAGE Publications Ltd, London, UK.Google ScholarGoogle Scholar
  46. Piyush Sharma, Nan Ding, Sebastian Goodman, and Radu Soricut. 2018. Conceptual Captions: A Cleaned, Hypernymed, Image Alt-text Dataset For Automatic Image Captioning. In Proceedings of the Annual Meeting of the Association for Computational Linguistics, Vol. 1: Long Papers.Association for Computational Linguistics, Melbourne, Australia, 2556–2565. https://doi.org/10.18653/v1/P18-1238Google ScholarGoogle ScholarCross RefCross Ref
  47. Joel Snyder. 2010. Audio Description guidelines and best practices. Technical Report. American Council of the Blind’s Audio Description Project, USA. 1–98 pages. http://docenti.unimc.it/catia.giaconi/teaching/2017/17069/files/corso-sostegno/audiodescrizioniGoogle ScholarGoogle Scholar
  48. Bruno Splendiani, Mireia Ribera Turró, Roberto García, and Marina Salse. 2012. An interdisciplinary approach to alternative representations of images. In Proceedings of the International Conference on Computers Helping People with Special Needs(ICCHP’13). 153–158.Google ScholarGoogle Scholar
  49. Abigale Stangl, Meredith Ringel Morris, and Danna Gurari. 2020. ’Person, Shoes, Tree. Is the Person Naked?’ What People with Vision Impairments Want in Image Descriptions. In Proceedings of the Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI ’20). Association for Computing Machinery, New York, NY, USA, 1–13. https://doi.org/10.1145/3313831.3376404Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Steve Stemler. 2000. An overview of content analysis. Practical Assessment, Research, and Evaluation 7, 1, Article 17 (Nov. 2000), 10 pages. https://doi.org/10.7275/z6fm-2e34Google ScholarGoogle ScholarCross RefCross Ref
  51. Elizabet Dias Sá, Izilda Maria Campos, and Myriam Beatriz Campolina Silva. 2007. Inclusão escolar de alunos cegos e com baixa visão. In Atendimento educacional especializado em deficiência visual. SEESP/SEED/MEC, Brasilia, DF, Brazil, Chapter 1, 13–40.Google ScholarGoogle Scholar
  52. Luana Rodrigues S. Sá, Lídia Hubert, and Jader S. Nunes. 2020. Técnicas de audiodescrição aplicadas à Internet e sites. Technical Report. Fundação Escola Nacional de Administração Pública, Brasília, DF, Brasil. 1–47 pages. http://repositorio.enap.gov.br/handle/1/5299Google ScholarGoogle Scholar
  53. Lisa Tang. 2012. Producing informative text alternatives for images. Ph. D. Dissertation. University of Saskatchewan, Saskatoon, SK, Canada.Google ScholarGoogle Scholar
  54. David R. Thomas. 2006. A General Inductive Approach for Analyzing Qualitative Evaluation Data. American Journal of Evaluation 27 (Jun. 2006), 237–246. https://doi.org/10.1177/1098214005283748Google ScholarGoogle ScholarCross RefCross Ref
  55. Ramakrishna Vedantam, C. Lawrence Zitnick, and Devi Parikh. 2015. CIDEr: Consensus-Based Image Description Evaluation. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 4566–4575.Google ScholarGoogle ScholarCross RefCross Ref
  56. Oriol Vinyals, Alexander Toshev, Samy Bengio, and Dumitru Erhan. 2017. Show and Tell: Lessons Learned from the 2015 MSCOCO Image Captioning Challenge. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 4 (Sep. 2017), 652–663. https://doi.org/10.1109/TPAMI.2016.2587640Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. W3C.World Wide Web Consortium. 1998. Understanding Success Criterion 1.1.1: Non-text Content. https://www.w3.org/WAI/WCAG21/Understanding/non-text-contentGoogle ScholarGoogle Scholar
  58. WCAG.Web Content Accessibility Guidelines. 2008. v2.0 (2008). https://www.w3.org/TR/WCAG20/Google ScholarGoogle Scholar
  59. WebAIM. 2023. The WebAIM Million. https://webaim.org/projects/million/Google ScholarGoogle Scholar
  60. Shaomei Wu, Jeffrey Wieland, Omid Farivar, and Julie Schiller. 2017. Automatic Alt-Text: Computer-Generated Image Descriptions for Blind Users on a Social Network Service. In Proceedings of the Conference on Computer Supported Cooperative Work and Social Computing (Portland, Oregon, USA) (CSCW ’17). Association for Computing Machinery, New York, NY, USA, 1180–1192. https://doi.org/10.1145/2998181.2998364Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Yuhang Zhao, Shaomei Wu, Lindsay Reynolds, and Shiri Azenkot. 2017. The Effect of Computer-Generated Descriptions on Photo-Sharing Experiences of People with Visual Impairments. ACM Transactions on Computer-Human Interaction 1, CSCW, Article 121 (Dec 2017), 22 pages. https://doi.org/10.1145/3134756Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Imagery contents descriptions for People with visual impairments

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      IHC '23: Proceedings of the XXII Brazilian Symposium on Human Factors in Computing Systems
      October 2023
      791 pages
      ISBN:9798400717154
      DOI:10.1145/3638067

      Copyright © 2023 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 24 January 2024

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited

      Acceptance Rates

      Overall Acceptance Rate331of973submissions,34%
    • Article Metrics

      • Downloads (Last 12 months)40
      • Downloads (Last 6 weeks)23

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format