Accessibility Evaluation of LLM-Generated Android Code in React Native

Resumo


Introdução: O uso de LLMs na criação de interfaces gera dúvidas sobre sua capacidade de incluir acessibilidade. Objetivo: Neste estudo, a acessibilidade do código React Native gerado pelo ChatGPT 4.o mini, DeepSeek V3 e Gemini 2.0 Flash foi avaliada com a comparação de tipos de prompts e idiomas distintos. Metodologia ou Etapas: Analisaram-se 252 amostras de telas com o Accessibility Scanner, que encontrou 1.159 erros de acessibilidade. Resultados: Os prompts do tipo few-shot superaram os zero-shot, enquanto o idioma usado não teve efeito significativo. Esses resultados destacam as limitações dos LLMs na geração de telas acessíveis, enfatizando a necessidade do estudo de metodologias que considerem o papel do desenvolvedor no uso dessas ferramentas no design inclusivo.

Palavras-chave: Modelos de Linguagem de Grande Escala, Acessibilidade Móvel, React Native, Engenharia de Prompts

Referências

Ahmed, A., Fresco, M., Forsberg, F., e Grotli, H. (2025). From code to compliance: Assessing chatgpt’s utility in designing an accessible webpage – a case study. In arXiv: [link].

Aljedaani, W., Habib, A., Aljohani, A., Eler, M., e Feng, Y. (2024). Does chatgpt generate accessible code? investigating accessibility challenges in llm-generated source code. In Proceedings of the 21st International Web for All Conference, W4A ’24, page 165–176, New York, NY, USA. Association for Computing Machinery.

Alshayban, A., Ahmed, I., e Malek, S. (2020). Accessibility issues in android apps: State of affairs, sentiments, and ways forward. In 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE), pages 1323–1334.

Andrade, M., Rabelo, D., Martins, R., e Viana, W. (2024). Investigating the accessibility of popular mobile android apps: a prevalence, category, and language study. In Proceedings of the 30th Brazilian Symposium on Multimedia and the Web, pages 400–404, Porto Alegre, RS, Brasil. SBC.

Android Developers (2025). User interface accessibility. Available at: [link]. Accessed on April 15, 2025.

AppBrain (2025). Android app frameworks statistics. Available at: [link]. Accessed on April 15, 2025.

Barmer, H., Dzombak, R., Gaston, M., Palat, V., Redner, F., Smith, C., e Smith, T. (2021). Human-centered ai. Report, Carnegie Mellon University. Available at DOI: 10.1184/R1/16560183.v1. Accessed on May 01, 2025.

Biørn-Hansen, A., Grønli, T.-M., e Ghinea, G. (2018). A survey and taxonomy of core concepts and research challenges in cross-platform mobile development. ACM Comput. Surv., 51(5).

Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., e Amodei, D. (2020). Language models are few-shot learners. In arXiv: [link].

Campoverde-Molina, M., Luján-Mora, S., e García, L. V. (2020). Empirical studies on web accessibility of educational websites: A systematic literature review. IEEE Access, 8:91676–91700.

Chen, M., Tworek, J., Jun, H., Yuan, Q., de Oliveira Pinto, H. P., Kaplan, J., Edwards, H., Burda, Y., Joseph, N., Brockman, G., Ray, A., Puri, R., Krueger, G., Petrov, M., Khlaaf, H., Sastry, G., Mishkin, P., Chan, B., Gray, S., Ryder, N., Pavlov, M., Power, A., Kaiser, L., Bavarian, M., Winter, C., Tillet, P., Such, F. P., Cummings, D., Plappert, M., Chantzis, F., Barnes, E., Herbert-Voss, A., Guss, W. H., Nichol, A., Paino, A., Tezak, N., Tang, J., Babuschkin, I., Balaji, S., Jain, S., Saunders, W., Hesse, C., Carr, A. N., Leike, J., Achiam, J., Misra, V., Morikawa, E., Radford, A., Knight, M., Brundage, M., Murati, M., Mayer, K., Welinder, P., McGrew, B., Amodei, D., McCandlish, S., Sutskever, I., e Zaremba, W. (2021). Evaluating large language models trained on code. In arXiv: [link].

Darvishy, A. (2022). Verifying screen reader accessibility of apps developed using google futter. In Zallio, M., editor, Human Factors in Accessibility and Assistive Technology, volume 37 of AHFE Open Access, USA. AHFE International.

DeepSeek (2025). Deepseek — advanced ai solutions and language models. Available at: [link]. Accessed on April 15, 2025.

Delnevo, G., Andruccioli, M., e Mirri, S. (2024). On the interaction with large language models for web accessibility: Implications and challenges. In 2024 IEEE 21st Consumer Communications & Networking Conference (CCNC), pages 1–6.

Duarte, E. F., Toledo Palomino, P., Pontual Falcão, T., Porto, G. L. P. M. B., Portela, C. d. S., Ribeiro, D. F., Nascimento, A., Costa Aguiar, Y. P., Souza, M., Moutin Segoria Gasparotto, A., et al. (2024). Grandihc-br 2025-2035-gc6: Implications of artifcial intelligence in hci: A discussion on paradigms ethics and diversity equity and inclusion. In Proceedings of the XXIII Brazilian Symposium on Human Factors in Computing Systems, pages 1–19.

Expo (2025). Expo — build universal native apps with react. Available at: [link]. Accessed on April 15, 2025.

Expo Project (2025). Expo go. [link]. Available at: [link]. Accessed on April 15, 2025.

Fagadau, I. D., Mariani, L., Micucci, D., e Riganelli, O. (2024). Analyzing prompt infuence on automated method generation: An empirical study with copilot. In Proceedings of the 32nd IEEE/ACM International Conference on Program Comprehension, ICPC ’24, page 24–34. ACM.

Google (2025a). Flutter — build apps for any screen. Available at: [link]. Accessed on April 15, 2025.

Google (2025b). Gemini – google ai assistant. Available at: [link]. Accessed on April 15, 2025.

Google LLC (2025). Accessibility scanner. Available at: [link]. Accessed on April 15, 2025.

Henry, S. L., Abou-Zahra, S., e Brewer, J. (2014). The role of accessibility in a universal web. In Proceedings of the 11th Web for All Conference, W4A ’14, New York, NY, USA. Association for Computing Machinery.

Hou, X., Zhao, Y., Liu, Y., Yang, Z., Wang, K., Li, L., Luo, X., Lo, D., Grundy, J., e Wang, H. (2024). Large language models for software engineering: A systematic literature review. In arXiv: [link].

Leite, M. V. R., Scatalon, L. P., Freire, A. P., e Eler, M. M. (2021). Accessibility in the mobile development industry in brazil: Awareness, knowledge, adoption, motivations and barriers. Journal of Systems and Software, 177:110942.

Mascetti, S., Ducci, M., Cantù, N., Pecis, P., e Ahmetovic, D. (2021). Developing accessible mobile applications with cross-platform development frameworks. In Proceedings of the 23rd International ACM SIGACCESS Conference on Computers and Accessibility, pages 1–5.

Mateus, D. A., Silva, C. A., De Oliveira, A. F., Costa, H., e Freire, A. P. (2021). A systematic mapping of accessibility problems encountered on websites and mobile apps: A comparison between automated tests, manual inspections and user evaluations. Journal on Interactive Systems, 12(1):145–171.

Mateus, D. A., Souza, M. R. D. A., e Freire, A. P. (2023). Accessibility of mobile apps for visually impaired users: Problems encountered by user evaluation, inspections and automated tools. In Proceedings of the XXII Brazilian Symposium on Human Factors in Computing Systems, pages 1–11.

Meta Platforms, Inc. (2025). React native — learn once, write anywhere. Available at: [link]. Accessed on April 15, 2025.

Muniz, J. H., Mesquita Feijó Rabelo, D., e Viana, W. (2024). Assessing accessibility levels in mobile applications developed from figma templates. In Proceedings of the 17th International Conference on PErvasive Technologies Related to Assistive Environments, PETRA ’24, page 316–321, New York, NY, USA. Association for Computing Machinery.

Nunes, E. H. d. C. (2025). Design e avaliação de um guia para acessibilidade em dispositivos móveis com a abnt nbr 17060. Dissertação de mestrado, Universidade Federal do Ceará, Fortaleza, Brasil.

Nunes, E. H. D. C., Ribeiro, G. V., Monteiro, I. T., e Gonçalves, E. (2023). Digital accessibility at the brazilian symposium on human factors in computing systems (ihc): An updated systematic literature review. In Proceedings of the XXII Brazilian Symposium on Human Factors in Computing Systems, pages 1–15.

OpenAI (2025). Openai. Available at: [link]. Accessed on April 15, 2025.

Othman, A., Dhouib, A., e Nasser Al Jabor, A. (2023). Fostering websites accessibility: A case study on the use of the large language models chatgpt for automatic remediation. In Proceedings of the 16th International Conference on PErvasive Technologies Related to Assistive Environments, PETRA ’23, page 707–713, New York, NY, USA. Association for Computing Machinery.

Pereira, R., Darin, T., e Silveira, M. S. (2024). Grandihc-br: Grand research challenges in human-computer interaction in brazil for 2025-2035. In Proceedings of the XXIII Brazilian Symposium on Human Factors in Computing Systems, IHC ’24, New York, NY, USA. Association for Computing Machinery.

Rabelo, D. M. F., de Souza Martins, R., Santos, I., da Silva, P. H. G., Gama, K., e Viana, W. (2025). Breaking barriers in mobile accessibility: A study of llm-generated native android interfaces. In Proceedings of the 12th International Conference on Mobile Software Engineering and Systems (MOBILESoft ’25), Ottawa, Canada. ACM.

Stack Overfow (2024). Stack overfow developer survey 2024: Most popular technologies. Available at: [link]. Accessed on April 15, 2025.

Suh, H., Tafreshipour, M., Malek, S., e Ahmed, I. (2025). Human or llm? a comparative study on accessible code generation capability. In arXiv: [link].

Vendome, C., Solano, D., Liñán, S., e Linares-Vásquez, M. (2019). Can everyone use my app? an empirical study on accessibility in android apps. In 2019 IEEE International Conference on Software Maintenance and Evolution (ICSME), pages 41–52.

World Wide Web Consortium (W3C) (2023). Web content accessibility guidelines (wcag) 2.2. Available at: [link]. Accessed on April 15, 2025.

Zhang, G., Raina, A., Cagan, J., e McComb, C. (2021). A cautionary tale about the impact of ai on human design teams. Design Studies, 72:100990.

Zhao, W. X., Zhou, K., Li, J., Tang, T., Wang, X., Hou, Y., Min, Y., Zhang, B., Zhang, J., Dong, Z., Du, Y., Yang, C., Chen, Y., Chen, Z., Jiang, J., Ren, R., Li, Y., Tang, X., Liu, Z., Liu, P., Nie, J.-Y., e Wen, J.-R. (2025). A survey of large language models. In arXiv: [link].
Publicado
08/09/2025
VENTURA, Carlos David; RABELO, Daniel Mesquita Feijó; DARIN, Ticianne; VIANA, Windson. Accessibility Evaluation of LLM-Generated Android Code in React Native. In: SIMPÓSIO BRASILEIRO SOBRE FATORES HUMANOS EM SISTEMAS COMPUTACIONAIS (IHC), 24. , 2025, Belo Horizonte/MG. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2025 . p. 1301-1322. DOI: https://doi.org/10.5753/ihc.2025.10833.