Vision Scan Insight: Um Assistente Inteligente Utilizando Redes Neurais Profundas para Usuários de Baixa Visão em Supermercados

Paulo S. Cabral; Josenildo C. da Silva; João B. Diniz; Daniel L. Gomes Jr.

doi:10.5753/sbcas_estendido.2025.7108

Paulo S. Cabral IFMA
Josenildo C. da Silva IFMA
João B. Diniz IFMA
Daniel L. Gomes Jr. IFMA

DOI: https://doi.org/10.5753/sbcas_estendido.2025.7108

Resumo

Este artigo propõe uma ferramenta inteligente para auxiliar pessoas com cegueira ou baixa visão durante as compras em um supermercado. A solução utiliza redes neurais profundas (CNNs) para detectar objetos, integrada a um aplicativo que oferece feedback auditivo em tempo real. O sistema integra três variantes de YOLO (v5, v8 e v9), retreinadas via fine-tuning completo e transfer learning restrito em três bases de dados de produtos (Food, mAP₅₀₋₉₅=0,683; No-Fridge, mAP₅₀₋₉₅=0,697; Groceries, mAP₅₀₋₉₅=0,916).

Referências

Dadboud, F., Patel, V., Mehta, V., Bolic, M., and Mantegh, I. (2021). Single-stage uav detection and classification with yolov5: Mosaic data augmentation and panet. In 17th Intl. Conf. on Advanced Video and Signal Based Surveillance (AVSS), pages 1–8. IEEE.

de Oliveira, R. D. and Okimoto, M. L. L. R. (2022). Tecnologias assistivas relacionadas à moda para pessoas com deficiência visual: uma revisão sistemática. dObra[s] – revista da Associação Brasileira de Estudos de Pesquisas em Moda, 2022:183–205. DOI: 10.26563/dobras.i35.1459.

de Oliveira, S. T., Bozo, J. V., and Okimoto, M. L. L. R. (2016). Assistive technology for people with low vision: Equipment for accessibility of visual information. In Advances in Ergonomics in Design, pages 701–710. Springer.

Girshick, R. (2015). Fast r-cnn. In 2015 IEEE International Conference on Computer Vision (ICCV), pages 1440–1448. DOI: 10.1109/ICCV.2015.169.

González, R. C. and Woods, R. E. (2008). Digital image processing, 3rd Edition. Pearson Education. [link].

Goodfellow, I., Bengio, Y., and Courville, A. (2016). Deep Learning. MIT Press. [link].

Holton, B. (2013). A review of the taptapsee, camfind, and talking goggles object identification apps for the iphone. AFB Access World, Jul. 2013.

Holton, B. (2016). Bespecular: A new remote assistant service. AFB Access World, Jul. 2016.

Hussain, M. (2024). Yolov1 to v8: Unveiling each variant–a comprehensive review of yolo. IEEE Access, 12:42816–42833. DOI: 10.1109/ACCESS.2024.3378568.

Jaiman, A. (2021). Seeing ai: An app for visually impaired people that narrates the world around you. Parliamentarian, 102(4):380–381. ISSN 0031-2282.

Jocher, G. (2020). Ultralytics YOLOv5. [link].

Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. Adv. in neural information processing systems, 25.

Lecun, Y., Bottou, L., Bengio, Y., and Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324. DOI: 10.1109/5.726791.

Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., and Berg, A. C. (2016). Ssd: Single shot multibox detector. In Leibe, B., Matas, J., Sebe, N., and Welling, M., editors, Computer Vision – ECCV 2016, pages 21–37. Springer.

Lomas, N. (2015). Aipoly puts machine vision in the hands of the visually impaired. TechCrunch, Aug. 17, 2015.

Pundlik, S., Shivshanker, P., and Luo, G. (2023). Impact of apps as assistive devices for visually impaired persons. Annual Review of Vision Science, 9:111–130. DOI: 10.1146/annurev-vision-111022-123837.

Redmon, J., Divvala, S., Girshick, R., and Farhadi, A. (2016). You only look once: Unified, real-time object detection. In Conf. on Computer Vision and Pattern Recognition (CVPR), pages 779–788. DOI: 10.1109/CVPR.2016.91.

Saliba, E. (2015). Be My Eyes app let’s vision-impaired people crowdsource sight. NBC Today, Jan. 26, 2015.

Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural Networks, 61:85–117. DOI: 10.1016/j.neunet.2014.09.003.

Szeliski, R. (2022). Computer vision: algorithms and applications. Springer Nature.

Terven, J., Córdova-Esparza, D.-M., and Romero-González, J.-A. (2023). A comprehensive review of yolo architectures in computer vision: From yolov1 to yolov8 and yolo-nas. Machine Learning and Knowledge Extraction, 5(4):1680–1716. DOI: 10.3390/make5040083.

Umbelino, C. C. and de Ávila, M. P. (2023). As Condições de Saúde Ocular no Brasil 2023. Conselho Brasileiro de Oftalmologia.

Wang, C.-Y., Yeh, I.-H., and Liao, H.-Y. M. (2024). Yolov9: Learning what you want to learn using programmable gradient information. arXiv preprint arXiv:2402.13616.

Xie, X., Cheng, G., Wang, J., Yao, X., and Han, J. (2021). Oriented r-cnn for object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 3520–3529.