Vision Scan Insight: An Intelligent Assistant Using Deep Neural Networks for Low Vision Users in Supermarkets
Abstract
This article proposes an intelligent tool to assist people with blindness or low vision during shopping visits in a supermarket. The solution utilizes deep neural networks (CNNs) to detect objects, integrated into an application that offers real-time auditory feedback. The system integrates three YOLO variants (v5, v8, and v9), each retrained using both full fine-tuning and restricted transfer learning on three product datasets (Food, mAP50−95 = 0.683; No-Fridge, mAP50−95 = 0.697; Groceries, mAP50−95 = 0.916).
References
de Oliveira, R. D. and Okimoto, M. L. L. R. (2022). Tecnologias assistivas relacionadas à moda para pessoas com deficiência visual: uma revisão sistemática. dObra[s] – revista da Associação Brasileira de Estudos de Pesquisas em Moda, 2022:183–205. DOI: 10.26563/dobras.i35.1459.
de Oliveira, S. T., Bozo, J. V., and Okimoto, M. L. L. R. (2016). Assistive technology for people with low vision: Equipment for accessibility of visual information. In Advances in Ergonomics in Design, pages 701–710. Springer.
Girshick, R. (2015). Fast r-cnn. In 2015 IEEE International Conference on Computer Vision (ICCV), pages 1440–1448. DOI: 10.1109/ICCV.2015.169.
González, R. C. and Woods, R. E. (2008). Digital image processing, 3rd Edition. Pearson Education. [link].
Goodfellow, I., Bengio, Y., and Courville, A. (2016). Deep Learning. MIT Press. [link].
Holton, B. (2013). A review of the taptapsee, camfind, and talking goggles object identification apps for the iphone. AFB Access World, Jul. 2013.
Holton, B. (2016). Bespecular: A new remote assistant service. AFB Access World, Jul. 2016.
Hussain, M. (2024). Yolov1 to v8: Unveiling each variant–a comprehensive review of yolo. IEEE Access, 12:42816–42833. DOI: 10.1109/ACCESS.2024.3378568.
Jaiman, A. (2021). Seeing ai: An app for visually impaired people that narrates the world around you. Parliamentarian, 102(4):380–381. ISSN 0031-2282.
Jocher, G. (2020). Ultralytics YOLOv5. [link].
Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. Adv. in neural information processing systems, 25.
Lecun, Y., Bottou, L., Bengio, Y., and Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324. DOI: 10.1109/5.726791.
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., and Berg, A. C. (2016). Ssd: Single shot multibox detector. In Leibe, B., Matas, J., Sebe, N., and Welling, M., editors, Computer Vision – ECCV 2016, pages 21–37. Springer.
Lomas, N. (2015). Aipoly puts machine vision in the hands of the visually impaired. TechCrunch, Aug. 17, 2015.
Pundlik, S., Shivshanker, P., and Luo, G. (2023). Impact of apps as assistive devices for visually impaired persons. Annual Review of Vision Science, 9:111–130. DOI: 10.1146/annurev-vision-111022-123837.
Redmon, J., Divvala, S., Girshick, R., and Farhadi, A. (2016). You only look once: Unified, real-time object detection. In Conf. on Computer Vision and Pattern Recognition (CVPR), pages 779–788. DOI: 10.1109/CVPR.2016.91.
Saliba, E. (2015). Be My Eyes app let’s vision-impaired people crowdsource sight. NBC Today, Jan. 26, 2015.
Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural Networks, 61:85–117. DOI: 10.1016/j.neunet.2014.09.003.
Szeliski, R. (2022). Computer vision: algorithms and applications. Springer Nature.
Terven, J., Córdova-Esparza, D.-M., and Romero-González, J.-A. (2023). A comprehensive review of yolo architectures in computer vision: From yolov1 to yolov8 and yolo-nas. Machine Learning and Knowledge Extraction, 5(4):1680–1716. DOI: 10.3390/make5040083.
Umbelino, C. C. and de Ávila, M. P. (2023). As Condições de Saúde Ocular no Brasil 2023. Conselho Brasileiro de Oftalmologia.
Wang, C.-Y., Yeh, I.-H., and Liao, H.-Y. M. (2024). Yolov9: Learning what you want to learn using programmable gradient information. arXiv preprint arXiv:2402.13616.
Xie, X., Cheng, G., Wang, J., Yao, X., and Han, J. (2021). Oriented r-cnn for object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 3520–3529.
