NAO-Read: Empowering the Humanoid Robot NAO to Recognize Texts in Objects in Natural Scenes

  • Diego Alves da Silva UPE
  • Aline Geovanna Soares UPE
  • Antonio Lundgren UPE
  • Estanislau Lima UPE
  • Byron Leite Dantas Bezerra UPE


Robotics is a field of research that has undergone several changes in recent years. Currently, robot applications are commonly used for many applications, such as pump deactivation, mobile robotic manipulation, etc. However, most robots today are programmed to follow a predefined path. This is sufficient when the robot is working in a settled environment. Nonetheless, for many tasks, autonomous robots are needed. In this way, NAO humanoid robots constitute the new active research platform within the robotics community. In this article, we present a vision system that connects to the NAO robot, allowing robots to detect and recognize the visible text present in objects in images of natural scenes and use that knowledge to interpret the content of a given scene. The proposed vision system is based on deep learning methods and was designed to be used by NAO robots and consists of five stages: 1) capturing the image; 2) after capturing the image, the YOLOv3 algorithm is used for object detection and classification; 3) selection of the objects of interest; 4) text detection and recognition stage, based on the OctShuffleMLT approach; and 5) synthesis of the text. The choice of these models was due to the better results obtained in the COCO databases, in the list of objects, and in the ICDAR 2015, in the text list, these bases are very similar to those found with the NAO robot. Experimental results show that the rate of detecting and recognizing text from the images obtained through the NAO robot camera in the wild are similar to those presented in models pre-trained with natural scenes databases.


A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," in Advances in neural information processing systems, 2012, pp. 1097–1105.

D. Albani, A. Youssef, V. Suriani, D. Nardi, and D. D. Bloisi, "A deep learning approach for object recognition with nao soccer robots," in Robot World Cup. Springer, 2016, pp. 392–403.

S. Chatterjee, F. H. Zunjani, and G. C. Nandi, "Real-time object detection and recognition on low-compute humanoid robots using deep learning," in 2020 6th International Conference on Control, Automation and Robotics (ICCAR). IEEE, 2020, pp. 202–208.

J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You only look once: Unified, real-time object detection," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 779– 788.

R. Girshick, J. Donahue, T. Darrell, and J. Malik, "Rich feature hierarchies for accurate object detection and semantic segmentation," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 580–587.

R. Girshick, "Fast r-cnn," in Proceedings of the IEEE international conference on computer vision, 2015, pp. 1440–1448.

S. Ren, K. He, R. Girshick, and J. Sun, "Faster r-cnn: Towards real-time object detection with region proposal networks," in Advances in neural information processing systems, 2015, pp. 91–99.

B. Shi, X. Bai, and C. Yao, "An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition," IEEE transactions on pattern analysis and machine intelligence, vol. 39, no. 11, pp. 2298–2304, 2016.

A. Lundgren, D. Castro, E. Lima, and B. Bezerra, "Octshufflemlt: A compact octave based neural network for end-to-end multilingual text detection and recognition," in 2019 International Conference on Document Analysis and Recognition Workshops (ICDARW), vol. 4. IEEE, 2019, pp. 37–42.

J. Han, N. Campbell, K. Jokinen, and G. Wilcock, "Investigating the use of non-verbal cues in human-robot interaction with a nao robot," in 2012 IEEE 3rd International Conference on Cognitive Infocommunications (CogInfoCom). IEEE, 2012, pp. 679–683.

S. Shamsuddin, H. Yussof, L. Ismail, F. A. Hanapiah, S. Mohamed, H. A. Piah, and N. I. Zahari, "Initial response of autistic children in human- robot interaction therapy with humanoid robot nao," in 2012 IEEE 8th International Colloquium on Signal Processing and its Applications. IEEE, 2012, pp. 188–193.

J. P. Vital, M. S. Couceiro, N. M. Rodrigues, C. M. Figueiredo, and N. M. Ferreira, "Fostering the nao platform as an elderly care robot," in 2013 IEEE 2nd international conference on serious games and applications for health (SeGAH). IEEE, 2013, pp. 1–5.

A. Tapus, A. Peca, A. Aly, C. Pop, L. Jisa, S. Pintea, A. S. Rusu, and D. O. David, "Children with autism social engagement in interaction with nao, an imitative robot: A series of single case experiments," Interaction studies, vol. 13, no. 3, pp. 315–347, 2012.

J. M. Sá, I. V. d. S. T. Pereira, and A. M. A. Maciel, "Integração de um modelo de reconhecimento de emoções ao robô humanoide nao," Revista de Engenharia e Pesquisa Aplicada, vol. 5, no. 1, pp. 110–116, 2020.

A. Aly, "Human posture recognition and gesture imitation with a humanoid robot," arXiv preprint arXiv:2002.01779, 2020.

W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg, "Ssd: Single shot multibox detector," in European conference on computer vision. Springer, 2016, pp. 21–37.

A. Rosebrock, "Yolo object detection with opencv," PyImageSearch, viewed, vol. 20, 2018.
Como Citar

Selecione um Formato
DA SILVA, Diego Alves; SOARES, Aline Geovanna; LUNDGREN, Antonio; LIMA, Estanislau; BEZERRA, Byron Leite Dantas. NAO-Read: Empowering the Humanoid Robot NAO to Recognize Texts in Objects in Natural Scenes. In: WORKSHOP DE TRABALHOS DA GRADUAÇÃO - CONFERENCE ON GRAPHICS, PATTERNS AND IMAGES (SIBGRAPI), 33. , 2020, Evento Online. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2020 . p. 151-154. DOI: