Application of Deep Learning Techniques to Depth Images for Person Tracking and Detection

  • Velton Cardoso Pires Universidade Federal do Rio Grande do Norte
  • Eduardo Silva Palmeira Universidade Estadual de Santa Cruz
  • Felipe Antunes dos Santos Public University of Navarre


Nowadays, using neural networks for image processing and tracking of individuals/objects is a highly popular subject that can be applied to various real-world issues. However, for such cases, the image often needs to possess good quality and exhibit distinct features to aid in object detection, posing challenges in environments with low or no illumination. In our work, we present a comparative study on the performance of leading convolutional neural networks for the detection and tracking of individuals through depth color images generated by infrared sensors. Additionally, we aim to demonstrate the usability of the YOLO (You Only Look Once) architecture as an alternative for identifying objects in images generated by sensors that do not rely on illumination. Experimental results showcase that the approach using YOLO Tiny improves accuracy by approximately 9% and processes around 8 times more frames per second (FPS).

Palavras-chave: Person Detectiion, Neural Networks, Image Processing


M. N. Murty and V. S. Devi, Introduction to Pattern Recognition and Machine Learning. Co-Published with Indian Institute of Science (IISc), Bangalore, India, 2015. [Online]. Available:

T. Surasak, I. Takahiro, C. H. Cheng, C. E. Wang, and P. Y. Sheng, “Histogram of oriented gradients for human detection in video,” Proceedings of 2018 5th International Conference on Business and Industrial Research: Smart Technology for Next Generation of Information, Engineering, Business and Social Science, ICBIR 2018, pp. 172–176, 2018.

R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 580–587, 2014.

K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask R-CNN,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 42, no. 2, pp. 386–397, 2020.

S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137–1149, 2017.

R. Girshick, “Fast R-CNN,” Proceedings of the IEEE International Conference on Computer Vision, vol. 2015 Inter, pp. 1440–1448, 2015.

J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2016-Decem, pp. 779–788, 2016.

J. Redmon and A. Farhadi, “YOLO9000: Better, faster, stronger,” Proceedings 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, vol. 2017-January, pp. 6517–6525, 2017.

——, “YOLOv3: An Incremental Improvement,” 2018. [Online]. Available:

W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Y. Fu, and A. C. Berg, “SSD: Single shot multibox detector,” Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 9905 LNCS, pp. 21–37, 2016.

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2016-December, pp. 770–778, 2016.

K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” 3rd International Conference on Learning Representations, ICLR 2015 Conference Track Proceedings, pp. 1–14, 2015.

A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in neural information processing systems, 2012, pp. 1097–1105.

T. Gong and H. Niu, “An implementation of resnet on the classification of rgb-d images,” pp. 149–155, 2019.

Z. Yi, Y. Shen, and Q. Zhao, “Multi-Person tracking algorithm based on data association,” Optik, vol. 194, no. April, 2019.

C. Herrmann, T. Müller, D. Willersinn, and J. Beyerer, “Real-time person detection in low-resolution thermal infrared imagery with mser and cnns,” in Electro-Optical and Infrared Systems: Technology and Applications XIII, vol. 9987. International Society for Optics and Photonics, 2016, p. 99870I.

D. Zhao, H. Zhou, S. Rang, and X. Jia, “An adaptation of cnn for small target detection in the infrared,” in IGARSS 2018-2018 IEEE International Geoscience and Remote Sensing Symposium. IEEE, 2018, pp. 669–672.

D. Chahyati, M. I. Fanany, and A. M. Arymurthy, “Tracking people by detection using cnn features,” Procedia Computer Science, vol. 124, pp. 167–172, 2017.

C. Kwan, B. Chou, A. Echavarren, B. Budavari, J. Li, and T. Tran, “Compressive vehicle tracking using deep learning,” in 2018 9th IEEE Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON). IEEE, 2018, pp. 51–56.

N. Wojke and A. Bewley, “Deep cosine metric learning for person re-identification,” in 2018 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, 2018, pp. 748–756.
PIRES, Velton Cardoso; PALMEIRA, Eduardo Silva; DOS SANTOS, Felipe Antunes. Application of Deep Learning Techniques to Depth Images for Person Tracking and Detection. In: ENCONTRO NACIONAL DE INTELIGÊNCIA ARTIFICIAL E COMPUTACIONAL (ENIAC), 20. , 2023, Belo Horizonte/MG. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2023 . p. 272-284. ISSN 2763-9061. DOI: