Modular Multi-Face Tracking Geared Toward Face Recognition in Surveillance Videos

  • Cássio B. Nascimento UFPR
  • Adolfo J. Neto EBTS Empresa Brasileira de Tecnologias e Sistemas Ltda
  • Luciano Silva UFPR


Face recognition has achieved great accuracy when used in controlled conditions, however, these results aren’t usually carried over to video surveillance scenarios. To facilitate the use of face recognition for video surveillance, face selection can be employed as an intermediate step. This dissertation presents a study of face selection where we rework a multi-face tracking pipeline and with few changes manage to increase tracking and reconnection capabilities. Through experimentation with different face detection models, random parameter search and a simpler face quality measure, we achieved an increase of 10.1% in Multiple Object Tracking Precision (MOTP) and 9% more in the IDF1 metric. All experiments were conducted on a public multi-face tracking dataset, which we also expanded through manual video annotations.


R. V. Clarke, “Situational crime prevention: Its theoretical basis and practical scope,” Crime and Justice, vol. 4, pp. 225–256, 1983. [Online]. Available:

M. P. J. Ashby, “The value of cctv surveillance cameras as an investigative tool: An empirical analysis,” European Journal on Criminal Policy and Research, vol. 23, no. 3, pp. 441–459, Sep 2017. [Online]. Available:

E. Piza, B. Welsh, D. Farrington, and A. Thomas, “Cctv surveillance for crime prevention: A 40-year systematic review with meta-analysis,” Criminology & Public Policy, vol. 18, pp. 135–159, 03 2019.

K. Nasrollahi and T. B. Moeslund, “Extracting a good quality frontal face image from a low-resolution video sequence,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 21, no. 10, pp. 1353–1362, 2011.

G. Barquero, I. Hupont, and C. Fernández Tena, “Rank-based verification for long-term face tracking in crowded scenes,” IEEE Transactions on Biometrics, Behavior, and Identity Science, vol. 3, no. 4, pp. 495–505, 2021.

B. Cássio, “Ltft-implementation,” [link], 2022.

P. Viola and M. Jones, “Robust real-time face detection,” International Journal of Computer Vision, vol. 57, pp. 137–154, 05 2004.

B. F. Momin and Y. Jere, “Mining visitors in video surveillance system,” in 2015 International Conference on Innovations in Information, Embedded and Communication Systems (ICIIECS), 2015, pp. 1–4.

S. Vignesh, K. M. Priya, and S. S. Channappayya, “Face image quality assessment for face selection in surveillance video using convolutional neural networks,” in 2015 IEEE Global Conference on Signal and Information Processing (GlobalSIP), 2015, pp. 577–581.

K. Nasrollahi and T. B. Moeslund, “Hybrid super resolution using refined face logs,” in 2010 2nd International Conference on Image Processing Theory, Tools and Applications, 2010, pp. 435–440.

P. Barra, S. Barra, C. Bisogni, M. De Marsico, and M. Nappi, “Webshaped model for head pose estimation: An approach for best exemplar selection,” IEEE Transactions on Image Processing, vol. 29, pp. 5457–5468, 2020.

A. Del Bimbo, F. Dini, and G. Lisanti, “A real time solution for face logging,” in 3rd International Conference on Imaging for Crime Detection and Prevention (ICDP 2009), 2009, pp. 1–6.

J. Zheng, R. Ranjan, C.-H. Chen, J.-C. Chen, C. D. Castillo, and R. Chellappa, “An automatic system for unconstrained video-based face recognition,” IEEE Transactions on Biometrics, Behavior, and Identity Science, vol. 2, no. 3, pp. 194–209, 2020.

Y. Cai and H. Gan, “An online face clustering algorithm for face monitoring and retrieval in real-time videos,” in 2019 IEEE Intl Conf on Parallel Distributed Processing with Applications, Big Data Cloud Computing, Sustainable Computing Communications, Social Computing Networking (ISPA/BDCloud/SocialCom/SustainCom), 2019, pp. 825–830.

J. F. Henriques, R. Caseiro, P. Martins, and J. Batista, “High-speed tracking with kernelized correlation filters,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 37, no. 3, pp. 583–596, 2015.

C.-C. Lin and Y. Hung, “A prior-less method for multi-face tracking in unconstrained videos,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.

L. Patino, T. Nawaz, T. Cane, and J. Ferryman, “Pets 2017: Dataset and challenge,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2017, pp. 2126–2132.

R. Fisher, “Caviar: Context aware vision using image-based active recognition,” [link], 2003, acessado em 15/02/2021.

i-LIDS Team, “Imagery library for intelligent detection systems (i-lids); a standard for testing video based detection systems,” in Proceedings 40th Annual 2006 International Carnahan Conference on Security Technology, 2006, pp. 75–80.

S. Oh, A. Hoogs, A. Perera, N. Cuntoor, C. Chen, J. T. Lee, S. Mukherjee, J. K. Aggarwal, H. Lee, L. Davis, E. Swears, X. Wang, Q. Ji, K. Reddy, M. Shah, C. Vondrick, H. Pirsiavash, D. Ramanan, J. Yuen, A. Torralba, B. Song, A. Fong, A. Roy-Chowdhury, and M. Desai, “A large-scale benchmark dataset for event recognition in surveillance video,” in CVPR 2011, 2011, pp. 3153–3160.

J. Pers and D. R. Magee, “CVBASE ’06 - Workshop on Computer Vision Based Analysis in Sport Environments,” [link], 2006, acessado em 21/03/2021.

M. Kristan, R. Pflugfelder, A. Leonardis, J. Matas, F. Porikli, L. Cehovin, G. Nebehay, G. Fernandez, T. Vojir, A. Gatt, A. Khajenezhad, A. Salahledin, A. Soltani-Farani, A. Zarezade, A. Petrosino, A. Milton, B. Bozorgtabar, B. Li, C. S. Chan, C. Heng, D. Ward, D. Kearney, D. Monekosso, H. C. Karaimer, H. R. Rabiee, J. Zhu, J. Gao, J. Xiao, J. Zhang, J. Xing, K. Huang, K. Lebeda, L. Cao, M. E. Maresca, M. K. Lim, M. El Helw, M. Felsberg, P. Remagnino, R. Bowden, R. Goecke, R. Stolkin, S. Y. Lim, S. Maher, S. Poullot, S. Wong, S. Satoh, W. Chen, W. Hu, X. Zhang, Y. Li, and Z. Niu, “The visual object tracking vot2013 challenge results,” in 2013 IEEE International Conference on Computer Vision Workshops, 2013, pp. 98–111.

A. Milan, L. Leal-Taixé, I. Reid, S. Roth, and K. Schindler, “MOT16: A benchmark for multi-object tracking,” arXiv:1603.00831 [cs], Mar. 2016, arXiv: 1603.00831. [Online]. Available: [link]

J. Shen, S. Zafeiriou, G. G. Chrysos, J. Kossaifi, G. Tzimiropoulos, and M. Pantic, “The first facial landmark tracking in-the-wild challenge: Benchmark and results,” in 2015 IEEE International Conference on Computer Vision Workshop (ICCVW), 2015, pp. 1003–1011.

Y. Wong, S. Chen, S. Mau, C. Sanderson, and B. C. Lovell, “Patch-based probabilistic image quality assessment for face selection and improved video-based face recognition,” in IEEE Biometrics Workshop, Computer Vision and Pattern Recognition (CVPR) Workshops. IEEE, June 2011, pp. 81–88.

S. Zhang, X. Zhu, Z. Lei, H. Shi, X. Wang, and S. Z. Li, “Faceboxes: A cpu real-time face detector with high accuracy,” in 2017 IEEE International Joint Conference on Biometrics (IJCB), 2017, pp. 1–9.

H. W. Kuhn, “The hungarian method for the assignment problem,” Naval Research Logistics Quarterly, vol. 2, no. 1-2, pp. 83–97, 1955. [Online]. Available: [link].

Y. Park, L. M. Dang, S. Lee, D. Han, and H. Moon, “Multiple object tracking in deep learning approaches: A survey,” Electronics, vol. 10, no. 19, 2021. [Online]. Available: [link]

S. Yang, P. Luo, C. C. Loy, and X. Tang, “WIDER FACE: A face detection benchmark,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.

J. Li, Y. Wang, C. Wang, Y. Tai, J. Qian, J. Yang, C. Wang, J. Li, and F. Huang, “Dsfd: Dual shot face detector,” in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 5055–5064.

J. Deng, J. Guo, E. Ververas, I. Kotsia, and S. Zafeiriou, “Retinaface: Single-shot multi-level face localisation in the wild,” in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 5202–5211.

C. Chi, S. Zhang, J. Xing, Z. Lei, S. Li, and X. Zou, “Selective refinement network for high performance face detection,” in Association for the Advancement of Artificial Intelligence (AAAI), 07 2019.

D. Qi, W. Tan, Q. Yao, and J. Liu, “Yolo5face: Why reinventing a face detector,” 2021. [Online]. Available: [link]

Y. Zhu, H. Cai, S. Zhang, C. Wang, and Y. Xiong, “Tinaface: Strong but simple baseline for face detection,” 2020. [Online]. Available: [link]

X. Zhu, X. Liu, Z. Lei, and S. Z. Li, “Face alignment in full pose range: A 3d total solution,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 41, no. 1, pp. 78–92, 2019.

Nikitin, A. Konushin, and V. Konushin, “Face quality assessment for face verification in video,” in The 24th International Conference on Computer Graphics and Vision (GraphiCon2014), 2014, pp. 111–114.

B. Boom, G. Beumer, L. Spreeuwers, and R. N. J. Veldhuis, “The effect of image resolution on the performance of a face recognition system,” in 2006 9th International Conference on Control, Automation, Robotics and Vision, 2006, pp. 1–6.

T. Marciniak, A. Chmielewska, R. Weychan, M. Parzych, and A. Dabrowski, “Influence of low resolution of images on reliability of face detection and recognition,” Multimedia Tools and Applications, vol. 74, no. 12, pp. 4329–4349, Jun 2015. [Online]. Available:

K. Bernardin and R. Stiefelhagen, “Evaluating multiple object tracking performance: The clear mot metrics,” EURASIP Journal on Image and Video Processing, vol. 2008, 01 2008.

E. Ristani, F. Solera, R. Zou, R. Cucchiara, and C. Tomasi, “Performance measures and a data set for multi-target, multi-camera tracking,” in Computer Vision – ECCV 2016 Workshops, G. Hua and H. Jégou, Eds. Cham: Springer International Publishing, 2016, pp. 17–35.

R. Henschel, T. von Marcard, and B. Rosenhahn, “Simultaneous identification and tracking of multiple people using video and imus,” in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2019, pp. 780–789.

A. Bewley, Z. Ge, L. Ott, F. Ramos, and B. Upcroft, “Simple online and realtime tracking,” in 2016 IEEE International Conference on Image Processing (ICIP), 2016, pp. 3464–3468.
Como Citar

Selecione um Formato
NASCIMENTO, Cássio B.; J. NETO, Adolfo; SILVA, Luciano. Modular Multi-Face Tracking Geared Toward Face Recognition in Surveillance Videos. In: WORKSHOP DE TESES E DISSERTAÇÕES - CONFERENCE ON GRAPHICS, PATTERNS AND IMAGES (SIBGRAPI), 36. , 2023, Rio Grande/RS. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2023 . p. 21-27. DOI: