A Novel Human-Machine Hybrid Framework for Person Re-Identification from Full Frame Videos

Felix Olivier Sumari Huayta; Esteban Gonzalez Clúa; Joris Guérin

doi:10.5753/sibgrapi.est.2021.20013

Felix Olivier Sumari Huayta UFF
Esteban Gonzalez Clúa UFF
Joris Guérin Université de Toulouse

DOI: https://doi.org/10.5753/sibgrapi.est.2021.20013

Resumo

With the major adoption of automation for cities security, person re-identification (Re-ID) has been extensively studied. In this dissertation, we argue that the current way of studying person re-identification, i.e. by trying to re-identify a person within already detected and pre-cropped images of people, is not sufficient to implement practical security applications, where the inputs to the system are the full frames of the video streams. To support this claim, we introduce the Full Frame Person Re-ID setting (FF-PRID) and define specific metrics to evaluate FF-PRID implementations. To improve robustness, we also formalize the hybrid human-machine collaboration framework, which is inherent to any Re-ID security applications. To demonstrate the importance of considering the FF-PRID setting, we build an experiment showing that combining a good people detection network with a good Re-ID model does not necessarily produce good results for the final application. This underlines a failure of the current formulation in assessing the quality of a Re-ID model and justifies the use of different metrics. We hope that this work will motivate the research community to consider the full problem in order to develop algorithms that are better suited to real-world scenarios.

Referências

A. Hampapur, L. Brown, J. Connell, S. Pankanti, A. Senior, and Y. Tian, “Smart surveillance: applications, technologies and implications,” in Fourth International Conference on Information, Communications and Signal Processing, 2003 and the Fourth Pacific Rim Conference on Multimedia. Proceedings of the 2003 Joint, vol. 2. IEEE, 2003, pp. 1133–1138.

M. Ye, J. Shen, G. Lin, T. Xiang, L. Shao, and S. C. Hoi, “Deep learning for person re-identification: A survey and outlook,” arXiv preprint arXiv:2001.04193, 2020.

L. Zheng, L. Shen, L. Tian, S. Wang, J. Wang, and Q. Tian, “Scalable person re-identification: A benchmark,” in Proceedings of the IEEE international conference on computer vision, 2015, pp. 1116–1124.

Z. Zheng, L. Zheng, and Y. Yang, “Unlabeled samples generated by gan improve the person re-identification baseline in vitro,” in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 3754–3762.

W. Li, R. Zhao, T. Xiao, and X. Wang, “Deepreid: Deep filter pairing neural network for person re-identification,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 152– 159.

D. Gray and H. Tao, “Viewpoint invariant pedestrian recognition with an ensemble of localized features,” in European conference on computer vision. Springer, 2008, pp. 262–275.

M. Hirzer, C. Beleznai, P. M. Roth, and H. Bischof, “Person reidentification by descriptive and discriminative classification,” in Scandinavian conference on Image analysis. Springer, 2011, pp. 91–102.

A. Brunetti, D. Buongiorno, G. F. Trotta, and V. Bevilacqua, “Computer vision and deep learning techniques for pedestrian detection and tracking: A survey,” Neurocomputing, vol. 300, pp. 17–33, 2018.

D. Ouyang, Y. Zhang, and J. Shao, “Video-based person re-identification via spatio-temporal attentional and two-stream fusion convolutional networks,” Pattern Recognition Letters, vol. 117, pp. 153–160, 2019.

S. Li, S. Bak, P. Carr, and X. Wang, “Diversity regularized spatiotemporal attention for video-based person re-identification,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 369–378.

A. Ejaz, M. Jones, and T. K. Marks, “An Improved Deep Learning Architecture for Person Re-Identification,” Cvpr, pp. 3908–3916, 2015. [Online]. Available: [link].

H. Wang, Y. Fan, Z. Wang, L. Jiao, and B. Schiele, “Parameter-Free Spatial Attention Network for Person Re-Identification,” 2018. [Online]. Available: http://arxiv.org/abs/1811.12150

Z. Zheng, X. Yang, Z. Yu, L. Zheng, Y. Yang, and J. Kautz, “Joint Discriminative and Generative Learning for Person Re-identification,” 2019. [Online]. Available: http://arxiv.org/abs/1904.07223

Y. Yan, B. Ni, J. Liu, and X. Yang, “Multi-level attention model for person re-identification,” Pattern Recognition Letters, vol. 127, pp. 156– 164, 2019.

N. Martinel, C. Micheloni, and G. L. Foresti, “A pool of multiple person re-identification experts,” Pattern Recognition Letters, vol. 71, pp. 23– 30, 2016.

L. Zheng, Y. Yang, and A. G. Hauptmann, “Person re-identification: Past, present and future,” arXiv preprint arXiv:1610.02984, 2016.

R. Satta, F. Pala, G. Fumera, and F. Roli, “Real-time appearance-based person re-identification over multiple kinecttm cameras.” in VISAPP (2), 2013, pp. 407–410.

C.-Y. Wang, P.-Y. Chen, M.-C. Chen, J.-W. Hsieh, and H.-Y. M. Liao, “Real-time video-based person re-identification surveillance with lightweight deep convolutional networks,” in 2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS). IEEE, 2019, pp. 1–8.

Y. Li, Z. Wu, S. Karanam, and R. J. Radke, “Real-world re-identification in an airport camera network,” in Proceedings of the International Conference on Distributed Smart Cameras, 2014, pp. 1–6.

O. Camps, M. Gou, T. Hebble, S. Karanam, O. Lehmann, Y. Li, R. J. Radke, Z. Wu, and F. Xiong, “From the lab to the real world: Reidentification in an airport camera network,” IEEE transactions on circuits and systems for video technology, vol. 27, no. 3, pp. 540–553, 2016.

A. Shenoi, M. Patel, J. Gwak, P. Goebel, A. Sadeghian, H. Rezatofighi, R. Martin-Martin, and S. Savarese, “Jrmot: A real-time 3d multi-object tracker and a new large-scale dataset,” arXiv preprint arXiv:2002.08397, 2020.

E. Togootogtokh, C. Micheloni, G. L. Foresti, and N. Martinel, “An efficient uav-based artificial intelligence framework for real-time visual tasks,” arXiv preprint arXiv:2004.06154, 2020.

S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks,” in Advances in neural information processing systems, 2015, pp. 91–99.

J. Redmon and A. Farhadi, “Yolov3: An incremental improvement,” arXiv, 2018.

S. Levine, C. Finn, T. Darrell, and P. Abbeel, “End-to-end training of deep visuomotor policies,” The Journal of Machine Learning Research, vol. 17, no. 1, pp. 1334–1373, 2016.

R. He, T. Tan, L. Davis, and Z. Sun, “Learning structured ordinal measures for video based face recognition,” Pattern Recognition, vol. 75, pp. 4–14, 2018.

L. Liu, W. Ouyang, X. Wang, P. Fieguth, J. Chen, X. Liu, and M. Pietik¨ainen, “Deep learning for generic object detection: A survey,” International journal of computer vision, vol. 128, no. 2, pp. 261–318, 2020.

Z.-Q. Zhao, P. Zheng, S.-t. Xu, and X. Wu, “Object detection with deep learning: A review,” IEEE transactions on neural networks and learning systems, vol. 30, no. 11, pp. 3212–3232, 2019.

M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman, “The pascal visual object classes (voc) challenge,” International journal of computer vision, vol. 88, no. 2, pp. 303–338, 2010.

Q. Leng, M. Ye, and Q. Tian, “A survey of open-world person reidentification,” IEEE Transactions on Circuits and Systems for Video Technology, 2019.

F. O. Sumari, L. Machaca, J. Huaman, E. W. Clua, and J. Guérin, “Towards practical implementations of person re-identification from full video frames,” Pattern Recognition Letters, vol. 138, pp. 513–519, 2020.