A survey on computer vision tools for action recognition, crowd surveillance and suspect retrieval

  • Teófilo de Campos University of Surrey

Resumo


This paper briefly surveys computer vision tools that can be used on surveillance videos for crowd monitoring in order to detect anomalous events and retrieve suspects. Focus is given to methods for action recognition and event detection.

Referências

[Almajai et al. 2010] Almajai, I., Kittler, J., DeCampos, T., Christmas, W., Yan, F., Windridge, D., and Khan, A. (2010). Ball event recognition using hmm for automatic tennis annotation. In Proceedings of Intl. Conf. on Image Processing (ICIP).

[Atmosukarto et al. 2012] Atmosukarto, I., Ghanem, B., and Ahuja, N. (2012). Trajectorybased fisher kernel representation for action recognition in videos. In 21st International Conference on Pattern Recognition (ICPR), pages 3333–3336.

[Chatfield et al. 2011] Chatfield, K., Lempitsky, V., Vedaldi, A., and Zisserman, A. (2011). The devil is in the details: an evaluation of recent feature encoding methods. In British Machine Vision Conference.

[Chatfield et al. 2014] Chatfield, K., Simonyan, K., Vedaldi, A., and Zisserman, A. (2014). Return of the devil in the details: Delving deep into convolutional nets. Technical report, University of Oxford. Archived in arXiv 1405.3531.

[Cho and Kang 2014] Cho, S.-H. and Kang, H.-B. (2014). Abnormal behavior detection using hybrid agents in crowded scenes. Pattern Recognition Letters, 44:64–70.

[Chong et al. 2014] Chong, X., Liu, W., Huang, P., and Badler, N. I. (2014). Hierarchical crowd analysis and anomaly detection. Journal of Visual Languages & Computing. http://dx.doi.org/10.1016/j.jvlc.2013.12.002i.

[Courty et al. 2014] Courty, N., Allain, P., Creusot, C., and Corpetti, T. (2014). Using the agoraset dataset: assessing for the quality of crowd video analysis methods. Pattern Recognition Letters.

[Dalal and Triggs 2005] Dalal, N. and Triggs, B. (2005). Histograms of oriented gradients for human detection. In Proc IEEE Conf on Computer Vision and Pattern Recognition, San Diego CA, June 20-25.

[de Campos et al. 2011] de Campos, T., Barnard, M., Mikolajczyk, K., Kittler, J., Yan, F., Christmas, W., and Windridge, D. (2011). An evaluation of bags-of-words and spatiotemporal shapes for action recognition. In IEEE Workshop on Applications of Computer Vision (WACV), Kona, Hawaii.

[Endres et al. 2011] Endres, D., Neumann, H., Kolesnik, M., and Giese, M. (2011). Hooligan detection: the effects of saliency and expert knowledge. In 4th International Conference on Imaging for Crime Detection and Prevention (ICDP), pages 1–6. IET.

[Fan et al. 2009] Fan, Q., Bobbitt, R., Zhai, Y., Yanagawa, A., Pankanti, S., and Hampapur, A. (2009). Recognition of repetitive sequential human activity. In Proc of the IEEE Conf on Computer Vision and Pattern Recognition, pages 943–950.

[Fan et al. 2013] Fan, Q., Gabbur, P., and Pankanti, S. (2013). Relative attributes for largescale abandoned object detection. In Proc 14th Int Conf on Computer Vision, Australia, pages 2736–2743.

[Felzenswalb et al. 2009] Felzenswalb, P. F., Girshick, R. B., McAllester, D., and Ramanan, D. (2009). Object detection with discriminatively trained part based models. IEEE Transactions on Pattern Analysis and Machine Intelligence.

[Feris et al. 2014] Feris, R., Bobbitt, R., Brown, L., and Pankanti, S. (2014). Attribute-based people search: Lessons learnt from a practical surveillance system. In Proceedings of International Conference on Multimedia Retrieval (ICMR). ACM.

[Gorelick et al. 2007] Gorelick, L., Blank, M., Shechtman, E., Irani, M., and Basri, R. (2007). Actions as space-time shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(12):2247–2253.

[Idrees et al. 2014] Idrees, H., Warner, N., and Shah, M. (2014). Tracking in dense crowds using prominence and neighborhood motion concurrence. Image and Vision Computing, 32(1):14–26.

[Ikisler and Forsyth 2007] Ikisler, N. and Forsyth, D. (2007). Searching video for complex activities with finite state models. In Proc of the IEEE Conf on Computer Vision and Pattern Recognition.

[Jacques-Jr et al. 2010] Jacques-Jr, J. C. S., Musse, S. R., and Jung, C. R. (2010). Crowd analysis using computer vision techniques. IEEE Signal Processing Magazine, 27(5):66–77.

[Ke et al. 2013] Ke, S.-R., Thuc, H. L. U., Lee, Y.-J., Hwang, J.-N., Yoo, J.-H., and Choi, K.-H. (2013). A review on video-based human activity recognition. Computers, 2(2):88–131.

[Ke et al. 2009] Ke, Y., Sukthankar, R., and Herbert, M. (2009). Event detection in crowded videos. In Proc 12th Int Conf on Computer Vision, Kyoto, Japan, Sept 27 - Oct 4.

[Kittler et al. 2014] Kittler, J., Christmas, W., de Campos, T., Windridge, D., Yan, F., Illingworth,

J., and Osman, M. (2014). Domain anomaly detection in machine perception: A system architecture and taxonomy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(5):845–859. http://dx.doi.org/10.1109/TPAMI.2013.209.

[Kläser et al. 2008] Kläser, A., Marszałek, M., and Schmid, C. (2008). A spatio-temporal descriptor based on 3D-gradients. In British Machine Vision Conference, pages 995– 1004.

[Kläser et al. 2010] Kläser, A., Marszałek, M., Schmid, C., and Zisserman, A. (2010). Human focused action localization in video. In International Workshop on Sign, Gesture, Activity. (best paper award winner) in conjunction with ECCV.

[Kosinski et al. 2014] Kosinski, M., Bachrach, Y., Kohli, P., Stillwell, D., and Graepel, T. (2014). Manifestations of user personality in website choice and behaviour on online social networks. Machine Learning, 95(3):357–380.

[Laptev et al. 2008] Laptev, I., Marszalek, M., Schmid, C., and Rozenfeld, B. (2008). Learning realistic human actions from movies. In Proc of the IEEE Conf on Computer Vision and Pattern Recognition, pages 1–8.

[Leach et al. 2014] Leach, M. J., Sparks, E., and Robertson, N. M. (2014). Contextual anomaly detection in crowded surveillance scenes. Pattern Recognition Letters, 44(0):71–79. Pattern Recognition and Crowd Analysis.

[McAuley and Leskovec 2012] McAuley, J. J. and Leskovec, J. (2012). Image labeling on a network: using social-network metadata for image classification. In Proc European Conf on Computer Vision.

[McAuley and Leskovec 2013] McAuley, J. J. and Leskovec, J. (2013). From amateurs to connoisseurs: modeling the evolution of user expertise through online reviews. In World Wide Web, pages 897–908.

[Oneata et al. 2014] Oneata, D., Verbeek, J., and Schmid, C. (2014). Efficient Action Localization with Approximately Normalized Fisher Vectors. In Proc of the IEEE Conf on Computer Vision and Pattern Recognition, Columbus, OH, United States.

[Poppe 2010] Poppe, R. (2010). A survey on vision-based human action recognition. Image and Vision Computing, 28(6):976–990.

[Ramanan et al. 2007] Ramanan, D., Forsyth, D. A., and Zisserman, A. (2007). Tracking people by learning their appearance. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(1):65–81.

[Rodriguez et al. 2011] Rodriguez, M., Sivic, J., Laptev, I., and Audibert, J.-Y. (2011). Data-driven crowd analysis in videos. In Proc 13th Int Conf on Computer Vision, Barcelona, Spain.

[Sadanand and Corso 2012] Sadanand, S. and Corso, J. J. (2012). Action bank: A high-level representation of activity in video. In Proc of the IEEE Conf on Computer Vision and Pattern Recognition, pages 1234–1241.

[Sánchez et al. 2013] Sánchez, J., Perronnin, F., Mensink, T., and Verbeek, J. (2013). Image classification with the fisher vector: Theory and practice. International journal of computer vision, 105(3):222–245.

[Shotton et al. 2011] Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., and Blake, A. (2011). Real-time human pose recognition in parts from single depth images. In Proc of the IEEE Conf on Computer Vision and Pattern Recognition.

[Siddiquie et al. 2011] Siddiquie, B., Feris, R. S., and Davis, L. S. (2011). Image ranking and retrieval based on multi-attribute queries. In Proc of the IEEE Conf on Computer Vision and Pattern Recognition, pages 801–808.

[Sukhbaatar and Fergus 2014] Sukhbaatar, S. and Fergus, R. (2014). Learning from noisy labels with deep neural networks. arXiv preprint arXiv:1406.2080.

[Thida et al. 2013] Thida, M., Yong, Y. L., Climent-Pérez, P., Eng, H.-l., and Remagnino, P. (2013). A literature review on video analytics of crowded scenes. In Intelligent Multimedia Surveillance, pages 17–36. Springer.

[Tian et al. 2008] Tian, Y. L., Feris, R. S., and Hampapur, A. (2008). Real-time detection of abandoned and removed objects in complex environments. In VS.

[Wang et al. 2013] Wang, H., Kläser, A., Schmid, C., and Liu, C.-L. (2013). Dense trajectories and motion boundary descriptors for action recognition. International Journal of Computer Vision, 103(1):60–79.

[Wang et al. 2009] Wang, H., Ullah, M. M., Kläser, A., Laptev, I., and Schmid, C. (2009). Evaluation of local spatio-temporal features for action recognition. In Proc 20th British Machine Vision Conf, London, Sept 7-10.

[Yan et al. 2012] Yan, F., Kittler, J., Mikolajczyk, K., and Windridge, D. (2012). Automatic annotation of court games with structured output learning. In 21st International Conference on Pattern Recognition (ICPR), pages 3577–3580. IEEE.
Publicado
28/07/2014
Como Citar

Selecione um Formato
DE CAMPOS, Teófilo. A survey on computer vision tools for action recognition, crowd surveillance and suspect retrieval. In: SEMINÁRIO INTEGRADO DE SOFTWARE E HARDWARE (SEMISH), 41. , 2014, Brasília. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2014 . p. 120-129. ISSN 2595-6205.