A survey on computer vision tools for action recognition, crowd surveillance and suspect retrieval

Teófilo de Campos

Teófilo de Campos University of Surrey

Referências

[Almajai et al. 2010] Almajai, I., Kittler, J., DeCampos, T., Christmas, W., Yan, F., Windridge, D., and Khan, A. (2010). Ball event recognition using hmm for automatic tennis annotation. In Proceedings of Intl. Conf. on Image Processing (ICIP).

[Atmosukarto et al. 2012] Atmosukarto, I., Ghanem, B., and Ahuja, N. (2012). Trajectorybased fisher kernel representation for action recognition in videos. In 21st International Conference on Pattern Recognition (ICPR), pages 3333–3336.

[Chatfield et al. 2011] Chatfield, K., Lempitsky, V., Vedaldi, A., and Zisserman, A. (2011). The devil is in the details: an evaluation of recent feature encoding methods. In British Machine Vision Conference.

[Chatfield et al. 2014] Chatfield, K., Simonyan, K., Vedaldi, A., and Zisserman, A. (2014). Return of the devil in the details: Delving deep into convolutional nets. Technical report, University of Oxford. Archived in arXiv 1405.3531.

[Cho and Kang 2014] Cho, S.-H. and Kang, H.-B. (2014). Abnormal behavior detection using hybrid agents in crowded scenes. Pattern Recognition Letters, 44:64–70.

[Chong et al. 2014] Chong, X., Liu, W., Huang, P., and Badler, N. I. (2014). Hierarchical crowd analysis and anomaly detection. Journal of Visual Languages & Computing. http://dx.doi.org/10.1016/j.jvlc.2013.12.002i.

[Courty et al. 2014] Courty, N., Allain, P., Creusot, C., and Corpetti, T. (2014). Using the agoraset dataset: assessing for the quality of crowd video analysis methods. Pattern Recognition Letters.

[Dalal and Triggs 2005] Dalal, N. and Triggs, B. (2005). Histograms of oriented gradients for human detection. In Proc IEEE Conf on Computer Vision and Pattern Recognition, San Diego CA, June 20-25.

[de Campos et al. 2011] de Campos, T., Barnard, M., Mikolajczyk, K., Kittler, J., Yan, F., Christmas, W., and Windridge, D. (2011). An evaluation of bags-of-words and spatiotemporal shapes for action recognition. In IEEE Workshop on Applications of Computer Vision (WACV), Kona, Hawaii.

[Endres et al. 2011] Endres, D., Neumann, H., Kolesnik, M., and Giese, M. (2011). Hooligan detection: the effects of saliency and expert knowledge. In 4th International Conference on Imaging for Crime Detection and Prevention (ICDP), pages 1–6. IET.

[Fan et al. 2009] Fan, Q., Bobbitt, R., Zhai, Y., Yanagawa, A., Pankanti, S., and Hampapur, A. (2009). Recognition of repetitive sequential human activity. In Proc of the IEEE Conf on Computer Vision and Pattern Recognition, pages 943–950.

[Fan et al. 2013] Fan, Q., Gabbur, P., and Pankanti, S. (2013). Relative attributes for largescale abandoned object detection. In Proc 14th Int Conf on Computer Vision, Australia, pages 2736–2743.

[Felzenswalb et al. 2009] Felzenswalb, P. F., Girshick, R. B., McAllester, D., and Ramanan, D. (2009). Object detection with discriminatively trained part based models. IEEE Transactions on Pattern Analysis and Machine Intelligence.

[Feris et al. 2014] Feris, R., Bobbitt, R., Brown, L., and Pankanti, S. (2014). Attribute-based people search: Lessons learnt from a practical surveillance system. In Proceedings of International Conference on Multimedia Retrieval (ICMR). ACM.

[Gorelick et al. 2007] Gorelick, L., Blank, M., Shechtman, E., Irani, M., and Basri, R. (2007). Actions as space-time shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(12):2247–2253.

[Idrees et al. 2014] Idrees, H., Warner, N., and Shah, M. (2014). Tracking in dense crowds using prominence and neighborhood motion concurrence. Image and Vision Computing, 32(1):14–26.

[Ikisler and Forsyth 2007] Ikisler, N. and Forsyth, D. (2007). Searching video for complex activities with finite state models. In Proc of the IEEE Conf on Computer Vision and Pattern Recognition.

[Jacques-Jr et al. 2010] Jacques-Jr, J. C. S., Musse, S. R., and Jung, C. R. (2010). Crowd analysis using computer vision techniques. IEEE Signal Processing Magazine, 27(5):66–77.

[Ke et al. 2013] Ke, S.-R., Thuc, H. L. U., Lee, Y.-J., Hwang, J.-N., Yoo, J.-H., and Choi, K.-H. (2013). A review on video-based human activity recognition. Computers, 2(2):88–131.

[Ke et al. 2009] Ke, Y., Sukthankar, R., and Herbert, M. (2009). Event detection in crowded videos. In Proc 12th Int Conf on Computer Vision, Kyoto, Japan, Sept 27 - Oct 4.

[Kittler et al. 2014] Kittler, J., Christmas, W., de Campos, T., Windridge, D., Yan, F., Illingworth,

J., and Osman, M. (2014). Domain anomaly detection in machine perception: A system architecture and taxonomy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(5):845–859. http://dx.doi.org/10.1109/TPAMI.2013.209.

[Kläser et al. 2008] Kläser, A., Marszałek, M., and Schmid, C. (2008). A spatio-temporal descriptor based on 3D-gradients. In British Machine Vision Conference, pages 995– 1004.

[Kläser et al. 2010] Kläser, A., Marszałek, M., Schmid, C., and Zisserman, A. (2010). Human focused action localization in video. In International Workshop on Sign, Gesture, Activity. (best paper award winner) in conjunction with ECCV.

[Kosinski et al. 2014] Kosinski, M., Bachrach, Y., Kohli, P., Stillwell, D., and Graepel, T. (2014). Manifestations of user personality in website choice and behaviour on online social networks. Machine Learning, 95(3):357–380.

[Laptev et al. 2008] Laptev, I., Marszalek, M., Schmid, C., and Rozenfeld, B. (2008). Learning realistic human actions from movies. In Proc of the IEEE Conf on Computer Vision and Pattern Recognition, pages 1–8.

[Leach et al. 2014] Leach, M. J., Sparks, E., and Robertson, N. M. (2014). Contextual anomaly detection in crowded surveillance scenes. Pattern Recognition Letters, 44(0):71–79. Pattern Recognition and Crowd Analysis.

[McAuley and Leskovec 2012] McAuley, J. J. and Leskovec, J. (2012). Image labeling on a network: using social-network metadata for image classification. In Proc European Conf on Computer Vision.

[McAuley and Leskovec 2013] McAuley, J. J. and Leskovec, J. (2013). From amateurs to connoisseurs: modeling the evolution of user expertise through online reviews. In World Wide Web, pages 897–908.

[Oneata et al. 2014] Oneata, D., Verbeek, J., and Schmid, C. (2014). Efficient Action Localization with Approximately Normalized Fisher Vectors. In Proc of the IEEE Conf on Computer Vision and Pattern Recognition, Columbus, OH, United States.

[Poppe 2010] Poppe, R. (2010). A survey on vision-based human action recognition. Image and Vision Computing, 28(6):976–990.

[Ramanan et al. 2007] Ramanan, D., Forsyth, D. A., and Zisserman, A. (2007). Tracking people by learning their appearance. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(1):65–81.

[Rodriguez et al. 2011] Rodriguez, M., Sivic, J., Laptev, I., and Audibert, J.-Y. (2011). Data-driven crowd analysis in videos. In Proc 13th Int Conf on Computer Vision, Barcelona, Spain.

[Sadanand and Corso 2012] Sadanand, S. and Corso, J. J. (2012). Action bank: A high-level representation of activity in video. In Proc of the IEEE Conf on Computer Vision and Pattern Recognition, pages 1234–1241.

[Sánchez et al. 2013] Sánchez, J., Perronnin, F., Mensink, T., and Verbeek, J. (2013). Image classification with the fisher vector: Theory and practice. International journal of computer vision, 105(3):222–245.

[Shotton et al. 2011] Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., and Blake, A. (2011). Real-time human pose recognition in parts from single depth images. In Proc of the IEEE Conf on Computer Vision and Pattern Recognition.

[Siddiquie et al. 2011] Siddiquie, B., Feris, R. S., and Davis, L. S. (2011). Image ranking and retrieval based on multi-attribute queries. In Proc of the IEEE Conf on Computer Vision and Pattern Recognition, pages 801–808.

[Sukhbaatar and Fergus 2014] Sukhbaatar, S. and Fergus, R. (2014). Learning from noisy labels with deep neural networks. arXiv preprint arXiv:1406.2080.

[Thida et al. 2013] Thida, M., Yong, Y. L., Climent-Pérez, P., Eng, H.-l., and Remagnino, P. (2013). A literature review on video analytics of crowded scenes. In Intelligent Multimedia Surveillance, pages 17–36. Springer.

[Tian et al. 2008] Tian, Y. L., Feris, R. S., and Hampapur, A. (2008). Real-time detection of abandoned and removed objects in complex environments. In VS.

[Wang et al. 2013] Wang, H., Kläser, A., Schmid, C., and Liu, C.-L. (2013). Dense trajectories and motion boundary descriptors for action recognition. International Journal of Computer Vision, 103(1):60–79.

[Wang et al. 2009] Wang, H., Ullah, M. M., Kläser, A., Laptev, I., and Schmid, C. (2009). Evaluation of local spatio-temporal features for action recognition. In Proc 20th British Machine Vision Conf, London, Sept 7-10.

[Yan et al. 2012] Yan, F., Kittler, J., Mikolajczyk, K., and Windridge, D. (2012). Automatic annotation of court games with structured output learning. In 21st International Conference on Pattern Recognition (ICPR), pages 3577–3580. IEEE.