FASTensor: A tensor framework for spatiotemporal description

Virgínia F. Mota; Jefersson A. dos Santos; Arnaldo de A. Araújo

doi:10.5753/sibgrapi.est.2019.8298

Virgínia F. Mota UFMG
Jefersson A. dos Santos UFMG
Arnaldo de A. Araújo UFMG

DOI: https://doi.org/10.5753/sibgrapi.est.2019.8298

Resumo

Spatiotemporal description is a research field with applications in various areas such as video indexing, surveillance, human-computer interfaces, among others. Big Data problems in large databases are now being treated with Deep Learning tools, however we still have room for improvement in spatiotemporal handcraft description. Moreover, we still have problems that involve small data in which data augmentation and other techniques are not valid. The main contribution of this Ph.D. Thesis 1 is the development of a framework for spatiotemporal representation using orientation tensors enabling dimension reduction and invariance. This is a multipurpose framework called Features As Spatiotemporal Tensors (FASTensor). We evaluate this framework in three different applications: Human Action recognition, Video Pornography classification and Cancer Cell classification. The latter one is also a contribution of this work, since we introduce a new dataset called Melanoma Cancer Cell dataset (MCC). It is a small data that cannot be artificially augmented due the difficulty of extraction and the nature of motion. The results were competitive, while also being fast and simple to implement. Finally, our results in the MCC dataset can be used in other cancer cell treatment analysis.

Referências

V. Sze, M. Budagavi, G. J. Sullivan, and E. , High Efficiency Video Coding: Algorithms and Architectures. Springer, 07 2014.

X. Lan, M. Ye, S. Zhang, H. Zhou, and P. C. Yuen, “Modality-correlation-aware sparse representation for rgb-infrared object tracking,” Pattern Recognition Letters, 2018. https://doi.org/10.1016/j.patrec.2018.10.002

K. Souza, A. d. A. Araújo, Z. Patrocı́nio Jr, and S. Guimarães, “Graph-based hierarchical video segmentation based on a simple dissimilarity measure,” Pattern Recognition Letters, vol. 47, pp. 85–92, 10 2014. https://doi.org/10.1016/j.patrec.2014.02.016

R. Prates and W. R. Schwartz, “Kernel multiblock partial least squares for a scalable and multicamera person reidentification system,” Journal of Electronic Imaging, vol. 27, no. 3, pp. 1–33, 2018. https://doi.org/10.1117/1.JEI.27.3.033041

J. Almeida, J. A. dos Santos, B. Alberton, L. P. C. Morellato, and R. da S. Torres, “Phenological visual rhythms: Compact representations for fine-grained plant species identification,” Pattern Recognition Letters, vol. 81, pp. 90–100, 2016. https://doi.org/10.1016/j.patrec.2015.11.028

F. Kriegel, R. Köhler, J. Bayat-Sarmadi, S. Bayerl, A. E. Hauser, R. Niesner, A. Luch, and Z. Cseresnyés, “Cell shape characterization and classification with discrete fourier transforms and self-organizing maps,” Cytometry Part A, vol. 93, 10 2017. https://doi.org/10.1002/cyto.a.23279

C. Schuldt, I. Laptev, and B. Caputo, “Recognizing human actions: A local svm approach,” in International Conference on Pattern Recognition, 2004, pp. 32–36. https://doi.org/10.1109/ICPR.2004.1334462

J. Hu, G.-S. Xia, F. Hu, and L. Zhang, “Dense v.s. sparse: A comparative study of sampling analysis in scene classification of high-resolution remote sensing imagery,” ArXiv e-prints, 02 2015.

A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei, “Large-scale video classification with convolutional neural networks,” in Computer Vision and Pattern Recognition, June 2014, pp. 1725–1732. https://doi.org/10.1109/CVPR.2014.223

J. Wehrmann, G. S. Simões, R. C. Barros, and V. F. Cavalcante, “Adult content detection in videos with convolutional and recurrent neural networks,” Neurocomputing, vol. 272, pp. 432–438, 2018. https://doi.org/10.1016/j.neucom.2017.07.012

I. Laptev, M. Marszałek, C. Schmid, and B. Rozenfeld, “Learning realistic human actions from movies,” in Computer Vision & Pattern Recognition, jun 2008. https://doi.org/10.1109/CVPR.2008.4587756

J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in Computer Vision and Pattern Recognition, 2009, pp. 248–255. https://doi.org/10.1109/CVPR.2009.5206848

M. Everingham, S. A. Eslami, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman, “The pascal visual object classes challenge: A retrospective,” International journal of computer vision, vol. 111, no. 1, pp. 98–136, 2015. https://doi.org/10.1007/s11263-014-0733-5

A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Neural Information Processing Systems, F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger, Eds. Curran Associates, Inc., 2012, pp. 1097–1105.

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Computer Vision and Pattern Recognition, 2016, pp. 770–778. https://doi.org/10.1109/CVPR.2016.90

J. Liu, J. Luo, and M. Shah, “Recognizing realistic actions from videos in the wild,” Computer Vision and Pattern Recognition, 2009. https://doi.org/10.1109/CVPR.2009.5206744

M. Marszałek, I. Laptev, and C. Schmid, “Actions in context,” in Computer Vision and Pattern Recognition, jun 2009. https://doi.org/10.1109/CVPR.2009.5206557

D. Sad, V. Mota, L. Maciel, M. B. Vieira, and A. de Albuquerque Araújo, “A tensor motion descriptor based on multiple gradient estimators,” in SIBGRAPI, aug 2013. https://doi.org/10.1109/SIBGRAPI.2013.19

V. Mota, J. Souza, A. de Albuquerque Araújo, and M. B. Vieira, “Combining orientation tensors for human action recognition,” in SIBGRAPI, aug 2013. https://doi.org/10.1109/SIBGRAPI.2013.52

V. F. Mota, E. A. Perez, L. M. Maciel, M. B. Vieira, and P.-H. Gosselin, “A tensor motion descriptor based on histograms of gradients and optical flow,” Pattern Recognition Letters, vol. 39, pp. 85–91, April 2014. https://doi.org/10.1016/j.patrec.2013.08.008

E. A. Perez, V. F. Mota, L. M. Maciel, D. Sad, and M. B. Vieira, “Combining gradient histograms using orientation tensors for human action recognition,” in International Conference on Pattern Recognition, 2012, pp. 3460–3463.

V. F. Mota, E. A. Perez, M. B. Vieira, L. M. Maciel, F. Precioso, and P.-H. Gosselin, “A tensor based on optical flow for global description of motion in videos,” in SIBGRAPI, august 2012, pp. 298–301. https://doi.org/10.1109/SIBGRAPI.2012.48

F. L. M. Oliveira, H. Maia, V. Mota, M. Vieira, and A. Araujo, “A variable size block matching based descriptor for human action recognition,” Journal of Communication and Information Systems, vol. 30, no. 1, 2015. https://doi.org/10.14209/jcis.2015.11

T. Kobayashi and N. Otsu, “Motion recognition using local autocorrelation of spacetime gradients,” Pattern Recognition Letters, vol. 33, no. 9, pp. 1188 – 1195, 2012. https://doi.org/10.1016/j.patrec.2012.01.007

M. Faraki, M. Palhang, and C. Sanderson, “Log-euclidean bag of words for human action recognition,” in IET Computer Vision (IET-CV), 2014. https://doi.org/10.1049/iet-cvi.2014.0018

S. Sadanand and J. J. Corso, “Action bank: A high-level representation of activity in video,” in Computer Vision and Pattern Recognition, 2012, pp. 1234–1241. https://doi.org/10.1109/CVPR.2012.6247806

R. Minhas, A. Baradarani, S. Seifzadeh, and Q. M. Jonathan Wu, “Human action recognition using extreme learning machine based on visual vocabularies,” Neurocomputing, vol. 73, no. 10-12, pp. 1906–1917, Jun. 2010. https://doi.org/10.1016/j.neucom.2010.01.020

H. Wang, A. Kläser, C. Schmid, and C.-L. Liu, “Dense trajectories and motion boundary descriptors for action recognition,” International Journal of Computer Vision, Mar. 2013. https://doi.org/10.1007/s11263-012-0594-8

A. Gilbert, J. Illingworth, and R. Bowden, “Action recognition using mined hierarchical compound features,” Pattern Analysis and Machine Intelligence, vol. 33, no. 5, pp. 883–897, 2011. https://doi.org/10.1109/TPAMI.2010.144

Q. V. Le, W. Y. Zou, S. Y. Yeung, and A. Y. Ng, “Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis,” in Computer Vision and Pattern Recognition, 2011, pp. 3361–3368. https://doi.org/10.1109/CVPR.2011.5995496

H. Wang, A. Kläser, C. Schmid, and L. Cheng-Lin, “Action Recognition by Dense Trajectories,” in Conference on Computer Vision and Pattern Recognition, Colorado Springs, United States, Jun. 2011, pp. 3169–3176. https://doi.org/10.1109/CVPR.2011.5995407

A. Kovashka and K. Grauman, “Learning a hierarchy of discriminative space-time neighborhood features for human action recognition,” in Computer Vision and Pattern Recognition, 2010. https://doi.org/10.1109/CVPR.2010.5539881

L. Shao and R. Gao, “A wavelet based local descriptor for human action recognition,” in British Machine Vision Conference, 2010, pp. 72.1–10, doi:10.5244/C.24.72. http://dx.doi.org/10.5244/C.24.72

O. Kihl, D. Picard, and P.-H. Gosselin, “A unified formalism for video descriptor,” in International Conference on Image Processing, 2013. https://doi.org/10.1109/ICIP.2013.6738498

M. Jain, H. Jégou, and P. Bouthemy, “Better exploiting motion for better action recognition,” in Computer Vision and Pattern Recognition, Apr. 2013. https://doi.org/10.1109/CVPR.2013.330

E. Vig, M. Dorr, and D. D. Cox, “Saliency-based selection of sparse descriptors for action recognition,” International Conference on Image Processing, pp. 1405–1408, 2012. https://doi.org/10.1109/ICIP.2012.6467132

T. Zhou, N. Li, X. Cheng, Q. Xu, L. Zhou, and Z. Wu, “Learning semantic context feature-tree for action recognition via nearest neighbor fusion,” Neurocomputing, vol. 201, pp. 1–11, 2016. https://doi.org/10.1016/j.neucom.2016.04.007

X. Peng, C. Zou, Y. Qiao, and Q. Peng, “Action recognition with stacked fisher vectors,” in European Conference on Computer Vision, 2014, pp. 581–595. https://doi.org/10.1007/978-3-319-10602-1_38

A. Liu, Y. Su, W. Nie, and M. Kankanhalli, “Hierarchical clustering multi-task learning for joint human action grouping and recognition,” Pattern Analysis and Machine Intelligence, vol. 39, no. 1, pp. 102–114, Jan 2017. https://doi.org/10.1109/TPAMI.2016.2537337

H. Kuehne, H. Jhuang, E. Garrote, T. Poggio, and T. Serre, “Hmdb: a large video database for human motion recognition,” in International Conference on Computer Vision, 2011. https://doi.org/10.1109/ICCV.2011.6126543

S. Avila, N. Thome, M. Cord, E. Valle, and A. De Albuquerque AraúJo, “Pooling in image representation: The visual codeword point of view,” Computer Vision and Image Understanding, vol. 117, no. 5, pp. 453–465, 2013. https://doi.org/10.1016/j.cviu.2012.09.007

K. E. A. Van de Sande, T. Gevers, and C. G. M. Snoek, “Evaluating color descriptors for object and scene recognition,” Pattern Analysis and Machine Intelligence, vol. 32, no. 9, pp. 1582–1596, 2010. https://doi.org/10.1109/TPAMI.2009.154

C. Caetano, S. Avila, W. R. Schwartz, S. J. F. G. aes, and A. de A. Araújo, “A mid-level video representation based on binary descriptors: A case study for pornography detection,” Neurocomputing, vol. 213, pp. 102 – 114, 2016, binary Representation Learning in Computer Vision. https://doi.org/10.1016/j.neucom.2016.03.099

D. Moreira, S. Avila, M. Perez, D. Moraes, V. Testoni, E. Valle, S. Goldenstein, and A. Rocha, “Pornography classification: The hidden clues in video space-time,” Forensic Science International, 2016. https://doi.org/10.1016/j.forsciint.2016.09.010

C. Decaestecker, O. Debeir, P. Van Ham, and R. Kiss, “Can antimigratory drugs be screened in vitro? a review of 2d and 3d assays for the quantitative analysis of cell migration,” Medicinal Research Reviews, vol. 27, no. 2, pp. 149–176, 2007. https://doi.org/10.1002/med.20078

N. Ramnath and P. Creaven, “Matrix metalloproteinase inhibitors,” Current Oncology, vol. 6, March 2004. https://doi.org/10.1007/s11912-004-0020-7

V. F. Mota, H. Oliveira, S. Scalzo, D. D., R. J. Santos, J. A. dos Santos, and A. A. Araújo, “From video pornography to cancer cells: a tensor framework for spatiotemporal description,” Multimedia Tools and Applications. Under Review, 2018.

V. F. Mota, M. B. Vieira, and A. A. Araújo, “Busca por imagens e vı́deos com base no conteúdo visual: Uma introdução,” in Anais da VII Escola Regional de Informática de Minas Gerais, 2012, pp. 1–24.

V. F. Mota, G. D. Dias, W. Santos, M. Vieira, and A. Araujo, “Tensor clustering for human action recognition,” in Workshop of Works in Progress (SIBGRAPI), 2015.

H. A. Maia, A. M. O. Figueiredo, F. L. M. Oliveira, V. F. Mota, and M. B. Vieira, “A video tensor self-descriptor based on variable size block matching,” Journal of Mobile Multimedia, vol. 11, pp. 90–102, 2015.

A. M. O. Figueiredo, M. Caniato, V. F. Mota, R. L. S. Silva, and M. B. Bernardes, “A video self-descriptor based on sparse trajectory clustering,” in International Conference in Computer Science and its Applications, 2016, pp. 571–583. https://doi.org/10.1007/978-3-319-42108-7_45

C. S. Lenzoni, G. De Paula, L. W. De Freitas, V. F. Mota, L. Pires, and N. Fernandes, “Ferramenta de assistência médica para o estudo de declı́nio cognitivo em pacientes com doença renal crônica,” in Workshop of Works in Progress/XXIX Conference on Graphics, Patterns and Images (SIBGRAPI), 2016, pp. 571–583.

A. M. O. Figueiredo, H. A. Maia, F. L. M. Oliveira, V. Mota, and M. B. Vieira, “A video tensor self-descriptor based on block matching in: Computational science and its applications,” in International Conference in Computer Science and its Applications, 2014, pp. 401–414. https://doi.org/10.1007/978-3-319-09153-2_30

F. L. M. Oliveira, H. Maia, V. F. Mota, M. B. Vieira, and A. A. Araujo, “Video tensor self-descriptor based on variable size block matching,” in WVHAR - Workshop on Vision-based Human Activity Recognition (SIBGRAPI), 2014.

C. E. Santor Jr, J. I. C. Souza, V. F. Mota, G. Sad, G. Gorgulho, and A. A. Araújo, “Panview: An extensible panoramic video viewer for the web,” in Latin American Web Congress (LAWEB), 2014. https://doi.org/10.1109/LAWeb.2014.19

FASTensor: A tensor framework for spatiotemporal description

Resumo

Referências

Artigos mais lidos do(s) mesmo(s) autor(es)