Video Segmentation Learning Using Cascade Residual Convolutional Neural Network

  • Daniel Felipe S. Santos Sao Paulo State University
  • Rafael Gonçalves Pires Sao Paulo State University
  • Danilo Colombo Cenpes-Petrobras
  • Joao Papa Sao Paulo State University


Video segmentation consists of a frame-by-frame selection process of meaningful areas related to foreground moving objects. Some applications include traffic monitoring, human tracking, action recognition, efficient video surveillance, and anomaly detection. In these applications, it is not rare to face challenges such as abrupt changes in weather conditions, illumination issues, shadows, subtle dynamic background motions, and also camouflage effects. In this work, we address such shortcomings by proposing a novel deep learning video segmentation approach that incorporates residual information into the foreground detection learning process. The main goal is to provide a method capable of generating an accurate foreground detection given a grayscale video. Experiments conducted on the Change Detection 2014 and on the private dataset PetrobrasROUTES from Petrobras support the effectiveness of the proposed approach concerning some state-of-the-art video segmentation techniques, with overall F-measures of 0.9535 and 0.9636 in the Change Detection 2014 and PetrobrasROUTES datasets, respectively. Such a result places the proposed technique amongst the top 3 state-of-the-art video segmentation methods, besides comprising approximately seven times less parameters than its top one counterpart.

Palavras-chave: Video Segmentation, Deep Learning, Foreground Object Detection, Residual Map


J. Kato, T. Watanabe, S. Joga, J. Rittscher, A. Blake, "An HMM-based segmentation method for traffic monitoring movies", Transactions on Pattern Analysis and Machine Intelligence, vol. no. 9, pp. 1291-1296, 2002.

J. Zhou, J. Hoang, "Real time robust human detection and tracking system", Computer Society Conference on Computer Vision and Pattern Recognition, pp. 149-12005.

S. Ji, W. Xu, M. Yang, K. Yu, "3D convolutional neural networks for human action recognition", Transactions on Pattern Analysis and Machine Intelligence, vol. no. 1, pp. 221-22012.

S. Brutzer, B. Höferlin, G. Heidemann, "Evaluation of background subtraction techniques for video surveillance", Conference on Computer Vision and Pattern Recognition, pp. 1937-192011.

V. Chandola, A. Banerjee, V. Kumar, "Anomaly detection: A survey", Computing Surveys, vol. no. 3, pp. 2009.

D. Sakkos, H. Liu, J. Han, L. Shao, "End-to-end video background subtraction with 3d convolutional neural networks", Multimedia Tools and Applications, vol. no. pp. 23 023-23 02018.

M. Braham, M. Van, "Deep background subtraction with scene-specific convolutional neural networks", International Conference on Systems Signals and Image Processing, pp. 1-4, 2016.

A. Lanza, L. di Stefano, "Statistical change detection by the pool adjacent violators algorithm", Transactions on Pattern Analysis and Machine Intelligence, vol. no. 9, pp. 1894-192011.

S. Varadarajan, P. Miller, H. Zhou, "Spatial mixture of gaussians for dynamic background modelling", International Conference on Advanced Video and Signal Based Surveillance, pp. 63-2013.

J. D. Pulgarin-Giraldo, A. Alvarez-Meza, D. Insuasti-Ceballos, T. Bouw-Mans, G. Castellanos-Dominguez, "GMM background modeling using divergence-based weight updating" in Iberoamerican Congress on Pattern Recognition, Springer, pp. 282-290, 2016.

T. Bouwmans, "Background subtraction for visual surveillance: A fuzzy approach", Handbook on soft computing for video surveillance, vol. 5, pp. 103-12012.

D. Farcas, C. Marghes, T. Bouwmans, "Background subtraction via incremental maximum margin criterion: a discriminative subspace approach", Machine Vision and Applications, vol. no. 6, pp. 1083-1101, 2012.

S. Javed, T. Bouwmans, S. K. Jung, "Combining ARF and OR-PCA for robust background subtraction of noisy videos", International Conference on Image Analysis and Processing, pp. 340-32015.

J. A. Ramirez-Quintana, M. I. Chacon-Murguia, "Self-adaptive SOM-CNN neural system for dynamic object detection in normal and complex scenarios", Pattern Recognition, vol. no. 4, pp. 1137-112015.

A. Schofield, P. Mehta, T. J. Stonham, "A system for counting people in video images using neural networks to identify the background scene", Pattern Recognition, vol. no. 8, pp. 1421-141996.

K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition, 2014.

Y. Wang, Z. Luo, P.-M. Jodoin, "Interactive deep learning method for segmenting moving objects", Pattern Recognition Letters, vol. 96, pp. 66-2017.

L. A. Lim, H. Y. Keles, "Foreground segmentation using convolutional neural networks for multiscale feature encoding", Pattern Recognition Letters, vol. 1pp. 256-22018.

K. Zhang, W. Zuo, Y. Chen, D. Meng, L. Zhang, "Beyond a gaussian denoiser: Residual learning of deep CNN for image denoising", Transactions on Image Processing, vol. no. 7, pp. 3142-312017.

K. He, X. Zhang, S. Ren, J. Sun, "Deep residual learning for image recognition", Conference on Computer vision and Pattern Recognition, pp. 770-72016.

J. Pang, W. Sun, J. S. Ren, C. Yang, Q. Yan, "Cascade residual learning: A two-stage convolutional neural network for stereo matching", International Conference on Computer Vision, pp. 887-895, 2017.

K. Zhang, W. Zuo, L. Zhang, "FFDNet: Toward a fast and flexible solution for CNN-based image denoising", Transactions on Image Processing, vol. no. 9, pp. 4608-462018.

A. Bevilacqua, L. di Stefano, A. Lanza, "A simple self-calibration method to infer a non-parametric model of the imaging system noise", Workshops on Applications of Computer Vision, vol. 1, pp. 229-22005.

S. Ioffe, C. Szegedy, Batch normalization: Accelerating deep network training by reducing internal covariate shift, 2015. (CDNET), [online] Available:

Petrobras, [online] Available:

Foreground segmentation network version 2, [online] Available:

D. P. Kingma, J. Ba, Adam: A method for stochastic optimization, 2014.

M. Babaee, D. T. Dinh, G. Rigoll, "A deep convolutional neural network for video sequence background subtraction", Pattern Recognition, vol. pp. 635-62018.

P.-L. St-Charles, G.-A. Bilodeau, R. Bergevin, "Subsense: A universal change detection method with local adaptive sensitivity", Transactions on Image Processing, vol. no. 1, pp. 359-32014.

S. Bianco, G. Ciocca, R. Schettini, "Combination of video change detection algorithms by genetic programming", Transactions on Evolutionary Computation, vol. no. 6, pp. 914-92017.

P.-L. St-Charles, G.-A. Bilodeau, R. Bergevin, "A self-adjusting approach to change detection based on background word consensus", Winter Conference on Applications of Computer Vision, pp. 990-997, 2015.
Como Citar

Selecione um Formato
SANTOS, Daniel Felipe S. ; PIRES, Rafael Gonçalves; COLOMBO, Danilo; PAPA, Joao. Video Segmentation Learning Using Cascade Residual Convolutional Neural Network. In: CONFERENCE ON GRAPHICS, PATTERNS AND IMAGES (SIBGRAPI), 32. , 2019, Rio de Janeiro. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2019 . DOI: