On Modeling Context from Objects with a Long Short-Term Memory for Indoor Scene Recognition

Camila L.  Silva; Anisio Lacerda; Erickson R.  Nascimento

doi:10.5753/sibgrapi.2019.9792

Camila L. Silva Federal University of Minas Gerais
Anisio Lacerda Federal University of Minas Gerais
Erickson R. Nascimento Federal University of Minas Gerais

DOI: https://doi.org/10.5753/sibgrapi.2019.9792

Resumo

Recognizing indoor scenes is still regarded an open challenge on the Computer Vision field. Indoor scenes can be well represented by their composing objects, which can vary in angle, appearance, besides often being partially occluded. Even though Convolutional Neural Networks are remarkable for image-related problems, the top performances on indoor scenes are from approaches modeling the intricate relationship of objects. Knowing that Recurrent Neural Networks were designed to model structure from a given sequence, we propose representing an image as a sequence of object-level information in order to feed a bidirectional Long Short-Term Memory network trained for scene classification. We perform a Many-to-Many training approach, such that each element outputs a scene prediction, allowing us to use each prediction to boost recognition. Our method outperforms RNN-based approaches on MIT67, an entirely indoor dataset, while also improved over the most successful methods through an ensemble of classifiers.

Palavras-chave: Indoor Scene Recognition, Recurrent Neural Networks

Referências

J. Xiao J. Hays K. A. Ehinger A. Oliva A. Torralba "Sun database: Large-scale scene recognition from abbey to zoo" Computer vision and pattern recognition (CVPR) 2010 IEEE conference on pp. 3485-32010.

Y. LeCun P. Haffner L. Bottou Y. Bengio "Object recognition with gradient-based learning" Shape contour and grouping in computer vision. Springer pp. 319-1999.

B. Zhou A. Lapedriza J. Xiao A. Torralba A. Oliva "Learning deep features for scene recognition using places database" Advances in neural information processing systems pp. 487-2014.

Z. Wang L. Wang Y. Wang B. Zhang Y. Qiao "Weakly supervised patchnets: Describing and aggregating local patches for scene recognition" IEEE Transactions on Image Processing vol. 26 no. 4 pp. 2028-2041 2017.

L. Herranz S. Jiang X. Li "Scene recognition with cnns: objects scales and dataset bias" IEEE Conference on Computer Vision and Pattern Recognition pp. 571-2016.

G. Nascimento C. Laranjeira V. Braz A. Lacerda E. R. Nascimento "A robust indoor scene recognition method based on sparse representation" in 22nd Iberoamerican Congress on Pattern Recognition. CIARP Valparaiso CL:Springer International Publishing 2017.

J. A. Pérez-Ortiz J. Calera-Rubio M. L. Forcada "Online text prediction with recurrent neural networks" Neural processing letters vol. 14 no. 2 pp. 127-2001.

M. Wöllmer A. Metallinou F. Eyben B. Schuller S. Narayanan "Context-sensitive multimodal emotion recognition from speech and facial expression using bidirectional LSTM modeling" INTERSPEECH 2010 Makuhari Japan pp. 2362-22010.

T-J. Hsieh H.-F. Hsiao W.-C. Yeh "Forecasting stock markets using wavelet transforms and recurrent neural networks: An integrated system based on artificial bee colony algorithm" Applied soft computing vol. 11 no. 2 pp. 2510-22011.

S. Hochreiter J. Schmidhuber Long short-term memory vol. 9 pp. 1735-80 1997.

M. Schuster K. K. Paliwal "Bidirectional recurrent neural networks" IEEE Transactions on Signal Processing vol. 45 no. 11 pp. 2673-21997.

L. Fei-Fei P. Perona "A bayesian hierarchical model for learning natural scene categories" 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) vol. 2 pp. 524-2005.

A. Quattoni A. Torralba "Recognizing indoor scenes" IEEE Conference on Computer Vision and Pattern Recognition pp. 413-2009.

A. Vailaya A. Jain H. J. Zhang "On image classification: city vs. landscape" IEEE Workshop on Content-Based Access of Image and Video Libraries (Cat. No. 98EX1pp. 3-8 1998.

I. Ulrich I. Nourbakhsh "Appearance-based place recognition for topological localization" ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. vol. 2 pp. 1023-1029 2000.

A. Oliva A. Torralba "Modeling the shape of the scene: A holistic representation of the spatial envelope" International journal of computer vision vol. 42 no. 3 pp. 145-2001.

S. Lazebnik C. Schmid J. Ponce "Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories" 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06) vol. 2 pp. 2169-22006.

J. Deng W. Dong R. Socher L.-J. Li K. Li L. Fei-Fei "ImageNet: A Large-Scale Hierarchical Image Database" CVPR09 2009.

A. Sharif Razavian H. Azizpour J. Sullivan S. Carlsson "Cnn features off-the-shelf: an astounding baseline for recognition" IEEE conference on computer vision and pattern recognition workshops pp. 806-2014.

B. Zhou A. Lapedriza J. Xiao A. Torralba A. Oliva Z. Ghahramani M. Welling C. Cortes N. D. Lawrence K. Q. Weinberger "Learning deep features for scene recognition using places database" in Advances in Neural Information Processing Systems 27 Curran Associates Inc. pp. 487-2014.

J. Wang Y. Yang J. Mao Z. Huang C. Huang W. Xu "CNN-RNN: A unified framework for multi-label image classification" IEEE conference on computer vision and pattern recognition pp. 2285-22016.

W. Byeon T. M. Breuel F. Raue M. Liwicki "Scene labeling with LSTM recurrent neural networks" Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition pp. 3547-32015.

Z. Zuo B. Shuai G. Wang X. Liu X. Wang B. Wang Y. Chen "Convolutional recurrent neural networks: Learning spatial dependencies for image representation" Proceedings of the IEEE conference on computer vision and pattern recognition workshops pp. 18-26 2015.

Z. Zuo B. Shuai G. Wang X. Liu X. Wang B. Wang Y. Chen "Learning contextual dependence with convolutional hierarchical recurrent neural networks" IEEE Transactions on Image Processing vol. 25 no. 7 pp. 2983-2996 2016.

S. A. Javed A. K. Nelakanti Object-level context modeling for scene classification with context-cnn 2017.

J. R. Uijlings K. E. van de Sande T. Gevers A. W. Smeulders "Selective search for object recognition" International journal of computer vision vol. no. 2 pp. 154-2013.

Y. Wang W. Pan "Scene recognition with sequential object context" Chinese Conference on Computer Vision pp. 108-2017.

K. He X. Zhang S. Ren J. Sun "Deep residual learning for image recognition" Proceedings of the IEEE conference on computer vision and pattern recognition pp. 770-2016.

K. Cho B. van Merriënbocr C. Gulcehre D. Bahdanau F. Bougares H. Schwenk Y. Bengio Learning phrase representations using RNN encoder-decoder for statistical machine translation 2014.

D. P. Kingma J. Ba Adam: A method for stochastic optimization 2014.

K. Simonyan A. Zisserman Very deep convolutional networks for large-scale image recognition 2014.