Semantic Description of Objects in Images Based on Prototype Theory

Omar Vidal Pino; Erickson R. Nascimento; Mario F. M. Campos

doi:10.5753/sibgrapi.est.2020.12994

Omar Vidal Pino UFMG
Erickson R. Nascimento UFMG
Mario F. M. Campos UFMG

DOI: https://doi.org/10.5753/sibgrapi.est.2020.12994

Resumo

This research aims to build a model for the semantic description of objects based on visual features extracted from images. We introduce a novel semantic description approach inspired by the Prototype Theory. Inspired by the human approach used to represent categories, we propose a novel Computational Prototype Model (CPM) that encodes and stores the object’s image category’s central semantic meaning: the semantic prototype. Our CPM model represents and constructs the semantic prototypes of object categories using Convolutional Neural Networks (CNN). The proposed Prototype-based Description Model uses the CPM model to describe an object highlighting its most distinctive features within the category. Our Global Semantic Descriptor (GSDP) builds discriminative, low-dimensional, and semantically interpretable signatures that encode the objects’ semantic information using the constructed semantic prototypes. It uses the proposed Prototypical Similarity Layer (PS-Layer) to retrieve the category prototype using the principle of categorization based on prototypes. Using different datasets, we show in our experiments that: i) the proposed CPM model successfully simulates the internal semantic structure of the categories; ii) the proposed semantic distance metric can be understood as the object typicality score within a category; iii) our semantic classification method based on prototypes can improve the performance and interpretation of CNN classification models; iv) our semantic descriptor encoding significantly outperforms others state-of-the-art image global encoding in clustering and classification tasks.

Referências

Y. Guo, Y. Liu, A. Oerlemans, S. Lao, S. Wu, and M. S. Lew, "Deep learning for visual understanding: A review," Neurocomputing, vol. 187, pp. 27–48, 2016.

H. Bay, A. Ess, T. Tuytelaars, and L. Van Gool, "Speeded-up robust features (surf)," Computer Vision and Image Understanding (CVIU), vol. 110, no. 3, pp. 346–359, 2008.

D. G. Lowe, "Distinctive image features from scale-invariant keypoints," International Journal of Computer Vision (IJCV), vol. 60, no. 2, pp. 91– 110, 2004.

E. Tola, V. Lepetit, and P. Fua, "A fast local descriptor for dense matching," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2008, pp. 1–8.

E. Tulving, "Coding and representation: searching for a home in the brain," Science of Memory: Concepts, pp. 65–68, 2007.

K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770–778.

K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," arXiv preprint arXiv:1409.1556, 2014.

C. Szegedy, S. Ioffe, V. Vanhoucke, and A. A. Alemi, "Inception-v4, inception-resnet and the impact of residual connections on learning." in Thirty-First AAAI Conference on Artiﬁcial Intelligence, 2017, pp. 4278– 4284.

V. Fromkin, R. Rodman, and N. Hyams, An introduction to language. Cengage Learning, 2018.

E. Rosch, "Cognitive representations of semantic categories." Journal of Experimental Psychology: General, vol. 104, no. 3, p. 192, 1975.

——, "Principles of categorization," in Cognition and Categorization, Hillsdale, NJ:Lawrence Erlbaum E. Rosch and B. B. Lloyd, Eds. Associates, 1978, pp. 27– 48.

E. Rosch and C. B. Mervis, "Family resemblances: Studies in the internal structure of categories," Cognitive psychology, vol. 7, no. 4, pp. 573– 605, 1975.

S. Zagoruyko and N. Komodakis, "Learning to compare image patches via convolutional neural networks," in Proceedings of the IEEE Confer- ence on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 4353–4361.

K. M. Yi, E. Trulls, V. Lepetit, and P. Fua, "Lift: Learned invariant feature transform," in Proceedings of the of the European Conference on Computer Vision (ECCV). Springer, 2016, pp. 467–483.

K. Lin, J. Lu, C.-S. Chen, and J. Zhou, "Learning compact binary descriptors with unsupervised deep neural networks," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 1183–1192.

E. Simo-Serra, E. Trulls, L. Ferraz, I. Kokkinos, P. Fua, and F. Moreno- Noguer, "Discriminative learning of deep convolutional feature point descriptors," in Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2015, pp. 118–126.

S. Kim, D. Min, B. Ham, S. Lin, and K. Sohn, "Fcss: Fully convolutional self-similarity for dense semantic correspondence," IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2018.

I. Rocco, R. Arandjelovi´c, and J. Sivic, "End-to-end weakly-supervised semantic alignment," in Proceedings of the IEEE Conference on Com- puter Vision and Pattern Recognition (CVPR), 2018.

C. Liu, J. Yuen, and A. Torralba, "Sift ﬂow: Dense correspondence across scenes and its applications," IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol. 33, no. 5, pp. 978– 994, 2011.

E. Rosch, "Coherences and categorization: A historical view," The development of language and language researchers: Essays in honor of Roger Brown, pp. 373–392, 1988.

D. Geeraerts, Theories of lexical semantics. Oxford University Press, 2010.

S. R. Zaki, R. M. Nosofsky, R. D. Stanton, and A. L. Cohen, "Prototype and exemplar accounts of category learning and attentional allocation: A reassessment," Journal of Experimental Psychology: Learning, Memory and Cognition, vol. 29, no. 6, pp. 1160–1173, 2003.

E. Rosch and B. B. Lloyd, Cognition and categorization. Lawrence Erlbaum Associates Hillsdale, NJ, 1978, vol. 1.

T. Kohonen, "Self-organization and associative memory," Springer- Verlag Berlin Heidelberg New York. Also Springer Series in Information Sciences, vol. 8, 1988.

S. Seo and K. Obermayer, "Soft learning vector quantization," Neural computation, vol. 15, no. 7, pp. 1589–1604, 2003.

P. Wohlhart, M. K¨ostinger, M. Donoser, P. M. Roth, and H. Bischof, the "Optimizing 1-nearest prototype classiﬁers," in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2013, pp. 460–467.

S. Jetley, B. Romera-Paredes, S. Jayasumana, and P. Torr, "Prototypical priors: From improving classiﬁcation to zero-shot learning," in Proceed- ings of the of the British Machine Vision Conference (BMVC), 2015.

J. Snell, K. Swersky, and R. Zemel, "Prototypical networks for few-shot learning," in Advances in Neural Information Processing Systems, 2017, pp. 4080–4090.

D. L. Medin and M. M. Schaffer, "Context theory of classiﬁcation learning." Psychological review, vol. 85, no. 3, p. 207, 1978.

J. P. Minda and J. D. Smith, "Comparing prototype-based and exemplar- based accounts of category learning and attentional allocation." Journal of Experimental Psychology: Learning, Memory, and Cognition, vol. 28, no. 2, p. 275, 2002.

B. Stellato, B. P. Van Parys, and P. J. Goulart, "Multivariate chebyshev inequality with estimated mean and variance," The American Statistician, vol. 71, no. 2, pp. 123–127, 2017.

A. Martin, "The representation of object concepts in the brain," Annual Review of Psychology, vol. 58, pp. 25–45, 2007.

J. A. Collins and K. M. Curby, "Conceptual knowledge attenuates viewpoint dependency in visual object recognition," Visual Cognition, vol. 21, no. 8, pp. 945–960, 2013.

O. Pino, E. Nascimento, and M. Campos, "Prototypicality effects in global semantic description of objects," in Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV), Jan 2019, pp. 1233–1242.

Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, Nov 1998.

A. Krizhevsky and G. Hinton, "Convolutional deep belief networks on cifar-10," Unpublished manuscript, vol. 40, 2010.

O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei, "ImageNet Large Scale Visual Recognition Challenge," International Journal of Computer Vision (IJCV), vol. 115, no. 3, pp. 211–252, 2015.

T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, "Microsoft coco: Common objects in context," in European conference on computer vision. Springer, 2014, pp. 740–755.

S. Liu and W. Deng, "Very deep convolutional neural network based image classiﬁcation using small training sample size," in Pattern Recog- nition (ACPR), 2015 3rd IAPR Asian Conference on. IEEE, 2015, pp. 730–734.

B. Lake, W. Zaremba, R. Fergus, and T. Gureckis, "Deep neural networks predict category typicality ratings for images," in Proceedings of the 37th Annual Conference of the Cognitive Science Society. Cognitive Science Society, 2015.

A. Oliva and A. Torralba, "Modeling the shape of the scene: A holistic representation of the spatial envelope," International Journal of Computer Vision (IJCV), vol. 42, no. 3, pp. 145–175, 2001.

T. Ojala, M. Pietikainen, and T. Maenpaa, "Multiresolution gray- scale and rotation invariant texture classiﬁcation with local binary pat- terns," IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol. 24, no. 7, pp. 971–987, 2002.

N. Dalal and B. Triggs, "Histograms of oriented gradients for human detection," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 1. IEEE, 2005, pp. 886–893.

M. Li, "Texture moment for content-based image retrieval," in 2007 IEEE, 2007, IEEE International Conference on Multimedia and Expo. pp. 508–511.

Y.-j. Song, W.-b. Park, D.-w. Kim, and J.-h. Ahn, "Content-based image retrieval using new color histogram," in Intelligent Signal Processing and Communication Systems, 2004. ISPACS 2004. Proceedings of 2004 International Symposium on. IEEE, 2004, pp. 609–611.

R. M. Haralick, K. Shanmugam et al., "Textural features for image classiﬁcation," IEEE Transactions on systems, man, and cybernetics, vol. 6, no. 6, pp. 610–621, 1973.

M.-K. Hu, "Visual pattern recognition by moment invariants," IRE Transactions on Information Theory, vol. 8, no. 2, pp. 179–187, 1962.

Semantic Description of Objects in Images Based on Prototype Theory

Resumo

Referências

Artigos mais lidos do(s) mesmo(s) autor(es)