Transfer learning of ImageNet Object Classification Challenge features to image aesthetic binary classification

Bruno Tinen; Jun Okamoto Junior

doi:10.5753/eniac.2019.9278

Bruno Tinen USP
Jun Okamoto Junior USP

DOI: https://doi.org/10.5753/eniac.2019.9278

Resumo

The aesthetic classification of photographies is a problem of separating aesthetically pleasing images from not pleasing images using algorithms that describe and evaluate both emotional and technical factors. Since the mass adoption of deep convolutional neural network (DCNN) models for image classification problems different DCNN architectures have been developed due to its overall better performance, pushing the boundaries of the state-of-the-art performance of the image classification further. This paper evaluates how architectures and features that were primarily developed for the ImageNet Object Classification Challenge perform when analyzed under the aesthetic scope. A high level transfer learning model composed of a DCNN layer and a top layer that behaves as a linear SVM is proposed and seven different DCNN architectures are trained using it. Scenarios with just transfer learning and with fine tuning are evaluated and a model using the ResNet-Inception V2 architecture is proposed, which achieves results better than current state-of-the-art for the experiment conditions used.

Palavras-chave: Machine Learning, Artificial Neural Networks, Deep Learning, Computer Vision

Referências

Aydin, T. O., Smolic, A., and Gross, M. (2015). Automated aesthetic analysis of photographic images. IEEE Transactions on Visualization and Computer Graphics, 21:31– 42.

Baldi, P. and Sadowski, P. J. (2013). Understanding dropout. pages 2814–2822.

Bhattacharya, S., Sukthankar, R., and Shah, M. (2010). Automated aesthetic analysis of photographic images. Proceedings of the international conference on Multimedia MM ’10, pages 271–280.

Datta, R., Joshi, D., Li, J., and Wang, J. Z. (2006a). Studying aesthetics in photographic images using a computational approach. Computer Vision ECCV 2006, pages 288–301.

Datta, R., Joshi, D., Li, J., and Wang, J. Z. (2006b). Studying aesthetics in photographic images using a computational approach. ECCV, pages 7–13.

Deng, Y., Loy, C. C., and Tang, X. (2017). Image aesthetic assessment: An experimental survey. IEEE Signal Process. Mag. 34, pages 80–106.

Dhar, S., Ordonez, V., and Stony, T. L. B. (2011). High level describable attributes for predicting aesthetics and interestingness. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 1657–1664.

Dong, Z., Shen, X., Li, H., and Tian, X. (2015). Photo quality assessment with dcnn that understands image well. International Conference on Multimedia Modeling.

Filho, J. G. (2009). Gestalt do objeto. Escrivuras Editora, Sao Paulo, SP, Brasil.

Gao, Z., Wang, S., and Ji, Q. (2015). Multiple aesthetic attribute assessment by exploiting relations among aesthetic attributes. Proceedings of the 5th ACM on International Conference on Multimedia Retrieval ICMR ’15, pages 575–578.

He, K., Zhang, X., Ren, S., and Sun, J. (2015). Deep residual learning for image recognition. CoRR, abs/1512.03385.

Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., and Adam, H. (2017). Mobilenets: Efcient convolutional neural networks for mobile vision applications. CoRR, abs/1704.04861.

Jiang, W., Loui, A. C., and Cerosaletti, C. D. (2010). Automatic aesthetic values assessment in photographic images. IEEE International Conference on Multimedia and Expo (ICME), 2010, pages 920 – 925.

Jin, X., Chi, J., Peng, S., Tian, Y., Ye, C., and Li, X. (2016a). Deep image aesthetics classication using inception modules and ne-tuning connected layer. 8th International Conference on Wireless Communications & Signal Processing (WCSP).

Jin, X., Wu, L., He, Z., Chen, S., Chi, J., Peng, S., Li, X., and Ge, S. (2017). Efcient deep aesthetic image classication using connected local and global features. arXiv:1610.02256v2 [cs.CV].

Jin, X., Wu, L., Li, X., Zhang, X., Chi, J., Peng, S., Ge, S., Zhao, G., and Li, S. (2016b). Ilgnet: Inception modules with connected local and global features for efcient image aesthetic quality classication using domain adaptation. 8th International Conference on Wireless Communications & Signal Processing, WCSP 2016.

Ke, Y., Tang, X., and Jing, F. (2006). The design of high-level features for photo quality assessment. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 419–426.

Khan, S. S. and David, D. V. (2012). Evaluating visual aesthetics in photographic portraiture. Proceedings of the Eighth Annual Symposium on Computational Aesthetics in Graphics, Visualization, and Imaging, pages 55–62.

Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). Imagenet classication with deep convolutional neural networks. Advances in neural information processing systems.

Li, C., Loui, A. C., Chen, T., and Gallagher, A. (2010). Aesthetic quality assessment of consumer photos with faces. Proceedings International Conference on Image Processing, ICIP, pages 3221–3224.

Lo, K.-Y., Liu, K.-H., and Chen, C.-S. (2012). Assessment of photo aesthetics with efciency. International Conference on Pattern Recognition (ICPR), 2012, pages 2186– 2189.

Lu, X., Lin, Z., Shen, X., Mech, R., and Wang, J. Z. (2015). Deep multi-patch aggregation network for image style, aesthetics, and quality estimation. Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), ICCV ’15, pages 990– 998.

Luo, W., Wang, X., and Tang, X. (2013). Content-based photo quality assessment. IEEE Transactions on Multimedia, 15:1930–1943.

Luo, Y. and Tang, X. (2008). Photo and video quality evaluation: Focusing on the subject. Computer Vision ECCV 2008, pages 1 – 14.

Ma, S., Liu, J., and Chen, C. W. (2017). A-lamp: Adaptive layout-aware multi-patch deep convolutional neural network for photo aesthetic assessment. IEEE Conference on Computer Vision and Pattern Recognition (CVPR ’17), pages 722–731.

Marchesotti, L. and Perronnin, F. (2013). Learning beautiful (and ugly) attributes. British Machine Vision Conference, pages 1–11.

Marchesotti, L., Perronnin, F., Larlus, D., and Csurka, G. (2011). Assessing the aesthetic quality of photographs using generic image descriptors. Proceedings of the IEEE International Conference on Computer Vision, pages 1784–1791.

Marchesotti, L., Perronnin, F., and Murray, N. (2015). Discovering beautiful attributes for aesthetic image analysis. International Journal of Computer Vision, 113:246–266.

Mavridaki, E. and Mezaris, V. (2015). A comprehensive aesthetic quality assessment method for natural images using basic rules of photography. Proceedings International Conference on Image Processing, ICIP, pages 887–891.

Murray, N., Marchesotti, L., and Perronnin, F. (2012a). Ava: A large-scale database for aesthetic visual analysis. CVPR 2012, pages 2408–2415.

Murray, N., Marchesotti, L., and Perronnin, F. (2012b). Learning to rank images using semantic and aesthetic labels. British Machine Vision Conference, pages 1–10.

Nishiyama, M., Okabe, T., Sato, I., and Sato, Y. (2011). Aesthetic quality classication of photographs based on color harmony. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 33–40.

Pan, S. J. and Yang, Q. (2010). A survey on transfer learning. IEEE Trans. on Knowl. and Data Eng., 22(10):1345–1359.

Sandler, M., Howard, A. G., Zhu, M., Zhmoginov, A., and Chen, L. (2018). Inverted residuals and linear bottlenecks: Mobile networks for classication, detection and segmentation. CoRR, abs/1801.04381.

Simonyan, K. and Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. CoRR, abs/1409.1556.

Suran, S. and K., S. (2015). Aesthetic quality assessment of photographic images: A literature survey. International Journal of Computer Applications, 132:11–15.

Szegedy, C., Ioffe, S., and Vanhoucke, V. (2016). Inception-v4, inception-resnet and the impact of residual connections on learning. CoRR, abs/1602.07261.

Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., and Wojna, Z. Rethinking the inception architecture for computer vision.

Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., and Wojna, Z. (2015). Rethinking the inception architecture for computer vision. CoRR, abs/1512.00567.

Talebi, H. and Milanfar, P. (2018). Nima: Neural image assessment. IEEE Transactions on Image Processing, pages 3998 – 4011.

Tang, Y. (2013). Deep learning using linear support vector machines. CoRR, abs/1306.0239.

Tian, X., Dong, Z., Yang, K., and Mei, T. (2015). Query-dependent aesthetic model with deep learning for photo quality assessment. IEEE Transactions on Multimedia, 17:1–1.

Wang, W., Zhao, M., Wang, L., Jiexiong, H., Cai, C., and Xu, X. (2016). A multiscene deep learning model for image aesthetic evaluation. Signal Processing: Image Communication, 47.

Wong, L.-K. and Low, K.-L. (2009). Saliency-enhanced image aesthetics class prediction. Proceedings International Conference on Image Processing, ICIP, pages 997–1000.

Zhang, L., Gao, Y., Zimmermann, R., Tian, Q., and Li, X. (2014). Fusion of multichannel local and global structural cues for photo aesthetics evaluation. IEEE Transactions on Image Processing, 23:1419–1429.

Zoph, B., Vasudevan, V., Shlens, J., and Le, Q. V. (2017). Learning transferable architectures for scalable image recognition. CoRR, abs/1707.07012.