A Survey of Transfer Learning for Convolutional Neural Networks

Ricardo Ribani; Mauricio Marengoni

doi:10.5753/sibgrapi.2019.9773

Ricardo Ribani Universidade Presbiteriana Mackenzie
Mauricio Marengoni Universidade Presbiteriana Mackenzie

DOI: https://doi.org/10.5753/sibgrapi.2019.9773

Resumo

In this tutorial, we propose to show the advantages of using transfer learning in real-world problems. Transfer learning is an emerging topic that may drive the success of machine learning in research and industry. The lack of data on specific tasks is one of the main reasons to use transfer learning since collect and label data can be very expensive and can take time. There are also recent concerns with privacy which makes difficult to use real data from users. The use of transfer learning also helps to fast prototype new models when using pre-trained models in other datasets, since training on millions of images can take days or weeks and requires expensive GPUs. We’ll give an explanation about transfer learning, covering types of transfer learning, when and how to transfer knowledge. The tutorial will also cover a practical demonstration of different use cases using transfer learning, comparing results and explaining the advantages of using it or not.

Palavras-chave: Transfer Learning, Convolutional Neural Networks, Deep Learning

Referências

Council regulation (EU) no 2016/620[online] Available: https://eur-lex.europa.eu/eli/reg/2016/679/oj.

J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, L. Fei-Fei, "Imagenet: A large-scale hierarchical image database", 2009 IEEE conference on computer vision and pattern recognition, pp. 248-22009.

O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. S. Bernstein, A. C. Berg, F. Li, "Imagenet large scale visual recognition challenge", CoRR, vol. abs/1409.0520[online] Available: http://arxiv.org/abs/1409.0575.

T. Lin, M. Maire, S. J. Belongie, L. D. Bourdev, R. B. Girshick, J. Hays, P. Perona, D. Ramanan, P. Dollár, C. L. Zitnick, "Microsoft COCO: common objects in context", CoRR, vol. abs/1405.0320[online] Available: http://arxiv.org/abs/1405.0312.

A. Kuznetsova, H. Rom, N. Alldrin, J. Uijlings, I. Krasin, J. Pont-Tuset, S. Kamali, S. Popov, M. Malloci, T. Duerig, V. Ferrari, The open images dataset v4: Unified image classification object detection and visual relationship detection at scale, 2018.

S. J. Pan, Q. Yang, "A survey on transfer learning", IEEE Transactions on Knowledge and Data Engineering, vol. no. pp. 1345-13Oct 2010.

K. Weiss, T. M. Khoshgoftaar, D. Wang Background, "A survey of transfer learning", Journal of Big Data 20[online] Available: https://journalofbigdata.springeropen.com/track/pdf/10.1186/s40537–016-0043-6.

C. Tan, F. Sun, T. Kong, W. Zhang, C. Yang, C. Liu, "A survey on deep transfer learning", CoRR, vol. abs/1808.01920[online] Available: http://arxiv.org/abs/1808.01974.

D. Sarkar, R. Bali, T. Ghosh, Hands-On Transfer Learning with Python: Implement advanced deep learning and neural network models using TensorFlow and Keras. Packt Publishing, 20[online] Available: https://books.google.com.br/books?id=aPFsDwAAQBAJ.

R. Caruana, D. L. Silver, J. Baxter, T. M. Mitchell, L. Y. Pratt, S. Thrun, Learning to learn: knowledge consolidation and transfer in inductive systems, 1995.

A. Y. Ng, "Nuts and bolts of building applications using deep learning", NIPS 2020[online] Available: https://nips.cc/Conferences/2016/Schedule?showEvent=6203.

D. L. Silver, R. E. Mercer, R. Cohen, B. Spencer, "The task rehearsal method of life-long learning: Overcoming impoverished data" in Advances in Artificial Intelligence, Berlin, Heidelberg:Springer Berlin Heidelberg, pp. 90-101, 2002.

J. Huang, J. Li, D. Yu, L. Deng, Y. Gong, "Cross-language knowledge transfer using multilingual deep neural network with shared hidden layers", 2013 IEEE International Conference on Acoustics Speech and Signal Processing, pp. 7304-7308, May 2013.

U. Rückert, S. Kramer, W. Daelemans, B. Goethals, K. Morik, "Kernel-based inductive transfer" in Machine Learning and Knowledge Discovery in Databases, Berlin, Heidelberg:Springer Berlin Heidelberg, pp. 220-22008

R. Caruana, "Multitask learning", Machine Learning, vol. no. 1, pp. 41-Jul 1997.

D. Silver, The consolidation of task knowledge for lifelong machine learning, 2013.

D. L. Silver, R. Poirier, "Context-sensitive mtl networks for machine lifelong learning", FLAIRS Conference, 2007

R. Caruana, "Multitask learning: A knowledge-based source of inductive bias", ICML, 1993.

Y. Bengio, S. Bengio, J. Cloutier, "Learning a synaptic learning rule", IJCNN-91-Seattle International Joint Conference on Neural Networks, vol. ii, pp. 9July 1991.

S. Thrun, L. Pratt, Learning to Learn, Norwell, MA, USA:Kluwer Academic Publishers, 1998.

J. Lu, V. Behbood, P. Hao, H. Zuo, S. Xue, G. Zhang, "Transfer learning using computational intelligence", Know.-Based Syst., vol. no. C, pp. 14-May 2015.

B. Sun, J. Feng, K. Saenko, "Return of frustratingly easy domain adaptation", Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence ser. AAAI’ pp. 2058-202016.

Y. Guo, H. Shi, A. Kumar, K. Grauman, T. Rosing, R. S. Feris, "Spottune: Transfer learning through adaptive fine-tuning", CoRR, vol. abs/1811.08720[online] Available: http://arxiv.org/abs/1811.08737.

R. Raina, A. Battle, H. Lee, B. Packer, A. Y. Ng, "Self-taught learning: Transfer learning from unlabeled data", Proceedings of the 24th International Conference on Machine Learning ser. ICML '07, pp. 759-72007.

H. Daumé, D. Marcu, "Domain adaptation for statistical classifiers", J. Artif. Int. Res., vol. no. 1, pp. 101-1May 2006.

B. Zadrozny, "Learning and evaluating classifiers under sample selection bias", Proceedings of the Twenty-first International Conference on Machine Learning ser. ICML '04, pp. 12004.

H. Shimodaira, "Improving predictive inference under covariate shift by weighting the log-likelihood function", Journal of Statistical Planning and Inference, vol. 90, no. 2, pp. 227-2Oct. 2000.

W. Dai, Q. Yang, G.-R. Xue, Y. Yu, "Self-taught clustering", Proceedings of the 25th International Conference on Machine Learning ser. ICML '08, pp. 200-207, 2008.

Z. Wang, Y. Song, C. Zhang, W. Daelemans, B. Goethals, K. Morik, "Transferred dimensionality reduction" in Machine Learning and Knowledge Discovery in Databases, Berlin, Heidelberg:Springer Berlin Heidelberg, pp. 550-52008.

W. Dai, Q. Yang, G.-R. Xue, Y. Yu, "Boosting for transfer learning", Proceedings of the 24th International Conference on Machine Learning ser. ICML '07, pp. 193-200, 2007.

W. Dai, G.-R. Xue, Q. Yang, Y. Yu, "Transferring naive bayes classifiers for text classification", Proceedings of the 22Nd National Conference on Artificial Intelligence - Volume 1 ser. AAAI'07, pp. 540-52007.

J. Quionero-Candela, M. Sugiyama, A. Schwaighofer, N. D. Lawrence, Dataset Shift in Machine Learning, The MIT Press, 2009.

J. Jiang, C. Zhai, "Instance weighting for domain adaptation in NLP", Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pp. 264-2Jun. 2007.

X. Liao, Y. Xue, L. Carin, "Logistic regression with an auxiliary data source", Proceedings of the 22Nd International Conference on Machine Learning ser. ICML '05, pp. 505-52005.

J. Huang, A. J. Smola, A. Gretton, K. M. Borgwardt, B. Scholkopf, "Correcting sample selection bias by unlabeled data", Proceedings of the 19th International Conference on Neural Information Processing Systems ser. NIPS'06, pp. 601-608, 2006.

S. Bickel, M. Brückner, T. Scheffer, "Discriminative learning for differing training and test distributions", Proceedings of the 24th International Conference on Machine Learning ser. ICML '07, pp. 81-88, 2007.

M. Sugiyama, S. Nakajima, H. Kashima, P. v. Bünau, M. Kawanabe, "Direct importance estimation with model selection and its application to covariate shift adaptation", Proceedings of the 20th International Conference on Neural Information Processing Systems ser. NIPS'07, pp. 1433-142007.

Fan Wei, I. Davidson, B. Zadrozny, P. S. Yu, "An improved categorization of classifier's sensitivity on sample selection bias", Fifth IEEE International Conference on Data Mining (ICDM'05), pp. 4, Nov 2005.

S. Kornblith, J. Shlens, Q. V. Le, Do better ImageNet models transfer better?, May 2018.

J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng, T. Darrell, E. P. Xing, T. Jebara, "Decaf: A deep convolutional activation feature for generic visual recognition", Proceedings of the 31 st International Conference on Machine Learning ser. Proceedings of Machine Learning Research, vol. no. 1, pp. 647-622–24 Jun 2014.

A. S. Razavian, H. Azizpour, J. Sullivan, S. Carlsson, "Cnn features off-the-shelf: An astounding baseline for recognition", Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops ser. CVPRW 'pp. 512-52014.

R. Geirhos, P. Rubisch, C. Michaelis, M. Bethge, F. A. Wichmann, W. Brendel, ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness, nov 20[online] Available: http://arxiv.org/abs/1811.12231.

N. D. Lawrence, J. C. Platt, "Learning to learn with the informative vector machine", Proceedings of the Twenty-first International Conference on Machine Learning ser. ICML '04, pp. 2004.

E. V. Bonilla, K. M. A. Chai, C. K. I. Williams, "Multi-task gaussian process prediction", Proceedings of the 20th International Conference on Neural Information Processing Systems ser. NIPS'07, pp. 153-12007.

A. Schwaighofer, V. Tresp, K. Yu, "Learning gaussian process kernels via hierarchical bayes", Proceedings of the 17th International Conference on Neural Information Processing Systems ser. NIPS'04, pp. 1209-122004.

T. Evgeniou, M. Pontil, "Regularized multi-task learning", Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining ser. KDD '04, pp. 109-12004.

J. Gao, W. Fan, J. Jiang, J. Han, "Knowledge transfer via multiple model local structure mapping", Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining ser. KDD '08, pp. 283-291, 2008.

L. Mihalkova, T. Huynh, R. J. Mooney, "Mapping and revising markov logic networks for transfer learning", Proceedings of the 22Nd National Conference on Artificial Intelligence - Volume 1 ser. AAAI'07, pp. 608-62007.

R. Liang, W. Xie, W. Li, H. Wang, J. J. Wang, L. Taylor, "A novel transfer learning method based on common space mapping and weighted domain matching", CoRR, vol. abs/1608.04520[online] Available: http://arxiv.org/abs/1608.04581.

J. Davis, P. Domingos, "Deep transfer via second-order markov logic", Proceedings of the 26th Annual International Conference on Machine Learning ser. ICML '09, pp. 217-22009.

Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Laviolette, M. Marchand, V. Lempitsky, "Domain-adversarial training of neural networks", J. Mach. Learn. Res., vol. no. 1, pp. 2096-20Jan. 2016.

M. Wulfmeier, A. Bewley, I. Posner, "Addressing appearance change in outdoor robotics with adversarial domain adaptation", 2017 IEEE/RSI International Conference on Intelligent Robots and Systems (IROS), pp. 1551-152017.

E. Tzeng, J. Hoffman, N. Zhang, K. Saenko, T. Darrell, Deep domain confusion: Maximizing for domain invariance, vol. abs/1412.342014.

H. Ajakan, P. Germain, H. Larochelle, F. Laviolette, M. Marchand, "Domain-adversarial neural networks", stat, vol. 10no. 2014.

Y. Ganin, V. Lempitsky, "Unsupervised domain adaptation by backpropagation", Proceedings of the 32Nd International Conference on International Conference on Machine Learning - Volume 37 ser. ICML'pp. 1180-1189, 2015.

E. Tzeng, J. Hoffman, T. Darrell, K. Saenko, "Simultaneous deep transfer across domains and tasks", Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV) ser. ICCV 'pp. 4068-402015.

E. Tzeng, J. Hoffman, K. Saenko, T. Darrell, "Adversarial discriminative domain adaptation", 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2962-29July 2017.

M. Long, Z. Cao, J. Wang, M. I. Jordan, "Domain adaptation with randomized multilinear adversarial networks", CoRR, vol. abs/1705.10620[online] Available: http://arxiv.org/abs/1705.10667.

A. Dundar, M. Liu, T. Wang, J. Zedlewski, J. Kautz, "Domain stylization: A strong simple baseline for synthetic to real image domain adaptation", CoRR, vol. abs/1807.09320[online] Available: http://arxiv.org/abs/1807.09384.

I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, Y. Bengio, "Generative adversarial nets", Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2 ser. NIPS'pp. 2672-262014.

K. Simonyan, A. Zisserman, Very deep convolutional networks for Large-Scale image recognition, Sep. 2014.

C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich, "Going deeper with convolutions", Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1-9, 2015.

C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, Z. Wojna, "Rethinking the inception architecture for computer vision", CoRR, vol. abs/1512.00520[online] Available: http://arxiv.org/abs/1512.00567.

K. He, X. Zhang, S. Ren, J. Sun, "Deep residual learning for image recognition", CoRR, vol. abs/1512.0332015.

A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, H. Adam, "Mobilenets: Efficient convolutional neural networks for mobile vision applications", CoRR, vol. abs/1704.04820[online] Available: http://arxiv.org/abs/1704.04861.

W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. E. Reed, C. Fu, A. C. Berg, "SSD: single shot multibox detector", CoRR, vol. abs/1512.02320[online] Available: http://arxiv.org/abs/1512.02325.

Q. Wang, J. Gao, W. Lin, Y. Yuan, "Learning from synthetic data for crowd counting in the wild", CoRR, vol. abs/1903.03303, 20[online] Available: http://arxiv.org/abs/1903.03303.

S. Sankaranarayanan, Y. Balaji, A. Jain, S. Lim, R. Chellappa, "Unsupervised domain adaptation for semantic segmentation with gans", CoRR, vol. abs/1711.06920[online] Available: http://arxiv.org/abs/1711.06969.

Y. Zhang, Q. Yang, "A survey on multi-task learning", CoRR, vol. abs/1707.08120[online] Available: http://arxiv.org/abs/1707.08114.

B. M. Lake, R. Salakhutdinov, J. B. Tenenbaum, "Human-level concept learning through probabilistic program induction", Science, vol. 3pp. 1332-132015.

Li Fei-Fei, R. Fergus, P. Perona, "One-shot learning of object categories", IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. no. 4, pp. 594-6April 2006.

A. Santoro, S. Bartunov, M. Botvinick, D. Wierstra, T. P. Lillicrap, "One-shot learning with memory-augmented neural networks", CoRR, vol. abs/1605.06020[online] Available: http://arxiv.org/abs/1605.06065.

G. Koch, R. Zemel, R. Salakhutdinov, Siamese neural networks for one-shot image recognition, 2015.

O. Vinyals, C. Blundell, T. Lillicrap, K. Kavukcuoglu, D. Wierstra, "Matching networks for one shot learning", Proceedings of the 30th International Conference on Neural Information Processing Systems ser. NIPS’ pp. 3637-362016.

E. Zakharov, A. Shysheya, E. Burkov, V. S. Lempitsky, "Few-shot adversarial learning of realistic neural talking head models", CoRR, vol. abs/1905.08220[online] Available: http://arxiv.org/abs/1905.08233.

Y. Fu, T. Xiang, Y. Jiang, X. Xue, L. Sigal, S. Gong, "Recent advances in zero-shot recognition", CoRR, vol. abs/1710.04820[online] Available: http://arxiv.org/abs/1710.04837.

L. Zhang, T. Xiang, S. Gong, "Learning a deep embedding model for zero-shot learning", CoRR, vol. abs/1611.05088, 20[online] Available: http://arxiv.org/abs/1611.05088.

Y. Xian, C. H. Lampert, B. Schiele, Z. Akata, "Zero-shot learning - A comprehensive evaluation of the good the bad and the ugly", CoRR, vol. abs/1707.00600, 20[online] Available: http://arxiv.org/abs/1707.00600.

W. Dai, Y. Chen, G. rong Xue, Q. Yang, Y. Yu, D. Koller, D. Schuurmans, Y. Bengio, L. Bottou, "Translated learning: Transfer learning across different feature spaces" in Advances in Neural Information Processing Systems Curran Associates, Inc., pp. 353-32009.

J. Schmidhuber, Evolutionary principles in self-referential learning. on learning now to learn: The meta-meta-meta…-hook, May 1987.

K. Li, J. Malik, "Learning to optimize neural nets", CoRR, vol. abs/1703.00420[online] Available: http://arxiv.org/abs/1703.00441.

R. Negrinho, G. J. Gordon, "Deeparchitect: Automatically designing and training deep architectures", CoRR, vol. abs/1704.08792, 20[online] Available: http://arxiv.org/abs/1704.08792.

B. Hariharan, R. B. Girshick, "Low-shot visual object recognition", CoRR, vol. abs/1606.02820[online] Available: http://arxiv.org/abs/1606.02819.

Y. Duan, J. Schulman, X. Chen, P. L. Bartlett, I. Sutskever, P. Abbeel, "Rl\$^2\$: Fast reinforcement learning via slow reinforcement learning", CoRR, vol. abs/1611.02720[online] Available: http://arxiv.org/abs/1611.02779.

J. X. Wang, Z. Kurth-Nelson, D. Tirumala, H. Soyer, J. Z. Leibo, R. Munos, C. Blundell, D. Kumaran, M. Botvinick, "Learning to reinforcement learn", CoRR, vol. abs/1611.05720[online] Available: http://arxiv.org/abs/1611.05763

C. Finn, P. Abbeel, S. Levine, D. Precup, Y. W. Teh, "Model-agnostic meta-learning for fast adaptation of deep networks", Proceedings of the 34th ICML ser. Proceedings of Machine Learning Research, vol. pp. 1126-112017.