# Milestones and New Frontiers in Deep Learning

### Resumo

Only very recently, deep learning models have been globally used as the basis for most of the on-going research on several fundamental computing areas, such as computer vision, information extraction, data generation and data understanding. The field is still a black box for many people, particularly, on its more mathematical aspects. Users of high level packages often struggle to understand the reasoning behind the many building blocks of deep learning, such as convolutional layers, batch normalization and activation functions.

**Palavras-chave:**Mathematical model, Machine learning, Data models, Machine learning algorithms, Task analysis, Numerical models, Computational modeling

### Referências

K. He, X. Zhang, S. Ren, J. Sun, "Deep residual learning for image recognition", Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770-778, 2016.

M. Johnson, M. Schuster, Q. V. Le, M. Krikun, Y. Wu, Z. Chen, J. Dean, "Google's multilingual neural machine translation system: Enabling zero-shot translation", Transactions of the Association for Computational Linguistics, vol. 5, pp. 339-351, 2017.

D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, D. Hassabis, "Mastering the game of go with deep neural networks and tree search", Nature, vol. 529, no. 7587, pp. 484, 2016.

J. M. F. Fernandez, T. Mahlmann, "The dota 2 bot competition", IEEE Transactions on Games, 2018.

O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, L. Fei-Fei, "ImageNet Large Scale Visual Recognition Challenge", International Journal of Computer Vision (IJCV), vol. 115, no. 3, pp. 211-252, 2015.

M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, X. Zheng, "Tensorflow: A system for large-scale machine learning", Proceedings of the 12 th Symposium on Operating Systems Design and Implementation , pp. 265-283, 2016.

A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, A. Lerer, "Automatic differentiation in PyTorch", NIPS Autodiff Workshop, 2017.

"Theano: A Python framework for fast computation of mathematical expressions", arXiv e-prints, vol. abs/1605.02688, 2016.

F. Seide, A. Agarwal, "Cntk: Microsoft's open-source deep-learning toolkit", Proceedings of the 22 nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , pp. 2135-2135, 2016.

T. Chen, M. Li, Y. Li, M. Lin, N. Wang, M. Wang, Z. Zhang, Mxnet: A flexible and efficient machine learning library for heterogeneous distributed systems, 2015.

Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, T. Darrell, "Caffe: Convolutional architecture for fast feature embedding", Proceedings of the 22 nd ACM international conference on Multimedia , pp. 675-678, 2014.

F. Chollet et al., Keras, 2015, [online] Available: https.//keras.io.

M. Reynolds, G. Barth-Maron, F. Besse, D. de Las Casas, A. Fid-jeland, T. Green, F. Viola, wOpen sourcing Sonnet - a new library for constructing neural networks, 2017, [online] Available: https://deepmind.com/blog/open-sourcing-sonnet/.

Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, "Gradient-based learning applied to document recognition", Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, 1998.

D. Ardila, A. P. Kiraly, S. Bharadwaj, B. Choi, J. J. Reicher, L. Peng, S. Shetty, "End-to-end lung cancer screening with three-dimensional deep learning on low-dose chest computed tomography", Nature Medicine, pp. 1-25, 2019.

L. Hu, D. Bell, S. Antani, Z. Xue, K. Yu, M. P. Horning, M. Schiffman, "An observational study of deep learning and automated evaluation of cervical images for cancer screening", Journal of the National Cancer Institute, 2019.

L. Bottou, "Large-scale machine learning with stochastic gradient descent", Proceedings of COMPSTAT., pp. 177-186, 2010.

D. P. Kingma, J. Ba, Adam: A method for stochastic optimization, 2014.

M. Andrychowicz, M. Denil, S. Gómez, M. W. Hoffman, D. Pfau, T. Schaul, N. de Freitas, "Learning to learn by gradient descent by gradient descent", Advances in Neural Information Processing Systems. Curran Associates Inc., pp. 3981-3989, 2016.

T. He, Z. Zhang, H. Zhang, Z. Zhang, J. Xie, M. Li, "Bag of tricks for image classification with convolutional neural networks", Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 558-567, 2019.

Q. V. Le, J. Ngiam, A. Coates, A. Lahiri, B. Prochnow, A. Y. Ng, "On optimization methods for deep learning", Proceedings of the International Conference on Machine Learning, pp. 265-272, 2011.

S. Ruder, An overview of gradient descent optimization algorithms, 2016.

S. Sra, S. Nowozin, S. J. Wright, Optimization for machine learning, 2012.

I. Sutskever, J. Martens, G. Dahl, G. Hinton, "On the importance of initialization and momentum in deep learning", Proceedings of the International Conference on Machine Learning, pp. 1139-1147, 2013.

M. C. Mukkamala, M. Hein, "Variants of rmsprop and adagrad with logarithmic regret bounds", Proceedings of the International Conference on Machine Learning, pp. 2545-2553, 2017.

Z. Zhang, "Improved adam optimizer for deep neural networks", Proceedings of the IEEE/ACM International Symposium on Quality of Service, pp. 1-2, 2018.

C. M. Bishop, Pattern recognition and machine learning, Springer, 2006.

G. Cybenko, "Approximation by superpositions of a sigmoidal function", Mathematics of Control Signals and Systems, vol. 2, no. 4, pp. 303-314, 1989.

D. E. Rumelhart, G. E. Hinton, R. J. Williams, "Learning representations by back-propagating errors", Nature, vol. 323, pp. 533-536, 1986.

I. Goodfellow, Y. Bengio, A. Courville, Deep learning, MIT press, 2016.

A. Krizhevsky, I. Sutskever, G. E. Hinton, "Imagenet classification with deep convolutional neural networks" in Advances in Neural Information Processing Systems, MIT Press, pp. 1097-1105, 2012.

J. Sanders, E. Kandrot, CUDA by example: an introduction to general-purpose GPU programming, 2010.

Y. LeCun, Y. Bengio, G. Hinton, "Deep learning", Nature, vol. 521, no. 7553, pp. 436-444, 2015.

V. Nair, G. E. Hinton, "Rectified linear units improve restricted boltzmann machines", Proceedings of the International Conference on Machine Learning, pp. 807-814, 2010.

D. H. Hubel, T. N. Wiesel, "Receptive fields and functional architecture of monkey striate cortex", The Journal of Physiology, vol. 195, no. 1, pp. 215-243, 1968.

K. Fukushima, "Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position", Biological Cybernetics, vol. 36, no. 4, pp. 193-202, 1980.

V. Dumoulin, F. Visin, A guide to convolution arithmetic for deep learning, 2016.

P. Isola, J.-Y. Zhu, T. Zhou, A. A. Efros, "Image-to-image translation with conditional adversarial networks", Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1125-1134, 2017.

J. Long, E. Shelhamer, T. Darrell, "Fully convolutional networks for semantic segmentation", Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3431-3440, 2015.

A. L. Maas, A. Y. Hannun, A. Y. Ng, "Rectifier nonlinearities improve neural network acoustic models", Proceedings of the International Conference on Machine Learning, pp. 1-6, 2013.

K. He, X. Zhang, S. Ren, J. Sun, "Delving deep into rectifiers: Surpassing human-level performance on imagenet classification", Proceedings of the IEEE International Conference on Computer Vision, pp. 1026-1034, 2015.

D.-A. Clevert, T. Unterthiner, S. Hochreiter, Fast and accurate deep network learning by exponential linear units (elus), 2015.

S. Ioffe, C. Szegedy, Batch normalization: Accelerating deep network training by reducing internal covariate shift, 2015.

N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, R. Salakhut-dinov, "Dropout: a simple way to prevent neural networks from overfitting", The Journal of Machine Learning Research, vol. 15, no. 1, pp. 1929-1958, 2014.

T. Raiko, H. Valpola, Y. LeCun, "Deep learning made easier by linear transformations in perceptrons", Artificial Intelligence and Statistics, pp. 924-932, 2012.

S. C. Wong, A. Gatt, V. Stamatescu, M. D. McDonnell, "Under-standing data augmentation for classification: when to warp?", Proceedings of the International Conference on Digital Image Computing: Techniques and Applications (DICTA), pp. 1-6, 2016.

T. Ko, V. Peddinti, D. Povey, S. Khudanpur, "Audio augmentation for speech recognition", Proceedings of the 16 th Annual Conference of the International Speech Communication Association , 2015.

M. Fadaee, A. Bisazza, C. Monz, Data augmentation for low-resource neural machine translation, 2017.

A. Dal Pozzolo, O. Caelen, R. A. Johnson, G. Bontempi, "Cali-brating probability with undersampling for unbalanced classification", Proceedings of the IEEE Symposium Series on Computational Intelligence, pp. 159-166, 2015.

Y. Cui, M. Jia, T.-Y. Lin, Y. Song, S. Belongie, "Class-balanced loss based on effective number of samples", Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9268-9277, 2019.

K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition, 2014.

T.-Y. Lin, P. Goyal, R. Girshick, K. He, P. Dollár, "Focal loss for dense object detection", Proceedings of the IEEE International Conference on Computer Vision, pp. 2980-2988, 2017.

O. Ronneberger, P. Fischer, T. Brox, "U-net: Convolutional networks for biomedical image segmentation", Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 234-241, 2015.

I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, Y. Bengio, "Generative adversarial nets", Advances in Neural Information Processing Systems, pp. 2672-2680, 2014.

J. Redmon, S. Divvala, R. Girshick, A. Farhadi, "You only look once: Unified real-time object detection", Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779-788, 2016.

J.-Y. Zhu, T. Park, P. Isola, A. A. Efros, "Unpaired image-to-image translation using cycle-consistent adversarial networks", Proceedings of the IEEE International Conference on Computer Vision, pp. 2223-2232, 2017.

S. Ren, K. He, R. Girshick, J. Sun, "Faster R-CNN: Towards realtime object detection with region proposal networks", Advances in neural information processing systems, pp. 91-99, 2015.

D. Ribli, A. Horváth, Z. Unger, P. Pollner, I. Csabai, "Detecting and classifying lesions in mammograms with deep learning", Scientific Reports, vol. 8, no. 1, pp. 4165, 2018.

L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, A. L. Yuille, "Deeplab: Semantic image segmentation with deep convolutional nets atrous convolution and fully connected CRFs", IEEE transactions on pattern analysis and machine intelligence, vol. 40, no. 4, pp. 834-848, 2017.

N. Wu, J. Phang, J. Park, Y. Shen, Z. Huang, M. Zorin, G. Krzysztof, Deep neural networks improve radiologists' performance in breast cancer screening, 2019.

A. Dubrovina, P. Kisilev, B. Ginsburg, S. Hashoul, R. Kimmel, "Computational mammography using deep neural networks", Computer Methods in Biomechanics and Biomedical Engineering: Imaging & Visualization, vol. 6, no. 3, pp. 243-247, 2018.

C. Dong, C. C. Loy, K. He, X. Tang, "Image super-resolution using deep convolutional networks", IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 38, no. 2, pp. 295-307, 2015.

K. Zhang, W. Zuo, Y. Chen, D. Meng, L. Zhang, "Beyond a gaussian denoiser: Residual learning of deep CNN for image denoising", IEEE Transactions on Image Processing, vol. 26, no. 7, pp. 3142-3155, 2017.

M. Zhu, S. Gupta, To prune or not to prune: exploring the efficacy of pruning for model compression, 2017.

J. FrankIe, M. Carbin, The lottery ticket hypothesis: Finding sparse trainable neural networks, 2018.

Y. LeCun, J. S. Denker, S. A. Solla, "Optimal brain damage", Advances in Neural Information Processing Systems, pp. 598-605, 1990.

C. Liu, B. Zoph, M. Neumann, J. Shlens, W. Hua, L.-J. Li, K. Murphy, "Progressive neural architecture search", Proceedings of the European Conference on Computer Vision (ECCV), pp. 19-34, 2018.

J. Snoek, H. Larochelle, R. P. Adams, "Practical bayesian optimization of machine learning algorithms", Advances in Neural Information Processing Systems, pp. 2951-2959, 2012.

*In*: CONFERENCE ON GRAPHICS, PATTERNS AND IMAGES (SIBGRAPI), 32. , 2019, Rio de Janeiro.

**Anais**[...]. Porto Alegre: Sociedade Brasileira de Computação, 2019 . DOI: https://doi.org/10.5753/sibgrapi.2019.9772.