A case of study about overfitting in multiclass classifiers using Convolutional Neural Networks
Resumo
Convolutional Neural Networks (CNNs) have achieved much success mainly in areas of computational vision, such as image recognition, classification, object segmentation, and more. The learning process of this type of network generally requires large volumes of data, commonly high-resolution images, and the adjustment of a large number of parameters. The lack of control over the learning process of the model can lead to various problems. One of them is overfitting, which leads the network to a situation where it loses generality, making incorrect forecasts in the presence of new data. Another very common problem is its speed of convergence, which depends on the parameterization of the network: selection of the number of filters per layer, number of convolution layers, and more, where a fine adjustment is very important to avoid excessive computational costs. Understanding the origins of these problems and the ways to prevent them from happening is essential for a successful design. In this paper, we analyze these problems by designing a multiclass classifier among ten categories of images from the Caltech 256 dataset, based on the metrics of accuracy, precision, recall, and loss. To do so, python 3.6, TensorFlow and Keras libraries were used on an RTX 2060 GPU.
Referências
Bodesheim, P., Freytag, A., Rodner, E., and Denzler, J. (2015). Local novelty detection in multi-class recognition problems. In 2015 IEEE Winter Conference on Applications of Computer Vision, pages 813–820. IEEE.
Cireşan, D. C., Meier, U., Gambardella, L. M., and Schmidhuber, J. (2010). Deep, big, simple neural nets for handwritten digit recognition. Neural computation, 22(12):3207–3220.
Deng, J., Dong, W., Socher, R., jia Li, L., Li, K., and Fei-fei, L. (2009). Imagenet: A large-scale hierarchical image database. In In CVPR.
Elleuch, M., Maalej, R., and Kherallah, M. (2016). A new design based-svm of the cnn classifier architecture with dropout for offline arabic handwritten recognition. Procedia Computer Science, 80:1712–1723.
Ge, Y., Li, B., Zhao, Y., Guan, E., and Yan, W. (2018). Melanoma segmentation and classification in clinical images using deep learning. In Proceedings of the 2018 10th International Conference on Machine Learning and Computing, pages 252–256. ACM.
Gong, Y. and Zhang, Q. (2016). Hashtag recommendation using attention-based convolutional neural network. In IJCAI, pages 2782–2788.
Griffin, G., Holub, A., and Perona, P. (2007). Caltech-256 object category dataset. CalTech Report.
He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778.
Hinton, G. E., Srivastava, N., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R. R. (2012). Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580.
Jarrett, K., Kavukcuoglu, K., LeCun, Y., et al. (2009). What is the best multi-stage architecture for object recognition? In 2009 IEEE 12th international conference on computer vision, pages 2146–2153. IEEE.
Krizhevsky, A., Hinton, G., et al. (2009). Learning multiple layers of features from tiny images. Technical report, Citeseer.
Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012a). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097–1105.
Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012b). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097–1105.
LeCun, Y., Haffner, P., Bottou, L., and Bengio, Y. (1999). Object recognition with gradient-based learning. In Shape, contour and grouping in computer vision, pages 319–345. Springer.
Liu, W., Liao, S., Hu, W., Liang, X., and Chen, X. (2018). Learning efficient single-stage pedestrian detectors by asymptotic localization fitting. In Proceedings of the European Conference on Computer Vision (ECCV), pages 618–634.
Luo, C., Chu, X., and Yuille, A. (2018). Orinet: A fully convolutional network for 3d human pose estimation. arXiv preprint arXiv:1811.04989.
Menegola, A., Fornaciali, M., Pires, R., Bittencourt, F. V., Avila, S., and Valle, E. (2017). Knowledge transfer for melanoma screening with deep learning. In 2017 IEEE 14th International Symposium on Biomedical Imaging (ISBI 2017), pages 297–300. IEEE.
Plaut, D. C. et al. (1986). Experiments on learning by back propagation.
Puthenputhussery, A., Liu, Q., and Liu, C. (2017). A sparse representation model using the complete marginal fisher analysis framework and its applications to visual recognition. IEEE Transactions on Multimedia, 19(8):1757–1770.
Shin, H.-C., Roth, H. R., Gao, M., Lu, L., Xu, Z., Nogues, I., Yao, J., Mollura, D., and Summers, R. M. (2016). Deep convolutional neural networks for computer-aided detection: Cnn architectures, dataset characteristics and transfer learning. IEEE transactions on medical imaging, 35(5):1285–1298.
Simonyan, K. and Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., and Wojna, Z. (2016). Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2818–2826.
Van Dyk, D. A. and Meng, X.-L. (2001). The art of data augmentation. Journal of Computational and Graphical Statistics, 10(1):1–50.
Wan, L., Zeiler, M., Zhang, S., Le Cun, Y., and Fergus, R. (2013). Regularization of neural networks using dropconnect. In International conference on machine learning, pages 1058–1066.
Xie, L., Wang, J., Wei, Z., Wang, M., and Tian, Q. (2016). Disturblabel: Regularizing cnn on the loss layer. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4753–4762.
Yosinski, J., Clune, J., Bengio, Y., and Lipson, H. (2014). How transferable are features in deep neural networks? In Advances in neural information processing systems, pages 3320–3328.
Yue, S. (2017). Imbalanced malware images classification: a cnn based approach. arXiv preprint arXiv:1708.08042.
Zolfaghari, M., Singh, K., and Brox, T. (2018). Eco: Efficient convolutional network for online video understanding. In Proceedings of the European Conference on Computer Vision (ECCV), pages 695–712.