A Study on the Impact of Data Augmentation for Training Convolutional Neural Networks in the Presence of Noisy Labels

  • Emeson Pereira UFRPE
  • Gustavo Carneiro University of Adelaide
  • Filipe R. Cordeiro UFRPE

Resumo


Label noise is common in large real-world datasets, and its presence harms the training process of deep neural networks. Although several works have focused on the training strategies to address this problem, there are few studies that evaluate the impact of data augmentation as a design choice for training deep neural networks. In this work, we analyse the model robustness when using different data augmentations and their improvement on the training with the presence of noisy labels. We evaluate state-of-the-art and classical data augmentation strategies with different levels of synthetic noise for the datasets MNist, CIFAR-10, CIFAR-100, and the real-world dataset Clothing1M. We evaluate the methods using the accuracy metric. Results show that the appropriate selection of data augmentation can drastically improve the model robustness to label noise, increasing up to 177.84% of relative best test accuracy compared to the baseline with no augmentation, and an increase of up to 6% in absolute value with the state-of-the-art DivideMix training strategy.
Palavras-chave: Training, Deep learning, Graphics, Analytical models, Neural networks, Robustness, Data models, label noise, deep learning, classification
Publicado
24/10/2022
PEREIRA, Emeson; CARNEIRO, Gustavo; CORDEIRO, Filipe R.. A Study on the Impact of Data Augmentation for Training Convolutional Neural Networks in the Presence of Noisy Labels. In: CONFERENCE ON GRAPHICS, PATTERNS AND IMAGES (SIBGRAPI), 35. , 2022, Natal/RN. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2022 .