Data Augmentation Guidelines for Cross-Dataset Transfer Learning and Pseudo Labeling

Fernando Pereira dos Santos; Gabriela Salvador Thumé; Moacir Antonelli Ponti

Fernando Pereira dos Santos USP
Gabriela Salvador Thumé USP
Moacir Antonelli Ponti USP

Resumo

Convolutional Neural Networks require large amounts of labeled data in order to be trained. To improve such performances, a practical approach widely used is to augment the training set data, generating compatible data. Standard data augmentation for images includes conventional techniques, such as rotation, shift, and flip. In this paper, we go beyond such methods by studying alternative augmentation procedures for cross-dataset scenarios, in which a source dataset is used for training and a target dataset is used for testing. Through an extensive analysis considering different paradigms, saturation, and combination procedures, we provide guidelines for using augmentation methods in favor of transfer learning scenarios. As a novel approach for self-supervised learning, we also propose data augmentation techniques as pseudo labels during training. Our techniques demonstrate themselves as robust alternatives for different domains of transfer learning, including benefiting scenarios for self-supervised learning.

Palavras-chave: Training, Graphics, Transfer learning, Labeling, Convolutional neural networks, Standards, Guidelines, transfer learning, deep learning, data augmentation