A Survey on Deep Learning with Noisy Labels: How to train your model when you cannot trust on the annotations?

Filipe R. Cordeiro; Gustavo Carneiro

Filipe R. Cordeiro UFRPE
Gustavo Carneiro University of Adelaide

Resumo

Noisy Labels are commonly present in data sets automatically collected from the internet, mislabeled by nonspecialist annotators, or even specialists in a challenging task, such as in the medical field. Although deep learning models have shown significant improvements in different domains, an open issue is their ability to memorize noisy labels during training, reducing their generalization potential. As deep learning models depend on correctly labeled data sets and label correctness is difficult to guarantee, it is crucial to consider the presence of noisy labels for deep learning training. Several approaches have been proposed in the literature to improve the training of deep learning models in the presence of noisy labels. This paper presents a survey on the main techniques in literature, in which we classify the algorithm in the following groups: robust losses, sample weighting, sample selection, meta-learning, and combined approaches. We also present the commonly used experimental setup, data sets, and results of the state-of-the-art models.

Palavras-chave: noisy labels, deep learning, survey