Impact of Pre-training Datasets on Human Activity Recognition with Contrastive Predictive Coding
Resumo
Self-Supervised Learning (SSL) techniques have been successfully employed to learn useful representations for various data modalities without labels. These techniques use a pretext task to train the backbone of a deep-learning model without labels and then leverage the pre-trained backbone to train a downstream model with a few labeled samples. In this context, Contrastive Predictive Coding (CPC) is an SSL technique that has demonstrated promising results in several tasks, including human activity recognition (HAR). In this work, we explore the impact of data variety on backbone pre-training when designing CPC models for HAR and the benefits of pre-training on the final model. We evaluated the impact of data variety on model pre-training using fifteen combinations of four distinct HAR datasets, finding significant performance variability based on the pre-training datasets, with F1-score varying from 9.6 to 13% points across different target datasets. We also found that including the target dataset in the pre-training process generally improved performance and that pre-training with all four datasets produced a high-quality backbone, yielding downstream models performing near the best on all target datasets. These findings emphasize the importance of selecting pre-training datasets aligned with the downstream task domain. Additionally, we demonstrated that CPC pre-training significantly benefits downstream model performance with limited data, achieving comparable F1-scores with just 5% of the data as with 100%, indicating that CPC effectively captures essential features of the problem domain.
Publicado
17/11/2024
Como Citar
SILVA, Betania E. R. da; NAPOLI, Otávio O.; DELGADO, J. V.; ROCHA, Anderson R.; BOCCATO, Levy; BORIN, Edson.
Impact of Pre-training Datasets on Human Activity Recognition with Contrastive Predictive Coding. In: BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS), 13. , 2024, Belém/PA.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2024
.
p. 306-320.
ISSN 2643-6264.