An Instance Level Analysis of Classification Difficulty for Unlabeled Data
Resumo
Instance hardness measures allow us to assess and understand why some observations from a dataset are difficult to classify. With this information, one may curate and cleanse the training dataset for improved data quality. However, these measures require data to be labeled. This limits their usage in the deployment stage when data is unlabeled. This paper investigates whether it is possible to identify observations that will be hard to classify despite their label. For such, two approaches are tested. The first adapts known instance hardness measures to the unlabeled scenario. The second learns regression meta-models to estimate the instance hardness of new data observations. In experiments, both approaches were better at identifying instances lying in borderline regions of the dataset, which pose a greater difficulty when the label is unknown.
Publicado
17/11/2024
Como Citar
UEDA, Patricia S. M.; RIVOLLI, Adriano; LORENA, Ana Carolina.
An Instance Level Analysis of Classification Difficulty for Unlabeled Data. In: BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS), 13. , 2024, Belém/PA.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2024
.
p. 141-155.
ISSN 2643-6264.