Improving Automated Species Identification in Herbaria Using Contrastive Learning
Resumo
Herbaria document plant specimens by preserving them as dried samples mounted on cardboard with metadata. These specimens are essential for botanical research but require accurate identification, a process that remains a major bottleneck due to its manual, error-prone nature and dependence on specialists. Recently, initiatives to mitigate this problem have leveraged images available in virtual repositories to create and provide datasets for training machine learning models that assist in species identification. However, the performance of these models has not been satisfactory for many botanical families due to factors such as high inter-species similarity, large intra-species variability, and particularly long-tailed distributions where only a few samples represent many species. Our work shows how Contrastive Learning (CL) methods improve the automated identification of species in herbaria. We evaluated three well-known CL frameworks on a dataset of Piperaceae specimens – a botanical family that is inherently complex to identify – and compared their results to classical learning approaches. Our experiments achieved an 18-percentage-point improvement over the baseline method. We also showed that CL methods can achieve strong performance in herbarium specimen classification, even under scenarios with limited labeled training data, making them a valuable tool to support experts in the identification.
Publicado
29/09/2025
Como Citar
VIEIRA, Alisson da Silva; BERTOLINI, Diego; OLIVEIRA, Luiz S.; SCHWERZ, André Luis.
Improving Automated Species Identification in Herbaria Using Contrastive Learning. In: BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS), 35. , 2025, Fortaleza/CE.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2025
.
p. 349-363.
ISSN 2643-6264.
