Improving Automated Species Identification in Herbaria Using Contrastive Learning

Alisson da Silva Vieira; Diego Bertolini; Luiz S. Oliveira; André Luis Schwerz

Alisson da Silva Vieira UTFPR
Diego Bertolini UTFPR
Luiz S. Oliveira UFPR
André Luis Schwerz UTFPR

Resumo

Herbaria document plant specimens by preserving them as dried samples mounted on cardboard with metadata. These specimens are essential for botanical research but require accurate identification, a process that remains a major bottleneck due to its manual, error-prone nature and dependence on specialists. Recently, initiatives to mitigate this problem have leveraged images available in virtual repositories to create and provide datasets for training machine learning models that assist in species identification. However, the performance of these models has not been satisfactory for many botanical families due to factors such as high inter-species similarity, large intra-species variability, and particularly long-tailed distributions where only a few samples represent many species. Our work shows how Contrastive Learning (CL) methods improve the automated identification of species in herbaria. We evaluated three well-known CL frameworks on a dataset of Piperaceae specimens – a botanical family that is inherently complex to identify – and compared their results to classical learning approaches. Our experiments achieved an 18-percentage-point improvement over the baseline method. We also showed that CL methods can achieve strong performance in herbarium specimen classification, even under scenarios with limited labeled training data, making them a valuable tool to support experts in the identification.