Self-Supervised Few-Shot Pill Image Recognition

Luan Sousa Cordeiro; Saulo A. F. de Oliveira; Francisco Nivando Bezerra

Luan Sousa Cordeiro IFCE
Saulo A. F. de Oliveira IFCE
Francisco Nivando Bezerra IFCE

Resumo

Pill recognition is a critical task in healthcare, with direct implications for patient safety. Yet, building accurate recognition systems remains challenging, especially when faced with limited labeled examples per pill category. Real-world pill images undergo high visual variability due to lighting, wear, and noise. These conditions give rise to hard negatives (distinct pills that look alike) and hard positives (instances of the same pill that appear different). To tackle these challenges, we propose SSL4PILL, a two-stage framework that leverages self-supervised learning for few-shot pill recognition. First, we train an Augmented Multiscale Deep InfoMax (AMDIM) model with self-supervision on the mini-ImageNet dataset, allowing the model to learn rich semantic representations from unlabeled images. Then, we fine-tune this pre-trained model on a pill dataset with a prototypical network (ProtoNet) under a 1-shot 5-way setting. This new approach promotes generalization to novel pill categories within a fine-grained domain, minimizing annotation costs. Our model consistently outperforms prior baselines, achieving 83.72% accuracy on the NLM PIR dataset and 60.78% on CURE, representing substantial improvements over previous few-shot models. These results demonstrate the effectiveness of self-supervised learning for generalizing to novel pill categories, highlighting its potential for real-world healthcare settings.