ZEDD-G: ZEro-Shot Diffusion-Based Data Generation for Improving Classification Tasks
Resumo
Generating synthetic data with diffusion models poses a challenge due to the scarcity of data in computer vision. However, its practical application presents challenges, including the need for computationally expensive retraining and the complex, manual effort required to design precise textual instructions. This work approaches these limitations by introducing the ZEDD-G method, a novel zero-shot and prompt-free framework for synthetic data augmentation specifically designed to enhance downstream classification tasks. Our methodology establishes a fully automated pipeline that begins with unsupervised clustering of visually similar images, followed by a multi-image latent guidance mechanism. This mechanism combines visual prompts with direct manipulation of latent features to generate diverse and controlled variations. Evaluated on a demanding classification benchmark, which includes natural and medical images, ZEDD-G demonstrates a substantial impact on performance. For a ResNet-50 model trained from scratch, our method boosts accuracy by an average of over 30 percentage points across diverse datasets. For a pre-trained model, it provides consistent gains of around 2.6 percentage points. These results are competitive with state-of-the-art fine-tuning-based methods, achieved without requiring any model retraining. Thus, ZEDD-G is an efficient method that enables the generation of high-quality synthetic data, thereby improving classification tasks. Our implementation is publicly available at https://github.com/Gardiy/ZEDDG.
