Reducing the need for bounding box annotations in Object Detection using Image Classification data

Leonardo Blanger; Nina S. T. Hirata; Xiaoyi Jiang

Leonardo Blanger USP
Nina S. T. Hirata USP
Xiaoyi Jiang University of Münster

Resumo

We address the problem of training Object Detection models using significantly less bounding box annotated images. For that, we take advantage of cheaper and more abundant image classification data. Our proposal consists in automatically generating artificial detection samples, with no need of expensive detection level supervision, using images with classification labels only. We also detail a pretraining initialization strategy for detection architectures using these artificially synthesized samples, before finetuning on real detection data, and experimentally show how this consistently leads to more data efficient models. With the proposed approach, we were able to effectively use only classification data to improve results on the harder and more supervision hungry object detection problem. We achieve results equivalent to those of the full data scenario using only a small fraction of the original detection data for Face, Bird, and Car detection.

Palavras-chave: Training, Graphics, Annotations, Object detection, Birds, Data models, Proposals, sample synthesis, object detection, deep learning, pretraining