A First Look at Dataset Bias in License Plate Recognition

Rayson Laroca; Marcelo Santos; Valter Estevam; Eduardo Luz; David Menotti

Rayson Laroca UFPR
Marcelo Santos UFPR
Valter Estevam UFPR / IFPR
Eduardo Luz UFOP
David Menotti UFPR

Resumo

Public datasets have played a key role in advancing the state of the art in License Plate Recognition (LPR). Although dataset bias has been recognized as a severe problem in the computer vision community, it has been largely overlooked in the LPR literature. LPR models are usually trained and evaluated separately on each dataset. In this scenario, they have often proven robust in the dataset they were trained in but showed limited performance in unseen ones. Therefore, this work investigates the dataset bias problem in the LPR context. We performed experiments on eight datasets, four collected in Brazil and four in mainland China, and observed that each dataset has a unique, identifiable “signature” since a lightweight classification model predicts the source dataset of a license plate (LP) image with more than 95 % accuracy. In our discussion, we draw attention to the fact that most LPR models are probably exploiting such signatures to improve the results achieved in each dataset at the cost of losing generalization capability. These results emphasize the importance of evaluating LPR models in cross-dataset setups, as they provide a better indication of generalization (hence real-world performance) than within-dataset ones.

Palavras-chave: Graphics, Computer vision, Costs, Computational modeling, Predictive models, License plate recognition