Iris-CV: Classifying Iris Flowers Is Not as Easy as You Thought

Itamar de Paiva Rocha Filho; João Pedro Vasconcelos Teixeira; João Wallace Lucena Lins; Felipe Honorato de Sousa; Ana Clara Chaves Sousa; Manuel Ferreira Junior; Thaís Ramos; Cecília Silva; Thaís Gaudencio do Rêgo; Yuri de Almeida Malheiros; Telmo Silva Filho

Itamar de Paiva Rocha Filho UFPB
João Pedro Vasconcelos Teixeira UFPB
João Wallace Lucena Lins UFPB
Felipe Honorato de Sousa UFPB
Ana Clara Chaves Sousa UFPB
Manuel Ferreira Junior UFPB
Thaís Ramos UFRN
Cecília Silva UFPE
Thaís Gaudencio do Rêgo UFPB
Yuri de Almeida Malheiros UFPB
Telmo Silva Filho UFPB

Resumo

The iris flower dataset is a ubiquitous benchmark task in machine learning literature. With its 150 instances, four continuous features, and three balanced classes, of which one is linearly separable from the others, iris is generally considered an easy problem. Hence researchers usually rely on other datasets when they need more challenging benchmarks. A similar situation happens with computer vision datasets such as MNIST and ImageNet, which have been widely explored. The state of the art models essentially solves these problems, motivating the search for more challenging tasks. Therefore, this paper introduces a new computer vision toy dataset featuring iris flowers. Users of a nature photography application took the pictures, thus they include noisy background information. Additionally, certain desirable features are not guaranteed, such as single, similarly-sized objects at the center of each picture, which makes the task more challenging. Our benchmark results show that the dataset can be challenging for traditional machine learning algorithms without any pre-processing steps, while state of the art deep learning architectures achieve around 82% accuracy, which means some effort will be necessary to drive this accuracy closer to what has been accomplished for MNIST and ImageNet.

Palavras-chave: Computer vision, Dataset, Machine learning