Soybean Weeds Segmentation Using VT-Net: A Convolutional-Transformer Model

Lucas Silva; Paulo Drews; Rodrigo de Bem

Lucas Silva FURG
Paulo Drews FURG
Rodrigo de Bem FURG

Resumo

The use of machine learning and computer vision in areas related to agriculture has grown significantly in the last few years, allowing for higher precision and efficiency in several processes. In this context, the present work aims at the development of a neural network for the segmentation of images containing weeds in soybean cultivation. We developed a new hybrid model based on convolutional neural networks (CNNs) and vision transformers (ViT), a neural network that uses a self-attention mechanism. Importantly, we also extended the well-known DeepWeeds dataset with segmentation labels, mitigating the lack of publicly available training data in the literature. We compare our hybrid model with state-of-the-art Transformer segmentation networks, such as BEiT and Mask2Former. Our approach obtains results equivalent to them with the advantage of employing fewer layers than the competitors. To the best of our knowledge, this work is the first to use a hybrid convolutional model, with a pure ViT backbone for the segmentation of soybean weeds.