Development of an Artificial Intelligence-Aided Software for Annotating Image Datasets

Paulo Victor de Magalhães Rozatto; Luiz Maurílio da Silva Maciel; Marcelo Bernardes Vieira; Saulo Moraes Villela

doi:10.5753/sbsi.2025.246365

Paulo Victor de Magalhães Rozatto UFJF http://orcid.org/0000-0003-0223-7207
Luiz Maurílio da Silva Maciel UFJF https://orcid.org/0000-0001-9193-2302
Marcelo Bernardes Vieira UFJF https://orcid.org/0000-0003-3356-6679
Saulo Moraes Villela UFJF https://orcid.org/0000-0001-5958-4766

DOI: https://doi.org/10.5753/sbsi.2025.246365

Resumo

Context: Deep learning is a highly successful class of methods in the field of artificial intelligence (AI) that has a variety of applications. To perform well, deep learning models require a large amount of high-quality annotated data. Problem: Data annotation is a time-consuming and laborious task that requires a significant amount of human labor, which makes it expensive. Solution: This work aims to reduce the time required to annotate image datasets by building an easy-to-use software tool that has semi-automated annotation powered by an artificial intelligence model. IS Theory: The work is based on the Socio-technical theory because we developed and evaluated a tool to be acceptable and useful for the users. Method: We developed a web-based tool and employed HQ-SAM, a deep neural network for image segmentation based on Vision Transformers, to generate polygon annotations based on the user’s prompts. Although HQ-SAM has a good zero-shot generalizability, we fine-tuned it on the Bean Leaf Dataset to evaluate how well the network adapts to specific tasks. Summary of results: We observed an increase in accuracy of the fine-tuned model compared to the pre-trained one. We tested our tool with 20 participants, all of whom are from the computer vision and graphics fields. We asked them to annotate the same two images both manually and AI-aided, and recorded the annotation times. Lastly, we asked the participants to fill out a usability form about their user experience. In our evaluation, we registered a median speedup of 1.5× regarding the AI-aided annotation compared to manual annotation and overly positive answers regarding our tool’s ease of use and usefulness. Contribution: We expect the proposed system to significantly reduce the human effort required for image dataset annotation, leading to faster annotation times. This offers a valuable contribution to the computer vision and AI communities, speeding up the dataset creation process.

Palavras-chave: Deep learning, Web application, Semi-automated annotation, Image datasets, Vision Transformers

Referências

H Abbas and G Katina. 2023. Socio-Technical Theory. Trist & Bamforth 1, 2 (2023), 01–16.

Roobaea Alroobaea and Pam J Mayhew. 2014. How many participants are really enough for usability studies?. In 2014 Science and Information Conference. IEEE, 48–56.

Renata Araújo and Rita Suzana. 2017. Grand research challenges in information systems in brazil 2016–2026. Brazilian Computer Society. Clodis Boscarioli Renata Araujo and Rita Suzana 5, 1 (2017), 2016–2026.

Prachya Boonsri and Yachai Limpiyakorn. 2023. Semi-Automated Image Annotation for Cannabis Seed Gender Detection Model. In 2023 IEEE 3rd International Conference on Software Engineering and Artificial Intelligence (SEAI). IEEE, 189–193.

Lluis Castrejon, Kaustav Kundu, Raquel Urtasun, and Sanja Fidler. 2017. Annotating object instances with a polygon-rnn. In Proceedings of the IEEE conference on computer vision and pattern recognition. 5230–5238.

Bowen Cheng, Ross Girshick, Piotr Dollár, Alexander C Berg, and Alexander Kirillov. 2021. Boundary IoU: Improving object-centric image segmentation evaluation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 15334–15342.

Karla Gabriele Florentino da Silva, Paulo Victor de Magalhães Rozatto, et al. 2025. Bean leaf image dataset annotated with leaf dimensions, segmentation masks, and camera calibration. Data in Brief (2025), 111328.

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2021. An Image is Worth 16x16Words: Transformers for Image Recognition at Scale. In International Conference on Learning Representations.

David H Douglas and Thomas K Peucker. 1973. Algorithms for the reduction of the number of points required to represent a digitized line or its caricature. Cartographica: the international journal for geographic information and geovisualization 10, 2 (1973), 112–122.

Ionuţ Fîciu, Radu Stîlpeanu, Anca Petre, Carmen Pătraşcu, Mihai Ciuc, et al. 2018. Automatic Annotation of Object Instances by Region-Based Recurrent Neural Networks. In 2018 IEEE 14th International Conference on Intelligent Computer Communication and Processing (ICCP). IEEE, 287–291.

Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017. Mask r-cnn. In Proceedings of the IEEE international conference on computer vision. 2961–2969.

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770–778.

Lei Ke et al. 2023. Segment Anything in High Quality. In NeurIPS.

Alexander Kirillov et al. 2023. Segment anything. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 4015–4026.

Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. nature 521, 7553 (2015), 436–444.

Rensis Likert. 1932. A technique for the measurement of attitudes. Archives of psychology (1932).

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13. Springer, 740–755.

Shervin Minaee, Yuri Boykov, et al. 2021. Image segmentation using deep learning: A survey. IEEE transactions on pattern analysis and machine intelligence 44, 7 (2021), 3523–3542.

Kenneth A Philbrick et al. 2019. RIL-contour: a medical imaging dataset annotation tool for and with deep learning. Journal of digital imaging 32 (2019), 571–581.

Krittaphat Pugdeethosapol, Morgan Bishop, Dennis Bowen, and Qinru Qiu. 2020. Automatic Image Labeling with Click Supervision on Aerial Images. In 2020 International Joint Conference on Neural Networks (IJCNN). IEEE, 1–8.

Xuebin Qin, Shida He, Zichen Zhang, Masood Dehghan, and Martin Jagersand. 2018. Bylabel: A boundary based semi-automatic image annotation tool. In 2018 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, 1804–1813.

Bhavani Sambaturu, Ashutosh Gupta, CV Jawahar, and Chetan Arora. 2023. ScribbleNet: Efficient interactive annotation of urban city scenes for semantic segmentation. Pattern Recognition 133 (2023), 109011.

Melvyn L Smith, Lyndon N Smith, and Mark F Hansen. 2021. The quiet revolution in machine vision-a state-of-the-art survey paper, including historical review, perspectives, and future directions. Computers in Industry 130 (2021), 103472.

Michael Sony and Subhash Naik. 2020. Industry 4.0 integration with sociotechnical systems theory: A systematic review and proposed theoretical model. Technology in society 61 (2020), 101248.

Satoshi Suzuki et al. 1985. Topological structural analysis of digitized binary images by border following. Computer vision, graphics, and image processing 30, 1 (1985), 32–46.

Cihan Topal and Cuneyt Akinlar. 2012. Edge drawing: a combined real-time edge and segment detector. Journal of Visual Communication and Image Representation 23, 6 (2012), 862–872.

Antonio Torralba, Bryan C Russell, and Jenny Yuen. 2010. Labelme: Online image annotation and applications. Proc. IEEE 98, 8 (2010), 1467–1484.

Ashish Vaswani, Noam Shazeer, et al. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017).

Xue Ying. 2019. An overview of overfitting and its solutions. In Journal of physics: Conference series, Vol. 1168. IOP Publishing, 022022.

Jiaxin Yu, Florian Wellmann, et al. 2023. Superpixel segmentations for thin sections: Evaluation of methods to enable the generation of machine learning training data sets. Computers & Geosciences 170 (2023), 105232.

Zhong-Qiu Zhao, Peng Zheng, et al. 2019. Object detection with deep learning: A review. IEEE transactions on neural networks and learning systems 30, 11 (2019), 3212–3232.

Development of an Artificial Intelligence-Aided Software for Annotating Image Datasets

Resumo

Referências

Artigos mais lidos do(s) mesmo(s) autor(es)