Comparação de arquiteturas CNNs e transformers na triagem automatizada de retinopatia diabética

  • Danilo Leite UTAD
  • Roberto Mendes UFPB
  • Arthur Custódio UFPB
  • Alline Veloso UFPB
  • Sabrina Ferraz UFPB
  • Mateus Ramalho Unipê
  • José Câmara UTAD
  • Ronei Moraes UFPB

Resumo


Objetivo: comparar CNNs e Transformers na triagem automatizada de retinopatia diabética (RD) com o BRSET. Método: tarefa binária em 16.266 imagens; pipeline padronizado, aumento de dados, validação cruzada estratificada (5-fold) e teste independente; métricas: acurácia, precisão, sensibilidade, F1 e Kappa. Resultados: o ConvNeXtV2 obteve melhor equilíbrio entre sensibilidade e precisão (Accuracy=0,981; F1=0,848; Kappa=0,838), superando EfficientNetV2M e MaxViT; o SwinV2 apresentou o pior desempenho. Conclusão: O método ConvNeXtV2 mostrou desempenho mais consistente, sugerindo que a escolha deve considerar a natureza das lesões e a representação espacial para maximizar a sensibilidade.

Referências

Akhtar, S. et al. (2025) A deep learning based model for diabetic retinopathy grading. Scientific Reports, v. 15, n. 1, p. 3763.

Akiba, T. et al. (2019) Optuna: A Next-generation Hyperparameter Optimization Framework. arXiv.

Alayón, S. et al. (2023) Comparison of the Performance of Convolutional Neural Networks and Vision Transformer-Based Systems for Automated Glaucoma Detection with Eye Fundus Images. Applied Sciences, v. 13, n. 23, p. 12722.

Bajwa, M. N. et al. (2020) G1020: A Benchmark Retinal Fundus Image Dataset for Computer-Aided Glaucoma Detection. In: 2020 International Joint Conference on Neural Networks (IJCNN). IEEE.

Camara, J. et al. (2022) Literature Review on Artificial Intelligence Methods for Glaucoma Screening, Segmentation, and Classification. Journal of Imaging, v. 8, n. 2, p. 19.

Cohen, J. (1960) A Coefficient of Agreement for Nominal Scales. Educational and Psychological Measurement, v. 20, n. 1, p. 37–46.

D S, R.; Saji, K. S. (2025) Hybrid deep learning framework for diabetic retinopathy classification with optimized attention AlexNet. Computers in Biology and Medicine, v. 190, p. 110054.

Elmoufidi, A.; Jai-andaloussi, S. (2021) CNN with Multiple Input for automatic glaucoma assessment using Fundus Images. In Review.

Fan, R. et al. (2023) Detecting Glaucoma from Fundus Photographs Using Deep Learning without Convolutions. Ophthalmology Science, v. 3, n. 1, p. 100233.

Ferreira, J. S. et al. (2024) Application of Vision Transformers in the Early Detection of Excavation in the BRSET Base. In: Proceedings of DSAI 2024. ACM.

Fumero Batista, F. J. et al. (2020) RIM-ONE DL: A Unified Retinal Image Database for Assessing Glaucoma Using Deep Learning. Image Analysis & Stereology, v. 39, n. 3, p. 161–167.

Han, K. et al. (2023) A Survey on Vision Transformer. IEEE Transactions on Pattern Analysis and Machine Intelligence, v. 45, n. 1, p. 87–110.

Howard, A. G. et al. (2017) MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv.

Japkowicz, N.; Stephen, S. (2002) The class imbalance problem: A systematic study. Intelligent Data Analysis, v. 6, n. 5, p. 429–449.

Leite, D. R. A.; De Moraes, R. M.; Lopes, L. W. (2022) Different Performances of Machine Learning Models to Classify Dysphonic and Non-Dysphonic Voices. Journal of Voice.

Lestari, Y. D. et al. (2025) Diabetic retinopathy screening model in low and middleincome countries: a scoping review. BMC Public Health, v. 25, n. 1, p. 4210.

Liu, Z. et al. (2022) Swin Transformer V2: Scaling Up Capacity and Resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), p. 12009–12019.

Loshchilov, I.; Hutter, F. (2019) Decoupled Weight Decay Regularization. ICLR 2019.

Nakayama, L. F. et al. (2023) BRSET: A Brazilian Multilabel Ophthalmological Dataset of Retina Fundus Photos. Scientific Data, v. 10, Article 283.

Saito, T.; Rehmsmeier, M. (2015) The Precision-Recall Plot Is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets. PLOS ONE, v. 10, n. 3, p. e0118432.

Santos, C. et al. (2026) Brazilian Dataset for Retinal Lesion Analysis: A Deep Learning Diagnostic Pipeline. Journal of Health Informatics, v. 18.

Sivaswamy, J. et al. (2014) Drishti-GS: Retinal Image Dataset for Optic Nerve Head (ONH) Segmentation. In: Proceedings of the 2014 IEEE 11th International Symposium on Biomedical Imaging (ISBI), p. 53–56.

Tan, M.; Le, Q. V. (2021) EfficientNetV2: Smaller Models and Faster Training. In: Proceedings of the 38th International Conference on Machine Learning (ICML), p. 10096–10106.

Teoh, C. S. et al. (2023) Variability in Grading Diabetic Retinopathy Using Retinal Photography and Its Comparison with an Automated Deep Learning Diabetic Retinopathy Screening Software. Healthcare, v. 11, n. 12, p. 1697.

Tu, Z. et al. (2022) MaxViT: Multi-axis Vision Transformer. In: Computer Vision – ECCV 2022. Lecture Notes in Computer Science. Springer, v. 13684, p. 459–479.

Viera, A. J.; Garrett, J. M. (2005) Understanding Interobserver Agreement: The Kappa Statistic. Family Medicine, v. 37, p. 360–363.

Woo, S. et al. (2023) ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE.

Wu, J.-H. et al. (2023) Vision transformers: The next frontier for deep learning-based ophthalmic image analysis. Saudi Journal of Ophthalmology, v. 37, n. 3, p. 173–178.

Yang, Y. et al. (2024) Vision transformer with masked autoencoders for referable diabetic retinopathy classification based on large-size retina image. PLOS ONE, v. 19, n. 3, p. e0299265.

Zhang, Z. et al. (2010) ORIGA-light: An Online Retinal Fundus Image Database for Glaucoma Analysis and Research. In: Proceedings of the 2010 Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), p. 3065–3068.
Publicado
01/06/2026
LEITE, Danilo; MENDES, Roberto; CUSTÓDIO, Arthur; VELOSO, Alline; FERRAZ, Sabrina; RAMALHO, Mateus; CÂMARA, José; MORAES, Ronei. Comparação de arquiteturas CNNs e transformers na triagem automatizada de retinopatia diabética. In: SIMPÓSIO BRASILEIRO DE COMPUTAÇÃO APLICADA À SAÚDE (SBCAS), 26. , 2026, Ouro Preto/MG. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2026 . p. 824-833. ISSN 2763-8952. DOI: https://doi.org/10.5753/sbcas.2026.21542.