Diagnóstico de Glaucoma em Retinografias de Oftalmoscópio Portátil Utilizando Ensemble Baseado em Transformers

Rodrigo Otávio C. Costa; Patrik Oliveira Pimentel; Alexandre Cesar P. Pessoa; Geraldo Braz Júnior; João Dallyson S. Almeida

doi:10.5753/sbcas.2025.7270

Rodrigo Otávio C. Costa UFMA
Patrik Oliveira Pimentel UFMA
Alexandre Cesar P. Pessoa UFMA
Geraldo Braz Júnior UFMA
João Dallyson S. Almeida UFMA

DOI: https://doi.org/10.5753/sbcas.2025.7270

Abstract

This paper explores the use of Transformers-based models for the detection of glaucoma in portable ophthalmoscope retinograms, addressing the growing need for accessible diagnostics in regions with limited resources. Glaucoma is a chronic eye disease that can lead to irreversible blindness if not diagnosed early, and the implementation of automatic methodologies to aid screening is essential. The research uses a dataset made up of 2,000 fundus images, captured under practical conditions and with reduced quality, to train and evaluate the models. The results show that Transformers models, such as SwinV2, BEiT, DeiT and ViT, perform competitively compared to previous approaches based on convolutional networks, highlighting the potential of these models in classifying glaucoma from low-resolution retinograms. Combining the models’ predictions in a Ensemble resulted in an average accuracy of 93.25%, demonstrating the effectiveness of the proposed approach.

References

Angara, S. and Kim, J. (2024). Deep ensemble learning for classification of glaucoma from smartphone fundus images. In 2024 IEEE 37th International Symposium on Computer-Based Medical Systems (CBMS), pages 412–417. IEEE.

Bao, H., Dong, L., Piao, S., and Wei, F. (2021). Beit: Bert pre-training of image transformers. arXiv preprint arXiv:2106.08254.

Bragança, C. P., Torres, J. M., Soares, C. P. d. A., and Macedo, L. O. (2022). Detection of glaucoma on fundus images using deep learning on a new image set obtained with a smartphone and handheld ophthalmoscope. In Healthcare, volume 10, page 2345. MDPI.

Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee.

Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929.

Gulshan, V., Peng, L., Coram, M., Stumpe, M. C., Wu, D., Narayanaswamy, A., Venugopalan, S., Widner, K., Madams, T., Cuadros, J., et al. (2016). Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. jama, 316(22):2402–2410.

Jonas, J. B., Aung, T., Bourne, R. R., Bron, A. M., Ritch, R., and Panda-Jonas, S. (2017). Glaucoma. The Lancet, 390(10108):2183–2193.

Kyari, F., Nolan, W., and Gilbert, C. (2016). Ophthalmologists’ practice patterns and challenges in achieving optimal management for glaucoma in nigeria: results from a nationwide survey. BMJ open, 6(10):e012230.

Liu, Z., Hu, H., Lin, Y., Yao, Z., Xie, Z., Wei, Y., Ning, J., Cao, Y., Zhang, Z., Dong, L., et al. (2022). Swin transformer v2: Scaling up capacity and resolution. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12009–12019.

Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., and Guo, B. (2021). Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision, pages 10012–10022.

Madhu, K., John, S. M., Joseph, A., and Abraham, B. (2024). Glaucoma diagnosis from smartphone captured fundus images using deep learning. In 2024 11th International Conference on Advances in Computing and Communications (ICACC), pages 1–6. IEEE.

Moreira, J. M. M., de Almeida, J. D. S., Junior, G. B., and de Paiva, A. C. (2021). Detecçao de glaucoma usando redes em cápsula. In Simpósio Brasileiro de Computação Aplicada à Saúde (SBCAS), pages 188–199. SBC.

Mrad, Y., Elloumi, Y., Akil, M., and Bedoui, M. H. (2022). A fast and accurate method for glaucoma screening from smartphone-captured fundus images. Irbm, 43(4):279–289.

Organization, W. H. et al. (2019). World report on vision. In World report on vision.

Resnikoff, S., Lansingh, V. C., Washburn, L., Felch, W., Gauthier, T.-M., Taylor, H. R., Eckert, K., Parke, D., and Wiedemann, P. (2020). Estimated number of ophthalmologists worldwide (international council of ophthalmology update): will we meet the needs? British Journal of Ophthalmology, 104(4):588–592.

Sarhan, A., Rokne, J., and Alhajj, R. (2019). Glaucoma detection using image processing techniques: A literature review. Computerized Medical Imaging and Graphics, 78:101657.

Soofi, A. A. et al. (2023). Exploring deep learning techniques for glaucoma detection: a comprehensive review. arXiv preprint arXiv:2311.01425.

Takahashi, S., Sakaguchi, Y., Kouno, N., Takasawa, K., Ishizu, K., Akagi, Y., Aoyama, R., Teraya, N., Bolatkan, A., Shinkai, N., et al. (2024). Comparison of vision transformers and convolutional neural networks in medical image analysis: a systematic review. Journal of Medical Systems, 48(1):84.

Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., and Jégou, H. (2021). Training data-efficient image transformers & distillation through attention. In International conference on machine learning, pages 10347–10357. PMLR.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., and Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30.

Wilimitis, D. and Walsh, C. G. (2023). Practical considerations and applied examples of cross-validation for model development and evaluation in health care: tutorial. Jmir ai, 2:e49023.

Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., Cistac, P., Rault, T., Louf, R., Funtowicz, M., Davison, J., Shleifer, S., von Platen, P., Ma, C., Jernite, Y., Plu, J., Xu, C., Scao, T. L., Gugger, S., Drame, M., Lhoest, Q., and Rush, A. M. (2020). Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 38–45, Online. Association for Computational Linguistics.

Wu, B., Xu, C., Dai, X., Wan, A., Zhang, P., Yan, Z., Tomizuka, M., Gonzalez, J., Keutzer, K., and Vajda, P. (2020). Visual transformers: Token-based image representation and processing for computer vision.

Glaucoma Diagnosis in Retinal Images from Portable Ophthalmoscope Using Transformer-Based Ensembles

Abstract

References

Most read articles by the same author(s)