Glaucoma Diagnosis in Retinal Images from Portable Ophthalmoscope Using Transformer-Based Ensembles

  • Rodrigo Otávio C. Costa UFMA
  • Patrik Oliveira Pimentel UFMA
  • Alexandre Cesar P. Pessoa UFMA
  • Geraldo Braz Júnior UFMA
  • João Dallyson S. Almeida UFMA

Abstract


This paper explores the use of Transformers-based models for the detection of glaucoma in portable ophthalmoscope retinograms, addressing the growing need for accessible diagnostics in regions with limited resources. Glaucoma is a chronic eye disease that can lead to irreversible blindness if not diagnosed early, and the implementation of automatic methodologies to aid screening is essential. The research uses a dataset made up of 2,000 fundus images, captured under practical conditions and with reduced quality, to train and evaluate the models. The results show that Transformers models, such as SwinV2, BEiT, DeiT and ViT, perform competitively compared to previous approaches based on convolutional networks, highlighting the potential of these models in classifying glaucoma from low-resolution retinograms. Combining the models’ predictions in a Ensemble resulted in an average accuracy of 93.25%, demonstrating the effectiveness of the proposed approach.

References

Angara, S. and Kim, J. (2024). Deep ensemble learning for classification of glaucoma from smartphone fundus images. In 2024 IEEE 37th International Symposium on Computer-Based Medical Systems (CBMS), pages 412–417. IEEE.

Bao, H., Dong, L., Piao, S., and Wei, F. (2021). Beit: Bert pre-training of image transformers. arXiv preprint arXiv:2106.08254.

Bragança, C. P., Torres, J. M., Soares, C. P. d. A., and Macedo, L. O. (2022). Detection of glaucoma on fundus images using deep learning on a new image set obtained with a smartphone and handheld ophthalmoscope. In Healthcare, volume 10, page 2345. MDPI.

Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee.

Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929.

Gulshan, V., Peng, L., Coram, M., Stumpe, M. C., Wu, D., Narayanaswamy, A., Venugopalan, S., Widner, K., Madams, T., Cuadros, J., et al. (2016). Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. jama, 316(22):2402–2410.

Jonas, J. B., Aung, T., Bourne, R. R., Bron, A. M., Ritch, R., and Panda-Jonas, S. (2017). Glaucoma. The Lancet, 390(10108):2183–2193.

Kyari, F., Nolan, W., and Gilbert, C. (2016). Ophthalmologists’ practice patterns and challenges in achieving optimal management for glaucoma in nigeria: results from a nationwide survey. BMJ open, 6(10):e012230.

Liu, Z., Hu, H., Lin, Y., Yao, Z., Xie, Z., Wei, Y., Ning, J., Cao, Y., Zhang, Z., Dong, L., et al. (2022). Swin transformer v2: Scaling up capacity and resolution. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12009–12019.

Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., and Guo, B. (2021). Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision, pages 10012–10022.

Madhu, K., John, S. M., Joseph, A., and Abraham, B. (2024). Glaucoma diagnosis from smartphone captured fundus images using deep learning. In 2024 11th International Conference on Advances in Computing and Communications (ICACC), pages 1–6. IEEE.

Moreira, J. M. M., de Almeida, J. D. S., Junior, G. B., and de Paiva, A. C. (2021). Detecçao de glaucoma usando redes em cápsula. In Simpósio Brasileiro de Computação Aplicada à Saúde (SBCAS), pages 188–199. SBC.

Mrad, Y., Elloumi, Y., Akil, M., and Bedoui, M. H. (2022). A fast and accurate method for glaucoma screening from smartphone-captured fundus images. Irbm, 43(4):279–289.

Organization, W. H. et al. (2019). World report on vision. In World report on vision.

Resnikoff, S., Lansingh, V. C., Washburn, L., Felch, W., Gauthier, T.-M., Taylor, H. R., Eckert, K., Parke, D., and Wiedemann, P. (2020). Estimated number of ophthalmologists worldwide (international council of ophthalmology update): will we meet the needs? British Journal of Ophthalmology, 104(4):588–592.

Sarhan, A., Rokne, J., and Alhajj, R. (2019). Glaucoma detection using image processing techniques: A literature review. Computerized Medical Imaging and Graphics, 78:101657.

Soofi, A. A. et al. (2023). Exploring deep learning techniques for glaucoma detection: a comprehensive review. arXiv preprint arXiv:2311.01425.

Takahashi, S., Sakaguchi, Y., Kouno, N., Takasawa, K., Ishizu, K., Akagi, Y., Aoyama, R., Teraya, N., Bolatkan, A., Shinkai, N., et al. (2024). Comparison of vision transformers and convolutional neural networks in medical image analysis: a systematic review. Journal of Medical Systems, 48(1):84.

Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., and Jégou, H. (2021). Training data-efficient image transformers & distillation through attention. In International conference on machine learning, pages 10347–10357. PMLR.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., and Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30.

Wilimitis, D. and Walsh, C. G. (2023). Practical considerations and applied examples of cross-validation for model development and evaluation in health care: tutorial. Jmir ai, 2:e49023.

Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., Cistac, P., Rault, T., Louf, R., Funtowicz, M., Davison, J., Shleifer, S., von Platen, P., Ma, C., Jernite, Y., Plu, J., Xu, C., Scao, T. L., Gugger, S., Drame, M., Lhoest, Q., and Rush, A. M. (2020). Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 38–45, Online. Association for Computational Linguistics.

Wu, B., Xu, C., Dai, X., Wan, A., Zhang, P., Yan, Z., Tomizuka, M., Gonzalez, J., Keutzer, K., and Vajda, P. (2020). Visual transformers: Token-based image representation and processing for computer vision.
Published
2025-06-09
COSTA, Rodrigo Otávio C.; PIMENTEL, Patrik Oliveira; PESSOA, Alexandre Cesar P.; BRAZ JÚNIOR, Geraldo; ALMEIDA, João Dallyson S.. Glaucoma Diagnosis in Retinal Images from Portable Ophthalmoscope Using Transformer-Based Ensembles. In: BRAZILIAN SYMPOSIUM ON COMPUTING APPLIED TO HEALTH (SBCAS), 25. , 2025, Porto Alegre/RS. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2025 . p. 449-460. ISSN 2763-8952. DOI: https://doi.org/10.5753/sbcas.2025.7270.

Most read articles by the same author(s)

1 2 > >>