Diagnóstico de Glaucoma em Retinografias de Oftalmoscópio Portátil Utilizando Ensemble Baseado em Transformers
Resumo
Este trabalho explora a utilização de modelos baseados em Transformers para a detecção de glaucoma em retinografias adquiridas por oftalmoscópio portátil, abordando a crescente necessidade de diagnósticos acessíveis em regiões com recursos limitados. O glaucoma é uma doença ocular crônica que pode levar à cegueira irreversível se não diagnosticada precocemente, sendo essencial a implementação de metodologias automáticas para auxiliar na triagem. A pesquisa utiliza um conjunto de dados composto por 2.000 imagens de fundo de olho, capturadas em condições práticas e com qualidade reduzida, para treinar e avaliar os modelos. Os resultados demonstram que os modelos de Transformers, como SwinV2, BEiT, DeiT e ViT, apresentam desempenho competitivo em comparação com abordagens anteriores baseadas em redes convolucionais, destacando o potencial desses modelos na classificação de glaucoma em retinografias de baixa resolução. A combinação das previsões dos modelos em um Ensemble resultou em uma acurácia média de 93,25%, evidenciando a eficácia da abordagem proposta.Referências
Angara, S. and Kim, J. (2024). Deep ensemble learning for classification of glaucoma from smartphone fundus images. In 2024 IEEE 37th International Symposium on Computer-Based Medical Systems (CBMS), pages 412–417. IEEE.
Bao, H., Dong, L., Piao, S., and Wei, F. (2021). Beit: Bert pre-training of image transformers. arXiv preprint arXiv:2106.08254.
Bragança, C. P., Torres, J. M., Soares, C. P. d. A., and Macedo, L. O. (2022). Detection of glaucoma on fundus images using deep learning on a new image set obtained with a smartphone and handheld ophthalmoscope. In Healthcare, volume 10, page 2345. MDPI.
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee.
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929.
Gulshan, V., Peng, L., Coram, M., Stumpe, M. C., Wu, D., Narayanaswamy, A., Venugopalan, S., Widner, K., Madams, T., Cuadros, J., et al. (2016). Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. jama, 316(22):2402–2410.
Jonas, J. B., Aung, T., Bourne, R. R., Bron, A. M., Ritch, R., and Panda-Jonas, S. (2017). Glaucoma. The Lancet, 390(10108):2183–2193.
Kyari, F., Nolan, W., and Gilbert, C. (2016). Ophthalmologists’ practice patterns and challenges in achieving optimal management for glaucoma in nigeria: results from a nationwide survey. BMJ open, 6(10):e012230.
Liu, Z., Hu, H., Lin, Y., Yao, Z., Xie, Z., Wei, Y., Ning, J., Cao, Y., Zhang, Z., Dong, L., et al. (2022). Swin transformer v2: Scaling up capacity and resolution. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12009–12019.
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., and Guo, B. (2021). Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision, pages 10012–10022.
Madhu, K., John, S. M., Joseph, A., and Abraham, B. (2024). Glaucoma diagnosis from smartphone captured fundus images using deep learning. In 2024 11th International Conference on Advances in Computing and Communications (ICACC), pages 1–6. IEEE.
Moreira, J. M. M., de Almeida, J. D. S., Junior, G. B., and de Paiva, A. C. (2021). Detecçao de glaucoma usando redes em cápsula. In Simpósio Brasileiro de Computação Aplicada à Saúde (SBCAS), pages 188–199. SBC.
Mrad, Y., Elloumi, Y., Akil, M., and Bedoui, M. H. (2022). A fast and accurate method for glaucoma screening from smartphone-captured fundus images. Irbm, 43(4):279–289.
Organization, W. H. et al. (2019). World report on vision. In World report on vision.
Resnikoff, S., Lansingh, V. C., Washburn, L., Felch, W., Gauthier, T.-M., Taylor, H. R., Eckert, K., Parke, D., and Wiedemann, P. (2020). Estimated number of ophthalmologists worldwide (international council of ophthalmology update): will we meet the needs? British Journal of Ophthalmology, 104(4):588–592.
Sarhan, A., Rokne, J., and Alhajj, R. (2019). Glaucoma detection using image processing techniques: A literature review. Computerized Medical Imaging and Graphics, 78:101657.
Soofi, A. A. et al. (2023). Exploring deep learning techniques for glaucoma detection: a comprehensive review. arXiv preprint arXiv:2311.01425.
Takahashi, S., Sakaguchi, Y., Kouno, N., Takasawa, K., Ishizu, K., Akagi, Y., Aoyama, R., Teraya, N., Bolatkan, A., Shinkai, N., et al. (2024). Comparison of vision transformers and convolutional neural networks in medical image analysis: a systematic review. Journal of Medical Systems, 48(1):84.
Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., and Jégou, H. (2021). Training data-efficient image transformers & distillation through attention. In International conference on machine learning, pages 10347–10357. PMLR.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., and Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30.
Wilimitis, D. and Walsh, C. G. (2023). Practical considerations and applied examples of cross-validation for model development and evaluation in health care: tutorial. Jmir ai, 2:e49023.
Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., Cistac, P., Rault, T., Louf, R., Funtowicz, M., Davison, J., Shleifer, S., von Platen, P., Ma, C., Jernite, Y., Plu, J., Xu, C., Scao, T. L., Gugger, S., Drame, M., Lhoest, Q., and Rush, A. M. (2020). Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 38–45, Online. Association for Computational Linguistics.
Wu, B., Xu, C., Dai, X., Wan, A., Zhang, P., Yan, Z., Tomizuka, M., Gonzalez, J., Keutzer, K., and Vajda, P. (2020). Visual transformers: Token-based image representation and processing for computer vision.
Bao, H., Dong, L., Piao, S., and Wei, F. (2021). Beit: Bert pre-training of image transformers. arXiv preprint arXiv:2106.08254.
Bragança, C. P., Torres, J. M., Soares, C. P. d. A., and Macedo, L. O. (2022). Detection of glaucoma on fundus images using deep learning on a new image set obtained with a smartphone and handheld ophthalmoscope. In Healthcare, volume 10, page 2345. MDPI.
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee.
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929.
Gulshan, V., Peng, L., Coram, M., Stumpe, M. C., Wu, D., Narayanaswamy, A., Venugopalan, S., Widner, K., Madams, T., Cuadros, J., et al. (2016). Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. jama, 316(22):2402–2410.
Jonas, J. B., Aung, T., Bourne, R. R., Bron, A. M., Ritch, R., and Panda-Jonas, S. (2017). Glaucoma. The Lancet, 390(10108):2183–2193.
Kyari, F., Nolan, W., and Gilbert, C. (2016). Ophthalmologists’ practice patterns and challenges in achieving optimal management for glaucoma in nigeria: results from a nationwide survey. BMJ open, 6(10):e012230.
Liu, Z., Hu, H., Lin, Y., Yao, Z., Xie, Z., Wei, Y., Ning, J., Cao, Y., Zhang, Z., Dong, L., et al. (2022). Swin transformer v2: Scaling up capacity and resolution. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12009–12019.
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., and Guo, B. (2021). Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision, pages 10012–10022.
Madhu, K., John, S. M., Joseph, A., and Abraham, B. (2024). Glaucoma diagnosis from smartphone captured fundus images using deep learning. In 2024 11th International Conference on Advances in Computing and Communications (ICACC), pages 1–6. IEEE.
Moreira, J. M. M., de Almeida, J. D. S., Junior, G. B., and de Paiva, A. C. (2021). Detecçao de glaucoma usando redes em cápsula. In Simpósio Brasileiro de Computação Aplicada à Saúde (SBCAS), pages 188–199. SBC.
Mrad, Y., Elloumi, Y., Akil, M., and Bedoui, M. H. (2022). A fast and accurate method for glaucoma screening from smartphone-captured fundus images. Irbm, 43(4):279–289.
Organization, W. H. et al. (2019). World report on vision. In World report on vision.
Resnikoff, S., Lansingh, V. C., Washburn, L., Felch, W., Gauthier, T.-M., Taylor, H. R., Eckert, K., Parke, D., and Wiedemann, P. (2020). Estimated number of ophthalmologists worldwide (international council of ophthalmology update): will we meet the needs? British Journal of Ophthalmology, 104(4):588–592.
Sarhan, A., Rokne, J., and Alhajj, R. (2019). Glaucoma detection using image processing techniques: A literature review. Computerized Medical Imaging and Graphics, 78:101657.
Soofi, A. A. et al. (2023). Exploring deep learning techniques for glaucoma detection: a comprehensive review. arXiv preprint arXiv:2311.01425.
Takahashi, S., Sakaguchi, Y., Kouno, N., Takasawa, K., Ishizu, K., Akagi, Y., Aoyama, R., Teraya, N., Bolatkan, A., Shinkai, N., et al. (2024). Comparison of vision transformers and convolutional neural networks in medical image analysis: a systematic review. Journal of Medical Systems, 48(1):84.
Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., and Jégou, H. (2021). Training data-efficient image transformers & distillation through attention. In International conference on machine learning, pages 10347–10357. PMLR.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., and Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30.
Wilimitis, D. and Walsh, C. G. (2023). Practical considerations and applied examples of cross-validation for model development and evaluation in health care: tutorial. Jmir ai, 2:e49023.
Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., Cistac, P., Rault, T., Louf, R., Funtowicz, M., Davison, J., Shleifer, S., von Platen, P., Ma, C., Jernite, Y., Plu, J., Xu, C., Scao, T. L., Gugger, S., Drame, M., Lhoest, Q., and Rush, A. M. (2020). Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 38–45, Online. Association for Computational Linguistics.
Wu, B., Xu, C., Dai, X., Wan, A., Zhang, P., Yan, Z., Tomizuka, M., Gonzalez, J., Keutzer, K., and Vajda, P. (2020). Visual transformers: Token-based image representation and processing for computer vision.
Publicado
09/06/2025
Como Citar
COSTA, Rodrigo Otávio C.; PIMENTEL, Patrik Oliveira; PESSOA, Alexandre Cesar P.; BRAZ JÚNIOR, Geraldo; ALMEIDA, João Dallyson S..
Diagnóstico de Glaucoma em Retinografias de Oftalmoscópio Portátil Utilizando Ensemble Baseado em Transformers. In: SIMPÓSIO BRASILEIRO DE COMPUTAÇÃO APLICADA À SAÚDE (SBCAS), 25. , 2025, Porto Alegre/RS.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2025
.
p. 449-460.
ISSN 2763-8952.
DOI: https://doi.org/10.5753/sbcas.2025.7270.