Método automático para geração de laudos médicos em imagens de retinografia utilizando Transformer

  • Eduardo F. P. Dutra UFMA
  • Victor H. B. de Lemos UFMA
  • João D. S. Almeida UFMA
  • Anselmo C. de Paiva UFMA

Resumo


Estima-se que o número de pessoas afetadas por doenças na retina aumentará significativamente nas próximas décadas. O diagnóstico tradicional dessas patologias envolve a análise visual das estruturas da retina, é um processo demorado e requer especialização. Assim, torna-se útil o uso de um sistema automático para suporte ao diagnóstico pelos especialistas. Neste trabalho é apresentado um método automático de geração de relatório médico, usando rede neural convolucional para extração de características da imagem, combinada a uma rede Transformer que sugere o relatório médico inicial. O método proposto apresenta aumento de 30% em BLEU comparado ao melhor método de Image Captioning na base DeepEyeNet, que tem 265 doenças de retina diferentes.

Referências

Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 248–255.

He, K., Zhang, X., Ren, S., and Sun, J. (2015). Deep residual learning for image recognition.

Hendrick AM, Gibson MV, K. A. (2015). Diabetic retinopathy. Prim Care.

Herdade, S., Kappeler, A., Boakye, K., and Soares, J. (2020). Image captioning: Transforming objects into words.

Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., and Adam, H. (2017). Mobilenets: Efficient convolutional neural networks for mobile vision applications.

Huang, G., Liu, Z., van der Maaten, L., and Weinberger, K. Q. (2018). Densely connected convolutional networks.

Huang, J.-H., Wu, T.-W., Yang, C.-H. H., Shi, Z., Lin, I.-H., Tegner, J., and Worring, M. (2022). Non-local attention improves description generation for retinal images. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 1606–1615.

Huang, J.-H., Wu, T.-W., Yang, C.-H. H., and Worring, M. (2021a). Deep context-encoding network for retinal image captioning. In 2021 IEEE International Conference on Image Processing (ICIP), pages 3762–3766.

Huang, J.-H., Yang, C.-H. H., Liu, F., Tian, M., Liu, Y.-C., Wu, T.-W., Lin, I., Wang, K., Morikawa, H., Chang, H., et al. (2021b). Deepopht: medical report generation for retinal images via deep models and visual explanation. In Proceedings of the IEEE/CVF winter conference on applications of computer vision, pages 2442–2452.

Kingma, D. P. and Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.

Lavie, A. and Denkowski, M. J. (2009). The meteor metric for automatic evaluation of machine translation. Machine Translation, 23(2–3):105–115.

Li, G., Zhu, L., Liu, P., and Yang, Y. (2019). Entangled transformer for image captioning. In Proceedings of the IEEE/CVF international conference on computer vision, pages 8928–8937.

Lin, C.-Y. (2004). ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out, pages 74–81, Barcelona, Spain. Association for Computational Linguistics.

Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., and Zitnick, C. L. (2014). Microsoft coco: Common objects in context. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pages 740–755. Springer.

Liu, W., Chen, S., Guo, L., Zhu, X., and Liu, J. (2021). Cptr: Full transformer network for image captioning.

Monasse, P. (2019). Extraction of the Level Lines of a Bilinear Image. Image Processing On Line, 9:205–219. DOI: 10.5201/ipol.2019.269.

Organization, W. H. et al. (2019). World report on vision.

Papineni, K., Roukos, S., Ward, T., and Zhu, W.-J. (2002a). Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, ACL ’02, page 311–318, USA. Association for Computational Linguistics.

Papineni, K., Roukos, S., Ward, T., and Zhu, W.-J. (2002b). Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, ACL ’02, page 311–318, USA. Association for Computational Linguistics.

Pavlopoulos, J., Kougia, V., Androutsopoulos, I., and Papamichail, D. (2022). Diagnostic captioning: a survey. Knowledge and Information Systems, 64(7):1691–1722.

Shaik, N. S. and Cherukuri, T. K. (2024). Gated contextual transformer network for multi-modal retinal image clinical description generation. Image and Vision Computing, page 104946.

Shin, H.-C., Roberts, K., Lu, L., Demner-Fushman, D., Yao, J., and Summers, R. M. (2016). Learning to read chest x-rays: Recurrent neural cascade model for automated image annotation.

Simonyan, K. and Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition.

Steinmetz, J. D., Bourne, R. R., Briant, P. S., Flaxman, S. R., Taylor, H. R., Jonas, J. B., Abdoli, A. A., Abrha, W. A., Abualhasan, A., Abu-Gharbieh, E. G., et al. (2021). Causes of blindness and vision impairment in 2020 and trends over 30 years, and prevalence of avoidable blindness in relation to vision 2020: the right to sight: an analysis for the global burden of disease study. The Lancet Global Health, 9(2):e144–e160.

Tan, M. and Le, Q. (2019). EfficientNet: Rethinking model scaling for convolutional neural networks. In Chaudhuri, K. and Salakhutdinov, R., editors, Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 6105–6114. PMLR.

Tan, M. and Le, Q. V. (2021). Efficientnetv2: Smaller models and faster training.

Umbelino, C. C. and Ávila, M. P. (2023). As condições de saúde ocular no brasil. São Paulo: Conselho Brasileiro de Oftalmologia.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., and Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30.

Vinyals, O., Toshev, A., Bengio, S., and Erhan, D. (2015). Show and tell: A neural image caption generator. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3156–3164.

Xie, S., Girshick, R., Dollár, P., Tu, Z., and He, K. (2017). Aggregated residual transformations for deep neural networks.

Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhutdinov, R., Zemel, R., and Bengio, Y. (2016). Show, attend and tell: Neural image caption generation with visual attention.

Zhang, Z., Xie, Y., Xing, F., McGough, M., and Yang, L. (2017). Mdnet: A semantically and visually interpretable medical image diagnosis network.
Publicado
25/06/2024
DUTRA, Eduardo F. P.; LEMOS, Victor H. B. de; ALMEIDA, João D. S.; PAIVA, Anselmo C. de. Método automático para geração de laudos médicos em imagens de retinografia utilizando Transformer. In: SIMPÓSIO BRASILEIRO DE COMPUTAÇÃO APLICADA À SAÚDE (SBCAS), 24. , 2024, Goiânia/GO. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2024 . p. 507-518. ISSN 2763-8952. DOI: https://doi.org/10.5753/sbcas.2024.2757.

Artigos mais lidos do(s) mesmo(s) autor(es)