Generating X-ray Reports Using Global Attention

  • Felipe André Zeiser UNISINOS
  • Cristiano André da Costa UNISINOS
  • Gabriel de Oliveira Ramos UNISINOS
  • Henrique C. Bohn UNISINOS
  • Ismael Santos UNISINOS
  • Bruna Donida Grupo Hospitalar Conceição
  • Ana Paula de Oliveira Brun Grupo Hospitalar Conceição
  • Nathália Zarichta Grupo Hospitalar Conceição


The use of images for the diagnosis, treatment, and decision-making in health is frequent. A large part of the radiologist’s work is the interpretation and production of potentially diagnostic reports. However, they are professionals with high workloads doing tasks operator dependent, that is being subject to errors in case of non-ideal conditions. With the COVID-19 pandemic, healthcare systems were overwhelmed, extending to the X-ray analysis process. In this way, the automatic generation of reports can help to reduce the workload of radiologists and define the diagnosis and treatment of patients with suspected COVID-19. In this article, we propose to generate suggestions for chest radiography reports evaluating two architectures based on: (i) Long short-term memory (LSTM), and (ii) LSTM with global attention. The extraction of the most representative features from the X-ray images is performed by an encoder based on a pre-trained DenseNet121 network for the ChestX-ray14 dataset. Experimental results in a private set of 6650 images and reports indicate that the LSTM model with global attention yields the best result, with a BLEU-1 of 0.693, BLEU-2 of 0.496, BLEU-3 of 0.400, and BLEU-4 of 0.345. The quantitative and qualitative results demonstrate that our method can effectively suggest high-quality radiological findings and demonstrate the possibility of using our methodology as a tool to assist radiologists in chest X-ray analysis.


Aafaq, N., Mian, A., Liu, W., Gilani, S. Z., and Shah, M. (2019). Video description: A survey of methods, datasets, and evaluation metrics. ACM Computing Surveys (CSUR), 52(6):1-37.

Chen, Z., Song, Y., Chang, T.-H., and Wan, X. (2020). Generating radiology reports via memory-driven transformer. arXiv preprint arXiv:2010.16056.

Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., and Bengio, Y. (2014). Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078.

De Fauw, J., Ledsam, J. R., Romera-Paredes, B., Nikolov, S., Tomasev, N., Blackwell, S., Askham, H., Glorot, X., O'Donoghue, B., Visentin, D., et al. (2018). Clinically applicable deep learning for diagnosis and referral in retinal disease. Nature medicine, 24(9):1342-1350.

Granata, V., Pradella, S., Cozzi, D., Fusco, R., Faggioni, L., Coppola, F., Grassi, R., Maggialetti, N., Buccicardi, D., Lacasella, G. V., et al. (2021). Computed tomography structured reporting in the staging of lymphoma: A delphi consensus proposal. Journal of clinical medicine, 10(17):4007.

Huang, C., Wang, Y., Li, X., Ren, L., Zhao, J., Hu, Y., Zhang, L., Fan, G., Xu, J., Gu, X., et al. (2020). Clinical features of patients infected with 2019 novel coronavirus in wuhan, china. The lancet, 395(10223):497-506.

Jing, B., Wang, Z., and Xing, E. (2020). Show, describe and conclude: On exploiting the structure information of chest x-ray reports. arXiv preprint arXiv:2004.12274.

Jing, B., Xie, P., and Xing, E. (2017). On the automatic generation of medical imaging reports. arXiv preprint arXiv:1711.08195.

Lakhani, P. and Sundaram, B. (2017). Deep learning at chest radiography: Automated classification of pulmonary tuberculosis by using convolutional neural networks. Radiology, 284(2):574-582.

Patel, A., Jernigan, D. B., et al. (2020). Initial public health response and interim clinical guidance for the 2019 novel coronavirus outbreak-united states, december 31, 2019- february 4, 2020. Morbidity and mortality weekly report, 69(5):140.

Pooch, E. H. P., Alva, T. A. P., and Becker, C. D. L. (2020). A deep learning approach for pulmonary lesion identification in chest radiographs. In Brazilian Conference on Intelligent Systems, pages 197-211. Springer.

Rennie, S. J., Marcheret, E., Mroueh, Y., Ross, J., and Goel, V. (2017). Self-critical sequence training for image captioning. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7008-7024.

Shin, H.-C., Roberts, K., Lu, L., Demner-Fushman, D., Yao, J., and Summers, R. M. (2016). Learning to read chest x-rays: Recurrent neural cascade model for automated image annotation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2497-2506.

Wang, X., Peng, Y., Lu, L., Lu, Z., Bagheri, M., and Summers, R. M. (2017). Chestxray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised clas sification and localization of common thorax diseases. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2097-2106.

Wang, X., Peng, Y., Lu, L., Lu, Z., and Summers, R. M. (2018). Tienet: Text-image embedding network for common thorax disease classification and reporting in chest x-rays. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 9049-9058.

Wong, H. Y. F. et al. (2020). Frequency and distribution of chest radiographic findings in patients positive for covid-19. Radiology, 296(2):E72-E78.

Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhudinov, R., Zemel, R., and Bengio, Y. (2015). Show, attend and tell: Neural image caption generation with visual attention. In International conference on machine learning, pages 2048-2057. PMLR.

You, Q., Jin, H., Wang, Z., Fang, C., and Luo, J. (2016). Image captioning with semantic attention. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4651-4659.

Zeiser, F. A., Donida, B., da Costa, C. A., de Oliveira Ramos, G., Scherer, J. N., Barcellos, N. T., Alegretti, A. P., Ikeda, M. L. R., Müller, A. P. W. C., Bohn, H. C., et al. (2022). First and second covid-19 waves in brazil: A cross-sectional study of patients' characteristics related to hospitalization and in-hospital mortality. The Lancet Regional Health-Americas, 6:100107.

Zhang, Z., Xie, Y., Xing, F., McGough, M., and Yang, L. (2017). Mdnet: A semantically and visually interpretable medical image diagnosis network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 6428-6436.

Zuiderveld, K. (1994). Graphics gems iv. In Heckbert, P. S., editor, Graphics Gems, chapter Contrast Limited Adaptive Histogram Equalization, pages 474-485. Academic Press Professional, Inc., San Diego, CA, USA.
ZEISER, Felipe André; COSTA, Cristiano André da; RAMOS, Gabriel de Oliveira; BOHN, Henrique C.; SANTOS, Ismael; DONIDA, Bruna; BRUN, Ana Paula de Oliveira; ZARICHTA, Nathália. Generating X-ray Reports Using Global Attention. In: ENCONTRO NACIONAL DE INTELIGÊNCIA ARTIFICIAL E COMPUTACIONAL (ENIAC), 19. , 2022, Campinas/SP. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2022 . p. 809-818. ISSN 2763-9061. DOI:


1 2 > >>