LLM-Driven Chest X-Ray Report Generation With a Modular, Reduced-Size Architecture

  • Talles Viana Vargas UNICAMP
  • Helio Pedrini UNICAMP
  • André Santanchè UNICAMP

Resumo


Large Language Models (LLMs) have been widely employed in various text processing tasks. In computer vision, these models have found application in generating captions and text from natural images, as well as in Visual Question Answering (VQA) systems. In the field of medical imaging, there are studies based on text generation proposing automated diagnoses of X-rays, magnetic resonance imaging scans, computed tomography scans, and other modalities. Few initiatives seek to apply and harness the potential of LLMs in medical text generation; they use models with tens of billions of parameters and are thus computationally expensive. This work addresses this gap by evaluating the use of frozen pre-trained models (CXAS U-Net and BioGPT) for chest X-ray report generation. We adapt the BLIP-2 modular architecture where only a cross-modal alignment module must be trained in order to generate text from images. We were able to achieve competitive scores over Clinical Efficacy (CE) metrics compared to some state-of-the-art (SOTA) methods, while obtaining lower scores for Natural Language Generation (NLG) metrics. Our findings suggest that NLG metrics may not serve as suitable proxies for evaluating models in the chest X-ray generation task.
Publicado
17/11/2024
VARGAS, Talles Viana; PEDRINI, Helio; SANTANCHÈ, André. LLM-Driven Chest X-Ray Report Generation With a Modular, Reduced-Size Architecture. In: BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS), 13. , 2024, Belém/PA. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2024 . p. 199-211. ISSN 2643-6264.