Tutor Inteligente Multimodal para Feedback Formativo em Radiografias de Tórax usando Modelos Visão-Linguagem

Ivan Ferreira Martins; Mathias Cesar Assis; Nádia Félix Felipe da Silva; Sergio Teixeira de Carvalho; Luciana de Oliveira Berretta

doi:10.5753/sbcas.2026.21506

Ivan Ferreira Martins UFG
Mathias Cesar Assis UFG
Nádia Félix Felipe da Silva UFG
Sergio Teixeira de Carvalho UFG
Luciana de Oliveira Berretta UFG

DOI: https://doi.org/10.5753/sbcas.2026.21506

Resumo

A formação em radiologia exige o desenvolvimento de habilidades perceptivas para localizar alterações e de habilidades cognitivas para interpretar imagens médicas, tradicionalmente adquiridas sob supervisão direta de especialistas. Este trabalho apresenta um tutor inteligente multimodal que fornece feedback formativo automatizado na interpretação de radiografias de tórax. O sistema integra o modelo de visão-linguagem Qwen2-VL-7B-Instruct, métricas espaciais baseadas em Intersection over Union (IoU) e técnicas de processamento de linguagem natural para a análise semântica dos achados radiológicos. A avaliação foi realizada por meio de interações simuladas com imagens do conjunto de dados VinBigData Chest X-ray, nas quais perturbações controladas nos bounding boxes reproduzem padrões comuns de erro diagnóstico. Os resultados indicam que o sistema consegue distinguir erros de localização e divergências semânticas, demonstrando a viabilidade computacional da arquitetura proposta como ferramenta de apoio ao treinamento em radiologia.

Referências

Bai, J. e. a. (2023). Qwen technical report. arXiv preprint arXiv:2309.16609.

Castellani, A. M. e. a. (2024). Uso de inteligência artificial em sistemas de tutores inteligentes. Revista de Ensino, Educação e Ciências Humanas, 24(4):507–512.

Freitas, L. G. C. e. a. (2017). Design science research methodology enquanto estratégia metodológica para a pesquisa tecnológica. Revista Espaços, 38(6):7–20.

Garcia, B. T., Westerfield, L., Yelemali, P., Gogate, N., Rivera-Munoz, E. A., Du, H., Dawood, M., Jolly, A., Lupski, J. R., and Posey, J. E. (2024). Improving automated deep phenotyping through large language models using retrieval augmented generation. Repository: Genetic and Genomic Medicine.

Hartuique, H. C. O. C. e. a. (2025). A influência do feedback formativo no desenvolvimento da autorregulação da aprendizagem na formação médica. Saúde Coletiva (Barueri), 15(94):15399–15424.

Hasani, A. M., Singh, S., Zahergivar, A., Ryan, B., Nethala, D., Bravomontenegro, G., Mendhiratta, N., Ball, M., Farhadi, F., and Malayeri, A. (2024). Evaluating the performance of generative pre-trained transformer-4 (GPT-4) in standardizing radiology reports. 34(6):3566–3574.

Hevner, A. R., March, S. T., Park, J., and Ram, S. (2004). Design science in information systems research1. Management Information Systems Quarterly, 28(1):75–106.

Hong, W. e. a. (2024). Cogvlm2: Visual language models for image and video understanding. arXiv preprint arXiv:2408.16500.

Jiang, B. e. a. (2018). Acquisition of localization confidence for accurate object detection. In Computer Vision – ECCV 2018, pages 816–832, Cham. Springer.

Kroop, S. (2025). Artifact validity in design science research (dsr): A comparative analysis of three influential frameworks. In Design Science Research. Springer-Verlag, Berlin.

Li, M. and Wilson, J. (2025). Ai-integrated scaffolding to enhance agency and creativity in education: A systematic review. Information, 16(7):519.

Lin, C.-C., Huang, A. Y. Q., and Lu, O. H. T. (2023). Artificial intelligence in intelligent tutoring systems toward sustainable education: a systematic review. 10(1):41.

McKee, J. (2024). Workforce trends in radiologic technology. American Society of Radiologic Technologists (ASRT).

Meşe, e. a. (2024). Educating the next generation of radiologists: A comparative report of chatgpt and e-learning resources. Diagnostic and Interventional Radiology, 30(3):163–174.

Nawaz, U. e. a. (2024). Classification of thoracic abnormalities from chest x-ray images with deep learning. International Journal of Advanced Computer Science and Applications, 15(4).

Nguyen, H. Q. e. a. (2020). Vindr-cxr: A large-scale benchmark dataset for computer-aided diagnosis in chest radiography. Scientific Data.

Pellegrino, J. W., Chudowsky, N., and Glaser, R. (2001). Knowing What Students Know: The Science and Design of Educational Assessment. National Academy Press, Washington, DC.

Pimentel, M., Filippo, D., and Santoro, F. M. (2020). Design science research: fazendo pesquisas científicas rigorosas atreladas ao desenvolvimento de artefatos computacionais projetados para a educação. Informática na Educação: Teoria & Prática.

Reimers, N. and Gurevych, I. (2019). Sentence-bert: Sentence embeddings using siamese bert-networks. In EMNLP.

Reimers, N. e. a. (2020). Sentence-transformers: Multilingual sentence embeddings using bert. arXiv preprint.

Silva, C. S. e. a. (2023). Sistemas tutores inteligentes na aprendizagem por competências: Uma revisão sistemática da literatura. In SBIE, Porto Alegre. SBC.

Sonkar, S. e. a. (2023). Class: A design framework for building intelligent tutoring systems based on learning science principles. In Findings of ACL: EMNLP 2023, Singapore.

Twidale, M. B. (2005). Over the shoulder learning: Supporting brief informal learning. Computer Supported Cooperative Work (CSCW), 14(6):505–547.

Wang, S. e. a. (2024a). Interactive computer-aided diagnosis on medical image using large language models. Communications Engineering, 3:133.

Wang, W. e. a. (2024b). Cogvlm: Visual expert for pretrained language models. arXiv preprint arXiv:2311.03079.

Weitekamp, D., Harpstead, E., and Koedinger, K. R. (2020). An interaction design for machine teaching to develop ai tutors. In Proceedings of the CHI Conference. ACM.

xhlulu (2021). Vinbigdata: Process and resize to png (1024x1024). Kaggle. Acesso em: 10 out. 2025.

Xing, Q., Song, Z., Zhang, Y., Feng, N., Yu, J., and Yang, W. (2025). Mca-rg: Enhancing llms with medical concept alignment for radiology report generation.

Tutor Inteligente Multimodal para Feedback Formativo em Radiografias de Tórax usando Modelos Visão-Linguagem

Resumo

Referências

Artigos mais lidos do(s) mesmo(s) autor(es)