The more the merrier: the use of verbose metadata description in the multimodal classification of skin lesions
Resumo
Sistemas CAD para câncer de pele frequentemente utilizam metadados estruturados com baixa profundidade semântica. Este trabalho investiga a substituição de codificações numéricas por descrições textuais detalhadas, semelhantes a anamneses, geradas por Modelos de Linguagem de Grande Porte (LLMs) como transcritores semânticos para converter metadados estruturados em descrições textuais verbosas, semelhantes a anamneses, aplicadas à classificação multimodal de lesões cutâneas. Os atributos dos datasets PAD-UFES-20 e ISIC-2019 foram convertidos em linguagem natural, codificados com SBERT e combinados a CNNs por meio da arquitetura MetaBlock-SE. Validações cruzadas (5-fold, patient-wise) mostram que modelos baseados em LLMs alcançam Acurácia e AUC competitivas ou estatisticamente superiores ao baseline. Os resultados indicam que descrições verbosas de metadados constituem uma alternativa flexível e semanticamente rica para classificação multimodal de lesões cutâneas, especialmente em cenários com metadados limitados ou semanticamente pouco informativos.
Referências
Bouzon, P. H. G. et al. (2025). Metablock-se: A method to deal with missing metadata in multimodal skin cancer classification. IEEE Journal of Biomedical and Health Informatics, 29(12):8855–8862.
Feng, H. et al. (2018). Comparison of dermatologist density between urban and rural counties in the United States. JAMA Dermatology, 154(11):1265–1271.
Gardent, C. et al. (2017). The WebNLG challenge: Generating text from RDF data. In Alonso, J. M., Bugarín, A., and Reiter, E., editors, Proceedings of the 10th International Conference on Natural Language Generation, pages 124–133, Santiago de Compostela, Spain. Association for Computational Linguistics.
Granzow, K. (2009). Pomeranos sob o Cruzeiro do Sul: colonos alemães no Brasil. Arquivo Público do Estado do Rio de Janeiro.
He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778.
INCA (2026). Incidência do câncer no Brasil. Instituto Nacional do Câncer (INCA). Disponível em: [link]. Último acesso em: 11 de Fevereiro 2026.
ISIC2019 (2019). Skin lesion analysis towards melanoma detection. Skin Image Collaboration. Disponível em: [link]. Último acesso em: 29 de Maio 2024.
Li, Y., Cai, H., Wang, W., Qu, L., Wei, Y., Li, W., Nie, L., and Chua, T.-S. (2025). Revolutionizing text-to-image retrieval as autoregressive token-to-voken generation. In Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’25, page 813–822, New York, NY, USA. Association for Computing Machinery.
Liu, A. et al. (2025). Deepseek-v3 technical report.
Liu, P. et al. (2023). Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Comput. Surv., 55(9).
Mesnard, T. et al. (2024). Gemma: Open models based on gemini research and technology.
OMS (2017). Radiation: Ultraviolet (UV) radiation and skin cancer. World Health Organization (WHO). Disponível em: [link]. Último acesso em: 05 de Junho 2023.
Pacheco, A. G. et al. (2020). Pad-ufes-20: A skin lesion dataset composed of patient data and clinical images collected from smartphones. Data in Brief, 32:106221.
Pacheco, A. G. and Krohling, R. A. (2020). The impact of patient clinical information on automated skin cancer detection. Computers in Biology and Medicine, 116:103545.
Pacheco, A. G. C. and Krohling, R. (2021). An attention-based mechanism to combine images and metadata in deep learning models applied to skin cancer classification. IEEE journal of biomedical and health informatics. In press.
Qasim Gilani, S., Syed, T., Umair, M., and Marques, O. (2023). Skin cancer classification using deep spiking neural network. Journal of Digital Imaging, 36(3):1137–1147.
Radford, A. et al. (2021). Learning transferable visual models from natural language supervision. arXiv preprint arXiv:2103.00020.
Reimers, N. and Gurevych, I. (2019). Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084.
Sandler, M. et al. (2018). Mobilenetv2: Inverted residuals and linear bottlenecks. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4510–4520.
Sinz, C. et al. (2017). Accuracy of dermatoscopy for the diagnosis of nonpigmented cancers of the skin. Journal of the American Academy of Dermatology, 77(6):1100–1109.
Souza, L. A. et al. (2024). Liwterm: A lightweight transformer-based model for dermatological multimodal lesion detection. In 2024 37th SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI), pages 1–6. IEEE.
Vaswani, A. et al. (2017). Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, page 6000–6010, Red Hook, NY, USA. Curran Associates Inc.
Wang, S., Guo, W., Chen, Z., Xu, Y., Hu, X., and Xiong, H. (2025). Less is more: Token-efficient video-qa via adaptive frame-pruning and semantic graph integration.
Wilcoxon, F. (1945). Individual comparisons by ranking methods. Biometrics, 1:196–202.
Wiseman, S., Shieber, S., and Rush, A. (2017). Challenges in data-to-document generation. In Palmer, M., Hwa, R., and Riedel, S., editors, Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 2253–2263, Copenhagen, Denmark. Association for Computational Linguistics.
Xing, X. and Wan, X. (2021). Structure-aware pre-training for table-to-text generation. In Findings of the association for computational linguistics: ACL-IJCNLP 2021, pages 2273–2278.
Yang, A. et al. (2025). Qwen3 technical report.
Yu, W. et al. (2024). Metaformer baselines for vision. IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(2):896–912.
Yélamos, O. et al. (2019). Usefulness of dermoscopy to improve the clinical and histopathologic diagnosis of skin cancers. Journal of the American Academy of Dermatology, 80(2):365–377.
