A Comparative Analysis of CNN Models and Vision Transformers for Skin Lesion Classification
Resumo
Skin cancer is one of the most common malignancies in Brazil, making accurate diagnosis essential. This study evaluates five convolutional neural networks (CNNs) and two Vision Transformers (ViTs) on the binary classification of dermoscopic images (nevus vs. melanoma) from four public datasets characterized by class imbalance. Using a unified fine-tuning protocol and hold-out splits, we observed that ViTs generally outperformed CNNs in F1-score and ROC-AUC. These results suggest that attention-based models may provide more balanced performance under class imbalance.
Referências
Araújo, R. L., de S. Luz, D., de Lima, B. V., Marques, J. V. M., de M. S. Veras, R., de C. Filho, A. O., Araújo, F. H. D., and e Silva, R. R. V. (2024). Quantifying the effects of segmentation in image classification for melanoma recognition. In Anais do XXI Encontro Nacional de Inteligência Artificial e Computacional (ENIAC 2024), pages 400–411. SBC.
Chowdhury, T. A., Wagner, E., Motzki, P., and Lehser, M. (2025). Enhanced transfer learning algorithm with zero-shot components for dermatological diagnosis using the ham10000 dataset. In Proceedings of SPIE Medical Imaging, volume 13292, page 132920G. SPIE.
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., and Houlsby, N. (2021). An image is worth 16x16 words: Transformers for image recognition at scale.
He, K., Chen, X., Xie, S., Li, Y., Dollár, P., and Girshick, R. (2022). Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16000–16009.
He, K., Zhang, X., Ren, S., and Sun, J. (2015). Deep residual learning for image recognition. CoRR, abs/1512.03385.
Islam, M. S. and Panta, S. (2024). Skin cancer images classification using transfer learning techniques. CoRR, abs/2406.12954.
Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2017). Imagenet classification with deep convolutional neural networks. Commun. ACM, 60(6):84–90.
Santos, M. d. O., Lima, F. C. d. S. d., Martins, L. F. L., Oliveira, J. F. P., Almeida, L. M. d., and Cancela, M. d. C. (2023). Estimativa de incidência de câncer no brasil, 2023-2025. Revista Brasileira de Cancerologia, 69:e–213700.
Simonyan, K. and Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition.
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., and Wojna, Z. (2015). Rethinking the inception architecture for computer vision. CoRR, abs/1512.00567.
Tan, M. and Le, Q. V. (2019). Efficientnet: Rethinking model scaling for convolutional neural networks. CoRR, abs/1905.11946.
Chowdhury, T. A., Wagner, E., Motzki, P., and Lehser, M. (2025). Enhanced transfer learning algorithm with zero-shot components for dermatological diagnosis using the ham10000 dataset. In Proceedings of SPIE Medical Imaging, volume 13292, page 132920G. SPIE.
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., and Houlsby, N. (2021). An image is worth 16x16 words: Transformers for image recognition at scale.
He, K., Chen, X., Xie, S., Li, Y., Dollár, P., and Girshick, R. (2022). Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16000–16009.
He, K., Zhang, X., Ren, S., and Sun, J. (2015). Deep residual learning for image recognition. CoRR, abs/1512.03385.
Islam, M. S. and Panta, S. (2024). Skin cancer images classification using transfer learning techniques. CoRR, abs/2406.12954.
Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2017). Imagenet classification with deep convolutional neural networks. Commun. ACM, 60(6):84–90.
Santos, M. d. O., Lima, F. C. d. S. d., Martins, L. F. L., Oliveira, J. F. P., Almeida, L. M. d., and Cancela, M. d. C. (2023). Estimativa de incidência de câncer no brasil, 2023-2025. Revista Brasileira de Cancerologia, 69:e–213700.
Simonyan, K. and Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition.
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., and Wojna, Z. (2015). Rethinking the inception architecture for computer vision. CoRR, abs/1512.00567.
Tan, M. and Le, Q. V. (2019). Efficientnet: Rethinking model scaling for convolutional neural networks. CoRR, abs/1905.11946.
Publicado
12/11/2025
Como Citar
AGUIRRE, Yasmin C.; OLIVEIRA, Ashiley Bianca S. de; BARROS, Rodrigo C.; KUPSSINSKÜ, Lucas S..
A Comparative Analysis of CNN Models and Vision Transformers for Skin Lesion Classification. In: ESCOLA REGIONAL DE APRENDIZADO DE MÁQUINA E INTELIGÊNCIA ARTIFICIAL DA REGIÃO SUL (ERAMIA-RS), 1. , 2025, Porto Alegre/RS.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2025
.
p. 248-251.
DOI: https://doi.org/10.5753/eramiars.2025.16623.