Detecção da Psoríase Utilizando Visão Computacional: Uma Abordagem Comparativa Entre CNNs e Vision Transformers

Natanael Lucena; Fabio S. da Silva; Ricardo Rios

doi:10.5753/semish.2025.7332

Natanael Lucena UEA
Fabio S. da Silva UEA
Ricardo Rios UEA

DOI: https://doi.org/10.5753/semish.2025.7332

Resumo

Esse artigo apresenta uma comparação de desempenho de Redes Neurais Convolucionais (CNNs) e Vision Transformers (ViTs) na tarefa de multiclassificação de imagens contendo lesões de psoríase e de enfermidades similares a essa doença. Modelos pré-treinados no ImageNet foram adaptados a um conjunto de dados específico. Ambos alcançaram métricas preditivas elevadas, mas os ViTs se destacaram por apresentarem desempenho superior com modelos menores. O Dual Attention Vision Transformer-Base (DaViT-B) obteve os melhores resultados, com um f1-score de 96,4%, e é recomendado como a arquitetura mais eficiente para detecção automatizada de psoríase. Esse artigo reforça o potencial dos ViTs para tarefas de classificação de imagens médicas.

Referências

Alexey, D. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv: 2010.11929.

Atlas Dermatologico (2024). Dermatology atlas. [link]. Acesso em: 19/04/2024.

Bin Ji, Yiyi Wang, D. Z. (2022). Automatic detection and evaluation of nail psoriasis based on deep learning: a preliminary application and exploration. SPIE International Conference on Computer Application and Information Security.

Brasil, A. (2020). Estudo mostra que mais de 90% da população desconhecem a psoríase. [link]. Acesso em: 29/04/2024.

Chollet, F. (2017). Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1251–1258.

Danderm (2017). Atlas of dermatology. [link]. Acesso em: 17/04/2024.

Dash, M. et al. (2020). A cascaded deep convolution neural network based cadx system for psoriasis lesion segmentation and severity assessment. Applied Soft Computing.

de Dermatologia, S. B. (2023). O que é a psoríase? [link]. Acesso em: 29/04/2024.

Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee.

Dermatoweb (2002). Web docente de dermatologia. [link]. Acesso em: 22/04/2024.

DermIS (2022). Dermatology information system. [link]. Acesso em: 26/04/2024.

DermNetNZ (2024). The worlds leading free dermatology website. [link]. Acesso em: 29/04/2024.

Ding, M., Xiao, B., Codella, N., Luo, P., Wang, J., and Yuan, L. (2022). Davit: Dual attention vision transformers. In European conference on computer vision, pages 74–92. Springer.

Dozat, T. (2016). Incorporating nesterov momentum into adam. In Proceedings of the 4th International Conference on Learning Representations, Workshop Track, pages 1–4.

Dubey, A., Jauhri, A., Pandey, A., Kadian, A., Al-Dahle, A., Letman, A., Mathur, A., Schelten, A., Yang, A., Fan, A., et al. (2024). The llama 3 herd of models. arXiv preprint arXiv:2407.21783.

Géron, A. (2019). Mãos À Obra: Aprendizado de Máquina com Scikit-Learn E TensorFlow. Alta Books.

He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778.

Hellenic Dermatological Atlas (2011). For health professionals and public. [link]. Acesso em: 16/04/2024.

Huang, G., Liu, Z., Van Der Maaten, L., and Weinberger, K. Q. (2017). Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4700–4708.

Hurst, A., Lerer, A., Goucher, A. P., Perelman, A., Ramesh, A., Clark, A., Ostrow, A., Welihinda, A., Hayes, A., Radford, A., et al. (2024). Gpt-4o system card. arXiv preprint arXiv:2410.21276.

Huynh, N. Q., Xu, X., Kong, A. W. K., and Subbiah, S. (2014). A preliminary report on a full-body imaging system for effectively collecting and processing biometric traits of prisoners. In 2014 IEEE Symposium on Computational Intelligence in Biometrics and Identity Management (CIBIM).

Kingma, D. and Ba, J. (2014). Adam: A method for stochastic optimization. International Conference on Learning Representations.

Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. In Proceedings of the 14th International Joint Conference on Artificial Intelligence - Volume 2. Morgan Kaufmann Publishers Inc.

LeCun, Y., Denker, J., and Solla, S. (1989). Optimal brain damage. Advances in neural information processing systems, 2.

Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., and Guo, B. (2021). Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision, pages 10012–10022.

Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., and Xie, S. (2022). A convnet for the 2020s. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11976–11986.

Luque, A., Carrasco, A., Martín, A., and de Las Heras, A. (2019). The impact of class imbalance in classification performance metrics based on the binary confusion matrix. Pattern Recognition, 91:216–231.

Maduranga, P. and Nandasena, D. (2022). Mobile-based skin disease diagnosis system using convolutional neural networks (cnn). International Journal of Image, Graphics and Signal Processing, 14:47–57.

Marcel, S. and Rodriguez, Y. (2010). Torchvision the machine-vision package of torch. [link].

Meng, X. (2013). Scalable simple random sampling and stratified sampling. In Dasgupta, S. and McAllester, D., editors, Proceedings of the 30th International Conference on Machine Learning, volume 28 of Proceedings of Machine Learning Research, pages 531–539, Atlanta, Georgia, USA. PMLR.

Milani, A. et al. (2023). A deep learning application for psoriasis detection. Anais do Encontro Nacional de Inteligência Artificial e Computacional (ENIAC), pages 315–329.

Mohan, J., Sivasubramanian, A., Sowmya, V., and Vinayakumar, R. (2024). Enhancing skin disease classification leveraging transformer-based deep learning architectures and explainable ai. arXiv preprint arXiv:2407.14757.

Olescki, G. (2021). Detecção de tromboembolia pulmonar utilizando redes neurais convolucionais e extração de características. Anais do XXI Simpósio Brasileiro de Computação Aplicada à Saúde, pages 381–391.

Oquab, M., Darcet, T., Moutakanni, T., Vo, H., Szafraniec, M., Khalidov, V., Fernandez, P., Haziza, D., Massa, F., El-Nouby, A., et al. (2023). Dinov2: Learning robust visual features without supervision. arXiv preprint arXiv:2304.07193.

Rodrigues, A. P. and Teixeira, R. M. (2009). Desvendando a psoríase. RBAC.

Roslan, R., Razly, I., Sabri, B., and Ibrahim, Z. (2020). Evaluation of psoriasis skin disease classification using convolutional neural network. IAES International Journal of Artificial Intelligence (IJ-AI), 9:349.

Silva, G. et al. (2022). Cardiac arrhythmia detection in ecg signals using graph convolutional network. Anais do XXII Simpósio Brasileiro de Computação Aplicada à Saúde, pages 25–35.

Simonyan, K. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.

Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., and Wojna, Z. (2016). Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2818–2826.

Tan, M. and Le, Q. (2021). Efficientnetv2: Smaller models and faster training. In International conference on machine learning, pages 10096–10106. PMLR.

Team, G., Georgiev, P., Lei, V. I., Burnell, R., Bai, L., Gulati, A., Tanzer, G., Vincent, D., Pan, Z., Wang, S., et al. (2024). Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context. arXiv preprint arXiv:2403.05530.

Tu, Z., Talebi, H., Zhang, H., Yang, F., Milanfar, P., Bovik, A., and Li, Y. (2022). Maxvit: Multi-axis vision transformer. In European conference on computer vision, pages 459–479. Springer.

Wei, M., Wu, Q., Ji, H., Wang, J., Lyu, T., Liu, J., and Zhao, L. (2023). A skin disease classification model based on densenet and convnext fusion. Electronics, 12(2):438.

Wightman, R. (2019). Pytorch image models. [link].

Zhao, S., Xie, B., Li, Y., Zhao, X.-y., Kuang, Y., Su, J., He, X.-y., Wu, X., Fan, W., Huang, K., et al. (2020). Smart identification of psoriasis by images using convolutional neural networks: a case study in china. Journal of the European Academy of Dermatology and Venereology, 34(3):518–524.