Vim-Med: a Vision Mamba-based Model for Pathology Classification in X-Ray Images
Resumo
There is a need to improve medical diagnostics in identifying rare diseases and analyzing unbalanced image data. This work presents Vim-Med, an adaptation of the Vision Mamba (Vim) architecture for pathology classification in X-ray images. To evaluate the model, a comparison was made with other Mamba models and Transformer architectures. The results show that in the Chest X-Ray dataset, Vim-Med achieved the best F1-score with 0.888. In the NIH CRX8 dataset, Vim-Med excelled at handling rare classes (Macro-F1 of 0.192). Vim-Med achieved the highest inference speed, corresponding to 125 FPS, and achieved a reduction of more than 50% in training time. Thus, the Vim-Med model is efficient in classifying pathologies in X-ray images.Referências
Dosovitskiy, A. and et al. (2021). An image is worth 16x16 words: Transformers for image recognition at scale. In ICLR.
Goodfellow, I., Bengio, Y., and Courville, A. (2016). Deep Learning. MIT Press.
Gu, A. and Dao, T. (2024). Mamba: Linear-time sequence modeling with selective state spaces. In First Conference on Language Modeling.
Liu et al. (2021). Swin transformer: Hierarchical vision transformer using shifted windows. Proceedings of the IEEE/CVF ICCV.
Vaswani et al. (2017). Attention is all you need. Advances in NeurIPS, 30.
Wang, Z. et al. (2024). Mamba-unet: Unet-like pure visual mamba for medical image segmentation.
Zhu et al. (2024). Vision mamba: Efficient visual representation learning with bidirectional state space model. In Proceedings of the 41st ICML.
Goodfellow, I., Bengio, Y., and Courville, A. (2016). Deep Learning. MIT Press.
Gu, A. and Dao, T. (2024). Mamba: Linear-time sequence modeling with selective state spaces. In First Conference on Language Modeling.
Liu et al. (2021). Swin transformer: Hierarchical vision transformer using shifted windows. Proceedings of the IEEE/CVF ICCV.
Vaswani et al. (2017). Attention is all you need. Advances in NeurIPS, 30.
Wang, Z. et al. (2024). Mamba-unet: Unet-like pure visual mamba for medical image segmentation.
Zhu et al. (2024). Vision mamba: Efficient visual representation learning with bidirectional state space model. In Proceedings of the 41st ICML.
Publicado
12/11/2025
Como Citar
PITTHAN, Gregory J.; CORDOVA, Lucas B. V.; SCHEIN, Tatiana T.; SILVA, Eduardo L.; DUTRA, Gustavo A.; ALMEIDA, Gustavo P.; BRIÃO, Stephanie L.; DREWS-JR, Paulo L. J..
Vim-Med: a Vision Mamba-based Model for Pathology Classification in X-Ray Images. In: ESCOLA REGIONAL DE APRENDIZADO DE MÁQUINA E INTELIGÊNCIA ARTIFICIAL DA REGIÃO SUL (ERAMIA-RS), 1. , 2025, Porto Alegre/RS.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2025
.
p. 396-399.
DOI: https://doi.org/10.5753/eramiars.2025.16757.