Vision Transformers with Dynamic Patches for Histological Slide Analysis

  • Vinícius Henrique Giovanini PUC Minas
  • Alexei Manso Correa Machado PUC Minas / UFMG

Abstract


This work presents an approach to improve Vision Transformers (ViT) by implementing dynamic patch input. Experiments were conducted with different types of models, performing fine-tuning and exploring multiple strategies to identify the most relevant issues related to patch extraction. The proposed approach was compared to the traditional ViT model in a study using the Cell Recognition and Inspection Center (CRIC) dataset, composed of Pap smear images. The results demonstrated that the fine-tuning of the ViT-Small model with Grid patch extraction achieved an accuracy of 0.81 while the best dynamic approach obtained 0.78 of accuracy, due to the excessive overlapping of patches.

References

Barcelos, M. R. B., Lima, R. d. C. D., Tomasi, E., Nunes, B. P., Duro, S. M. S., and Facchini, L. A. (2017). Quality of cervical cancer screening in brazil: external assessment of the pmaq. REV SAUDE PUBL, 51:67.

Buslaev, A., Iglovikov, V. I., Khvedchenya, E., Parinov, A., Druzhinin, M., and Kalinin, A. A. (2020). Albumentations: Fast and flexible image augmentations. Information, 11(2).

Chen, X., Hsieh, C.-J., and Gong, B. (2022). When vision transformers outperform resnets without pretraining or strong data augmentations. ArXiv, abs/2106.01548.

Dai, Y., Gao, Y., and Liu, F. (2021). Transmed: Transformers advance multi-modal medical image classification. Diagnostics, 11(8).

Dodge, S. and Karam, L. (2016). Understanding how image quality affects deep neural networks. In 2016 Eighth International Conference on Quality of Multimedia Experience (QoMEX), pages 1–6.

Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., and Houlsby, N. (2021). An image is worth 16x16 words: Transformers for image recognition at scale. ArXiv, abs/2010.11929.

Fang, M., Lei, X., Liao, B., and Wu, F.-X. (2022). A deep neural network for cervical cell classification based on cytology images. IEEE Access, 10:130968–130980.

Kalbhor, M., Shinde, S., Wajire, P., and Jude, H. (2023). Cervicell-detector: An object detection approach for identifying the cancerous cells in pap smear images of cervical cancer. Heliyon, 9(11):e22324.

Kotyan, S. and Vargas, D. V. (2024). Improving robustness for vision transformer with a simple dynamic scanning augmentation. Neurocomputing, 565:127000.

McDanel, B. and Ngoc, C. P. (2023). Dynamic patch sampling for efficient training and dynamic inference in vision transformers. In International Conference on Machine Learning and Applications, pages 83–9.

N. Diniz, D., T. Rezende, M., G. C. Bianchi, A., M. Carneiro, C., J. S. Luz, E., J. P. Moreira, G., M. Ushizima, D., N. S. de Medeiros, F., and J. F. Souza, M. (2021). A deep learning ensemble method to assist cytopathologists in pap test image classification. J IMAGING SCI, 7(7).

Rezende, M. T., Silva, R., Bernardo, F. d. O., Tobias, A. H. G., Oliveira, P. H. C., Machado, T. M., Costa, C. S., Medeiros, F. N. S., Ushizima, D. M., Carneiro, C. M., and Bianchi, A. G. C. (2021). Cric searchable image database as a public platform for conventional pap smear cytology data. Scientific Data, 8(1):151.

Steiner, A. P., Kolesnikov, A., Zhai, X., Wightman, R., Uszkoreit, J., and Beyer, L. (2022). How to train your vit? data, augmentation, and regularization in vision transformers. Trans. Mach. Learn. Res., 2022.

Wu, B., Xu, C., Dai, X., Wan, A., Zhang, P., Tomizuka, M., Keutzer, K., and Vajda, P. (2020). Visual transformers: Token-based image representation and processing for computer vision. ArXiv, abs/2006.03677.
Published
2025-06-09
GIOVANINI, Vinícius Henrique; MACHADO, Alexei Manso Correa. Vision Transformers with Dynamic Patches for Histological Slide Analysis. In: BRAZILIAN SYMPOSIUM ON COMPUTING APPLIED TO HEALTH (SBCAS), 25. , 2025, Porto Alegre/RS. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2025 . p. 188-199. ISSN 2763-8952. DOI: https://doi.org/10.5753/sbcas.2025.6972.

Most read articles by the same author(s)