Vision Transformers com Patches Dinâmicos para a Análise de Lâminas Histológicas
Resumo
Este trabalho apresenta uma abordagem para aprimorar os Vision Transformers (ViT) por meio da implementação de captura de patches de maneira dinâmica. Foram conduzidos experimentos com diferentes tipos de modelos, realizando ajuste fino (fine-tuning) e explorando múltiplas estratégias para identificar as áreas mais relevantes na captura de patches. As modificações propostas foram comparadas ao modelo tradicional do ViT, aplicando-se essas abordagens ao conjunto de dados do Centro de Reconhecimento e Inspeção de Células (CRIC), composto por imagens de exames de Papanicolau. Os resultados demonstraram que o fine-tuning do modelo ViT-Small com extração de patches em Grid alcançou uma acurácia de 0,81. Em contrapartida, a melhor abordagem dinâmica obteve 0,78, devido à excessiva sobreposição dos patches.Referências
Barcelos, M. R. B., Lima, R. d. C. D., Tomasi, E., Nunes, B. P., Duro, S. M. S., and Facchini, L. A. (2017). Quality of cervical cancer screening in brazil: external assessment of the pmaq. REV SAUDE PUBL, 51:67.
Buslaev, A., Iglovikov, V. I., Khvedchenya, E., Parinov, A., Druzhinin, M., and Kalinin, A. A. (2020). Albumentations: Fast and flexible image augmentations. Information, 11(2).
Chen, X., Hsieh, C.-J., and Gong, B. (2022). When vision transformers outperform resnets without pretraining or strong data augmentations. ArXiv, abs/2106.01548.
Dai, Y., Gao, Y., and Liu, F. (2021). Transmed: Transformers advance multi-modal medical image classification. Diagnostics, 11(8).
Dodge, S. and Karam, L. (2016). Understanding how image quality affects deep neural networks. In 2016 Eighth International Conference on Quality of Multimedia Experience (QoMEX), pages 1–6.
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., and Houlsby, N. (2021). An image is worth 16x16 words: Transformers for image recognition at scale. ArXiv, abs/2010.11929.
Fang, M., Lei, X., Liao, B., and Wu, F.-X. (2022). A deep neural network for cervical cell classification based on cytology images. IEEE Access, 10:130968–130980.
Kalbhor, M., Shinde, S., Wajire, P., and Jude, H. (2023). Cervicell-detector: An object detection approach for identifying the cancerous cells in pap smear images of cervical cancer. Heliyon, 9(11):e22324.
Kotyan, S. and Vargas, D. V. (2024). Improving robustness for vision transformer with a simple dynamic scanning augmentation. Neurocomputing, 565:127000.
McDanel, B. and Ngoc, C. P. (2023). Dynamic patch sampling for efficient training and dynamic inference in vision transformers. In International Conference on Machine Learning and Applications, pages 83–9.
N. Diniz, D., T. Rezende, M., G. C. Bianchi, A., M. Carneiro, C., J. S. Luz, E., J. P. Moreira, G., M. Ushizima, D., N. S. de Medeiros, F., and J. F. Souza, M. (2021). A deep learning ensemble method to assist cytopathologists in pap test image classification. J IMAGING SCI, 7(7).
Rezende, M. T., Silva, R., Bernardo, F. d. O., Tobias, A. H. G., Oliveira, P. H. C., Machado, T. M., Costa, C. S., Medeiros, F. N. S., Ushizima, D. M., Carneiro, C. M., and Bianchi, A. G. C. (2021). Cric searchable image database as a public platform for conventional pap smear cytology data. Scientific Data, 8(1):151.
Steiner, A. P., Kolesnikov, A., Zhai, X., Wightman, R., Uszkoreit, J., and Beyer, L. (2022). How to train your vit? data, augmentation, and regularization in vision transformers. Trans. Mach. Learn. Res., 2022.
Wu, B., Xu, C., Dai, X., Wan, A., Zhang, P., Tomizuka, M., Keutzer, K., and Vajda, P. (2020). Visual transformers: Token-based image representation and processing for computer vision. ArXiv, abs/2006.03677.
Buslaev, A., Iglovikov, V. I., Khvedchenya, E., Parinov, A., Druzhinin, M., and Kalinin, A. A. (2020). Albumentations: Fast and flexible image augmentations. Information, 11(2).
Chen, X., Hsieh, C.-J., and Gong, B. (2022). When vision transformers outperform resnets without pretraining or strong data augmentations. ArXiv, abs/2106.01548.
Dai, Y., Gao, Y., and Liu, F. (2021). Transmed: Transformers advance multi-modal medical image classification. Diagnostics, 11(8).
Dodge, S. and Karam, L. (2016). Understanding how image quality affects deep neural networks. In 2016 Eighth International Conference on Quality of Multimedia Experience (QoMEX), pages 1–6.
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., and Houlsby, N. (2021). An image is worth 16x16 words: Transformers for image recognition at scale. ArXiv, abs/2010.11929.
Fang, M., Lei, X., Liao, B., and Wu, F.-X. (2022). A deep neural network for cervical cell classification based on cytology images. IEEE Access, 10:130968–130980.
Kalbhor, M., Shinde, S., Wajire, P., and Jude, H. (2023). Cervicell-detector: An object detection approach for identifying the cancerous cells in pap smear images of cervical cancer. Heliyon, 9(11):e22324.
Kotyan, S. and Vargas, D. V. (2024). Improving robustness for vision transformer with a simple dynamic scanning augmentation. Neurocomputing, 565:127000.
McDanel, B. and Ngoc, C. P. (2023). Dynamic patch sampling for efficient training and dynamic inference in vision transformers. In International Conference on Machine Learning and Applications, pages 83–9.
N. Diniz, D., T. Rezende, M., G. C. Bianchi, A., M. Carneiro, C., J. S. Luz, E., J. P. Moreira, G., M. Ushizima, D., N. S. de Medeiros, F., and J. F. Souza, M. (2021). A deep learning ensemble method to assist cytopathologists in pap test image classification. J IMAGING SCI, 7(7).
Rezende, M. T., Silva, R., Bernardo, F. d. O., Tobias, A. H. G., Oliveira, P. H. C., Machado, T. M., Costa, C. S., Medeiros, F. N. S., Ushizima, D. M., Carneiro, C. M., and Bianchi, A. G. C. (2021). Cric searchable image database as a public platform for conventional pap smear cytology data. Scientific Data, 8(1):151.
Steiner, A. P., Kolesnikov, A., Zhai, X., Wightman, R., Uszkoreit, J., and Beyer, L. (2022). How to train your vit? data, augmentation, and regularization in vision transformers. Trans. Mach. Learn. Res., 2022.
Wu, B., Xu, C., Dai, X., Wan, A., Zhang, P., Tomizuka, M., Keutzer, K., and Vajda, P. (2020). Visual transformers: Token-based image representation and processing for computer vision. ArXiv, abs/2006.03677.
Publicado
09/06/2025
Como Citar
GIOVANINI, Vinícius Henrique; MACHADO, Alexei Manso Correa.
Vision Transformers com Patches Dinâmicos para a Análise de Lâminas Histológicas. In: SIMPÓSIO BRASILEIRO DE COMPUTAÇÃO APLICADA À SAÚDE (SBCAS), 25. , 2025, Porto Alegre/RS.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2025
.
p. 188-199.
ISSN 2763-8952.
DOI: https://doi.org/10.5753/sbcas.2025.6972.
