Stacking Ensemble of CNNs and Vision Transformers for Classification of Abnormalities in Endoscopic Images

  • Pedro da S. Viana UFCA
  • Luana B. da Cruz UFCA
  • João O. B. Diniz IFMA
  • Nelson C. Sandes UFCA

Abstract


The massive analysis of endoscopic examinations overwhelms specialists in identifying gastrointestinal abnormalities. To optimize this screening process, this work proposes a binary classification method (normal and abnormal) using the Kvasir V1 image database. The proposed method encompasses Region of Interest extraction, Specular Highlight, Data Augmentation, and Ensemble Stacking, integrating Convolutional Neural Networks and Vision Transformers. The final model achieved 98.12% accuracy, 98.15% precision, 98.12% sensitivity, 98.23% specificity, and a 98.13% F1-score, distinguishing itself as a robust tool for medical diagnostic support.

References

Alvino, A. et al. (2025). Abordagem baseada em deep features para diagnóstico de câncer seroso de ovário em imagens histopatológicas. In Anais do XXV Simpósio Brasileiro de Computação Aplicada à Saúde, pages 401–412, Porto Alegre, RS, Brasil. SBC.

Awe, O. O. et al. (2024). Weighted hard and soft voting ensemble machine learning classifiers: Application to anaemia diagnosis. In Sustainable Statistical and Data Science Methods and Practices: Reports from LISA 2020 Global Network, Ghana, 2022, pages 351–374. Springer.

Ayan, E. (2024). Classification of gastrointestinal diseases in endoscopic images: Comparative analysis of convolutional neural networks and vision transformers. Journal of the Institute of Science and Technology, 14(3):988–999.

Chiras, D. D. (2013). Human body systems: Structure, function, and environment. Jones & Bartlett Publishers.

da S. Viana, P. et al. (2024). Anomalies diagnostic in endoscopic images using deep learning ensemble models. In Brazilian Conference on Intelligent Systems, pages 110–124. Springer.

de Câncer INCA, I. N. (2022). Estimativa 2023: incidência de câncer no brasil. Technical report, INCA, Rio de Janeiro, RJ.

Demirbaş, A. A., Üzen, H., and Fırat, H. (2024). Spatial-attention convmixer architecture for classification and detection of gastrointestinal diseases using the kvasir dataset. Health Information Science and Systems, 12(1):32.

Duda, R. (1973). Pattern classification and scene analysis. A Wiley-Interscience Publication, New York, London, Sydney, Toronto.

Gonçalves, J. et al. (2024). D.iagnóstica: Ferramenta cadx para diagnóstico de doenças pulmonares em imagens radiológicas. In Anais do XXIV Simpósio Brasileiro de Computação Aplicada à Saúde, pages 214–225, Porto Alegre, RS, Brasil. SBC.

Hosmer Jr, D. W., Lemeshow, S., and Sturdivant, R. X. (2013). Applied logistic regression, volume 398. John Wiley & Sons.

Hussain, T. et al. (2025). Effresnet-vit: A fusion-based convolutional and vision transformer model for explainable medical image classification. IEEE Access, 13:54040–54068.

Ilic, M. and Ilic, I. (2022). Epidemiology of stomach cancer. World Journal of Gastroenterology, 28(12):1187.

Pogorelov, K. et al. (2017). In ACM.

Rogler, G. (2014). Chronic ulcerative colitis and colorectal cancer. Cancer Letters.

Sehmus, A. (2025). Ensemble-based deep transfer learning for robust gastrointestinal endoscopy image classification. Balkan Journal of Electrical and Computer Engineering, 13(1).

Siddiqui, S., Khan, J. A., and Algamdi, S. (2025). Deep ensemble learning for gastrointestinal diagnosis using endoscopic image classification. PeerJ Computer Science, 11:e2809.

Subedi, A. et al. (2024). Classification of endoscopy and video capsule images using cnn-transformer model. arXiv preprint arXiv:2408.10733.

Wolpert, D. H. (1992). Stacked generalization. Neural networks, 5(2):241–259.
Published
2026-06-01
VIANA, Pedro da S.; CRUZ, Luana B. da; DINIZ, João O. B.; SANDES, Nelson C.. Stacking Ensemble of CNNs and Vision Transformers for Classification of Abnormalities in Endoscopic Images. In: BRAZILIAN SYMPOSIUM ON COMPUTING APPLIED TO HEALTH (SBCAS), 26. , 2026, Ouro Preto/MG. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2026 . p. 954-965. ISSN 2763-8952. DOI: https://doi.org/10.5753/sbcas.2026.21592.

Most read articles by the same author(s)

1 2 > >>