Risk-Aware Robust Learning: Reducing Clinical Risk under Label Noise in Medical Image Classification
Abstract
Noisy labels are a pervasive challenge in medical image classification, where annotation errors arise from inter-observer variability and diagnostic ambiguity. Although several noise-robust learning methods have been proposed, their evaluation predominantly relies on accuracy-oriented metrics, overlooking the clinical implications of asymmetric error costs. In medical diagnosis, a false negative (missed disease) carries substantially higher consequences than a false positive (false alarm), as delayed treatment can directly impact patient outcomes. In this work, we investigate whether noise-robust training methods preserve clinical safety under label noise. We conduct a systematic risk-aware evaluation of the state-of-the-art noise-robust methods Coteaching, DivideMix, UNICON, and a GMM-based filtering approach on binarized DermaMNIST and PathMNIST datasets under clean and label noise rates of 20%, and 40%. Beyond balanced accuracy, we adopt a cost-sensitive Global Risk formulation that explicitly penalizes false negatives. Our analysis reveals that the robustness of state-of-the-art methods does not guarantee clinical safety. Furthermore, we demonstrate that integrating cost-sensitive optimization into noise-robust training significantly reduces clinical risk, while mantaining model utility. These findings demonstrate that noise-robust learning must be evaluated through a clinical risk lens, and that combining robust training with cost-sensitive optimization can meaningfully reduce risk in noisy-label medical imaging scenarios.References
Araf, I., Idri, A., and Chairi, I. (2024). Cost-sensitive learning for imbalanced medical data: a review. Artificial Intelligence Review, 57(4):80.
Arpit, D., Jastrzebski, S., Ballas, N., Krueger, D., Bengio, E., Kanwal, M. S., Maharaj, T., Fischer, A., Courville, A., Bengio, Y., and Lacoste-Julien, S. (2017). A closer look at memorization in deep networks. In International Conference on Machine Learning (ICML), pages 233–242.
Berthelot, D., Carlini, N., Goodfellow, I., Papernot, N., Oliver, A., and Raffel, C. A. (2019). Mixmatch: A holistic approach to semi-supervised learning. In Advances in Neural Information Processing Systems (NeurIPS), volume 32.
Carneiro, G. (2024). Machine Learning with Noisy Labels: Definitions, Theory, Techniques and Solutions. Elsevier.
Chan, H.-P., Samala, R. K., Hadjiiski, L. M., and Zhou, C. (2020). Deep learning in medical image analysis. Deep learning in medical image analysis: challenges and applications, pages 3–21.
Collell, G., Prelec, D., and Patil, K. R. (2018). A simple plug-in bagging ensemble based on threshold-moving for imbalanced data. In IEEE International Conference on Big Data, pages 2390–2399.
Cordeiro, F. R. and Carneiro, G. (2025). Anne: Adaptive nearest neighbours and eigenvector-based sample selection for robust learning with noisy labels. Pattern Recognition, 159:111132.
Cordeiro, F. R., Sachdeva, R., Belagiannis, V., Reid, I., and Carneiro, G. (2023). Longremix: Robust learning with high confidence samples in a noisy label environment. volume 133, page 109013.
Elkan, C. (2001). The foundations of cost-sensitive learning. pages 973–978.
Frénay, B. and Verleysen, M. (2014). Classification in the presence of label noise: A survey. IEEE Transactions on Neural Networks and Learning Systems, 25(5):845–869.
Haimerl, M. and Reich, C. (2025). Risk-based evaluation of machine learning-based classification methods used for medical devices. BMC Medical Informatics and Decision Making, 25(1):126.
Han, B., Yao, Q., Yu, X., Niu, G., Xu, M., Hu, W., Tsang, I., and Sugiyama, M. (2018). Co-teaching: Robust training of deep neural networks with extremely noisy labels. In Advances in Neural Information Processing Systems (NeurIPS), volume 31.
He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep residual learning for image recognition. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778.
Jiang, L., Zhou, Z., Leung, T., Li, L.-J., and Fei-Fei, L. (2018). Mentornet: Learning data-driven curriculum for very deep neural networks on corrupted labels. In International Conference on Machine Learning (ICML), pages 2304–2313.
Karim, N., Rizve, M. N., Rahnavard, N., Mian, A., and Shah, M. (2022). Unicon: Combating label noise through uniform selection and contrastive learning. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9676–9686.
Khan, S. H., Hayat, M., Bennamoun, M., Sohel, F. A., and Togneri, R. (2018). Cost-sensitive learning of deep feature representations from imbalanced data. IEEE Transactions on Neural Networks and Learning Systems, 29(8):3573–3587.
Li, J., Socher, R., and Hoi, S. C. (2020). Dividemix: Learning with noisy labels as semi-supervised learning. In International Conference on Learning Representations (ICLR).
Ling, C. X. and Sheng, V. S. (2010). Cost-sensitive learning and the class imbalance problem. Encyclopedia of Machine Learning, pages 231–235.
Scholz, R. et al. (2024). Imbalance-aware loss functions improve medical image classification. In Proceedings of Machine Learning Research.
Song, H., Kim, M., Park, D., Shin, Y., and Lee, J.-G. (2022). Learning from noisy labels with deep neural networks: A survey. IEEE Transactions on Neural Networks and Learning Systems, 34(11):8135–8153.
Yang, J., Shi, R., Wei, D., Liu, Z., Wang, L., Zhou, Y., Zhou, S., Bian, C., Li, L., Wang, X., et al. (2021). Medmnist: A lightweight automl benchmark for medical image analysis. [link]. Accessed: February 13, 2025.
Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., and Ni, B. (2023). Medmnist v2 – a large-scale lightweight benchmark for 2d and 3d biomedical image classification. Scientific Data, 10(1):41.
Zhang, C., Bengio, S., Hardt, M., Recht, B., and Vinyals, O. (2021). Understanding deep learning (still) requires rethinking generalization. Communications of the ACM, 64(3):107–115.
Arpit, D., Jastrzebski, S., Ballas, N., Krueger, D., Bengio, E., Kanwal, M. S., Maharaj, T., Fischer, A., Courville, A., Bengio, Y., and Lacoste-Julien, S. (2017). A closer look at memorization in deep networks. In International Conference on Machine Learning (ICML), pages 233–242.
Berthelot, D., Carlini, N., Goodfellow, I., Papernot, N., Oliver, A., and Raffel, C. A. (2019). Mixmatch: A holistic approach to semi-supervised learning. In Advances in Neural Information Processing Systems (NeurIPS), volume 32.
Carneiro, G. (2024). Machine Learning with Noisy Labels: Definitions, Theory, Techniques and Solutions. Elsevier.
Chan, H.-P., Samala, R. K., Hadjiiski, L. M., and Zhou, C. (2020). Deep learning in medical image analysis. Deep learning in medical image analysis: challenges and applications, pages 3–21.
Collell, G., Prelec, D., and Patil, K. R. (2018). A simple plug-in bagging ensemble based on threshold-moving for imbalanced data. In IEEE International Conference on Big Data, pages 2390–2399.
Cordeiro, F. R. and Carneiro, G. (2025). Anne: Adaptive nearest neighbours and eigenvector-based sample selection for robust learning with noisy labels. Pattern Recognition, 159:111132.
Cordeiro, F. R., Sachdeva, R., Belagiannis, V., Reid, I., and Carneiro, G. (2023). Longremix: Robust learning with high confidence samples in a noisy label environment. volume 133, page 109013.
Elkan, C. (2001). The foundations of cost-sensitive learning. pages 973–978.
Frénay, B. and Verleysen, M. (2014). Classification in the presence of label noise: A survey. IEEE Transactions on Neural Networks and Learning Systems, 25(5):845–869.
Haimerl, M. and Reich, C. (2025). Risk-based evaluation of machine learning-based classification methods used for medical devices. BMC Medical Informatics and Decision Making, 25(1):126.
Han, B., Yao, Q., Yu, X., Niu, G., Xu, M., Hu, W., Tsang, I., and Sugiyama, M. (2018). Co-teaching: Robust training of deep neural networks with extremely noisy labels. In Advances in Neural Information Processing Systems (NeurIPS), volume 31.
He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep residual learning for image recognition. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778.
Jiang, L., Zhou, Z., Leung, T., Li, L.-J., and Fei-Fei, L. (2018). Mentornet: Learning data-driven curriculum for very deep neural networks on corrupted labels. In International Conference on Machine Learning (ICML), pages 2304–2313.
Karim, N., Rizve, M. N., Rahnavard, N., Mian, A., and Shah, M. (2022). Unicon: Combating label noise through uniform selection and contrastive learning. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9676–9686.
Khan, S. H., Hayat, M., Bennamoun, M., Sohel, F. A., and Togneri, R. (2018). Cost-sensitive learning of deep feature representations from imbalanced data. IEEE Transactions on Neural Networks and Learning Systems, 29(8):3573–3587.
Li, J., Socher, R., and Hoi, S. C. (2020). Dividemix: Learning with noisy labels as semi-supervised learning. In International Conference on Learning Representations (ICLR).
Ling, C. X. and Sheng, V. S. (2010). Cost-sensitive learning and the class imbalance problem. Encyclopedia of Machine Learning, pages 231–235.
Scholz, R. et al. (2024). Imbalance-aware loss functions improve medical image classification. In Proceedings of Machine Learning Research.
Song, H., Kim, M., Park, D., Shin, Y., and Lee, J.-G. (2022). Learning from noisy labels with deep neural networks: A survey. IEEE Transactions on Neural Networks and Learning Systems, 34(11):8135–8153.
Yang, J., Shi, R., Wei, D., Liu, Z., Wang, L., Zhou, Y., Zhou, S., Bian, C., Li, L., Wang, X., et al. (2021). Medmnist: A lightweight automl benchmark for medical image analysis. [link]. Accessed: February 13, 2025.
Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., and Ni, B. (2023). Medmnist v2 – a large-scale lightweight benchmark for 2d and 3d biomedical image classification. Scientific Data, 10(1):41.
Zhang, C., Bengio, S., Hardt, M., Recht, B., and Vinyals, O. (2021). Understanding deep learning (still) requires rethinking generalization. Communications of the ACM, 64(3):107–115.
Published
2026-06-01
How to Cite
PEREIRA, Maycon R. S.; CORDEIRO, Filipe R..
Risk-Aware Robust Learning: Reducing Clinical Risk under Label Noise in Medical Image Classification. In: BRAZILIAN SYMPOSIUM ON COMPUTING APPLIED TO HEALTH (SBCAS), 26. , 2026, Ouro Preto/MG.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2026
.
p. 325-336.
ISSN 2763-8952.
DOI: https://doi.org/10.5753/sbcas.2026.20827.
