On the performance of uncertainty estimation methods for deep-learning based image classification models

  • Luís Felipe P. Cattelan UFSC
  • Danilo Silva UFSC


Previous works have shown that modern neural networks tend to be overconfident; thus, for deep learning models to be trusted and adopted in critical applications, reliable uncertainty estimation (UE) is essential. However, many questions are still open regarding how to fairly compare UE methods. This work focuses on the task of selective classification and proposes a methodology where the predictions of the underlying model are kept fixed and only the UE method is allowed to vary. Experiments are performed for convolutional neural networks using Deep Ensembles and Monte Carlo Dropout. Surprisingly, our results show that the conventional softmax response can outperform most other UE methods for a large part of the risk-coverage curve.


Abdar, M., Pourpanah, F., Hussain, S., Rezazadegan, D., Liu, L., Ghavamzadeh, M., Fieguth, P., Cao, X., Khosravi, A., Acharya, U. R., Makarenkov, V., and Nahavandi, S. (2021). A review of uncertainty quantification in deep learning: Techniques, applications and challenges. Inf. Fusion, 76(C):243-297.

Barnes, E. A. and Barnes, R. J. (2021). Controlled abstention neural networks for identifying skillful predictions for classification problems. Journal of Advances in Modeling Earth Systems, 13(12). e2021MS002573 2021MS002573.

Corbière, C., Thome, N., Saporta, A., Vu, T.-H., Cord, M., and Perez, P. (2021). Confidence Estimation via Auxiliary Models. IEEE Transactions on Pattern Analysis and Machine Intelligence, pages 1-1.

DeVries, T. and Taylor, G. W. (2018). Learning confidence for out-of-distribution detection in neural networks. arXiv preprint arXiv:1802.04865.

Dietterich, T. (2000). Ensemble methods in machine learning. Multiple Classifier Systems: First International Workshop, MCS 2000, Lecture Notes in Computer Science, pages 1-15.

Ding, Y., Liu, J., Xiong, J., and Shi, Y. (2020). Revisiting the evaluation of uncertainty estimation and its application to explore model complexity-uncertainty trade-off. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 22-31.

Gal, Y. and Ghahramani, Z. (2016). Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In Proceedings of the 33rd International Conference on International Conference on Machine Learning Volume 48, ICML'16, page 1050-1059. JMLR.org.

Galil, I., Dabbah, M., and El-Yaniv, R. (2022). Which models are innately best at uncertainty estimation? Gawlikowski, J., Tassi, C. R. N., Ali, M., Lee, J., Humt, M., Feng, J., Kruspe, A., Triebel, R., Jung, P., Roscher, R., Shahzad, M., Yang, W., Bamler, R., and Zhu, X. X. (2021). A survey of uncertainty in deep neural networks.

Geifman, Y. and El-Yaniv, R. (2017). Selective classification for deep neural networks. In Guyon, I., Luxburg, U. V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., and Garnett, R., editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc.

Geifman, Y. and El-Yaniv, R. (2019). SelectiveNet: A deep neural network with an integrated reject option. In Chaudhuri, K. and Salakhutdinov, R., editors, Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 2151-2159. PMLR.

Geifman, Y., Uziel, G., and El-Yaniv, R. (2018). Bias-Reduced Uncertainty Estimation for Deep Neural Classifiers. arXiv e-prints, page arXiv:1805.08206.

Guo, C., Pleiss, G., Sun, Y., and Weinberger, K. Q. (2017). On calibration of modern neural networks. In Precup, D. and Teh, Y. W., editors, Proceedings of the 34th In ternational Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pages 1321-1330. PMLR.

He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770-778.

Hendrycks, D. and Gimpel, K. (2017). A baseline for detecting misclassified and out-of-distribution examples in neural networks. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net.

Lakshminarayanan, B., Pritzel, A., and Blundell, C. (2017). Simple and scalable predictive uncertainty estimation using deep ensembles. Advances in neural information processing systems, 30.

Manivannan, I. (2020). A comparative study of uncertainty estimation methods in deep learning based classiAcation models. Technical report, Hochschule Bonn-Rhein-Sieg UÌ‹ University of Applied Sciences, Department of Computer Science.

Nado, Z., Band, N., Collier, M., Djolonga, J., Dusenberry, M. W., Farquhar, S., Filos, A., Havasi, M., Jenatton, R., Jerfel, G., Liu, J., Mariet, Z., Nixon, J., Padhy, S., Ren, J., Rudner, T. G. J., Wen, Y., Wenzel, F., Murphy, K., Sculley, D., Lakshminarayanan, B., Snoek, J., Gal, Y., and Tran, D. (2021). Uncertainty baselines: Benchmarks for uncertainty & robustness in deep learning. CoRR, abs/2106.04015.

Nair, T., Precup, D., Arnold, D. L., and Arbel, T. (2020). Exploring uncertainty measures in deep networks for multiple sclerosis lesion detection and segmentation. Medical image analysis, 59:101557.

Ovadia, Y., Fertig, E., Ren, J., Nado, Z., Sculley, D., Nowozin, S., Dillon, J., Lakshminarayanan, B., and Snoek, J. (2019). Can you trust your model's uncertainty? evaluating predictive uncertainty under dataset shift. Advances in neural information processing systems, 32.

Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., et al. (2019). Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32.

Smith, L. and Gal, Y. (2018). Understanding measures of uncertainty for adversarial example detection. In Globerson, A. and Silva, R., editors, Proceedings of the Thirty-Fourth Conference on Uncertainty in Artificial Intelligence, UAI 2018, Monterey, California, USA, August 6-10, 2018, pages 560-569. AUAI Press.

Ståhl, N., Falkman, G., Karlsson, A., and Mathiason, G. (2020). Evaluation of uncertainty quantification in deep learning. In International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems, pages 556-568. Springer.

Thulasidasan, S. (2020). Deep Learning with Abstention: Algorithms for Robust Training and Predictive Uncertainty. PhD thesis, University of Washington.

Wang, D.-B., Feng, L., and Zhang, M.-L. (2021). Rethinking calibration of deep neural networks: Do not be afraid of overconfidence. In Beygelzimer, A., Dauphin, Y., Liang, P., and Vaughan, J. W., editors, Advances in Neural Information Processing Systems.

Zagoruyko, S. and Komodakis, N. (2016). Wide residual networks. In Richard C. Wilson, E. R. H. and Smith, W. A. P., editors, Proceedings of the British Machine Vision Conference (BMVC), pages 87.1-87.12. BMVA Press.
CATTELAN, Luís Felipe P.; SILVA, Danilo. On the performance of uncertainty estimation methods for deep-learning based image classification models. In: ENCONTRO NACIONAL DE INTELIGÊNCIA ARTIFICIAL E COMPUTACIONAL (ENIAC), 19. , 2022, Campinas/SP. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2022 . p. 532-543. ISSN 2763-9061. DOI: https://doi.org/10.5753/eniac.2022.227603.