Case Study of Deep Learning Methods for Depth Estimation in Indoor Ground Robotics

Authors

  • Fábio Leandro Vizzotto Centro de Pesquisa e Desenvolvimento em Telecomunicações (CPQD)
  • Marcos D'Addio de Moura Centro de Pesquisa e Desenvolvimento em Telecomunicações (CPQD)
  • Vinicius Carbonezi de Souza Centro de Pesquisa e Desenvolvimento em Telecomunicações (CPQD) https://orcid.org/0009-0006-1820-2825
  • Cides Semprebom Bezerra Centro de Pesquisa e Desenvolvimento em Telecomunicações (CPQD)
  • Guilherme Ribeiro Sales Centro de Pesquisa e Desenvolvimento em Telecomunicações (CPQD)
  • Valentino Corso Centro de Pesquisa e Desenvolvimento em Telecomunicações (CPQD)
  • Luiz Eduardo Pita Mercês Almeida Centro de Pesquisa e Desenvolvimento em Telecomunicações (CPQD) https://orcid.org/0009-0009-6823-1570
  • Douglas Henrique Siqueira Abreu Pontifícia Universidade Católica de Campinas (PUCC) https://orcid.org/0009-0005-4739-5980

DOI:

https://doi.org/10.22456/2175-2745.143443

Keywords:

Depth estimation, Ground robotics, Deep learning, Case Study

Abstract

Depth estimation is the computer vision task that assigns a distance between the camera and each pixel in an image. This paper focuses on monocular metric depth estimation in videos, which infers a distance in metric units using a single RGB camera. Considering its applications, robotics systems and environmental mapping arise as practical areas that can make extensive usage of these techniques. As a case study for indoor robotics, the ICL ground robot dataset obtained by video footage in graphic simulation was used for experiments. A comparison was made considering the results and requirements of data acquisition needed for different deep learning models, presenting self-supervised and supervised methods available in literature and being the first work to present a depth estimation benchmark for the chosen dataset.

Downloads

Download data is not yet available.

References

LUO, X. et al. Consistent video depth estimation. ACM Transactions on Graphics (Proceedings of ACM SIGGRAPH), ACM, v. 39, n. 4, 2020.

GEIGER, A.; LENZ, P.; URTASUN, R. Are we ready for autonomous driving? The KITTI vision benchmark suite. In: IEEE. 2012 IEEE Conference on Computer Vision and Pattern Recognition. [S.l.], 2012. p. 3354–3361.

SILBERMAN, N. et al. Indoor segmentation and support inference from RGBD images. In: SPRINGER. Computer Vision–ECCV 2012: 12th European Conference on Computer Vision, Florence, Italy, October 7-13, 2012, Proceedings, Part V 12. [S.l.], 2012. p. 746–760.

SPENCER, J. et al. The monocular depth estimation challenge. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. [S.l.: s.n.], 2023. p. 623–632.

RANFTL, R. et al. Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer. IEEE Transactions on Pattern Analysis and Machine Intelligence, IEEE, v. 44, n. 3, p. 1623–1637, 2020.

YANG, L. et al. Depth anything: Unleashing the power of large-scale unlabeled data. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. [S.l.: s.n.], 2024. p. 10371–10381.

BHAT, S. F. et al. Zoedepth: Zero-shot transfer by combining relative and metric depth. arXiv preprint arXiv:2302.12288, 2023.

GODARD, C. et al. Digging into self-supervised monocular depth estimation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. [S.l.: s.n.], 2019. p. 3828–3838.

ZHAO, C. et al. Monocular depth estimation based on deep learning: An overview. Science China Technological Sciences, Springer Science and Business Media LLC, v. 63, n. 9, p. 1612–1627, jun. 2020. ISSN 1869-1900. Disponível em: ⟨http://dx.doi.org/10.1007/s11431-020-1582-8⟩.

WU, C.-Y. et al. Toward practical monocular indoor depth estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. [S.l.: s.n.], 2022. p. 3814–3824.

SAEEDI, S. et al. Characterizing visual localization and mapping datasets. In: IEEE. 2019 International Conference on Robotics and Automation (ICRA). [S.l.], 2019. p. 6699–6705.

RANFTL, R. et al. Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2020.

LIU, Z. et al. Swin transformer v2: Scaling up capacity and resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. [S.l.: s.n.], 2022. p. 12009–12019.

LOSHCHILOV, I. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.

SMITH, L. N.; TOPIN, N. Super-convergence: Very fast training of neural networks using large learning rates. In: SPIE. Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications. [S.l.], 2019. v. 11006, p. 369–386.

RONNEBERGER, O.; FISCHER, P.; BROX, T. U-net: Convolutional networks for biomedical image segmentation. In: SPRINGER. Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18. [S.l.], 2015. p. 234–241.

HE, K. et al. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. [S.l.: s.n.], 2016. p. 770–778.

KINGMA, D. P. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.

Downloads

Published

2025-02-20

How to Cite

Leandro Vizzotto, F., D’Addio de Moura, M., Carbonezi de Souza, V., Semprebom Bezerra, C., Ribeiro Sales, G., Corso, V., Eduardo Pita Mercês Almeida, L., & Henrique Siqueira Abreu, D. (2025). Case Study of Deep Learning Methods for Depth Estimation in Indoor Ground Robotics. Revista De Informática Teórica E Aplicada, 32(1), 166–172. https://doi.org/10.22456/2175-2745.143443

Issue

Section

WVC2024

Most read articles by the same author(s)