Case Study of Deep Learning Methods for Depth Estimation in Indoor Ground Robotics
DOI:
https://doi.org/10.22456/2175-2745.143443Keywords:
Depth estimation, Ground robotics, Deep learning, Case StudyAbstract
Depth estimation is the computer vision task that assigns a distance between the camera and each pixel in an image. This paper focuses on monocular metric depth estimation in videos, which infers a distance in metric units using a single RGB camera. Considering its applications, robotics systems and environmental mapping arise as practical areas that can make extensive usage of these techniques. As a case study for indoor robotics, the ICL ground robot dataset obtained by video footage in graphic simulation was used for experiments. A comparison was made considering the results and requirements of data acquisition needed for different deep learning models, presenting self-supervised and supervised methods available in literature and being the first work to present a depth estimation benchmark for the chosen dataset.
Downloads
References
LUO, X. et al. Consistent video depth estimation. ACM Transactions on Graphics (Proceedings of ACM SIGGRAPH), ACM, v. 39, n. 4, 2020.
GEIGER, A.; LENZ, P.; URTASUN, R. Are we ready for autonomous driving? The KITTI vision benchmark suite. In: IEEE. 2012 IEEE Conference on Computer Vision and Pattern Recognition. [S.l.], 2012. p. 3354–3361.
SILBERMAN, N. et al. Indoor segmentation and support inference from RGBD images. In: SPRINGER. Computer Vision–ECCV 2012: 12th European Conference on Computer Vision, Florence, Italy, October 7-13, 2012, Proceedings, Part V 12. [S.l.], 2012. p. 746–760.
SPENCER, J. et al. The monocular depth estimation challenge. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. [S.l.: s.n.], 2023. p. 623–632.
RANFTL, R. et al. Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer. IEEE Transactions on Pattern Analysis and Machine Intelligence, IEEE, v. 44, n. 3, p. 1623–1637, 2020.
YANG, L. et al. Depth anything: Unleashing the power of large-scale unlabeled data. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. [S.l.: s.n.], 2024. p. 10371–10381.
BHAT, S. F. et al. Zoedepth: Zero-shot transfer by combining relative and metric depth. arXiv preprint arXiv:2302.12288, 2023.
GODARD, C. et al. Digging into self-supervised monocular depth estimation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. [S.l.: s.n.], 2019. p. 3828–3838.
ZHAO, C. et al. Monocular depth estimation based on deep learning: An overview. Science China Technological Sciences, Springer Science and Business Media LLC, v. 63, n. 9, p. 1612–1627, jun. 2020. ISSN 1869-1900. Disponível em: ⟨http://dx.doi.org/10.1007/s11431-020-1582-8⟩.
WU, C.-Y. et al. Toward practical monocular indoor depth estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. [S.l.: s.n.], 2022. p. 3814–3824.
SAEEDI, S. et al. Characterizing visual localization and mapping datasets. In: IEEE. 2019 International Conference on Robotics and Automation (ICRA). [S.l.], 2019. p. 6699–6705.
RANFTL, R. et al. Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2020.
LIU, Z. et al. Swin transformer v2: Scaling up capacity and resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. [S.l.: s.n.], 2022. p. 12009–12019.
LOSHCHILOV, I. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.
SMITH, L. N.; TOPIN, N. Super-convergence: Very fast training of neural networks using large learning rates. In: SPIE. Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications. [S.l.], 2019. v. 11006, p. 369–386.
RONNEBERGER, O.; FISCHER, P.; BROX, T. U-net: Convolutional networks for biomedical image segmentation. In: SPRINGER. Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18. [S.l.], 2015. p. 234–241.
HE, K. et al. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. [S.l.: s.n.], 2016. p. 770–778.
KINGMA, D. P. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Vinicius Carbonezi de Souza, Fábio Leandro Vizzotto, Marcos D'Addio de Moura, Cides Semprebom Bezerra, Guilherme Ribeiro Sales, Valentino Corso, Luiz Eduardo Pita Mercês Almeida, Douglas Henrique Siqueira Abreu

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Autorizo aos editores a publicação de meu artigo, caso seja aceito, em meio eletrônico de acordo com as regras do Public Knowledge Project.