Case Study of Deep Learning Methods for Depth Estimation in Indoor Ground Robotics

Fábio Leandro Vizzotto; Marcos D'Addio de Moura; Vinicius Carbonezi de Souza; Cides Semprebom Bezerra; Guilherme Ribeiro Sales; Valentino Corso; Luiz Eduardo Pita Mercês Almeida; Douglas Henrique Siqueira Abreu

doi:10.22456/2175-2745.143443

Authors

Fábio Leandro Vizzotto Centro de Pesquisa e Desenvolvimento em Telecomunicações (CPQD)
Marcos D'Addio de Moura Centro de Pesquisa e Desenvolvimento em Telecomunicações (CPQD)
Vinicius Carbonezi de Souza Centro de Pesquisa e Desenvolvimento em Telecomunicações (CPQD) https://orcid.org/0009-0006-1820-2825
Cides Semprebom Bezerra Centro de Pesquisa e Desenvolvimento em Telecomunicações (CPQD)
Guilherme Ribeiro Sales Centro de Pesquisa e Desenvolvimento em Telecomunicações (CPQD)
Valentino Corso Centro de Pesquisa e Desenvolvimento em Telecomunicações (CPQD)
Luiz Eduardo Pita Mercês Almeida Centro de Pesquisa e Desenvolvimento em Telecomunicações (CPQD) https://orcid.org/0009-0009-6823-1570
Douglas Henrique Siqueira Abreu Pontifícia Universidade Católica de Campinas (PUCC) https://orcid.org/0009-0005-4739-5980

DOI:

https://doi.org/10.22456/2175-2745.143443

Keywords:

Depth estimation, Ground robotics, Deep learning, Case Study

Abstract

Depth estimation is the computer vision task that assigns a distance between the camera and each pixel in an image. This paper focuses on monocular metric depth estimation in videos, which infers a distance in metric units using a single RGB camera. Considering its applications, robotics systems and environmental mapping arise as practical areas that can make extensive usage of these techniques. As a case study for indoor robotics, the ICL ground robot dataset obtained by video footage in graphic simulation was used for experiments. A comparison was made considering the results and requirements of data acquisition needed for different deep learning models, presenting self-supervised and supervised methods available in literature and being the first work to present a depth estimation benchmark for the chosen dataset.

Downloads

Download data is not yet available.

References

LUO, X. et al. Consistent video depth estimation. ACM Transactions on Graphics (Proceedings of ACM SIGGRAPH), ACM, v. 39, n. 4, 2020.

GEIGER, A.; LENZ, P.; URTASUN, R. Are we ready for autonomous driving? The KITTI vision benchmark suite. In: IEEE. 2012 IEEE Conference on Computer Vision and Pattern Recognition. [S.l.], 2012. p. 3354–3361.

SILBERMAN, N. et al. Indoor segmentation and support inference from RGBD images. In: SPRINGER. Computer Vision–ECCV 2012: 12th European Conference on Computer Vision, Florence, Italy, October 7-13, 2012, Proceedings, Part V 12. [S.l.], 2012. p. 746–760.

SPENCER, J. et al. The monocular depth estimation challenge. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. [S.l.: s.n.], 2023. p. 623–632.

RANFTL, R. et al. Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer. IEEE Transactions on Pattern Analysis and Machine Intelligence, IEEE, v. 44, n. 3, p. 1623–1637, 2020.

YANG, L. et al. Depth anything: Unleashing the power of large-scale unlabeled data. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. [S.l.: s.n.], 2024. p. 10371–10381.

BHAT, S. F. et al. Zoedepth: Zero-shot transfer by combining relative and metric depth. arXiv preprint arXiv:2302.12288, 2023.

GODARD, C. et al. Digging into self-supervised monocular depth estimation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. [S.l.: s.n.], 2019. p. 3828–3838.

ZHAO, C. et al. Monocular depth estimation based on deep learning: An overview. Science China Technological Sciences, Springer Science and Business Media LLC, v. 63, n. 9, p. 1612–1627, jun. 2020. ISSN 1869-1900. Disponível em: ⟨http://dx.doi.org/10.1007/s11431-020-1582-8⟩.

WU, C.-Y. et al. Toward practical monocular indoor depth estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. [S.l.: s.n.], 2022. p. 3814–3824.

SAEEDI, S. et al. Characterizing visual localization and mapping datasets. In: IEEE. 2019 International Conference on Robotics and Automation (ICRA). [S.l.], 2019. p. 6699–6705.

RANFTL, R. et al. Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2020.

LIU, Z. et al. Swin transformer v2: Scaling up capacity and resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. [S.l.: s.n.], 2022. p. 12009–12019.

LOSHCHILOV, I. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.

SMITH, L. N.; TOPIN, N. Super-convergence: Very fast training of neural networks using large learning rates. In: SPIE. Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications. [S.l.], 2019. v. 11006, p. 369–386.

RONNEBERGER, O.; FISCHER, P.; BROX, T. U-net: Convolutional networks for biomedical image segmentation. In: SPRINGER. Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18. [S.l.], 2015. p. 234–241.

HE, K. et al. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. [S.l.: s.n.], 2016. p. 770–778.

KINGMA, D. P. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.