Comparing U-Net based architectures in monocular depth estimation

  • Antônio Carlos Durães da Silva IFES
  • Kelly Assis de Souza Gazolli IFES


Monocular depth estimation is a computer vision problem which has diverse applications ranging from augmented reality to surgical procedures. Given the similarity between the segmentation and monocular depth estimation tasks, in addition to the good performance of the U-net network and its variations in the segmentation task, this study aims to compare the performance of variations of U-Net and UNet++ architectures, each one adopting a different network as encoder, and the TransUnet architecture in monocular depth estimation. The results achieved on the NYU Depth V2 dataset shows that U-Net using Mix Transformer (MiT-B2) as encoder outperforms all other evaluated approaches.

Palavras-chave: Monocular depth estimation, U-Net, UNet, Transunet


SILVA, Antônio Carlos Durães da; GAZOLLI, Kelly Assis de Souza. Comparing U-Net based architectures in monocular depth estimation. In: WORKSHOP DE VISÃO COMPUTACIONAL (WVC), 18. , 2023, São Bernardo do Campo/SP. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2023 . p. 48-53. DOI: