A Learning-Based Framework for Depth Perception using Dense Light Fields

  • Anderson Priebe Ferrugem UFPel
  • Bruno Zatt UFPel
  • Luciano Volcan Agostini UFPel


The rapid development of optical sensors technology has accompanied a growing demand for visual measurement systems in emerging areas that need to interpret the real three-dimensional physical world, such as self-driving cars, mobile robotics, Advanced Driver Assistance Systems (ADAS), and medical diagnostic in 3D imaging. In these systems, for modeling the physical world, it is necessary to unify visual information with depth measurements. Light Field cameras have the potential to be used in such systems as a versatile hypersensor. Since Light Fields represent the scene’s visual information from multiple viewpoints, it is possible to calculate the depth information through trigonometric operations. This paper proposes a learning-based framework that allows unifying scene depth with visual information obtained from Light Fields. The structure of the proposed framework is composed of four main modules. The deep learning modules consist of (i) a depth map estimation using a siamese convolutional neural network and (ii) an instance segmentation employing region-based convolutional neural network. The others two modules apply linear transformations: (iii) a module which applies the matrix transformations with camera intrinsic parameters to generated a new depth map of absolute distances and (iv) a module to return the distance of the selected objects. For the depth map estimation module this framework proposal a siamese neural network called EPINET-FAST, which allows for generating depth maps in less than half the time of the original EPINET. A case study is presented using Dense Light Fields captured by a Lytro Illum camera (plenotic 1.0). The case study seeks to exemplify the processing time of each module, allowing researchers to isolate critical points and propose changes in the future, seeking a processing that can be applied in real time.
Palavras-chave: light fields, artificial neural networks, computer vision, instance segmentation


A. Bajpayee, A. H. Techet, and H. Singh. 2018. Real-Time Light Field Processing for Autonomous Robotics. In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). 4218–4225. https://doi.org/10.1109/IROS.2018.8594477

Daniel Bolya, Chong Zhou, Fanyi Xiao, and Yong Jae Lee. 2019. YOLACT: Real-Time Instance Segmentation. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV). 9156–9165. https://doi.org/10.1109/ICCV.2019.00925

Hao Chen, Kunyang Sun, Zhi Tian, Chunhua Shen, Yongming Huang, and Youliang Yan. 2020. BlendMask: Top-Down Meets Bottom-Up for Instance Segmentation. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 8570–8578. https://doi.org/10.1109/CVPR42600.2020.00860

Kai Chen, Jiangmiao Pang, Jiaqi Wang, Yu Xiong, Xiaoxiao Li, Shuyang Sun, Wansen Feng, Ziwei Liu, Jianping Shi, Wanli Ouyang, Chen Change Loy, and Dahua Lin. 2019. Hybrid Task Cascade for Instance Segmentation. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 4969–4978. https://doi.org/10.1109/CVPR.2019.00511

Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. 2016. The Cityscapes Dataset for Semantic Urban Scene Understanding. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

Jifeng Dai, Haozhi Qi, Yuwen Xiong, Yi Li, Guodong Zhang, Han Hu, and Yichen Wei. 2017. Deformable Convolutional Networks. In 2017 IEEE International Conference on Computer Vision (ICCV). 764–773. https://doi.org/10.1109/ICCV.2017.89

Bert De Brabandere, Davy Neven, and Luc Van Gool. 2017. Semantic Instance Segmentation for Autonomous Driving. In 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). 478–480. https://doi.org/10.1109/CVPRW.2017.66

M. Everingham, S. M. A. Eslami, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. 2015. The Pascal Visual Object Classes Challenge: A Retrospective. International Journal of Computer Vision 111, 1 (Jan. 2015), 98–136.

Naiyu Gao, Yanhu Shan, Yupei Wang, Xin Zhao, and Kaiqi Huang. 2021. SSAP: Single-Shot Instance Segmentation With Affinity Pyramid. IEEE Transactions on Circuits and Systems for Video Technology 31, 2(2021), 661–673. https://doi.org/10.1109/TCSVT.2020.2985420

Ross Girshick. 2015. Fast R-CNN. In 2015 IEEE International Conference on Computer Vision (ICCV). 1440–1448. https://doi.org/10.1109/ICCV.2015.169

Bichuan Guo, Jiangtao Wen, and Yuxing Han. 2020. Deep Material Recognition in Light-Fields via Disentanglement of Spatial and Angular Information. In Computer Vision – ECCV 2020, Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm (Eds.). Springer International Publishing, Cham, 664–679.

Agrim Gupta, Piotr Dollár, and Ross Girshick. 2019. LVIS: A Dataset for Large Vocabulary Instance Segmentation. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 5351–5359. https://doi.org/10.1109/CVPR.2019.00550

Abdul Mueed Hafiz and Ghulam Mohiuddin Bhat. 2020. A survey on instance segmentation: state of the art. International Journal of Multimedia Information Retrieval 9, 3 (jul 2020), 171–189. https://doi.org/10.1007/s13735-020-00195-x

Christopher Hahne, Amar Aggoun, Vladan Velisavljevic, Susanne Fiebig, and Matthias Pesch. 2016. Refocusing distance of a standard plenoptic camera. Opt. Express 24, 19 (Sep 2016), 21521–21540. https://doi.org/10.1364/OE.24.021521

Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017. Mask R-CNN. In 2017 IEEE International Conference on Computer Vision (ICCV). 2980–2988. https://doi.org/10.1109/ICCV.2017.322

Stefan Heber, Wei Yu, and Thomas Pock. 2017. Neural EPI-Volume Networks for Shape from Light Field. In 2017 IEEE International Conference on Computer Vision (ICCV). 2271–2279. https://doi.org/10.1109/ICCV.2017.247

Katrin Honauer, Ole Johannsen, Daniel Kondermann, and Bastian Goldluecke. 2017. A Dataset and Evaluation Methodology for Depth Estimation on 4D Light Fields. In Computer Vision – ACCV 2016, Shang-Hong Lai, Vincent Lepetit, Ko Nishino, and Yoichi Sato (Eds.). Springer International Publishing, Cham, 19–34.

Eddy Ilg, Nikolaus Mayer, Tonmoy Saikia, Margret Keuper, Alexey Dosovitskiy, and Thomas Brox. 2017. FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1647–1655. https://doi.org/10.1109/CVPR.2017.179

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common Objects in Context. In Computer Vision – ECCV 2014, David Fleet, Tomas Pajdla, Bernt Schiele, and Tinne Tuytelaars (Eds.). Springer International Publishing, Cham, 740–755.

Shu Liu, Jiaya Jia, Sanja Fidler, and Raquel Urtasun. 2017. SGN: Sequential Grouping Networks for Instance Segmentation. In 2017 IEEE International Conference on Computer Vision (ICCV). 3516–3524. https://doi.org/10.1109/ICCV.2017.378

Shu Liu, Lu Qi, Haifang Qin, Jianping Shi, and Jiaya Jia. 2018. Path Aggregation Network for Instance Segmentation. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8759–8768. https://doi.org/10.1109/CVPR.2018.00913

Wenjie Luo, Alexander G. Schwing, and Raquel Urtasun. 2016. Efficient Deep Learning for Stereo Matching. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 5695–5703. https://doi.org/10.1109/CVPR.2016.614

Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2017. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 6(2017), 1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031

Jinglei Shi, Xiaoran Jiang, and Christine Guillemot. 2019. A Framework for Learning Depth From a Flexible Subset of Dense and Sparse Light Field Views. IEEE Transactions on Image Processing 28, 12 (2019), 5867–5880. https://doi.org/10.1109/TIP.2019.2923323

Changha Shin, Hae-Gon Jeon, Youngjin Yoon, In So Kweon, and Seon Joo Kim. 2018. EPINET: A Fully-Convolutional Neural Network Using Epipolar Geometry for Depth from Light Field Images. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4748–4757. https://doi.org/10.1109/CVPR.2018.00499

Richard Szeliski. 2022. Computer Vision (2ed.). Springer Nature Switzerland AG, Cham,Switzerland. https://doi.org/10.1007/978-3-030-34372-9

Di Tian, Yi Han, Biyao Wang, Tian Guan, Hengzhi Gu, and Wei Wei. 2021. Review of object instance segmentation based on deep learning. Journal of Electronic Imaging 31, 4 (2021), 1 – 18. https://doi.org/10.1117/1.JEI.31.4.041205

Jorge Vargas, Suleiman Alsweiss, Onur Toker, Rahul Razdan, and Joshua Santos. 2021. An Overview of Autonomous Vehicles Sensors and Their Vulnerability to Weather Conditions. Sensors 21, 16 (2021). https://doi.org/10.3390/s21165397

Xinlong Wang, Rufeng Zhang, Chunhua Shen, Tao Kong, and Lei Li. 2021. SOLO: A Simple Framework for Instance Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence (2021), 1–1. https://doi.org/10.1109/TPAMI.2021.3111116

Yuxin Wu, Alexander Kirillov, Francisco Massa, Wan-Yen Lo, and Ross Girshick. 2019. Detectron2. https://github.com/facebookresearch/detectron2.

Enze Xie, Peize Sun, Xiaoge Song, Wenhai Wang, Xuebo Liu, Ding Liang, Chunhua Shen, and Ping Luo. 2020. PolarMask: Single Shot Instance Segmentation With Polar Representation. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 12190–12199. https://doi.org/10.1109/CVPR42600.2020.01221

Shizhe Zang, Ming Ding, David Smith, Paul Tyler, Thierry Rakotoarivelo, and Mohamed Ali Kaafar. 2019. The Impact of Adverse Weather Conditions on Autonomous Vehicles: How Rain, Snow, Fog, and Hail Affect the Performance of a Self-Driving Car. IEEE Vehicular Technology Magazine 14, 2 (2019), 103–111. https://doi.org/10.1109/MVT.2019.2892497

Jure Žbontar and Yann LeCun. 2016. Stereo Matching by Training a Convolutional Neural Network to Compare Image Patches. J. Mach. Learn. Res. 17, 1 (Jan. 2016), 2287–2318.

Bolei Zhou, Hang Zhao, Xavier Puig, Sanja Fidler, Adela Barriuso, and Antonio Torralba. 2017. Scene Parsing through ADE20K Dataset. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 5122–5130. https://doi.org/10.1109/CVPR.2017.544

Zheming Zhou, Xiaotong Chen, and Odest Chadwicke Jenkins. 2020. LIT: Light-Field Inference of Transparency for Refractive Object Localization. IEEE Robotics and Automation Letters 5, 3 (2020), 4548–4555. https://doi.org/10.1109/LRA.2020.3001499
Como Citar

Selecione um Formato
FERRUGEM, Anderson Priebe; ZATT, Bruno; AGOSTINI, Luciano Volcan. A Learning-Based Framework for Depth Perception using Dense Light Fields. In: SIMPÓSIO BRASILEIRO DE SISTEMAS MULTIMÍDIA E WEB (WEBMEDIA), 28. , 2022, Curitiba. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2022 . p. 159-167.