A Learning-Based Framework for Depth Perception using Dense Light Fields
Resumo
The rapid development of optical sensors technology has accompanied a growing demand for visual measurement systems in emerging areas that need to interpret the real three-dimensional physical world, such as self-driving cars, mobile robotics, Advanced Driver Assistance Systems (ADAS), and medical diagnostic in 3D imaging. In these systems, for modeling the physical world, it is necessary to unify visual information with depth measurements. Light Field cameras have the potential to be used in such systems as a versatile hypersensor. Since Light Fields represent the scene’s visual information from multiple viewpoints, it is possible to calculate the depth information through trigonometric operations. This paper proposes a learning-based framework that allows unifying scene depth with visual information obtained from Light Fields. The structure of the proposed framework is composed of four main modules. The deep learning modules consist of (i) a depth map estimation using a siamese convolutional neural network and (ii) an instance segmentation employing region-based convolutional neural network. The others two modules apply linear transformations: (iii) a module which applies the matrix transformations with camera intrinsic parameters to generated a new depth map of absolute distances and (iv) a module to return the distance of the selected objects. For the depth map estimation module this framework proposal a siamese neural network called EPINET-FAST, which allows for generating depth maps in less than half the time of the original EPINET. A case study is presented using Dense Light Fields captured by a Lytro Illum camera (plenotic 1.0). The case study seeks to exemplify the processing time of each module, allowing researchers to isolate critical points and propose changes in the future, seeking a processing that can be applied in real time.
Palavras-chave:
light fields, artificial neural networks, computer vision, instance segmentation
Referências
A. Bajpayee, A. H. Techet, and H. Singh. 2018. Real-Time Light Field Processing for Autonomous Robotics. In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). 4218–4225. https://doi.org/10.1109/IROS.2018.8594477
Daniel Bolya, Chong Zhou, Fanyi Xiao, and Yong Jae Lee. 2019. YOLACT: Real-Time Instance Segmentation. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV). 9156–9165. https://doi.org/10.1109/ICCV.2019.00925
Hao Chen, Kunyang Sun, Zhi Tian, Chunhua Shen, Yongming Huang, and Youliang Yan. 2020. BlendMask: Top-Down Meets Bottom-Up for Instance Segmentation. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 8570–8578. https://doi.org/10.1109/CVPR42600.2020.00860
Kai Chen, Jiangmiao Pang, Jiaqi Wang, Yu Xiong, Xiaoxiao Li, Shuyang Sun, Wansen Feng, Ziwei Liu, Jianping Shi, Wanli Ouyang, Chen Change Loy, and Dahua Lin. 2019. Hybrid Task Cascade for Instance Segmentation. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 4969–4978. https://doi.org/10.1109/CVPR.2019.00511
Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. 2016. The Cityscapes Dataset for Semantic Urban Scene Understanding. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Jifeng Dai, Haozhi Qi, Yuwen Xiong, Yi Li, Guodong Zhang, Han Hu, and Yichen Wei. 2017. Deformable Convolutional Networks. In 2017 IEEE International Conference on Computer Vision (ICCV). 764–773. https://doi.org/10.1109/ICCV.2017.89
Bert De Brabandere, Davy Neven, and Luc Van Gool. 2017. Semantic Instance Segmentation for Autonomous Driving. In 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). 478–480. https://doi.org/10.1109/CVPRW.2017.66
M. Everingham, S. M. A. Eslami, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. 2015. The Pascal Visual Object Classes Challenge: A Retrospective. International Journal of Computer Vision 111, 1 (Jan. 2015), 98–136.
Naiyu Gao, Yanhu Shan, Yupei Wang, Xin Zhao, and Kaiqi Huang. 2021. SSAP: Single-Shot Instance Segmentation With Affinity Pyramid. IEEE Transactions on Circuits and Systems for Video Technology 31, 2(2021), 661–673. https://doi.org/10.1109/TCSVT.2020.2985420
Ross Girshick. 2015. Fast R-CNN. In 2015 IEEE International Conference on Computer Vision (ICCV). 1440–1448. https://doi.org/10.1109/ICCV.2015.169
Bichuan Guo, Jiangtao Wen, and Yuxing Han. 2020. Deep Material Recognition in Light-Fields via Disentanglement of Spatial and Angular Information. In Computer Vision – ECCV 2020, Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm (Eds.). Springer International Publishing, Cham, 664–679.
Agrim Gupta, Piotr Dollár, and Ross Girshick. 2019. LVIS: A Dataset for Large Vocabulary Instance Segmentation. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 5351–5359. https://doi.org/10.1109/CVPR.2019.00550
Abdul Mueed Hafiz and Ghulam Mohiuddin Bhat. 2020. A survey on instance segmentation: state of the art. International Journal of Multimedia Information Retrieval 9, 3 (jul 2020), 171–189. https://doi.org/10.1007/s13735-020-00195-x
Christopher Hahne, Amar Aggoun, Vladan Velisavljevic, Susanne Fiebig, and Matthias Pesch. 2016. Refocusing distance of a standard plenoptic camera. Opt. Express 24, 19 (Sep 2016), 21521–21540. https://doi.org/10.1364/OE.24.021521
Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017. Mask R-CNN. In 2017 IEEE International Conference on Computer Vision (ICCV). 2980–2988. https://doi.org/10.1109/ICCV.2017.322
Stefan Heber, Wei Yu, and Thomas Pock. 2017. Neural EPI-Volume Networks for Shape from Light Field. In 2017 IEEE International Conference on Computer Vision (ICCV). 2271–2279. https://doi.org/10.1109/ICCV.2017.247
Katrin Honauer, Ole Johannsen, Daniel Kondermann, and Bastian Goldluecke. 2017. A Dataset and Evaluation Methodology for Depth Estimation on 4D Light Fields. In Computer Vision – ACCV 2016, Shang-Hong Lai, Vincent Lepetit, Ko Nishino, and Yoichi Sato (Eds.). Springer International Publishing, Cham, 19–34.
Eddy Ilg, Nikolaus Mayer, Tonmoy Saikia, Margret Keuper, Alexey Dosovitskiy, and Thomas Brox. 2017. FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1647–1655. https://doi.org/10.1109/CVPR.2017.179
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common Objects in Context. In Computer Vision – ECCV 2014, David Fleet, Tomas Pajdla, Bernt Schiele, and Tinne Tuytelaars (Eds.). Springer International Publishing, Cham, 740–755.
Shu Liu, Jiaya Jia, Sanja Fidler, and Raquel Urtasun. 2017. SGN: Sequential Grouping Networks for Instance Segmentation. In 2017 IEEE International Conference on Computer Vision (ICCV). 3516–3524. https://doi.org/10.1109/ICCV.2017.378
Shu Liu, Lu Qi, Haifang Qin, Jianping Shi, and Jiaya Jia. 2018. Path Aggregation Network for Instance Segmentation. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8759–8768. https://doi.org/10.1109/CVPR.2018.00913
Wenjie Luo, Alexander G. Schwing, and Raquel Urtasun. 2016. Efficient Deep Learning for Stereo Matching. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 5695–5703. https://doi.org/10.1109/CVPR.2016.614
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2017. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 6(2017), 1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031
Jinglei Shi, Xiaoran Jiang, and Christine Guillemot. 2019. A Framework for Learning Depth From a Flexible Subset of Dense and Sparse Light Field Views. IEEE Transactions on Image Processing 28, 12 (2019), 5867–5880. https://doi.org/10.1109/TIP.2019.2923323
Changha Shin, Hae-Gon Jeon, Youngjin Yoon, In So Kweon, and Seon Joo Kim. 2018. EPINET: A Fully-Convolutional Neural Network Using Epipolar Geometry for Depth from Light Field Images. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4748–4757. https://doi.org/10.1109/CVPR.2018.00499
Richard Szeliski. 2022. Computer Vision (2ed.). Springer Nature Switzerland AG, Cham,Switzerland. https://doi.org/10.1007/978-3-030-34372-9
Di Tian, Yi Han, Biyao Wang, Tian Guan, Hengzhi Gu, and Wei Wei. 2021. Review of object instance segmentation based on deep learning. Journal of Electronic Imaging 31, 4 (2021), 1 – 18. https://doi.org/10.1117/1.JEI.31.4.041205
Jorge Vargas, Suleiman Alsweiss, Onur Toker, Rahul Razdan, and Joshua Santos. 2021. An Overview of Autonomous Vehicles Sensors and Their Vulnerability to Weather Conditions. Sensors 21, 16 (2021). https://doi.org/10.3390/s21165397
Xinlong Wang, Rufeng Zhang, Chunhua Shen, Tao Kong, and Lei Li. 2021. SOLO: A Simple Framework for Instance Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence (2021), 1–1. https://doi.org/10.1109/TPAMI.2021.3111116
Yuxin Wu, Alexander Kirillov, Francisco Massa, Wan-Yen Lo, and Ross Girshick. 2019. Detectron2. https://github.com/facebookresearch/detectron2.
Enze Xie, Peize Sun, Xiaoge Song, Wenhai Wang, Xuebo Liu, Ding Liang, Chunhua Shen, and Ping Luo. 2020. PolarMask: Single Shot Instance Segmentation With Polar Representation. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 12190–12199. https://doi.org/10.1109/CVPR42600.2020.01221
Shizhe Zang, Ming Ding, David Smith, Paul Tyler, Thierry Rakotoarivelo, and Mohamed Ali Kaafar. 2019. The Impact of Adverse Weather Conditions on Autonomous Vehicles: How Rain, Snow, Fog, and Hail Affect the Performance of a Self-Driving Car. IEEE Vehicular Technology Magazine 14, 2 (2019), 103–111. https://doi.org/10.1109/MVT.2019.2892497
Jure Žbontar and Yann LeCun. 2016. Stereo Matching by Training a Convolutional Neural Network to Compare Image Patches. J. Mach. Learn. Res. 17, 1 (Jan. 2016), 2287–2318.
Bolei Zhou, Hang Zhao, Xavier Puig, Sanja Fidler, Adela Barriuso, and Antonio Torralba. 2017. Scene Parsing through ADE20K Dataset. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 5122–5130. https://doi.org/10.1109/CVPR.2017.544
Zheming Zhou, Xiaotong Chen, and Odest Chadwicke Jenkins. 2020. LIT: Light-Field Inference of Transparency for Refractive Object Localization. IEEE Robotics and Automation Letters 5, 3 (2020), 4548–4555. https://doi.org/10.1109/LRA.2020.3001499
Daniel Bolya, Chong Zhou, Fanyi Xiao, and Yong Jae Lee. 2019. YOLACT: Real-Time Instance Segmentation. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV). 9156–9165. https://doi.org/10.1109/ICCV.2019.00925
Hao Chen, Kunyang Sun, Zhi Tian, Chunhua Shen, Yongming Huang, and Youliang Yan. 2020. BlendMask: Top-Down Meets Bottom-Up for Instance Segmentation. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 8570–8578. https://doi.org/10.1109/CVPR42600.2020.00860
Kai Chen, Jiangmiao Pang, Jiaqi Wang, Yu Xiong, Xiaoxiao Li, Shuyang Sun, Wansen Feng, Ziwei Liu, Jianping Shi, Wanli Ouyang, Chen Change Loy, and Dahua Lin. 2019. Hybrid Task Cascade for Instance Segmentation. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 4969–4978. https://doi.org/10.1109/CVPR.2019.00511
Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. 2016. The Cityscapes Dataset for Semantic Urban Scene Understanding. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Jifeng Dai, Haozhi Qi, Yuwen Xiong, Yi Li, Guodong Zhang, Han Hu, and Yichen Wei. 2017. Deformable Convolutional Networks. In 2017 IEEE International Conference on Computer Vision (ICCV). 764–773. https://doi.org/10.1109/ICCV.2017.89
Bert De Brabandere, Davy Neven, and Luc Van Gool. 2017. Semantic Instance Segmentation for Autonomous Driving. In 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). 478–480. https://doi.org/10.1109/CVPRW.2017.66
M. Everingham, S. M. A. Eslami, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. 2015. The Pascal Visual Object Classes Challenge: A Retrospective. International Journal of Computer Vision 111, 1 (Jan. 2015), 98–136.
Naiyu Gao, Yanhu Shan, Yupei Wang, Xin Zhao, and Kaiqi Huang. 2021. SSAP: Single-Shot Instance Segmentation With Affinity Pyramid. IEEE Transactions on Circuits and Systems for Video Technology 31, 2(2021), 661–673. https://doi.org/10.1109/TCSVT.2020.2985420
Ross Girshick. 2015. Fast R-CNN. In 2015 IEEE International Conference on Computer Vision (ICCV). 1440–1448. https://doi.org/10.1109/ICCV.2015.169
Bichuan Guo, Jiangtao Wen, and Yuxing Han. 2020. Deep Material Recognition in Light-Fields via Disentanglement of Spatial and Angular Information. In Computer Vision – ECCV 2020, Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm (Eds.). Springer International Publishing, Cham, 664–679.
Agrim Gupta, Piotr Dollár, and Ross Girshick. 2019. LVIS: A Dataset for Large Vocabulary Instance Segmentation. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 5351–5359. https://doi.org/10.1109/CVPR.2019.00550
Abdul Mueed Hafiz and Ghulam Mohiuddin Bhat. 2020. A survey on instance segmentation: state of the art. International Journal of Multimedia Information Retrieval 9, 3 (jul 2020), 171–189. https://doi.org/10.1007/s13735-020-00195-x
Christopher Hahne, Amar Aggoun, Vladan Velisavljevic, Susanne Fiebig, and Matthias Pesch. 2016. Refocusing distance of a standard plenoptic camera. Opt. Express 24, 19 (Sep 2016), 21521–21540. https://doi.org/10.1364/OE.24.021521
Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017. Mask R-CNN. In 2017 IEEE International Conference on Computer Vision (ICCV). 2980–2988. https://doi.org/10.1109/ICCV.2017.322
Stefan Heber, Wei Yu, and Thomas Pock. 2017. Neural EPI-Volume Networks for Shape from Light Field. In 2017 IEEE International Conference on Computer Vision (ICCV). 2271–2279. https://doi.org/10.1109/ICCV.2017.247
Katrin Honauer, Ole Johannsen, Daniel Kondermann, and Bastian Goldluecke. 2017. A Dataset and Evaluation Methodology for Depth Estimation on 4D Light Fields. In Computer Vision – ACCV 2016, Shang-Hong Lai, Vincent Lepetit, Ko Nishino, and Yoichi Sato (Eds.). Springer International Publishing, Cham, 19–34.
Eddy Ilg, Nikolaus Mayer, Tonmoy Saikia, Margret Keuper, Alexey Dosovitskiy, and Thomas Brox. 2017. FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1647–1655. https://doi.org/10.1109/CVPR.2017.179
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common Objects in Context. In Computer Vision – ECCV 2014, David Fleet, Tomas Pajdla, Bernt Schiele, and Tinne Tuytelaars (Eds.). Springer International Publishing, Cham, 740–755.
Shu Liu, Jiaya Jia, Sanja Fidler, and Raquel Urtasun. 2017. SGN: Sequential Grouping Networks for Instance Segmentation. In 2017 IEEE International Conference on Computer Vision (ICCV). 3516–3524. https://doi.org/10.1109/ICCV.2017.378
Shu Liu, Lu Qi, Haifang Qin, Jianping Shi, and Jiaya Jia. 2018. Path Aggregation Network for Instance Segmentation. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8759–8768. https://doi.org/10.1109/CVPR.2018.00913
Wenjie Luo, Alexander G. Schwing, and Raquel Urtasun. 2016. Efficient Deep Learning for Stereo Matching. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 5695–5703. https://doi.org/10.1109/CVPR.2016.614
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2017. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 6(2017), 1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031
Jinglei Shi, Xiaoran Jiang, and Christine Guillemot. 2019. A Framework for Learning Depth From a Flexible Subset of Dense and Sparse Light Field Views. IEEE Transactions on Image Processing 28, 12 (2019), 5867–5880. https://doi.org/10.1109/TIP.2019.2923323
Changha Shin, Hae-Gon Jeon, Youngjin Yoon, In So Kweon, and Seon Joo Kim. 2018. EPINET: A Fully-Convolutional Neural Network Using Epipolar Geometry for Depth from Light Field Images. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4748–4757. https://doi.org/10.1109/CVPR.2018.00499
Richard Szeliski. 2022. Computer Vision (2ed.). Springer Nature Switzerland AG, Cham,Switzerland. https://doi.org/10.1007/978-3-030-34372-9
Di Tian, Yi Han, Biyao Wang, Tian Guan, Hengzhi Gu, and Wei Wei. 2021. Review of object instance segmentation based on deep learning. Journal of Electronic Imaging 31, 4 (2021), 1 – 18. https://doi.org/10.1117/1.JEI.31.4.041205
Jorge Vargas, Suleiman Alsweiss, Onur Toker, Rahul Razdan, and Joshua Santos. 2021. An Overview of Autonomous Vehicles Sensors and Their Vulnerability to Weather Conditions. Sensors 21, 16 (2021). https://doi.org/10.3390/s21165397
Xinlong Wang, Rufeng Zhang, Chunhua Shen, Tao Kong, and Lei Li. 2021. SOLO: A Simple Framework for Instance Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence (2021), 1–1. https://doi.org/10.1109/TPAMI.2021.3111116
Yuxin Wu, Alexander Kirillov, Francisco Massa, Wan-Yen Lo, and Ross Girshick. 2019. Detectron2. https://github.com/facebookresearch/detectron2.
Enze Xie, Peize Sun, Xiaoge Song, Wenhai Wang, Xuebo Liu, Ding Liang, Chunhua Shen, and Ping Luo. 2020. PolarMask: Single Shot Instance Segmentation With Polar Representation. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 12190–12199. https://doi.org/10.1109/CVPR42600.2020.01221
Shizhe Zang, Ming Ding, David Smith, Paul Tyler, Thierry Rakotoarivelo, and Mohamed Ali Kaafar. 2019. The Impact of Adverse Weather Conditions on Autonomous Vehicles: How Rain, Snow, Fog, and Hail Affect the Performance of a Self-Driving Car. IEEE Vehicular Technology Magazine 14, 2 (2019), 103–111. https://doi.org/10.1109/MVT.2019.2892497
Jure Žbontar and Yann LeCun. 2016. Stereo Matching by Training a Convolutional Neural Network to Compare Image Patches. J. Mach. Learn. Res. 17, 1 (Jan. 2016), 2287–2318.
Bolei Zhou, Hang Zhao, Xavier Puig, Sanja Fidler, Adela Barriuso, and Antonio Torralba. 2017. Scene Parsing through ADE20K Dataset. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 5122–5130. https://doi.org/10.1109/CVPR.2017.544
Zheming Zhou, Xiaotong Chen, and Odest Chadwicke Jenkins. 2020. LIT: Light-Field Inference of Transparency for Refractive Object Localization. IEEE Robotics and Automation Letters 5, 3 (2020), 4548–4555. https://doi.org/10.1109/LRA.2020.3001499
Publicado
07/11/2022
Como Citar
FERRUGEM, Anderson Priebe; ZATT, Bruno; AGOSTINI, Luciano Volcan.
A Learning-Based Framework for Depth Perception using Dense Light Fields. In: BRAZILIAN SYMPOSIUM ON MULTIMEDIA AND THE WEB (WEBMEDIA), 28. , 2022, Curitiba.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2022
.
p. 159-167.