Elsevier

Computers & Graphics

Volume 102, February 2022, Pages 257-268
Computers & Graphics

Special Section on SIBGRAPI 2021
Spatially and color consistent environment lighting estimation using deep neural networks for mixed reality

https://doi.org/10.1016/j.cag.2021.08.007Get rights and content

Highlights

  • Automatic end-to-end method to estimate the environment lighting in XR applications.

  • A CNN architecture that learns a latent-space of the environment lighting.

  • A methodology to generate egocentric mixed-reality-views from HDR panoramas.

  • Real-time lighting estimation that does not make assumptions about the XR scene.

Abstract

The representation of consistent mixed reality (XR) environments requires adequate real and virtual illumination composition in real-time. Estimating the lighting of a real scenario is still a challenge. Due to the ill-posed nature of the problem, classical inverse-rendering techniques tackle the problem for simple lighting setups. However, those assumptions do not satisfy the current state-of-art in computer graphics and XR applications. While many recent works solve the problem using machine learning techniques to estimate the environment light and scene’s materials, most of them are limited to geometry or previous knowledge. This paper presents a CNN-based model to estimate complex lighting for mixed reality environments with no previous information about the scene. We model the environment illumination using a set of spherical harmonics (SH) environment lighting, capable of efficiently represent area lighting. We propose a new CNN architecture that inputs an RGB image and recognizes, in real-time, the environment lighting. Unlike previous CNN-based lighting estimation methods, we propose using a highly optimized deep neural network architecture, with a reduced number of parameters, that can learn high complex lighting scenarios from real-world high-dynamic-range (HDR) environment images. We show in the experiments that the CNN architecture can predict the environment lighting with an average mean squared error (MSE) of 7.85× 10−4 when comparing SH lighting coefficients. We validate our model in a variety of mixed reality scenarios. Furthermore, we present qualitative results comparing relights of real-world scenes.

Introduction

Consistent environment lighting is a crucial component in real-time simulations based on mixed reality applications. The divergence between real and virtual objects lighting is a significant factor for immersion loss and a perceptual reduced graphical quality [1]. Plausible mixed reality lighting can be accomplished by acquiring the lighting of the real environment and adapting the virtual environment with matching lighting properties [2]. Most lighting recovery approaches have focused on intrusive tools to measure the environment lighting, requiring great user’s effort and scenario preparation. Consequently, these solutions have limited applicability in XR systems based on real-time visualization. An alternative to real-data measurements is to estimate the lighting indirectly through the available environment information. Despite the recent advances in computer vision and inverse rendering [3], estimating the environment lighting without specialized equipment and under strict time constraints remains a challenging problem [4]. This work aims to recognize the user’s environment lighting through a model that learns the scene’s inherent characteristics regarding lighting and illumination, therefore estimating an environment lighting capable of generating plausible XR environments. A challenging aspect of the problem resides in the fact that lighting estimation is an ill-posed problem, yielding no solution or multiple solutions for a given input [5].

We use machine learning techniques and a specialized dataset to overcome the complex aspects of lighting estimation, learning from promptly available information in mixed reality applications: an RGB image of the environment taken from an egocentric point-of-view.

We leverage state-of-the-art lighting estimation methods by predicting the real-world environment lighting using a convolutional neural network that works in the wild without assumptions about the scene’s geometry or special measurement devices. Our method does work in a variety of environments, including indoor and outdoor scenes, and does not require any user’s intervention in the scene. Our custom-designed CNN architecture learns a latent space representation of the environment lighting, allowing an efficient representation of the scene illumination. This representation is used to estimate the environment lighting encoded in a spherical harmonics basis. We also present a framework to create a mixed-reality-view, an image that mimics the user’s egocentric view in an XR environment.

Fig. 1 illustrates examples where virtual objects are illuminated by our method. The composition of real and virtual objects can be utilized as a plausible and realistic XR environment.

The main contributions of our work are:

  • Automatic end-to-end method to estimate the environment lighting in XR applications.

  • A CNN architecture that learns a latent-space of the environment lighting.

  • A methodology to generate egocentric mixed-reality-views from HDR panoramas.

  • Real-time lighting estimation that does not make assumptions about the XR scene.

The lighting estimation model developed in this work can be employed in most XR applications increasing the user’s immersion by providing lighting consistency. The applicability of our model is not restricted to mixed reality; other applications also benefit from it, including real-time editing of video and photo with consistent illumination, real-time relighting of pictures, and inverse lighting design [6].

Section snippets

Related work

Many related works try to solve the lighting estimation task based on different assumptions or strategies. In the following subsections, we group them into categories comparing with our proposed solution. In addition, we highlight the limitations and restrictions of the prior works concerning XR applications when appropriate.

A CNN method for environment lighting estimation based on spherical harmonics functions

Our goal is to recognize the real-world environment lighting, leveraging this lighting information to virtual environments, allowing more convincing lighting composition for XR experiences. We explore spherical harmonics functions to encode the environment lighting into a compact and expressive representation. This strategy allows representing smooth arbitrary area lighting, not limited to a few point light or directional light sources [51].

Our model is based on a convolutional neural network

Learning from HDR panoramas

In this section, we describe the complete pipeline to process the input HDR environment panorama into mixed-reality-views and the corresponding environment lighting. The mixed-reality-view (MRV) is a low-dynamic-range (LDR) color image similar to a photograph taken from a camera located in the HMD capturing an egocentric view of the user’s environment. Spherical harmonics coefficients encode an area light model that represents the environment lighting. Those data are used for training our

Results, experiments and performance

In this section, we show the results of our method and discuss the XR applications that are made possible by our lighting estimation method.

Conclusions

In this work, we introduced a new real-time environment lighting model that is able to compute plausible estimated environment lighting for XR applications directly from mixed-reality-views with no former constraints. Unlike previous approaches, we neither rely on any constraints on the scene geometry and lighting settings nor require the use of probes.

The environment lighting produced is encoded as 3 × 9 spherical harmonic coefficients (9 for each color channel) predicted by a new deep neural

CRediT authorship contribution statement

Bruno Augusto Dorta Marques: Conceptualization, Methodology, Software, Investigation, Writing. Esteban Walter Gonzalez Clua: Conceptualization, Supervision. Anselmo Antunes Montenegro: Investigation, Writing. Cristina Nader Vasconcelos: Conceptualization, Supervision.

Declaration of Competing Interest

One or more of the authors of this paper have disclosed potential or pertinent conflicts of interest, which may include receipt of payment, either direct or indirect, institutional support, or association with an entity in the biomedical field which may be perceived to have potential conflict of interest with this work. For full disclosure statements refer to https://doi.org/10.1016/j.cag.2021.08.007. This research was supported by CAPES, NVIDIA, CNPq and FAPERJ.

Acknowledgments

This research has been supported by the following research agencies: CAPES, Brazil, CNPq, Brazil and FAPERJ, Brazil. We also would like to thanks NVIDIA Corp., USA for providing GPUs and funding this work.

References (65)

  • RamamoorthiR. et al.

    A signal-processing framework for inverse rendering

  • DebevecP.

    Rendering synthetic objects into real scenes: Bridging traditional and image-based graphics with global illumination and high dynamic range photography

  • WaltonD.R. et al.

    Dynamic hdr environment capture for mixed reality

  • KánP. et al.

    Differential irradiance caching for fast high-quality light transport between virtual and real worlds

  • CalianD.A. et al.

    The shading probe: Fast appearance acquisition for mobile ar

  • AittalaM.

    Inverse lighting and photorealistic rendering for augmented reality

    Vis Comput

    (2010)
  • LeGendreC. et al.

    Deeplight: Learning illumination for unconstrained mobile mixed reality

  • MandlD. et al.

    Learning lightprobes for mixed reality illumination

  • WeberH. et al.

    Learning to estimate indoor lighting from 3d objects

  • WhelanT. et al.

    Elasticfusion: Real-time dense slam and light source estimation

    Int J Robot Res

    (2016)
  • MeillandM. et al.

    3d high dynamic range dense visual slam and its application to real-time object re-lighting

  • GruberL. et al.

    Efficient and robust radiance transfer for probeless photorealistic augmented reality

  • GruberL. et al.

    Real-time photometric registration from arbitrary geometry

  • ZhangE. et al.

    Emptying, refurnishing, and relighting indoor spaces

    ACM Trans Graph (Proc SIGGRAPH Asia 2016)

    (2016)
  • MaierR. et al.

    Intrinsic3d: High-quality 3d reconstruction by joint appearance and geometry optimization with spatially-varying lighting

  • ZollhöferM. et al.

    State of the art on monocular 3d face reconstruction, tracking, and applications

  • BlanzV. et al.

    A morphable model for the synthesis of 3d faces

  • ShahlaeiD. et al.

    Realistic inverse lighting from a single 2d image of a face, taken under unknown and complex lighting

  • CondeM.H. et al.

    Efficient and robust inverse lighting of a single face image using compressive sensing

  • ShahlaeiD. et al.

    Lighting design for portraits with a virtual light stage

  • EggerB. et al.

    Occlusion-aware 3d morphable models and an illumination prior for face image analysis

    Int J Comput Vis

    (2018)
  • SunT. et al.

    Single image portrait relighting

    ACM Trans Graph

    (2019)
  • Cited by (6)

    • Using convolutional neural network models illumination estimation according to light colors

      2022, Optik
      Citation Excerpt :

      Some familiar models used in image analysis are VGG16 [45], EfficientNet-B0 [46], ResNet50 [47], MobileNet [48], DenseNet121 [49], and GoogLeNet [50]. These models have different layers and highly optimized structures that can learn complex tasks [51]. Of these models, the VGG16 uses small convolution filters, one step 3 × 3 in all of its layers.

    • Foreword to the special section on SIBGRAPI 2021

      2022, Computers and Graphics (Pergamon)
    • Challenges for XR in Games

      2023, Communications in Computer and Information Science
    • Lighting Spectral Power Distribution Estimation With RGB Camera

      2022, Proceedings - 16th International Conference on Signal-Image Technology and Internet-Based Systems, SITIS 2022
    View full text