ABSTRACT
Research on the Fish Tank Virtual Reality (FTVR) technique commonly uses specific sensors (e.g. infrared cameras and LEDs on glasses) to estimate user’s eye position. However, estimating the face position with an RGB camera is becoming more accessible. In this work, we explore community available face characteristics detection software to implement the FTVR technique for everyday uses of 3D-enabled applications on consumer notebooks without requiring extra devices. We introduce the Parallax Engine solution that can be added with ease to any Unity game engine application. The solution supports two parallax-related visualization options: 1) a monoscopic FTVR mode (FishTank), which locks the virtual camera of the 3D environment to the laptop’s screen 2) and a 2D parallax mode (Parallax2DoF), which allows horizontal and vertical displacement of 3D scene camera. Regarding face characteristics detection techniques, the Parallax Engine uses a standardized interface that can receive input from different methods and currently supports three options: Google’s MediaPipe, dlib, and PoseNet. We evaluated the proposed solution with five users, performing tasks using different options for viewing and face characteristics detection, aiming to understand how suitable it is for end-users. Besides some detection failures from dlib, results showed an overall good acceptance for both the FishTank and Parallax2DoF visualization options.
Supplemental Material
- Brian Amberg, Reinhard Knothe, and Thomas Vetter. 2008. Expression invariant 3D face recognition with a morphable model. In 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition. IEEE, 1–6.Google ScholarCross Ref
- Volker Blanz, Kristina Scherbaum, and Hans-Peter Seidel. 2007. Fitting a morphable model to 3D scans of faces. In 2007 IEEE 11th International Conference on Computer Vision. IEEE, 1–8.Google ScholarCross Ref
- Volker Blanz and Thomas Vetter. 1999. A morphable model for the synthesis of 3D faces. In Proceedings of the 26th annual conference on Computer graphics and interactive techniques. 187–194.Google ScholarDigital Library
- Timo Bolkart and Stefanie Wuhrer. 2013. Statistical analysis of 3d faces in motion. In 2013 International conference on 3D vision-3DV 2013. IEEE, 103–110.Google ScholarDigital Library
- Mark F Bradshaw, Andrew D Parton, and Andrew Glennerster. 2000. The task-dependent use of binocular disparity and motion parallax information. Vision Research 40, 27 (2000), 3725–3734. https://doi.org/10.1016/S0042-6989(00)00214-5Google ScholarCross Ref
- Zhe Cao, Gines Hidalgo, Tomas Simon, Shih-En Wei, and Yaser Sheikh. 2018. OpenPose: realtime multi-person 2D pose estimation using Part Affinity Fields. arXiv preprint arXiv:1812.08008(2018).Google Scholar
- Zhe Cao, Tomas Simon, Shih-En Wei, and Yaser Sheikh. 2017. Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarCross Ref
- Yu Chen, Chunhua Shen, Hao Chen, Xiu-Shen Wei, Lingqiao Liu, and Jian Yang. 2020. Adversarial Learning of Structure-Aware Fully Convolutional Networks for Landmark Localization. IEEE Transactions on Pattern Analysis and Machine Intelligence 42, 7(2020), 1654–1669. https://doi.org/10.1109/TPAMI.2019.2901875Google ScholarCross Ref
- Neil Dodgson. 2004. Variation and extrema of human interpupillary distance. Stereoscopic Displays and Virtual Reality Syst XI 5291, 36–46. https://doi.org/10.1117/12.529999Google ScholarCross Ref
- Xuanyi Dong, Yan Yan, Wanli Ouyang, and Yi Yang. 2018. Style Aggregated Network for Facial Landmark Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarCross Ref
- Xuanyi Dong, Shoou-I Yu, Xinshuo Weng, Shih-En Wei, Yi Yang, and Yaser Sheikh. 2018. Supervision-by-Registration: An Unsupervised Approach to Improve the Precision of Facial Landmark Detectors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarCross Ref
- Dylan Fafard, Ian Stavness, Martin Dechant, Regan Mandryk, Qian Zhou, and Sidney Fels. 2019. Ftvr in vr: Evaluation of 3d perception with a simulated volumetric fish-tank virtual reality display. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1–12.Google ScholarDigital Library
- Dylan Brodie Fafard, Qian Zhou, Chris Chamberlain, Georg Hagemann, Sidney Fels, and Ian Stavness. 2018. Design and implementation of a multi-person fish-tank virtual reality display. In Proceedings of the 24th ACM Symposium on Virtual Reality Software and Technology. 1–9.Google ScholarDigital Library
- Yao Feng, Haiwen Feng, Michael J Black, and Timo Bolkart. 2020. Learning an animatable detailed 3D face model from in-the-wild images. arXiv preprint arXiv:2012.04012(2020).Google Scholar
- Lucas S Figueiredo, Edvar Vilar Neto, Ermano Arruda, João Marcelo Teixeira, and Veronica Teichrieb. 2014. Fishtank everywhere: Improving viewing experience over 3D content. In International Conference of Design, User Experience, and Usability. Springer, 560–571.Google ScholarCross Ref
- James J. Gibson. 1979. The Ecological Approach to Visual Perception (1st ed.). Houghton Mifflin, Boston. 346 pages.Google ScholarCross Ref
- Jianzhu Guo, Xiangyu Zhu, Yang Yang, Fan Yang, Zhen Lei, and Stan Z Li. 2020. Towards fast, accurate and stable 3d dense face alignment. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIX 16. Springer, 152–168.Google Scholar
- Harvey J Howard. 1919. A test for the judgment of distance. Transactions of the American Ophthalmological Society 17 (1919), 195.Google Scholar
- Thibaut Jacob, Gilles Bailly, Eric Lecolinet, Géry Casiez, and Marc Teyssier. 2016. Desktop orbital camera motions using rotational head movements. In Proceedings of the 2016 Symposium on Spatial User Interaction. 139–148.Google ScholarDigital Library
- Youngkyoon Jang, Hatice Gunes, and Ioannis Patras. 2019. Registration-free face-ssd: Single shot analysis of smiles, facial attributes, and affect in the wild. Computer Vision and Image Understanding 182 (2019), 17–29.Google ScholarDigital Library
- Sheng Jin, Wentao Liu, Enze Xie, Wenhai Wang, Chen Qian, Wanli Ouyang, and Ping Luo. 2020. Differentiable hierarchical graph grouping for multi-person pose estimation. In European Conference on Computer Vision. Springer, 718–734.Google ScholarDigital Library
- Sheng Jin, Lumin Xu, Jin Xu, Can Wang, Wentao Liu, Chen Qian, Wanli Ouyang, and Ping Luo. 2020. Whole-body human pose estimation in the wild. In European Conference on Computer Vision. Springer, 196–214.Google ScholarDigital Library
- Yury Kartynnik, Artsiom Ablavatski, Ivan Grishchenko, and Matthias Grundmann. 2019. Real-time facial surface geometry from monocular video on mobile GPUs. arXiv preprint arXiv:1907.06724(2019).Google Scholar
- Yury Kartynnik, Artsiom Ablavatski, Ivan Grishchenko, and Matthias Grundmann. 2019. Real-time facial surface geometry from monocular video on mobile GPUs. arXiv preprint arXiv:1907.06724(2019).Google Scholar
- Vahid Kazemi and Josephine Sullivan. 2014. One millisecond face alignment with an ensemble of regression trees. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1867–1874.Google ScholarDigital Library
- Petr Kellnhofer, Piotr Didyk, Tobias Ritschel, Belen Masia, Karol Myszkowski, and Hans-Peter Seidel. 2016. Motion parallax in stereo 3D: Model and applications. ACM Transactions on Graphics (TOG) 35, 6 (2016), 1–12.Google ScholarDigital Library
- Davis E King. 2009. Dlib-ml: A machine learning toolkit. The Journal of Machine Learning Research 10 (2009), 1755–1758.Google ScholarDigital Library
- Sirisilp Kongsilp and Matthew N Dailey. 2017. Communication portals: Immersive communication for everyday life. In 2017 20th Conference on Innovations in Clouds, Internet and Networks (ICIN). IEEE, 226–228.Google ScholarCross Ref
- Sirisilp Kongsilp and Matthew N Dailey. 2017. Motion parallax from head movement enhances stereoscopic displays by improving presence and decreasing visual fatigue. Displays 49(2017), 72–79.Google ScholarCross Ref
- Sirisilp Kongsilp and Matthew N Dailey. 2020. User Behavior and the Importance of Stereo for Depth Perception in Fish Tank Virtual Reality. PRESENCE: Virtual and Augmented Reality 27, 2 (2020), 206–225.Google ScholarCross Ref
- Alexandros Lattas, Stylianos Moschoglou, Baris Gecer, Stylianos Ploumpis, Vasileios Triantafyllou, Abhijeet Ghosh, and Stefanos Zafeiriou. 2020. AvatarMe: Realistically Renderable 3D Facial Reconstruction” In-the-Wild”. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 760–769.Google ScholarCross Ref
- Jia Li, Wen Su, and Zengfu Wang. 2020. Simple pose: Rethinking and improving a bottom-up approach for multi-person pose estimation. In Proceedings of the AAAI conference on artificial intelligence, Vol. 34. 11354–11361.Google ScholarCross Ref
- Yong-Lu Li, Xinpeng Liu, Han Lu, Shiyi Wang, Junqi Liu, Jiefeng Li, and Cewu Lu. 2020. Detailed 2d-3d joint representation for human-object interaction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10166–10175.Google ScholarCross Ref
- Jiangke Lin, Yi Yuan, and Zhengxia Zou. 2021. MeInGame: Create a Game Character Face from a Single Portrait. arXiv preprint arXiv:2102.02371(2021).Google Scholar
- Camillo Lugaresi, Jiuqiang Tang, Hadon Nash, Chris McClanahan, Esha Uboweja, Michael Hays, Fan Zhang, Chuo-Ling Chang, Ming Guang Yong, Juhyun Lee, 2019. Mediapipe: A framework for building perception pipelines. arXiv preprint arXiv:1906.08172(2019).Google Scholar
- George Papandreou, Tyler Zhu, Liang-Chieh Chen, Spyros Gidaris, Jonathan Tompson, and Kevin Murphy. 2018. PersonLab: Person Pose Estimation and Instance Segmentation with a Bottom-Up, Part-Based, Geometric Embedding Model. arxiv:1803.08225 [cs.CV]Google Scholar
- George Papandreou, Tyler Zhu, Nori Kanazawa, Alexander Toshev, Jonathan Tompson, Chris Bregler, and Kevin Murphy. 2017. Towards Accurate Multi-person Pose Estimation in the Wild. arxiv:1701.01779 [cs.CV]Google Scholar
- Ankur Patel and William AP Smith. 2009. 3d morphable face models revisited. In 2009 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1327–1334.Google ScholarCross Ref
- Eduardo Rodrigues, Lucas Silva Figueiredo, Lucas Maggi, Edvar Neto, Layon Tavares Bezerra, João Marcelo Teixeira, and Veronica Teichrieb. 2017. Mixed Reality TVs: Applying Motion Parallax for Enhanced Viewing and Control Experiences on Consumer TVs. In 2017 19th Symposium on Virtual and Augmented Reality (SVR). IEEE, 319–330.Google Scholar
- Eduardo Rodrigues, Lucas Silva Figueiredo, Lucas Maggi, Edvar Neto, Layon Tavares Bezerra, João Marcelo Teixeira, and Veronica Teichrieb. 2017. Mixed Reality TVs: Applying Motion Parallax for Enhanced Viewing and Control Experiences on Consumer TVs. In 2017 19th Symposium on Virtual and Augmented Reality (SVR). 319–330. https://doi.org/10.1109/SVR.2017.48Google ScholarCross Ref
- Jiaxiang Shang, Tianwei Shen, Shiwei Li, Lei Zhou, Mingmin Zhen, Tian Fang, and Long Quan. 2020. Self-supervised monocular 3d face reconstruction by occlusion-aware multi-view geometry consistency. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XV 16. Springer, 53–70.Google Scholar
- Jonathan Tompson, Ross Goroshin, Arjun Jain, Yann LeCun, and Christoph Bregler. 2015. Efficient object localization using convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 648–656.Google ScholarCross Ref
- Alexander Toshev and Christian Szegedy. 2014. Deeppose: Human pose estimation via deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1653–1660.Google ScholarDigital Library
- M Alex O Vasilescu and Demetri Terzopoulos. 2002. Multilinear analysis of image ensembles: Tensorfaces. In European conference on computer vision. Springer, 447–460.Google ScholarDigital Library
- Daniel Vlasic, Matthew Brand, Hanspeter Pfister, and Jovan Popovic. 2006. Face transfer with multilinear models. In ACM SIGGRAPH 2006 Courses. 24–es.Google ScholarDigital Library
- Collin Ware, Kevin Arthur, and Kellogg S. Booth. 1993. Fish tank virtual reality. In Conference on Human Factors in Computing Systems - Proceedings. 37–42. https://doi.org/10.1145/169059.169066Google ScholarDigital Library
- Shih-En Wei, Varun Ramakrishna, Takeo Kanade, and Yaser Sheikh. 2016. Convolutional pose machines. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4724–4732.Google ScholarCross Ref
- E Wright, PE Connolly, M Sackley, J McCollom, S Malek, K Fan, 2012. A comparative analysis of Fish Tank Virtual Reality to stereoscopic 3D imagery. In 67th Midyear Meeting Proceedings. 37–45.Google Scholar
- Yue Wu, Chao Gou, and Qiang Ji. 2017. Simultaneous Facial Landmark Detection, Pose and Deformation Estimation Under Facial Occlusion. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarCross Ref
- Feng Zhang, Xiatian Zhu, Hanbin Dai, Mao Ye, and Ce Zhu. 2020. Distribution-aware coordinate representation for human pose estimation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 7093–7102.Google ScholarCross Ref
- Jialiang Zhang, Lixiang Lin, Jianke Zhu, and Steven CH Hoi. 2021. Weakly-Supervised Multi-Face 3D Reconstruction. arXiv preprint arXiv:2101.02000(2021).Google Scholar
- Meilu Zhu, Daming Shi, Mingjie Zheng, and Muhammad Sadiq. 2019. Robust Facial Landmark Detection via Occlusion-Adaptive Deep Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarCross Ref
- Xiangyu Zhu, Zhen Lei, Junjie Yan, Dong Yi, and Stan Z. Li. 2015. High-Fidelity Pose and Expression Normalization for Face Recognition in the Wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
- Xu Zou, Sheng Zhong, Luxin Yan, Xiangyun Zhao, Jiahuan Zhou, and Ying Wu. 2019. Learning Robust Facial Landmark Detection via Hierarchical Structured Ensemble. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV).Google ScholarCross Ref
Index Terms
- Parallax Engine: Head Controlled Motion Parallax Using Notebooks’ RGB Camera
Recommendations
Mixed Reality Laptops: A Protocol to Evaluate Enhanced Viewing Experiences on Consumer Laptops Using Parallax Engine
SVR '22: Proceedings of the 24th Symposium on Virtual and Augmented RealityResearch into Fish Tank Virtual Reality (FTVR) techniques commonly uses specific wearable sensors such as infrared cameras, glasses, or helmets to estimate the position of the user’s eyes. However, using only existing RGB cameras on laptops is more ...
A Motion Parallax Rendering Approach to Real-time Stereoscopic Visualization for Aircraft Virtual Assembly
ICVRV '12: Proceedings of the 2012 International Conference on Virtual Reality and VisualizationAs a cue to depth perception, motion parallax can improve stereoscopic visualization to a level more like human natural vision. And stereoscopic visualization with motion parallax rendering can lessen the fatigue when people are long-time immersed in ...
The effects of virtual reality, augmented reality, and motion parallax on egocentric depth perception
APGV '08: Proceedings of the 5th symposium on Applied perception in graphics and visualizationAs the use of virtual and augmented reality applications becomes more common, the need to fully understand how observers perceive spatial relationships grows more critical. One of the key requirements in engineering a practical virtual or augmented ...
Comments