Methodology and Implementation of an Architecture for Egocentric Manual Interactivity in Monocular Augmented Reality
Resumo
Investments in Augmented Reality (AR) have grown considerably in recent years. This advance is due to the increased use of AR in areas such as education, training, games and medicine. In addition, technological advances in hardware enable devices that, a few years ago, were unthinkable. A popular example is Microsoft Hololens 2, which allows the user to use their own hands as a means of interacting with an AR experience. However, a disadvantage from this device is its high cost due to several sensors. Thus, this project offers an AR architecture that uses only a monocular RGB camera as a sensor, allowing the user to interact with an AR experience using their hands to perform gestures similar to the Microsoft Hololens 2 architecture, where it is possible to handle a virtual object in the same way that a real object would be manipulated. The results obtained are promising, where the verification of the interaction of the hand with the virtual object worked in approximately 80% of the tests carried out, respecting the path defined by hand movement.
Referências
Disney. Panda3d, 2019.
Jakob Engel, Vladlen Koltun, and Daniel Cremers. Direct sparse odometry. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40:611-625, 2016.
Ahmad Karambakhsh, Aouaidjia Kamel, Bin Sheng, Ping Li, Po Yang, and David Dagan Feng. Deep gesture interaction for augmented anatomy learning. International Journal of Information Management, 45:328- 336, 2019.
Matej Kristan, Ales Leonardis, Jiri Matas, Michael Felsberg, Roman Pflugfelder, Luka Cehovin Zajc, Tomas Vojir, Gustav Hager, Alan Lukezic, Abdelrahman Eldesokey, et al. The visual object tracking vot2017 challenge results. In Proceedings of the IEEE International Conference on Computer Vision Workshops, pages 1949-1972, 2017.
Matej Kristan, Ales Leonardis, Jiri Matas, Michael Felsberg, Roman Pflugfelder, Luka Cehovin Zajc, Tomas Vojír, Gustav Hager, Alan Lukezic, Gustavo Fernandez Dominguez, Abhinav Gupta, Alfredo Petrosino, Alireza Memarmoghadam, Alvaro Garcia-Martin, Andrés Montero, Andrea Vedaldi, Andreas Robinson, Andy Ma, Anton Varfolomieiev, and Zhizhen Chi. The visual object tracking vot2016 challenge results. volume 9914, pages 777-823, 10 2016.
Matej Kristan, Ales Leonardis, Jiri Matas, Michael Felsberg, Roman Pflugfelder, Luka Cehovin Zajc, Tomas Vojir, Goutam Bhat, Alan Lukezic, Abdelrahman Eldesokey, et al. The sixth visual object tracking vot2018 challenge results. In Proceedings of the European Conference on Computer Vision (ECCV), pages 0-0, 2018.
Matej Kristan, Jiri Matas, Ales Leonardis, Michael Felsberg, Luka Cehovin, Gustavo Fernandez, Tomas Vojir, Gustav Hager, Georg Nebehay, and Roman Pflugfelder. The visual object tracking vot2015 challenge results. In Proceedings of the IEEE international conference on computer vision workshops, pages 1-23, 2015.
Matej Kristan, Jiri Matas, Ales Leonardis, Michael Felsberg, Roman Pflugfelder, Joni-Kristian Kamarainen, Luka Cehovin Zajc, Ondrej Drbohlav, Alan Lukezic, Amanda Berg, et al. The seventh visual object tracking vot2019 challenge results. In Proceedings of the IEEE International Conference on Computer Vision Workshops, pages 0-0, 2019.
David G Lowe. Distinctive image features from scale-invariant keypoints. International journal of computer vision, 60(2):91-110, 2004.
Tim Merel. Digi-capital: Over $4.1 billion invested in ar and vr in 2019, Mar 2020.
Franziska Mueller, Florian Bernard, Oleksandr Sotnychenko, Dushyant Mehta, Srinath Sridhar, Dan Casas, and Christian Theobalt. Ganerated hands for real-time 3d hand tracking from monocular rgb. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 49-59, 2018.
R. Mur-Artal, J. M. M. Montiel, and J. D. Tardós. Orb-slam: A versatile and accurate monocular slam system. IEEE Transactions on Robotics, 31(5):1147-1163, 2015.
Raúl Mur-Artal and Juan D Tardós. Visual-inertial monocular slam with map reuse. IEEE Robotics and Automation Letters, 2(2):796-803, 2017.
N. O-larnnithipong, N. Ratchatanantakit, S. Tangnimitchok, F. Ortega, and A. Barreto. Hand tracking interface for virtual reality interaction based on marg sensors. In 2019 IEEE Conference on Virtual Reality and 3D User Interfaces (VR), pages 1717-1722, 2019.
Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. You only look once: Unified, real-time object detection. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016.
Torsten Sattler, Bastian Leibe, and Leif Kobbelt. Efficient effective prioritized matching for large-scale image-based localization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39:1744- 1756, 2017.
Davide Scaramuzza and Friedrich Fraundorfer. Visual odometry [tutorial]. IEEE robotics & automation magazine, 18(4):80-92, 2011.
G. Varol, I. Laptev, and C. Schmid. Long-term temporal convolutions for action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(6):1510-1517, 2018.
Hsin-KaiWu, Silvia Wen-Yu Lee, Hsin-Yi Chang, and Jyh-Chong Liang. Current status, opportunities and challenges of augmented reality in education. Computers Education, 62:41 - 49, 2013.
Huangying Zhan, Ravi Garg, Chamara Saroj Weerasekera, Kejie Li, Harsh Agarwal, and Ian Reid. Unsupervised learning of monocular depth estimation and visual odometry with deep feature reconstruction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 340-349, 2018.
Christian Zimmermann and Thomas Brox. Learning to estimate 3d hand pose from single rgb images. In Proceedings of the IEEE International Conference on Computer Vision, pages 4903-4911, 2017.