RAM-VO: A Recurrent Attentional Model for Visual Odometry

Iury Cleveston; Esther L. Colombini

doi:10.5753/wtdr_ctdr.2021.18684

Iury Cleveston Unicamp
Esther L. Colombini Unicamp

DOI: https://doi.org/10.5753/wtdr_ctdr.2021.18684

Resumo

Determining the agent's pose is fundamental for developing autonomous vehicles. Visual Odometry (VO) algorithms estimate the egomotion using only visual differences from the input frames. The most recent VO methods implement deep-learning techniques using convolutional neural networks (CNN) widely, adding a high cost to process large images. Also, more data does not imply a better prediction, and the network may have to filter out useless information. In this context, we incrementally formulate a lightweight model called RAM-VO to perform visual odometry regressions using large monocular images. Our model is extended from the Recurrent Attention Model (RAM), which has emerged as a unique architecture that implements a hard attentional mechanism guided by reinforcement learning to select the essential input information. Our methodology modifies the RAM and improves the visual and temporal representation of information, generating the intermediary RAM-R and RAM-RC architectures. Also, we include the optical flow as contextual information for initializing the RL agent and implement the Proximal Policy Optimization (PPO) algorithm to learn a robust policy. The experimental results indicate that RAM-VO can perform regressions with six degrees of freedom using approximately 3 million parameters. Additionally, experiments on the KITTI dataset confirm that RAM-VO produces competitive results using only 5.7% of the input image.

Palavras-chave: Vision in robotics and automation, Self-localization, mapping and navigation

Referências

Correia, A. d. S. and Colombini, E. L. (2021). Attention, please! a survey of neural attention models in deep learning. arXiv preprint arXiv:2103.16775.

Geiger, A., Lenz, P., Stiller, C., and Urtasun, R. (2013). Vision meets robotics: The KITTI dataset. INT J ROBOT RES, 32(11):1231–1237.

Goodfellow, I., Bengio, Y., and Courville, A. (2016). Deep learning. Adaptive computation and machine learning. The MIT Press, Cambridge, Massachusetts.

Konda, K. and Memisevic, R. (2015). Learning Visual Odometry with a Convolutional Network:. In Proceedings of the 10th ICCV Theory and Applications, pages 486–490, Berlin, Germany. SCITEPRESS - Science and and Technology Publications.

Mnih, V., Heess, N., Graves, A., and Kavukcuoglu, K. (2014). Recurrent models of visual attention. arXiv preprint arXiv:1406.6247.

Muller, P. and Savakis, A. (2017). Flowdometry: An Optical Flow and Deep Learning Based Approach to Visual Odometry. In 2017 IEEE WACV, pages 624–631, Santa Rosa, CA, USA. IEEE.

Mur-Artal, R., Montiel, J. M. M., and Tardos, J. D. (2015). Orb-slam: a versatile and accurate monocular slam system. IEEE transactions on robotics, 31(5):1147–1163.

Peretroukhin, V. and Kelly, J. (2018). DPC-Net: Deep Pose Correction for Visual Localization. IEEE Robotics and Automation Letters, 3(3):2424–2431.

Scaramuzza, D. and Fraundorfer, F. (2011). Visual Odometry [Tutorial]. IEEE Robotics & Automation Magazine, 18(4):80–92.

Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and Klimov, O. (2017). Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.

Tang, J., Folkesson, J., and Jensfelt, P. (2018). Geometric Correspondence Network for Camera Motion Estimation. IEEE Robotics and Automation Letters, 3(2):1010–1017.

Wang, S., Clark, R., Wen, H., and Trigoni, N. (2017). DeepVO: Towards End-to-End Visual Odometry with Deep Recurrent Convolutional Neural Networks. 2017 IEEE ICRA, pages 2043–2050.

Wang, S., Clark, R., Wen, H., and Trigoni, N. (2018). End-to-end, sequence-to-sequence probabilistic visual odometry through deep neural networks. INT J ROBOT RES, 37(4-5):513–542.

Williams, R. J. (1992). Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, 8(3-4):229–256.

Yin, X., Wang, X., Du, X., and Chen, Q. (2017). Scale Recovery for Monocular Visual Odometry Using Depth Estimated with Deep Convolutional Neural Fields. In 2017 IEEE ICCV, pages 5871–5879, Venice. IEEE.

Zhao, C., Sun, L., Purkait, P., Duckett, T., and Stolkin, R. (2018). Learning monocular visual odometry with dense 3D mapping from dense 3D flow. arXiv:1803.02286 [cs]. arXiv: 1803.02286.