Image Processing Techniques to Improve Deep 6DoF Detection in RGB Images

Heitor Felix; Francisco Simões; Kelvin Cunha; Veronica Teichrieb

doi:10.5753/svr_estendido.2019.8457

Heitor Felix UFPE
Francisco Simões IFPE
Kelvin Cunha UFPE
Veronica Teichrieb UFPE

DOI: https://doi.org/10.5753/svr_estendido.2019.8457

Resumo

Six degrees of freedom (6DoF) Object Detection has great relevance in computer vision due to its use in applications on several areas, such as augmented reality and robotics. Even with the improved results provided by deep learning techniques, object detection of textured and non-textured objects is still a challenge. The objective of this work was to seek improvements in the six degrees of freedom detection of non-textured objects using a Convolutional Neural Network (CNN) approach through the preprocessing of the images that were used for training the network. A State of the art research was carried out on techniques that use CNN to detect objects in six degrees of freedom. We also searched for filters with enhancement factors for detection. Finally, a detection technique based on a CNN was selected and adapted to use single-channel images (grayscale) as input, instead of using three-channel images (RGB) as in the original proposition, aiming to increase its robustness while reducing the complexity of the input images. The technique was also tested with the application of two different preprocessing filters to enhance the objects’ contours on the single-channel images, one being the ”pencil effect”, and the other based on local binary patterns (LBP). With this study, it was possible to evaluate the impact on the CNN detection performance due to the application of both of the filters. The proposed technique used with one channel images and the filters on the images still could not surpass the results of the technique with the three-channel image (RGB), although it indicated paths for improvement. The pencil filter also proved to be more robust than the LBP filter, as expected.

Referências

P. Fraga-Lamas, T. M. Fernández-Caramés, Ó. Blanco-Novoa, and M. A. Vilar-Montesinos, "A review on industrial augmented reality systems for the industry 4.0 shipyard," IEEE Access, vol. 6, pp. 13 358–13 375, 2018.

C. Moreno and L. Alberto, "Robot asistente para personas con problemas de movilidad," 2016.

C. Hernandez, M. Bharatheesha, W. Ko, H. Gaiser, J. Tan, K. van Deurzen, M. de Vries, B. Van Mil, J. van Egmond, R. Burger et al., "Team delft’s robot winner of the amazon picking challenge 2016," in Robot World Cup. Springer, 2016, pp. 613–624.

D. G. Lowe et al., "Object recognition from local scale-invariant features." in iccv, vol. 99, no. 2, 1999, pp. 1150–1157.

H. Bay, T. Tuytelaars, and L. Van Gool, "Surf: Speeded up robust features," in European conference on computer vision. Springer, 2006, pp. 404–417.

M. Rad and V. Lepetit, "Bb8: a scalable, accurate, robust to partial occlusion method for predicting the 3d poses of challenging objects without using depth," in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 3828–3836.

W. Kehl, F. Manhardt, F. Tombari, S. Ilic, and N. Navab, "Ssd-6d: Making rgb-based 3d detection and 6d pose estimation great again," in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 1521–1529.

B. Tekin, S. N. Sinha, and P. Fua, "Real-time seamless single shot 6d object pose prediction," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 292–301.

M. Garon and J.-F. Lalonde, "Deep 6-dof tracking," IEEE transactions on visualization and computer graphics, vol. 23, no. 11, pp. 2410–2418, 2017.

J. Huang, Y.-F. Li, and M. Xie, "An empirical analysis of data preprocessing for machine learning-based software cost estimation," Information and software Technology, vol. 67, pp. 108–127, 2015.

D. Cireşan, U. Meier, and J. Schmidhuber, "Multi-column deep neural networks for image classification," arXiv preprint arXiv:1202.2745, 2012.

J. Rambach, C. Deng, A. Pagani, and D. Stricker, "Learning 6dof object poses from synthetic single channel images," in 2018 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct). IEEE, 2018, pp. 164–169.

T. Ojala, M. Pietikainen, and D. Harwood, "Performance evaluation of texture measures with classification based on kullback discrimination of distributions," in Proceedings of 12th International Conference on Pattern Recognition, vol. 1. IEEE, 1994, pp. 582–585.

L. Guo, D. Xu, and Z. Qiang, "Background subtraction using local svd binary pattern," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2016, pp. 86–94.

Y. Duan, J. Lu, J. Feng, and J. Zhou, "Context-aware local binary feature learning for face recognition," IEEE transactions on pattern analysis and machine intelligence, vol. 40, no. 5, pp. 1139–1153, 2018.

A. Kendall, M. Grimes, and R. Cipolla, "Posenet: A convolutional network for real-time 6-dof camera relocalization," in Proceedings of the IEEE international conference on computer vision, 2015, pp. 2938– 2946.

S. Hinterstoisser, V. Lepetit, S. Ilic, S. Holzer, G. Bradski, K. Konolige, and N. Navab, "Model based training, detection and pose estimation of texture-less 3d objects in heavily cluttered scenes," in Asian conference on computer vision. Springer, 2012, pp. 548–562.

J. Rambach, A. Pagani, M. Schneider, O. Artemenko, and D. Stricker, "6dof object tracking based on 3d scans for augmented reality remote live support," Computers, vol. 7, no. 1, p. 6, 2018.

T. Ojala, M. Pietikäinen, and T. Mäenpää, "Multiresolution gray-scale and rotation invariant texture classification with local binary patterns," IEEE Transactions on Pattern Analysis & Machine Intelligence, no. 7, pp. 971–987, 2002.

M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman, "The pascal visual object classes (voc) challenge," International Journal of Computer Vision, vol. 88, no. 2, pp. 303–338, Jun. 2010.

E. Brachmann, F. Michel, A. Krull, M. Ying Yang, S. Gumhold et al., "Uncertainty-driven 6d pose estimation of objects and scenes from a single rgb image," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 3364–3372.