From Robustness to Efficiency: Deformation-Aware and Efficient Local Feature Extraction for Images

Guilherme Potje; Renato Martins; Erickson R. Nascimento

doi:10.5753/sibgrapi.est.2025.38267

Guilherme Potje UFMG
Renato Martins Université de Bourgogne
Erickson R. Nascimento UFMG

DOI: https://doi.org/10.5753/sibgrapi.est.2025.38267

Resumo

Just as animals rely on visual perception and geometric understanding to navigate the 3D world, modern computers emulate this ability through Simultaneous Localization and Mapping (SLAM), image-based 3D reconstruction, and visual place recognition techniques, all relying on image features for obtaining correspondences. However, most feature extraction methods handle only affine transformations, ignoring non-rigid deformations, ubiquitous in the real-world. This work investigates deformation-aware local features, leveraging RGB-D images to compute geodesics, where RGB denotes the visible channels (Red, Green, Blue) and D represents image depth. Then, we generalize the concept to RGB-only images via learned representations. We introduce a novel RGB-D dataset with non-rigid deformations for real-world benchmarking, where experiments showed significant improvements in foundational vision tasks as matching and registration when adopting our proposed strategies. Finally, we present an efficient local feature extractor, balancing accuracy with reduced computational cost, expanding visual perception for mobile computers.

Referências

C. Campos, R. Elvira, J. J. G. Rodríguez, J. M. Montiel, and J. D. Tardós, “Orb-slam3: An accurate open-source library for visual, visual–inertial, and multimap slam,” IEEE Transactions on Robotics, vol. 37, no. 6, pp. 1874–1890, 2021.

G. Potje, G. Resende, M. Campos, and E. R. Nascimento, “Towards an efficient 3d model estimation methodology for aerial and ground images,” Machine Vision and Applications, vol. 28, pp. 937–952, 2017.

J. L. Schonberger and J.-M. Frahm, “Structure-from-motion revisited,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 4104–4113.

M. Teichmann, A. Araujo, M. Zhu, and J. Sim, “Detect-to-retrieve: Efficient regional aggregation for image search,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 5109–5118.

——, “Detect-to-retrieve: Efficient regional aggregation for image search,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 5109–5118.

D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” IJCV, pp. 91–110, 2004.

M. Calonder, V. Lepetit, M. Ozuysal, T. Trzcinski, C. Strecha, and P. Fua, “Brief: Computing a local binary descriptor very fast,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, no. 7, pp. 1281–1298, 2012.

E. Rublee, V. Rabaud, K. Konolige, and G. Bradski, “ORB: an efficient alternative to SIFT or SURF,” in ICCV, Barcelona, 2011.

K. Mikolajczyk and C. Schmid, “Scale & affine invariant interest point detectors,” International journal of computer vision, vol. 60, pp. 63–86, 2004.

N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), vol. 1. Ieee, 2005, pp. 886–893.

E. Simo-Serra, C. Torras, and F. Moreno-Noguer, “DaLI: deformation and light invariant descriptor,” International Journal of Computer Vision, vol. 115, no. 2, 2015.

A. E. Johnson and M. Hebert, “Efficient multiple model recognition in cluttered 3-d scenes,” in Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No. 98CB36231). IEEE, 1998, pp. 671–677.

M. M. Bronstein and I. Kokkinos, “Scale-invariant heat kernel signatures for non-rigid shape recognition,” in 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE, 2010, pp. 1704–1711.

I. Kokkinos, M. M. Bronstein, R. Litman, and A. M. Bronstein, “Intrinsic shape context descriptors for deformable shapes,” in CVPR, June 2012, pp. 159–166.

E. R. Nascimento, G. L. Oliveira, M. F. M. Campos, A. W. Vieira, and W. R. Schwartz, “BRAND: A Robust Appearance and Depth Descriptor for RGB-D Images,” in Proc. IROS, 2012.

A. Zaharescu, E. Boyer, K. Varanasi, and R. P. Horaud, “Surface Feature Detection and Description with Applications to Mesh Matching,” in CVPR, Miami Beach, Florida, June 2009.

F. Tombari, S. Salti, and L. D. Stefano, “A combined texture-shape descriptor for enhanced 3D feature matching,” in ICIP, 2011.

E. R. Nascimento, G. Potje, R. Martins, F. Cadar, M. F. Campos, and R. Bajcsy, “Geobit: A geodesic-based binary descriptor invariant to non-rigid deformations for rgb-d images,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 10 004–10 012.

D. P. Vassileios Balntas, Edgar Riba and K. Mikolajczyk, “Learning local feature descriptors with triplets and shallow convolutional neural networks,” in Proceedings of the British Machine Vision Conference (BMVC), 2016.

Y. Tian, B. Fan, and F. Wu, “L2-net: Deep learning of discriminative patch descriptor in euclidean space,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 661–669.

A. Mishchuk, D. Mishkin, F. Radenovic, and J. Matas, “Working hard to know your neighbor’s margins: Local descriptor learning loss,” in Advances in Neural Information Processing Systems, 2017, pp. 4826–4837.

P. Ebel, A. Mishchuk, K. M. Yi, P. Fua, and E. Trulls, “Beyond cartesian representations for local descriptors,” in Proceedings of the IEEE International Conference on Computer Vision, 2019, pp. 253–262.

G. Potje, R. Martins, F. Cadar, and E. R. Nascimento, “Learning geodesic-aware local features from rgb-d images,” Computer Vision and Image Understanding, vol. 219, p. 103409, 2022.

G. Potje, R. Martins, F. Chamone, and E. Nascimento, “Extracting deformation-aware local features by learning to deform,” Advances in Neural Information Processing Systems, vol. 34, pp. 10 759–10 771, 2021.

K. M. Yi, E. Trulls, V. Lepetit, and P. Fua, “LIFT: Learned invariant feature transform,” in ECCV, 2016.

D. DeTone, T. Malisiewicz, and A. Rabinovich, “Superpoint: Self-supervised interest point detection and description,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2018, pp. 224–236.

M. Dusmanu, I. Rocco, T. Pajdla, M. Pollefeys, J. Sivic, A. Torii, and T. Sattler, “D2-net: A trainable cnn for joint detection and description of local features,” arXiv preprint arXiv:1905.03561, 2019.

J. Revaud, C. De Souza, M. Humenberger, and P. Weinzaepfel, “R2D2: Reliable and repeatable detector and descriptor,” in Advances in Neural Information Processing Systems, vol. 32, 2019, pp. 12 405–12 415.

Z. Luo, L. Zhou, X. Bai, H. Chen, J. Zhang, Y. Yao, S. Li, T. Fang, and L. Quan, “Aslfeat: Learning local features of accurate shape and localization,” in CVPR, 2020, pp. 6589–6598.

M. Tyszkiewicz, P. Fua, and E. Trulls, “Disk: Learning local features with policy gradient,” Advances in Neural Information Processing Systems, vol. 33, pp. 14 254–14 265, 2020.

G. Potje, F. Cadar, A. Araujo, R. Martins, and E. R. Nascimento, “Xfeat: Accelerated features for lightweight image matching,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 2682–2691.

——, “Enhancing deformable local features by jointly learning to detect and describe keypoints,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 1306–1315.

K. Crane, C. Weischedel, and M. Wardetzky, “Geodesics in heat: A new approach to computing distance based on heat flow,” ACM Transactions on Graphics (TOG), vol. 32, no. 5, pp. 1–11, 2013.

M. Jaderberg, K. Simonyan, A. Zisserman, and K. Kavukcuoglu, “Spatial transformer networks,” arXiv preprint arXiv:1506.02025, 2015.

T. Wang, H. Ling, C. Lang, S. Feng, and X. Hou, “Deformable surface tracking by graph matching,” in IEEE International Conference on Computer Vision, 2019.

J. Sun, Z. Shen, Y. Wang, H. Bao, and X. Zhou, “Loftr: Detector-free local feature matching with transformers,” in CVPR, 2021, pp. 8922–8931.

Y. Wang, X. He, S. Peng, D. Tan, and X. Zhou, “Efficient loftr: Semi-dense local feature matching with sparse-like speed,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2024, pp. 21 666–21 675.

X. Zhao, X. Wu, J. Miao, W. Chen, P. C. Chen, and Z. Li, “Alike: Accurate and lightweight keypoint detection and descriptor extraction,” IEEE TMM, 2022.

Z. Li and N. Snavely, “Megadepth: Learning single-view depth prediction from internet photos,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 2041–2050.

A. Dai, A. X. Chang, M. Savva, M. Halber, T. Funkhouser, and M. Nießner, “Scannet: Richly-annotated 3d reconstructions of indoor scenes,” in CVPR, 2017, pp. 5828–5839.

From Robustness to Efficiency: Deformation-Aware and Efficient Local Feature Extraction for Images

Resumo

Referências

Artigos mais lidos do(s) mesmo(s) autor(es)