Unsupervised Dual-Layer Aggregation for Feature Fusion on Image Retrieval Tasks
Resumo
The revolutionary advances in image representation have led to impressive progress in many image understanding-related tasks, primarily supported by Convolutional Neural Networks (CNN) and, more recently, by Transformer models. Despite such advances, assessing the similarity among images for retrieval in unsupervised scenarios remains a challenging task, mostly grounded on traditional pairwise measures, such as the Euclidean distance. The scenario is even more challenging when different visual features are available, requiring the selection and fusion of features without any label information. In this paper, we propose an Unsupervised Dual-Layer Aggregation (UDLA) method, based on contextual similarity approaches for selecting and fusing CNN and Transformer-based visual features trained through transfer learning. In the first layer, the selected features are fused in pairs focused on precision. A sub-set of pairs is selected for a second layer aggregation focused on recall. An experimental evaluation conducted in different public datasets showed the effectiveness of the proposed approach, which achieved results significantly superior to the best-isolated feature and also superior to a recent fusion approach considered as baseline.
Palavras-chave:
Visualization, Image retrieval, Transfer learning, Estimation, Euclidean distance, Image representation, Transformers, Convolutional neural networks, Optimization
Publicado
30/09/2024
Como Citar
MORENO, Ademir; PEDRONETTE, Daniel Carlos Guimarães.
Unsupervised Dual-Layer Aggregation for Feature Fusion on Image Retrieval Tasks. In: CONFERENCE ON GRAPHICS, PATTERNS AND IMAGES (SIBGRAPI), 37. , 2024, Manaus/AM.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2024
.