Similarity Grouping by Influence: Exploring Result Diversification in Similarity Group-by Operators


The group-by operator groups the tuples sharing the same values in specified attributes, then extracts summaries from each group. However, several data stored by modern applications are best queried not by equality but by similarity, giving rise to a number of questions, such as: "How to obtain groups, such that each one contains the k tuples most similar?" or "How to include diversity in the results?". In this paper, we present a binary grouping operator focused on diversified similarity comparisons, which is able to answer such questions. We define the operator algebraically and show its applicability to enable the execution of grouping operations over complex attributes, such as multidimensional data. We provide an algorithm, called Similarity Grouping by Influence -- SGIa --- to implement the binary operator. An experimental evaluation performed on real data shows the SGIa is able to timely meet real application needs with significant results.
Palavras-chave: Group-by operator, Similarity Search, Result Diversification


OLIVEIRA, Willian D.; LAUTON, Anna J. C.; TRAINA JR., Caetano; SANTOS, Lucio F. D.. Similarity Grouping by Influence: Exploring Result Diversification in Similarity Group-by Operators. In: SIMPÓSIO BRASILEIRO DE BANCO DE DADOS (SBBD), 38. , 2023, Belo Horizonte/MG.