An empirical assessment of quality metrics for diversified similarity searching
Keywords:Metric spaces, Diversified similarity searching, Result diversification, Similarity searching
A diversified similarity search retrieves elements that are simultaneously similar to a query object and akin to the different collections within the explored data. While several methods in information retrieval, data clustering, and similarity searching have tackled the problem of adding diversity into result sets, the experimental comparison of their performances is still an open issue mainly because the quality metrics are “borrowed” from those different research areas, bringing their biases alongside. In this manuscript, we investigate a series of such metrics and experimentally discuss their trends and limitations. We conclude diversity is better addressed by a set of measures rather than a single quality index and introduce the concept of Diversity Features Model (DFM), which combines the viewpoints of biased metrics into a multidimensional representation. Experimental evaluations indicate (i) DFM enables comparing different result diversification algorithms by considering multiple criteria, and (ii) the most suitable searching methods for a particular dataset are spotted by combining DFM with ranking aggregation and parallel coordinates maps.
Aggarwal, C. C. Data Mining: The Textbook. Springer, 2015.
Agrawal, R., Gollapudi, S., Halverson, A., and Ieong, S. Diversifying search results. In ICWSDM. pp. 5–14, 2009.
Carbonell, J. and Goldstein, J. The use of MMR, diversity-based reranking for reordering documents and producing summaries. ACM SIGIR 1 (1): 335–336, 1998.
Chávez, E., Navarro, G., Baeza-Yates, R., and Marroquín, J. L. Searching in metric spaces. In Computing Surveys. Vol. 33. ACM, pp. 273–321, 2001.
Chen, L., Gao, Y., Zheng, B., Jensen, C. S., Yang, H., and Yang, K. Pivot-based metric indexing. PVLDB 10 (10), 2017.
Ciaccia, P. and Martinenghi, D. Reconciling skyline and ranking queries. PVLDB 10 (11): 1454–1465, 2017.
Drosou, M., Jagadish, H., Pitoura, E., and Stoyanovich, J. Diversity in big data: A review. Big data 5 (2): 73–84, 2017.
Drosou, M. and Pitoura, E. Poikilo: a tool for evaluating the results of diversification models and algorithms. PVLDB 6 (12): 1246–1249, 2013.
Fagin, R., Kumar, R., and Sivakumar, D. Efficient similarity search and classification via rank aggregation. In ACM SIGMOD. pp. 301–312, 2003.
Hetland, M. The Basic Principles of Metric Indexing. In Swarm Intellingence for Multi-objective Problems in Data Mining. Springer, pp. 199–232, 2009.
Hetland, M. L. Optimal Metric Search Is Equivalent to the Minimum Dominating Set Problem. In SISAP. Springer, pp. 111–125, 2020.
Hjaltason, G. and Samet, H. Index-driven similarity search in metric spaces. TODS 28 (4): 517–580, 2003.
Inselberg, A. and Dimsdale, B. Parallel coordinates: a tool for visualizing multi-dimensional geometry. In Conf. Vis. IEEE, pp. 361–378, 1990.
Jain, A., Sarda, P., and Haritsa, J. R. Providing diversity in k-nearest neighbor query results. In PAKDD. Springer, pp. 404–413, 2004.
Jasbick, D., Santos, L., de Oliveira, D., and Bedo, M. Some Branches May Bear Rotten Fruits: Diversity Browsing VP-Trees. In SISAP. Springer, pp. 140–154, 2020.
Lopes, C., Jasbick, D., Bedo, M., and Santos, L. Quality metrics for diversified similarity searching: What they stand for? . In Simpósio Brasileiro de Bancos de Dados. SBC, pp. 1–12, 2020.
Pestov, V. Is the k-nn classifier in high dimensions affected by the curse of dimensionality? Computers & Mathematics with Applications 65 (10): 1427–1437, 2013.
Polinsky, A., Feinstein, R., Shi, S., and Kuki, A. Librain: Software for automated design of exploratory and targeted combinatorial libraries. Molecular diversity and combinatorial chemistry: Libraries and drug discovery vol. 996, pp. 219–232, 1996.
Pouyanfar, S., Yang, Y., Chen, S.-C., Shyu, M.-L., and Iyengar, S. Multimedia big data analytics: A survey. ACM CSUR 51 (1): 1–34, 2018.
Rousseeuw, P. J. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics vol. 20, pp. 53–65, 1987.
Santos, L., Blanco, G., Oliveira, D., Traina, A., Traina Jr, C., and Bedo, M. Exploring Diversified Similarity with Kundaha. In ACM CIKM. pp. 1903–1906, 2018.
Santos, L., Dias, R. L., Ferreira, M. R., Ribeiro, M. X., Traina, A. J., and Traina Jr, C. Have you met VikS?: A novel framework for visual diversity search analysis. SBBD Demos, 2014.
Santos, L., Oliveira, W., Ferreira, M., Cordeiro, R., Traina, A., and Traina Jr, C. Evaluating the diversification of similarity query results. Journal of Information and Data Management 4 (3): 188–188, 2013.
Santos, L., Oliveira, W., Ferreira, M., Traina, A., and Traina Jr, C. Parameter-free and domain-independent similarity search with diversity. In SSDBM. pp. 1–12, 2013.
Smyth, B. and McClave, P. Similarity vs. diversity. PICCR 1 (1): 347–361, 2001.
Vieira, M., Razente, H., Barioni, M., Hadjieleftheriou, M., Srivastava, D., Traina Jr., C., and Tsotras, V. On query result diversification. In ICDE. pp. 1163–1174, 2011.
Vincent, P., Larochelle, H., Bengio, Y., and Manzagol, P.-A. Extracting and composing robust features with denoising autoencoders. In ICML. pp. 1096–1103, 2008.
Yu, C., Lakshmanan, L. V., and Amer-Yahia, S. Recommendation diversification using explanations. In ICDE. IEEE, pp. 1299–1302, 2009.
Zezula, P., Amato, G., Dohnal, V., and Batko, M. Similarity Search: The Metric Space Approach. Vol. 32. Springer, 2010.
Zheng, K., Wang, H., Qi, Z., Li, J., and Gao, H. A survey of query result diversification. Knowledge and Information Systems 51 (1): 1–36, 2017.