Analyzing Missing Data in Metric Spaces

Authors

  • Safia Brinis No affiliation declared
  • Agma J. M. Traina No affiliation declared
  • Caetano Traina Jr. No affiliation declared

DOI:

https://doi.org/10.5753/jidm.2014.1538

Keywords:

Distance Concentration, Data Distribution, Missing attribute values, Similarity Search

Abstract

Similarity search in multimedia databases has challenged researchers for the last two decades, whose studies resulted in several achievements. However, searching in incomplete databases, i.e., databases with missing attribute values, has been less studied so far.
In this article, we present a set of experimental analyzes that evaluate the impact of missing data on the query performance in metric spaces. The results show that missing data cause severe skew in the metric space with only 2% of missing values and drastically affect the performance of the metric indexing techniques. Interestingly, our analyzes, confirmed by the presented experiments, show that data missing not at random are more prone of skew and raise the conditions of distance concentration phenomenon where the distances between pairs of elements in the space become homogeneous. Thus, this study provides an understanding of the issues involved with metric spaces when indexing incomplete databases and gives ground for research that supports the development of advanced metric access methods with handling of missing attribute values.

Downloads

Download data is not yet available.

Downloads

Published

2014-10-02

How to Cite

Brinis, S., Traina, A. J. M., & Traina Jr., C. (2014). Analyzing Missing Data in Metric Spaces. Journal of Information and Data Management, 5(3), 224. https://doi.org/10.5753/jidm.2014.1538

Issue

Section

SBBD Articles