A Strategy for Selecting Relevant Attributes in the Entity Resolution Process

  • Gabrielle K. Canalle Federal University of Pernambuco
  • Bernadette F. Lóscio Federal University of Pernambuco
  • Ana Carolina Salgado Federal University of Pernambuco

Abstract


Data integration is an essential task for achieving a unified view of data stored in heterogeneous and distributed sources. A key step in this process is the Entity Resolution, which consists of identifying instances that refer to the same real-world entity. Functions that evaluate the similarity between values of attributes are used to identify equivalent instances. This work proposes a strategy for selection of relevant attributes to consider in the instance matching phase in the process of entity resolution. This strategy employs characteristics from attributes, such as the quantity of duplicated and null values, aiming to identify the most relevant to the instance matching process.
Keywords: Data Integration, Entity Resolution

References

Chen, J., Jin, C., Zhang, R., and Zhou, A. (2012). A learning method for entity matching. In In Proceedings of 10th International Workshop on Quality in Databases, East China Normal University, China.

Christen, P. (2012). Data Matching. Springer, Heidelberg.

Dong, X. L. and Srivastava, D. (2015). Big Data Integration. Synthesis Lectures on Data Management. Morgan & Claypool Publishers.

Mihaila, G. A., Raschid, L., and Vidal, M.-E. (2000). Using quality of data metadata for source selection and ranking. In WebDB (Informal Proceedings), pages 93–98.

Naumann, F. and Freytag, J. C. (2000). Completeness of information sources. Technical report, Humboldt University of Berlin.

Su, W., Wang, J., Lochovsky, F. H., and Society, I. C. (2010). Record Matching over Query Results from Multiple Web Databases. IEEE Transactions on Knowledge and Data Engineering, 22(4):578–589.
Published
2016-10-04
CANALLE, Gabrielle K.; LÓSCIO, Bernadette F.; SALGADO, Ana Carolina. A Strategy for Selecting Relevant Attributes in the Entity Resolution Process. In: BRAZILIAN SYMPOSIUM ON DATABASES (SBBD), 31. , 2016, Salvador/BA. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2016 . p. 259-264. ISSN 2763-8979. DOI: https://doi.org/10.5753/sbbd.2016.24338.