Incremental Unsupervised Name Disambiguation in Cleaned Digital Libraries

Ana Paula de Carvalho; Anderson A. Ferreira; Alberto H. F. Laender; Marcos A. Gonçalves

doi:10.5753/jidm.2011.1410

Authors

Ana Paula de Carvalho UFMG
Anderson A. Ferreira UFMG , UFOP
Alberto H. F. Laender UFMG
Marcos A. Gonçalves UFMG

DOI:

https://doi.org/10.5753/jidm.2011.1410

Keywords:

Bibliographic Citation, Digital Library, Name Disambiguation

Abstract

Name ambiguity in the context of bibliographic citations is one of the hardest problems currently faced by the Digital Library (DL) community. Here we deal with the problem of disambiguating new citations records inserted into a cleaned DL, without the need to process the whole collection, which is usually necessary for unsupervised methods. Although supervised solutions can deal with this situation, there is the costly burden of generating training data besides the fact that these methods cannot handle well the insertion of records of new authors not already existent in the repository. In this article, we propose a new unsupervised method that identifies the correct authors of the new citation records to be inserted in a DL. The method is based on heuristics that are also used to identify whether the new records belong to authors already in the digital library or not, correctly identifying new authors in most cases. Our experimental evaluation, using synthetic and real datasets, shows gains of up to 19\% when compared to a state-of-the-art method without the cost of having to disambiguate the whole DL at each new load (as done by unsupervised methods) or the need for any training (as done by supervised methods).

Incremental Unsupervised Name Disambiguation in Cleaned Digital Libraries

Authors

DOI:

Keywords:

Abstract

Downloads

Downloads

Additional Files

Published

How to Cite

Issue

Section

Make a Submission

Metrics: