Towards Auditable and Intelligent Privacy-Preserving Record Linkage

  • Thiago Nóbrega Universidade Federal de Campina Grande (UFCG)
  • Carlos Eduardo S. Pires Universidade Federal de Campina Grande (UFCG)
  • Dimas Cassimiro Nascimento Universidade Federal de Campina Grande (UFCG)

Resumo


Privacy-Preserving Record Linkage (PPRL) intends to integrate private/sensitive data from several data sources held by different parties. It aims to identify records (e.g., persons or objects) representing the same real-world entity over private data sources held by different custodians. Due to recent laws and regulations (e.g., General Data Protection Regulation), PPRL approaches are increasingly demanded in real-world application areas such as health care, credit analysis, public policy evaluation, and national security. As a result, the PPRL process needs to deal with efficacy (linkage quality), and privacy problems. For instance, the PPRL process needs to be executed over data sources (e.g., a database containing personal information of governmental income distribution and assistance programs), with an accurate linkage of the entities, and, at the same time, protect the privacy of the information. Thus, this work intends to simplify the PPRL process by facilitating real-world applications (such as medical, epidemiologic, and populational studies) to reduce legal and bureaucratic efforts to access and process the data, making these applications' execution more straightforward for companies and governments. In this context, this work presents two major contributions to PPRL: i) an improvement to the linkage quality and simplify the process by employing Machine Learning techniques to decide whether two records represent the same entity, or not; and ii) we enable the auditability the computations performed during PPRL.

Palavras-chave: PPRL

Referências

Carlo Batini and Monica Scannapieco. Data and Information Quality. Data-Centric Systems and Applications. Springer International Publishing, 1 edition, 2016.

Peter Christen, Thilina Ranbaduge, and Rainer Schnell. Linking Sensitive Data. Springer International Publishing, Cham, 2020.

Thiago P. Nobrega, Carlos E. S. Pires, and Tiago Brasileiro Araujo. Avaliação Empirica de Comparações Privada Aplicadas na Resolucão de Entidades. SBBD, 2016.

Dinusha Vatsalan, Dimitrios Karapiperis B, and Aris Gkoulalas-divanis. An Overview of Big Data Issues in PPRL, volume 2. Springer International Publishing, 2019.

Dinusha Vatsalan, Peter Christen, and Vassilios S. Verykios. A taxonomy of privacypreserving record linkage techniques. Information Systems, 38(6):946–969, 2013.

Dinusha Vatsalan, Dimitrios Karapiperis, and Vassilios S Verykios. Privacy-Preserving Record Linkage. (January), 2018.

Dinusha Vatsalan, Ziad Sehili, Peter Christen, and Erhard Rahm. Privacy-Preserving Record Linkage for Big Data : Current Approaches and Research Challenges. In Big Data Handbook. Springer, 2016.
Publicado
04/10/2021
NÓBREGA, Thiago; PIRES, Carlos Eduardo S.; NASCIMENTO, Dimas Cassimiro. Towards Auditable and Intelligent Privacy-Preserving Record Linkage. In: WORKSHOP DE TESES E DISSERTAÇÕES (WTDBD) - SIMPÓSIO BRASILEIRO DE BANCO DE DADOS (SBBD), 36. , 2021, Rio de Janeiro. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2021 . p. 99-105. DOI: https://doi.org/10.5753/sbbd_estendido.2021.18170.