A formal quantitative study of privacy in the publication of official educational censuses in Brazil

  • Gabriel H. Nunes UFMG
  • Mário S. Alvim UFMG
  • Annabelle McIver Macquarie University

Resumo


We present a summary of the work done in the dissertation "A formal quantitative study of privacy in the publication of official educational censuses in Brazil", including its contributions and impacts so far. The dissertation presents a systematic refactoring of the conventional treatment of privacy analyses, based on mathematical concepts from the framework of Quantitative Information Flow (QIF). This brings three principal advantages: flexibility, allowing for precise quantification and comparison of privacy risks for attacks both known and novel; computational tractability for very large, longitudinal datasets; and explainable results both to politicians and to the general public. We apply our approach to a very large case study: the educational censuses in Brazil, which comprise over 90 attributes of approximately 50 million individuals released longitudinally every year since 2007.
Palavras-chave: Quantitative Information Flow, Disclosure Control, Microdata, Differential Privacy, Privacy, Utility

Referências

Alvim, M. S., Chatzikokolakis, K., McIver, A., Morgan, C., Palamidessi, C., and Smith, G. (2020a). The Science of Quantitative Information Flow. Springer.

Alvim, M. S., Fernandes, N., McIver, A., and Nunes, G. H. (2020b). On Privacy and Accuracy in Data Releases (Invited Paper). In 31st International Conference on Concurrency Theory, CONCUR 2020, September 1-4, 2020, Vienna, Austria (Virtual Conference), volume 171 of LIPIcs, pages 1:1–1:18. Schloss Dagstuhl - Leibniz-Zentrum für Informatik.

Dalenius, T. (1977). Towards a methodology for statistical disclosure control. statistik Tidskrift, 15(429-444):2–1.

Dalenius, T. (1986). Finding a needle in a haystack or identifying anonymous census records. Journal of official statistics, 2(3):329.

Fung, B. C. M., Wang, K., Fu, A. W.-C., and Yu, P. S. (2010). Introduction to Privacy- Preserving Data Publishing: Concepts and Techniques. Chapman & Hall/CRC, 1st edition.

Ganta, S. R., Kasiviswanathan, S. P., and Smith, A. (2008). Composition attacks and auxiliary information in data privacy. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 265–273.

Google (2022). Conheça os vencedores da 9ª edição do LARA, o programa de bolsas de pesquisa do Google. [link].

INEP (2022a). Inep publica microdados do Enem 2020 e do Censo Escolar da Educação Básica 2021. [link].

INEP (2022b). Nota de esclarecimento | Divulgação dos microdados. [link].

INEP (2022c). Resultados do Termo de Execução Descentralizada (TED) firmado entre o Inep e a Universidade Federal de Minas Gerais (UFMG). https://download.inep.gov.br/microdados/TED_8750-UFMG.pdf.

Kifer, D. and Machanavajjhala, A. (2011). No Free Lunch in Data Privacy. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, SIGMOD ’11, pages 193–204. Association for Computing Machinery.

Machanavajjhala, A., Kifer, D., Gehrke, J., and Venkitasubramaniam, M. (2007). LDiversity: Privacy beyond k-Anonymity. ACM Transactions on Knowledge Discovery from Data, 1(1):3–es.

Meindl, B., Kowarik, A., and Templ, M. (2021). sdcMicro - Statistical Disclosure Control Methods for Anonymization of Microdata and Risk Estimation. https://sdctools.github.io/sdcMicro/index.html.

Narayanan, A. and Shmatikov, V. (2008). Robust De-anonymization of Large Sparse Datasets. In Proc. of S&P, pages 111–125.

Nunes, G. H. L. G. A. (2021). A formal quantitative study of privacy in the publication of official educational censuses in Brazil. Master’s thesis, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil.

Prasser, F. and Kohlmayer, F. (2021). ARX - Data Anonymization Tool. https://arx.deidentifier.org/.

Prasser, F., Kohlmayer, F., Lautenschlaeger, R., and Kuhn, K. A. (2014). ARX - a comprehensive tool for anonymizing biomedical data. In AMIA Annual Symposium Proceedings, volume 2014, page 984. American Medical Informatics Association.

Queiroz, M. and Motta, G. (2015). Privacidade e Transparência no Setor público: Um Estudo de Caso da Publicação de Microdados do INEP. In XV Simposio Brasileiro em Seguranca da Informacao e de Sistemas Computacionais-SBSeg.

Samarati, P. and Sweeney, L. (1998). Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression.

Sweeney, L. (2000). Simple demographics often identify people uniquely. Health (San Francisco), 671(2000):1–34.

United States Census Bureau (2019). Legacy Techniques and Current Research in Disclosure Avoidance at the U.S. Census Bureau. [link].
Publicado
31/07/2022
NUNES, Gabriel H.; ALVIM, Mário S.; MCIVER, Annabelle. A formal quantitative study of privacy in the publication of official educational censuses in Brazil. In: CONCURSO DE TESES E DISSERTAÇÕES (CTD), 35. , 2022, Niterói. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2022 . p. 61-70. ISSN 2763-8820. DOI: https://doi.org/10.5753/ctd.2022.223158.