Data stratification analysis on the propagation of discriminatory effects in binary classification

  • Diego Minatel Universidade de São Paulo (USP)
  • Angelo Cesar Mendes da Silva Universidade de São Paulo (USP)
  • Nícolas Roque dos Santos Universidade de São Paulo (USP)
  • Mariana Curi Universidade de São Paulo (USP)
  • Ricardo Marcondes Marcacini Universidade de São Paulo (USP)
  • Alneu de Andrade Lopes Universidade de São Paulo (USP)


Unfair decision-making supported by machine learning, which harms or benefits a specific group of people, is frequent. In many cases, the models only reproduce the biases in the data, which does not absolve its responsibility for these decisions. Thus, with the increase in the automation of activities through machine learning models, it is mandatory to prospect solutions that add fairness factors to the models and clarity about the supported decisions. One option to mitigate model discrimination is quantifying the ratio of instances belonging to each target class to build data sets that approximate the actual data distribution. This alternative aims to reduce the responsibility of data on discriminatory effects and direct the function of treating them to the models. In this sense, we propose to analyze different types of data stratification, including stratification by sociodemographic groups that are historically unprivileged, and associate these stratification types to the fairer or unfairer models. According to our results, stratification by class and group of people helps to develop fairer models, reducing the discriminatory effects in binary classification.
Palavras-chave: analysis, binary classification, data bias, data stratification, discriminatory effects, fairness, machine learning, unfairness


MINATEL, Diego; DA SILVA, Angelo Cesar Mendes; DOS SANTOS, Nícolas Roque; CURI, Mariana; MARCACINI, Ricardo Marcondes; LOPES, Alneu de Andrade. Data stratification analysis on the propagation of discriminatory effects in binary classification. In: SYMPOSIUM ON KNOWLEDGE DISCOVERY, MINING AND LEARNING (KDMILE), 11. , 2023, Belo Horizonte/MG. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2023 . p. 73-80. ISSN 2763-8944. DOI: