On the evaluation of example-dependent cost-sensitive models for tax debts classification

  • Helton Souza Lima IFPB
  • Damires Yluska de Souza Fernandes IFPB
  • Thiago José Moura IFPB


Example-dependent cost-sensitive classification methods are suitable to many real-world classification problems, where the costs, due to misclassification, vary among every example of a dataset. Tax administration applications are included in this segment of problems, since they deal with different values involved in the tax payments. To help matters, this work presents an experimental evaluation which aims to verify whether cost-sensitive learning algorithms are more cost-effective on average than traditional ones. This task is accomplished in a tax administration application domain, what implies the need of a cost-matrix regarding debt values. The obtained results show that cost-sensitive methods avoid situations like erroneously granting a request with a debt involving millions of reals. Considering the savings score, the cost-sensitive classification methods achieved higher results than their traditional method versions.


Bahnsen, A. C., Stojanovic, A., Aouada, D., Ottersten, B.: Cost sensitive credit card fraud detection using Bayes minimum risk. 12th International conference on machine learning and applications (Vol. 1, pp. 333-338). IEEE. (2013)

Bahnsen, A. C., Aouada, D., Ottersten, B.: Example-dependent cost-sensitive logistic regression for credit scoring. 13th International conference on machine learning and applications (pp. 263-269). IEEE. (2014)

Bahnsen, A. C., Aouada, D., Ottersten, B.: Example-dependent cost-sensitive decision trees. Expert Systems with Applications, 42(19), 6609-6619. (2015)

Bahnsen, A. C., Aouada, D., Ottersten, B.: Ensemble of Example-Dependent Cost-Sensitive Decision Trees. arXiv e-prints, arXiv-1505. (2015)

Battiston, P., Gamba, S., Santoro, A.: Optimizing Tax Administration Policies with Machine Learning. University of Milan Bicocca Department of Economics, Management and Statistics Working Paper, (436). (2020)

Breiman, L.: Bagging predictors. Machine learning, 24(2), 123-140. (1996)

Breiman, L.: Pasting small votes for classification in large databases and on-line. Machine learning, 36(1), 85-103. (1999)

Breiman, L.: Random forests. Machine learning, 45(1), 5-32. (2001)

Cunha, A. D. S., Klin, I. D. V., Pessoa, O. A. G.: Custo e tempo do processo de execução fiscal promovido pela Procuradoria-Geral da Fazenda Nacional. Brasília: Ipea. (2011)

Elkan, C.: The foundations of cost-sensitive learning. International joint conference on artificial intelligence. Vol. 17. No. 1. Lawrence Erlbaum Associates Ltd. (2001)

Freund, Y., Schapire, R.: A decision-theoretic generalization of on-line learning and an application to boosting. Journal of computer and system sciences. (1997)

Harrington, P.: Machine learning in action. 1st edn. Manning Publications. (2012)

Höppner, S., Baesens, B., Verbeke, W., and Verdonck, T.: Instance-dependent cost-sensitive learning for detecting transfer fraud. European Journal of Operational Research, 297(1), 291-300. (2022)

Ippolito, A., Lozano, A. C. G.: Tax Crime Prediction with Machine Learning: A Case Study in the Municipality of São Paulo. In 22nd International Conference on Enterprise Information Systems (pp. 452-459). (2020)

Kim, J., Choi, K., Kim, G., Suh, Y.: Classification cost: An empirical comparison among traditional classifier, Cost-Sensitive Classifier, and MetaCost. Expert Systems with Applications. (2012)

Lima, H. S., de Souza Fernandes, D. Y., Moura, T. J. M., and Sabóia, D.: On the Evaluation of Classification Methods Applied to Requests for Revision of Registered Debts. International Conference on Enterprise Information Systems (ICEIS). (2021)

López, C. P., Rodríguez, M. J. R., Santos, S. L.: Tax fraud detection through neural networks: an application using a sample of personal income taxpayers. Future Internet, 11(4), 86. (2019)

Louppe, G., Geurts, P.: Ensembles on random patches. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases (pp. 346-361). Springer, Berlin, Heidelberg. (2012)

Mathews, J., Mehta, P., Kuchibhotla, S., Bisht, D., Chintapalli, S. B., Rao, S. K. V.: Regression analysis towards estimating tax evasion in Goods and Services Tax. In IEEE/WIC/ACM International Conference on Web Intelligence. (2018)

Mehta, P., Babu, C. S., Rao, S. K. V., Kumar, S.: DeepCatch: Predicting return defaulters in taxation system using example-dependent cost-sensitive deep neural networks. IEEE International Conference on Big Data (Big Data) (pp. 4412-4419). IEEE. (2020)

Mitchell, T. M.: Machine Learning. McGraw-Hill, 1st edition. (1997)

Mohri, M., Rostamizadeh, A., Talwalkar, A.: Foundations of machine learning. MIT press. (2018)

Ordóñez, P. J., Hallo, M.: Data Mining Techniques Applied in Tax Administrations: A Literature Review. In Sixth International Conference on eDemocracy and eGovernment (ICEDEG) (pp. 224-229). (2019)

Soares, G. V.; Cunha, R. C. L. V.: Predição de Irregularidade Fiscal dos Contribuintes do Tributo ISS. In: Anais do Simpósio Brasileiro de Banco de Dados. (2020)

Wu, R. S., Ou, C. S., Lin, H. Y., Chang, S. I., Yen, D. C.: Using data mining technique to enhance tax evasion detection performance. Expert Systems with Applications, 39(10), 8769-8777. (2012)

Zadrozny, B., Elkan, C.: Learning and making decisions when costs and probabilities are both unknown. In Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 204-213). (2001)

Zelenkov, Y. (2019). Example-dependent cost-sensitive adaptive boosting. Expert Systems with Applications, 135, 71-82.

Zhou, Z. H.: Ensemble methods: foundations and algorithms. CRC press. (2012)
Como Citar

Selecione um Formato
LIMA, Helton Souza; FERNANDES, Damires Yluska de Souza; MOURA, Thiago José. On the evaluation of example-dependent cost-sensitive models for tax debts classification. In: ENCONTRO NACIONAL DE INTELIGÊNCIA ARTIFICIAL E COMPUTACIONAL (ENIAC), 19. , 2022, Campinas/SP. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2022 . p. 425-436. ISSN 2763-9061. DOI: https://doi.org/10.5753/eniac.2022.227607.