Death Registry Prediction in Brazilian Male Prisons with a Random Forest Ensemble

  • Nathan Formentin Universidade Federal do Rio Grande
  • Eduardo Borges Universidade Federal do Rio Grande
  • Giancarlo Lucca Universidade Federal do Rio Grande
  • Helida Santos Universidade Federal do Rio Grande
  • Gracaliz Dimuro Universidade Federal do Rio Grande

Resumo


Brazil has the third-largest prison population globally, and it has been growing steadily for more than two decades. Constant growth and low jail investment generated significant problems, such as overcrowding and widespread diseases. This study proposes the construction of a Random Forest classifier to predict the occurrence of deaths in prisons. We extracted data from the National Survey of Penitentiary Information for the years 2015 to 2016. The best-fitted classifier achieved accuracy equals 87% being able to identify correctly up to 84% of deaths occurrences. In the present work, it was possible to establish a relationship between prisons' reality and the data mined, determining areas in need of investment in the penitentiary system.

Palavras-chave: Death Prediction, Ensemble, Prison System

Referências

Becker, E. J., Burkart, D., Mildner, J., and Tamir, D. (2018). Determination of the defining features of texts written in isolation with a naive bayesian classifier. In 2018 IEEE Integrated STEM Education Conference (ISEC), pages 209–210. IEEE.

Breiman, L. (2001). Random forests. Machine learning, 45(1):5–32.

Burki, T. (2020). Prisons are “in no way equipped” to deal with covid-19. Lancet (London, England), 395(10234):1411.

Chandrashekar, G. and Sahin, F. (2014). A survey on feature selection methods. Computers & Electrical Engineering, 40(1):16–28.

DEPEN (2014). Levantamento nacional de informações penitenciárias – infopen. Technical report, Departamento Nacional Penitenciário, Ministério da Justiça. http://depen.gov.br/DEPEN/depen/sisdepen/infopen/infopen dez14.pdf.

DEPEN (2015). Levantamento nacional de informações penitenciárias –fopen. Technical report, Departamento Nacional Penitenciário, Ministério da Justiça. http://depen.gov.br/DEPEN/depen/sisdepen/infopen/relatoriosinteticos/relatorio 2015 2311.pdf.

DEPEN (2016). Levantamento nacional de informações penitenciárias –fopen. Technical report, Departamento Nacional Penitenciário, Ministério da Justiça. http://depen.gov.br/DEPEN/noticias-1/noticias/infopen-levantamentnacional-de-informacoes-penitenciarias-2016/relatorio 2016 22111.pdf.

DEPEN (2019). Levantamento nacional de informações penitenciárias – infopen. http://depen.gov.br/DEPEN/depen/sisdepen/infopen.

Duwe, G. (2020). The development and validation of a classification system predicting severe and frequent prison misconduct. The Prison Journal, 100(2):173–200.

Fayyad, U., Piatetsky-Shapiro, G., and Smyth, P. (1996). From data mining to knowledge discovery in databases. AI Magazine, 17(3):37.

Hossin, M. and Sulaiman, M. (2015). A review on evaluation metrics for data classification evaluations. International Journal of Data Mining & Knowledge Management Process, 5(2):1.

ICPR (2017). World prison brief. Institute for Criminal Policy Research, https://www.prisonstudies.org.

Jin Huang and Ling, C. X. (2005). Using auc and accuracy in evaluating learning algorithms. IEEE Transactions on Knowledge and Data Engineering, 17(3):299–310.

Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. In Proceedings of the 14th International Joint Conference on Artificial Intelligence - Volume 2, IJCAI’95, page 1137–1143, San Francisco, CA, USA. Morgan Kaufmann Publishers Inc.

Lemaı̂tre, G., Nogueira, F., and Aridas, C. K. (2017). Imbalanced-learn: A python toolbox to tackle the curse of imbalanced datasets in machine learning. Journal of Machine Learning Research, 18(17):1–5.

Li, S., Zhang, H., Ye, L., Su, S., Guo, X., Yu, H., and Fang, B. (2020). Prison term prediction on criminal case description with deep learning. Computers, Materials & Continua, 62(3):1217–1231.

Miranda, A., Zandonade, E., Job Neto, F., Pompeu, J., Costa-Moura, R., Coelho, R., Saraceni, V., and Fonseca, V. (2015). Análise epidemiológica da situação da saúde na população privada de liberdade no brasil: dados de bases de informação. Vitória: Editora da UFES.

Muggah, R., Taboada, C., and Tinoco, D. (2019). Q&a: Why is prison violence so bad in brazil? Americas Quarterly, 2.

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., et al. (2011). Scikit-learn: Machine learning in python. the Journal of machine Learning research, 12:2825–2830.

Sánchez, R., Maria, A. A. M., et al. (2008). Tuberculose em População Carcerária do Estado do Rio de Janeiro: prevalência e subsı́dios para formulação de estratégias de controle. PhD thesis, Fundação Oswaldo Cruz.

Soares Filho, Marden Marques Bueno, P. M. M. G. (2016). Demography, vulnerabilities and right to health to brazilian prison population. Ciencia & saude coletiva, 21:1999– 2010.

SVS (2014). Situação da tuberculose no brasil. Technical report, Secretaria de Vigilância em Saúde, Ministério da Saúde. http://bvsms.saude.gov.br/bvs/publicacoes/panorama %20tuberculose%20brasil 2014.pdf.

Wes McKinney (2010). Data Structures for Statistical Computing in Python. In Stéfan van der Walt and Jarrod Millman, editors, Proceedings of the 9th Python in Science Conference, pages 56 – 61.
Publicado
20/10/2020
FORMENTIN, Nathan; BORGES, Eduardo; LUCCA, Giancarlo; SANTOS, Helida; DIMURO, Gracaliz. Death Registry Prediction in Brazilian Male Prisons with a Random Forest Ensemble. In: ENCONTRO NACIONAL DE INTELIGÊNCIA ARTIFICIAL E COMPUTACIONAL (ENIAC), 17. , 2020, Evento Online. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2020 . p. 330-341. ISSN 2763-9061. DOI: https://doi.org/10.5753/eniac.2020.12140.