Employing Gradient Boosting and Anomaly Detection for Prediction of Frauds in Energy Consumption

  • Beatriz Albiero Latam Datalab Serasa Experian
  • Estevo Uyrá Latam Datalab Serasa Experian
  • Ramon Vilarino Latam Datalab Serasa Experian
  • Juliano Silva CPFL Energia
  • Tales Souza CPFL Energia
  • Ricardo dos Santos Latam Datalab Serasa Experian
  • Sami Yamouni Latam Datalab Serasa Experian
  • Renato Vicente Latam Datalab Serasa Experian


Energy fraud is a critical economical burden for electric power organizations in Brazil. In this paper we present the application of cutting-edge Machine Learning algorithms, namely XGBoost and Isolation Forest, for prediction of irregularities in electrical energy consumption. By using a Logistic Regression model as a benchmark, we show that the use of XGBoost results in a significant improvement in the F1-score for fraud predictions in two different scenarios: with and without inspection history features. Moreover, we also propose the use of the Isolation Forest algorithm for detection of anomalies in electrical energy consumption. We show that this approach may be useful in the case of lack of inspection history features, surpassing dummy classifiers.

Palavras-chave: Fraud, Energy Consumption, XGBoost


Alfarra, H., Attia, A., and S. M. El Safty, C. (2018). Nontechnical loss detection for metered customers in alexandria electricity distribution company using support vector machine. Renewable Energy and Power Quality Journal, 1:468–474.

Angelos, E., Saavedra, O., Carmona Cortes, O., and Souza, A. (2011). Detection and identification of abnormalities in customer consumptions in power distribution systems. Power Delivery, IEEE Transactions on, 26:2436–2442.

Antunes Lima, D. (2019). Perdas de energia - aneel (brazilian electricity regulatory agency). https://www2.camara.leg.br/atividade-legislativa/ comissoes/comissoes-permanentes/cme/audiencias-publicas/ 2018/audiencia-publica-16-05-2018/ANEEL\%20-\%20\ %20Perdas\%20Eletricas\%20-\%20Davi\%20Lima.pdf.

Chen, T. and Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. arXiv e-prints, page arXiv:1603.02754.

Cody, C., Ford, V., and Siraj, A. (2015a). Decision tree learning for fraud detection in consumer energy consumption. In 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA), pages 1175–1179. IEEE.

Cody, C., Ford, V., and Siraj, A. (2015b). Decision tree learning for fraud detection in consumer energy consumption. 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA), pages 1175–1179.

Coma-Puig, B., Carmona, J., Gavalda, R., Alcoverro, S., and Martin, V. (2016). Fraud detection in energy consumption: A supervised approach. In 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA), pages 120–129. IEEE.

Costa, B., L. A Alberto, B., M. Portela, A., W, M., and O.Eler, E. (2013). Fraud detection in electric power distribution networks using an ann-based knowledge-discovery process. International Journal of Artificial Intelligence & Applications, 4:17–23.

Doukas, H., Karakosta, C., Flamos, A., and Psarras, J. (2011). Electric power transmission: An overview of associated burdens. International Journal of Energy Research, 35(11):979–988.

E. Cabral, J., Pinto, J., M. Martins, E., and M. A. C. Pinto, A. (2008). Fraud detection in high voltage electricity consumers using data mining. pages 1 – 5.

ENERDATA (2019). Global energy statistical yearbook 2019. https://yearbook.enerdata.net/electricity/ electricity-domestic-consumption-data.html.

Ford, V., Siraj, A., and Eberle, W. (2014). Smart grid energy fraud detection using artificial neural networks. In 2014 IEEE Symposium on Computational Intelligence Applications in Smart Grid (CIASG), pages 1–6. IEEE.

Lawi, A., Wungo, S. L., and Manjang, S. (2017). Identifying irregularity electricity usage of customer behaviors using logistic regression and linear discriminant analysis. 2017 3rd International Conference on Science in Information Technology (ICSITech), pages 552–557.

Liu, F., Ting, K., and Zhou, Z.-H. (2012). Isolation-based anomaly detection. ACM Transactions on Knowledge Discovery from Data, 6(1):1 – 39.

Liu, F. T., Ting, K. M., and Zhou, Z.-H. (2008). Isolation forest. In 2008 Eighth IEEE International Conference on Data Mining, pages 413–422. IEEE.

Maia, C. (2017). Perdas de energia custam mais de r$8 bi aos consumidores em 2016. https://www.valor.com.br/empresas/ 5219107/perdas-de-energia-custam-mais-de-r-8-bi-\ \aos-consumidores-em-2016.

Management Solutions, M. (2017). Fraud management in the energy industry. https://www.managementsolutions. com/sites/default/files/publicaciones/eng/ fraud-management-in-the-energy-industry.pdf. Accessed: 201907-11.

Messinis, G. M. and Hatziargyriou, N. D. (2018). Review of non-technical loss detection methods. Electric Power Systems Research, 158:250–266.

Monedero, I., Biscarri, F., Leon, C., Guerrero, J. I., Biscarri, J., and Millan, R. (2012). Detection of frauds and other non-technical losses in a power utility using pearson coefficient, bayesian networks and decision trees. International Journal of Electrical Power & Energy Systems, 34:90–98.

Monedero, I., Biscarri, F., Len, C., Biscarri, J., and Milln, R. (2006). Midas: Detection of non-technical losses in electrical consumption using neural networks and statistical techniques. pages 725–734.

Nagi, J., Yap, K. S., Tiong, S. K., Ahmed, S. K., and Mohamad, M. (2010). Nontechnical loss detection for metered customers in power utility using support vector machines. IEEE Transactions on Power Delivery, 25:11621171.

Nizar, A. H., Dong, Z. Y., and Wang, Y. (2008). Power utility nontechnical loss analysis with extreme learning machine method. IEEE Transactions on Power Systems, 23:946– 955.

Nogales, F., Contreras, J., J. Conejo, A., and Espinola, R. (2002). Forecasting next-day electricity prices by time series models. Power Engineering Review, IEEE, 22:58–58.

Raschka, S. (2018). Model Evaluation, Model Selection, and Algorithm Selection in Machine Learning. arXiv e-prints, page arXiv:1811.12808.

Smith, T. B. (2004). Electricity theft: a comparative analysis. Energy policy, 32(18):2067–2076.

Spiri, J. V., Stankovi, S. S., Doi, M. B., and Popovi, T. D. (2014). Using the rough set theory to detect fraud committed by electricity customers. International Journal of Electrical Power & Energy Systems, 62:727 – 734.
ALBIERO, Beatriz; UYRÁ, Estevo; VILARINO, Ramon; SILVA, Juliano; SOUZA, Tales; SANTOS, Ricardo dos; YAMOUNI, Sami; VICENTE, Renato. Employing Gradient Boosting and Anomaly Detection for Prediction of Frauds in Energy Consumption. In: ENCONTRO NACIONAL DE INTELIGÊNCIA ARTIFICIAL E COMPUTACIONAL (ENIAC), 16. , 2019, Salvador. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2019 . p. 916-925. ISSN 2763-9061. DOI: https://doi.org/10.5753/eniac.2019.9345.