Statistical Audit via Gaussian Mixture Models in Business Intelligence Systems

  • Bruno Pilon Universidade de Brasília
  • João Costa Universidade de Brasília
  • Juan Murillo-Fuentes Universidade de Brasília
  • Rafael Júnior Universidade de Brasília


A Business Intelligence (BI) System employs tools from several areas of knowledge for the collection, integration and analysis of data to improve business decision making. The Brazilian Ministry of Planning, Budget and Management (MP) uses a BI System designed with the University of Brasília to ascertain irregularities on the payroll of the Brazilian federal government, performing audit trails on selected items and fields of the payroll database. This current auditing approach is entirely deterministic, since the audit trails look for previously known signatures of irregularities which are composed by means of an ontological method used to represent auditors concept maps. In this work, we propose to incorporate a statistical filter in this existing BI system in order to increase its performance in terms of processing speed and overall system responsiveness. The proposed statistical filter is based on a generative Gaussian Mixture Model (GMM) whose goal is to provide a complete stochastic model of the process, specially the latent probability density function of the generative mixture, and use that model to filter the most probable payrolls. Inserting this statistical filter as a pre-processing stage preceding the deterministic auditing showed to be effective in reducing the amount of data to be analyzed by the audit trails, despite the penalty fee intrinsically associated with stochastic models due to the false negative outcomes that are not further processed. In our approach, gains obtained with the proposed pre-processing stage overcome impacts from false negative outcomes.

Palavras-chave: Inteligência de negócios, auditoria estatística, Misturas gaussianas


Anderson, D., Frivold, T., Valdes, A.: Next-generation intrusion detection expert system (NIDES): A summary. SRI International, Computer Science Laboratory (1995)

Bilmes, J.A., et al.: A gentle tutorial of the EM algorithm and its application to parameter estimation for gaussian mixture and hidden markov models. International Computer Science Institute 4(510), 126 (1998)

Bolton, R.J., Hand, D.J.: Statistical fraud detection: A review. Statistical Science pp. 235–249 (2002)

Brasil, Legal Regime of the Federal Public Employee. Law number 8112 of December 11, 1990

Campos, S.R., Fernandes, A.A., de Sousa Jr, R.T., De Freitas, E.P., da Costa, J.P.C.L., Serrano, A.M.R., Rodrigues, D.D.C., Rodrigues, C.T.: Ontologic audit trails mapping for detection of irregularities in payrolls. In: International Conference on Next Generation Web Services Practices (NWeSP), pp. 339–344 (2012)

Dalal, S., Hall, W.: Approximating priors by mixtures of natural conjugate priors. Journal of the Royal Statistical Society. Series B (Methodological) pp. 278–286 (1983)

Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B (Methodological) pp. 1–38 (1977)

Fernandes, A.A., Amaro, L.C., Da Costa, J.P.C.L., Serrano, A.M.R., Martins, V.A., de Sousa, R.T.: Construction of ontologies by using concept maps: A study case of business intelligence for the federal property department. In: Business Intelligence and Financial Engineering (BIFE), 2012 Fifth International Conference on, pp. 84–88. IEEE (2012)

Figueiredo, M.A., Jain, A.K.: Unsupervised learning of finite mixture models. Pattern Analysis and Machine Intelligence, IEEE Transactions on 24(3), 381– 396 (2002)

Ghosh, S., Reilly, D.L.: Credit card fraud detection with a neural-network. In: System Sciences, 1994. Proceedings of the Twenty-Seventh Hawaii International Conference on, vol. 3, pp. 621–630. IEEE (1994)

Hastie, T., Tibshirani, R.: Discriminant analysis by Gaussian mixtures. Journal of the Royal Statistical Society. Series B (Methodological) pp. 155–176 (1996)

Huacarpuma, R.C., Rodrigues, D.d.C., Serrano, A.M.R., da Costa, J.P.C.L., de Sousa Jr, R.T., Holanda, M., Araujo, A.P.F.: Big data: A case study on data from the Brazilian ministry of planning, budgeting and management. IADIS Applied Computing 2013 (AC) Conference (2013)

Lee, K., Guillemot, L., Yue, Y., Kramer, M., Champion, D.: Application of the Gaussian mixture model in pulsar astronomy-pulsar classification and candidates ranking for the fermi 2fgl catalogue. Monthly Notices of the Royal Astronomical Society 424(4), 2832–2840 (2012)

Lu, D., Moran, E., Batistella, M.: Linear mixture model applied to Amazonian vegetation classification. Remote sensing of environment 87(4), 456–469 (2003)

Marin, J.M., Mengersen, K., Robert, C.P.: Bayesian modelling and inference on mixtures of distributions. Handbook of statistics 25, 459–507 (2005)

McLachlan, G., Peel, D.: Finite mixture models. John Wiley & Sons (2004)

McLachlan, G.J., Basford, K.E.: Mixture models. inference and applications to clustering. Statistics: Textbooks and Monographs, New York: Dekker, 1988 1 (1988)

McLachlan, G.J., Bean, R., Peel, D.: A mixture modelbased approach to the clustering of microarray expression data. Bioinformatics 18(3), 413–422 (2002)

Miziara, F., Puttini, R.S., Sousa Jr, R.T.: D´etection d’intrusion en r´eseaux mobiles ad hoc utilisant un mod- `ele de m´elange gaussien pour le comportement du trafic. 2nd Joint Conference on Security in Network Architectures and Information Systems pp. 89–100 (2007)

MPOG: Boletim Estat´ıstico de Pessoal (2013). URL upload/Arquivos/servidor/publicacoes/boletim_ estatistico_pessoal/2013/Bol207_Jul2013_2.pdf

Rasmussen, C.E.: The infinite Gaussian mixture model. In: NIPS, vol. 12, pp. 554–560 (1999)

Redner, R.A., Walker, H.F.: Mixture densities, maximum likelihood and the EM algorithm. Society of Industrial and Applied Mathematics Review 26(2), 195– 239 (1984)

Reynolds, D.A., Quatieri, T.F., Dunn, R.B.: Speaker verification using adapted Gaussian mixture models. Digital signal processing 10(1), 19–41 (2000)

S¨onmez, M.K., Heck, L., Weintraub, M., Shriberg, E., Kemal, M., Larry, S., Mitchel, H., Shriberg, W.E.: A lognormal tied mixture model of pitch for prosodybased speaker recognition. SRI International (1997)

Serrano, A.M.R., Rodrigues, P.H., Huacarpuma, R.C., da Costa, J.P.C.L., de Freitas, E.P., de Assis, V.L., Fernandes, A.A., de Sousa Jr, R.T., Marinho, M.A.M., Pilon, B.H.A.: Improved business intelligence solution with reimbursement tracking system for the Brazilian ministry of planning, budget and management. 6th International Conference on Knowledge Management and Information Sharing (KMIS) (2014)

SIAPE: Sistema Integrado de Administra¸c˜ao de Recusros Humanos (2015). URL https: // produtos/administracao-federal/

Wu, C.J.: On the convergence properties of the EM algorithm. The Annals of statistics pp. 95–103 (1983)
Como Citar

Selecione um Formato
PILON, Bruno; COSTA, João; MURILLO-FUENTES, Juan; JÚNIOR, Rafael. Statistical Audit via Gaussian Mixture Models in Business Intelligence Systems. In: SIMPÓSIO BRASILEIRO DE SISTEMAS DE INFORMAÇÃO (SBSI), 11. , 2015, Goiânia. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2015 . p. 683-690. DOI: