Application of the Probabilistic Suffix Tree in Predicting Outcomes of the Soluble Coffee Extraction Process
Abstract
The extraction of instant coffee is an industrial process that generates in real time a large amount of data, such as yield, pH, temperature, concentration, percentage of soluble solids, among others. However, the data collected is still poorly explored to improve the instant coffee process. This work presents a methodology to summarize the results of the coffee extractor using probabilistic suffix trees, in which the observations from the past are used to estimate the probability of each class given a variable length context. These probabilities can indicate if the extractor is operating properly. Our methodology is under study at Cia Iguaçu de Café Solúvel and it would be extended to other applications in near future.
References
Café Iguaçu (2017) “História”, [link], Março.
Ching, W., Fung, E. S. and Ng, M. K. (2002) “A Multivariate Markov Chain Model for Categorical Data Sequences and Its Applications in Demand Predictions”, In: IMA Journal of Management Mathematics, p.187-199, Hong Kong.
Clarke, R. J. (1985) “Water and Mineral Contents”. In: Clarke, R. J., Macrae, R. “Coffee: Chemistry”, Elsevier Applied Science Publishers, v.1, p.42-82, London.
Clifford, M. N. (1985) “Chemical and Physical Aspects of Green Coffee and Coffee Products”. In: Coffee: Botany, Biochemistry and Production of Beans and Beverage, p.305-374, London: M. Chapman and Hall.
CNI, Confederação Nacional da Indústria (2017) “Alimentos e Bebidas”, [link], Fevereiro.
De Souza, A. J., Bezerra, C. G., De Andrade, W. L. S., Feijo, R. H.; Leitao, G. B. P., Guedes, L. A., Maitelli, A. L., De Medeiros, A. A. D. (2005) “Gerência de Informação da Produção de Petróleo e Gás”, In: 3º Congresso Brasileiro de P&D em Petróleo e Gás. Salvador, Bahia.
Duda, R. O., Hart, P. E. and Stork, D. G. (2000a) “Pattern Classification”, Second Edition, In: Wiley-Interscience, c.3, p.57.
Duda, R. O., Hart, P. E. and Stork, D. G. (2000b) “Pattern Classification”, Second Edition, In: Wiley-Interscience, c.3, p.3.
Kashiwabara, A. Y., Bonadio, Í., Onuchic, V., Amado, F., Mathias, R., Durham, A. M. (2013) “Tops: A Framework to Manipulate Probabilistic Models of Sequence Data”, In: PLOS: Computational Biology.
Largeron, C. (2003) “Prediction Suffix Trees for Supervised Classification of Sequences”, In: Journal Pattern Recognition Letters, v.24, p.3153-3164.
Leonardi, F. G. (2006) “A Generalization of the PST Algorithm: Modeling the Sparse Nature of Protein Sequences”, In: Bioinformatics, v.22, n.11, p.1302-1307.
Linden, R. (2012) “Algoritmos Genéticos”, 3ª Edição, Editora Ciência Moderna, Rio de Janeiro, p.43.
Patrick, J. J. (2009) “SQL Fundamentals”, Third Edition, Pearson Education, USA, p.3.
Pitchon, E., Gottesman, M. and Meier, R. W. (1970) “Process for Manufacture of Coffee Extract”, United States Patent, General Foods Corporation, New York.
Muñoz-Garcia, J., L. Moreno-Rebollo, J., Pascual-Acosta, A. (1990) “Outliers: A Formal Approach”, In: International Statistical Review, v.58, n.3, p.215-226.
Ribeiro, M. A. (2001) “Automação Industrial”, 4 ed, Salvador: Tek Treinamento & Consultoria Ltda.
Rissanen, J. (1983) “A Universal Data Compression System”, In: IEE Transactions on Information Theory, v.29, n.5, p.656-664.
Santos, A. F. S. (2014) “Métodos Facilitadores de Melhoria do Processo e Aumento de Produtividade”, Instituto de Educação Tecnológica - IETEC.
Schwarz, G. (1978) “Estimating the Dimension of a Model”, In: The Annals of Statistics, v.6, n.2, p.461-464.
Zeferino, L. B., Saraiva, S. H., Silva, L. C, Teixeira, L. J. Q., Lucia S. M. D. (2010) “Efeito da Concentração de Sólidos Solúveis do Extrato de Café Conilon no Índice de Refração, na Densidade e na Viscosidade do Extrato”, In: Enciclopédia Biosfera, Centro Científico Conhecer, v.6, n.11, p.1, Goiânia.
