On the use of online clustering for anomaly detection in trace streams
Resumo
Identifying anomalies in business processes is a challenge organizations face daily and are critical for their operations data flow, whether public or private. Most current techniques face this challenge by requiring prior knowledge about business process models or specialists intervention to support the usage of state of the art methods, such as supervised machine learning. Also, the techniques tend to perform offline towards achieving consistent predictive results. In this work, we propose identifying the effectiveness of an online clustering method, particularly Autocloud. This algorithm is able to perform anomaly detection in trace streams meeting real-life requirements. Autocloud is an autonomous, evolutionary, recursive online clustering algorithm that requires little memory to provide insights from anomalous patterns in real-time. Moreover, this clustering algorithm does not require previous training or even prior knowledge from the application domain. Experiments were carried out with six processes schemes, six different anomalies over 1,000, 5,000 and 10,000 event traces, generating a total of 630 datasets. The experiments confirmed the algorithms ability to detect anomalies in those event traces, paving the way for more reliable information systems grounded on an automatic conformance checking of desirable business process execution.
Palavras-chave:
Clustering algorithms, Process mining, Anomalies, Data mining, Data stream mining, Information systems, Information systems applications
Referências
Wil van der Aalst. 2016. Process Mining: Data Science in Action (2 ed.). Springer-Verlag, Berlin Heidelberg. https://doi.org/10.1007/978-3-662-49851-4
Marcel R. Ackermann, Marcus Märtens, Christoph Raupach, Kamil Swierkot, Christiane Lammersen, and Christian Sohler. 2012. StreamKM++: A clustering algorithm for data streams. ACM Journal of Experimental Algorithmics 17 (May 2012), 2.4:2.1–2.4:2.30. https://doi.org/10.1145/2133803.2184450
Charu C. Aggarwal, Jiawei Han, Jianyong Wang, and Philip S. Yu. 2003. A framework for clustering evolving data streams. In Proceedings - 29th International Conference on Very Large Data Bases, VLDB 2003. Morgan Kaufmann, 81–92. [link].
P. Angelov. 2014. Anomaly detection based on eccentricity analysis. In 2014 IEEE Symposium on Evolving and Autonomous Learning Systems (EALS). 1–8. https://doi.org/10.1109/EALS.2014.7009497
Sylvio Barbon Junior, Gabriel Marques Tavares, Victor G Turrisi da Costa, Paolo Ceravolo, and Ernesto Damiani. 2018. A framework for human-in-the-loop monitoring of concept-drift detection in event log stream. In Companion Proceedings of the The Web Conference 2018. 319–326.
Clauber Gomes Bezerra, Bruno Sielly Jales Costa, Luiz Affonso Guedes, and Plamen Parvanov Angelov. 2020. An evolving approach to data streams clustering based on typicality and eccentricity data analytics. Information Sciences 518 (May 2020), 13–28. https://doi.org/10.1016/j.ins.2019.12.022
Fábio Bezerra and Jacques Wainer. 2013. Algorithms for anomaly detection of traces in logs of process aware information systems. Information Systems 38, 1 (March 2013), 33–44. https://doi.org/10.1016/j.is.2012.04.004
F. Bezerra, J. Wainer, and Van Der W. M.P. Aalst. 2009. Anomaly detection using process mining. Enterprise, Business-Process and Information Systems Modeling (10th International Workshop, BPMDS 2009, and 14th International Conference, EMMSAD 2009, held at CAiSE 2009, Amsterdam, The Netherlands, June 8-9, 2009. Proceedings) (2009), 149–161. https://doi.org/10.1007/978-3-642-01862-6_13
Kristof Böhmer and Stefanie Rinderle-Ma. 2016. Multi-perspective Anomaly Detection in Business Process Execution Events. In On the Move to Meaningful Internet Systems: OTM 2016 Conferences. Springer International Publishing, Cham, 80–98.
Andrea Burattin. 2015. PLG2: Multiperspective Processes Randomization and Simulation for Online and Offline Settings. ArXiv (2015).
Andrea Burattin and Josep Carmona. 2017. A Framework for Online Conformance Checking. In International Conference on Business Process Management. Springer, 165–177.
Andrea Burattin, Sebastiaan J. van Zelst, Abel Armas-Cervantes, Boudewijn F. van Dongen, and Josep Carmona. 2018. Online Conformance Checking Using Behavioural Patterns. In Business Process Management, Mathias Weske, Marco Montali, Ingo Weber, and Jan vom Brocke (Eds.). Springer International Publishing, Cham, 250–267.
Kristof Böhmer and Stefanie Rinderle-Ma. 2016. Multi-perspective Anomaly Detection in Business Process Execution Events. In On the Move to Meaningful Internet Systems: OTM 2016 Conferences(Lecture Notes in Computer Science), Christophe Debruyne, Hervé Panetto, Robert Meersman, Tharam Dillon, eva Kühn, Declan O’Sullivan, and Claudio Agostino Ardagna (Eds.). Springer International Publishing, Cham, 80–98. https://doi.org/10.1007/978-3-319-48472-3_5
F. Cao, M. Ester, W. Qian, and A. Zhou. 2006. Density-Based Clustering over an Evolving Data Stream with Noise. In SDM. https://doi.org/10.1137/1.9781611972764.29
Josep Carmona, Boudewijn F. van Dongen, Andreas Solti, and Matthias Weidlich. 2018. Conformance Checking - Relating Processes and Models. Springer. 1–263 pages.
Paolo Ceravolo, Gabriel Marques Tavares, Sylvio Barbon Junior, and Ernesto Damiani. 2020. Evaluation goals for online process mining: a concept drift perspective. IEEE Transactions on Services Computing(2020).
Pieter De Koninck, Seppe vanden Broucke, and Jochen De Weerdt. 2018. act2vec, trace2vec, log2vec, and model2vec: Representation Learning for Business Processes. In Business Process Management, Mathias Weske, Marco Montali, Ingo Weber, and Jan vom Brocke (Eds.). Springer International Publishing, Cham, 305–321.
Cleiton dos Santos Garcia, Alex Meincheim, Elio Ribeiro Faria Junior, Marcelo Rosano Dallagassa, Denise Maria Vecino Sato, Deborah Ribeiro Carvalho, Eduardo Alves Portela Santos, and Edson Emilio Scalabrin. 2019. Process mining techniques and applications – A systematic mapping study. Expert Systems with Applications 133 (2019), 260 – 295. https://doi.org/10.1016/j.eswa.2019.05.003
Mohammadreza Fani Sani, Sebastiaan J. van Zelst, and Wil M. P. van der Aalst. 2020. Conformance Checking Approximation Using Subset Selection and Edit Distance. In Advanced Information Systems Engineering. Springer International Publishing, Cham, 234–251.
Mohammed Ghesmoune, Mustapha Lebbah, and Hanene Azzag. 2016. State-of-the-art on clustering data streams. Big Data Analytics 1, 1 (Dec. 2016), 13. https://doi.org/10.1186/s41044-016-0011-3
R. P. Jagadeesh Chandra Bose and Wil van der Aalst. 2010. Trace Alignment in Process Mining: Opportunities for Process Diagnostics. In Business Process Management. Springer Berlin Heidelberg, Berlin, Heidelberg, 227–242.
S. B. Junior, P. Ceravolo, E. Damiani, N. J. Omori, and G. M. Tavares. 2020. Anomaly Detection on Event Logs with a Scarcity of Labels. In 2020 2nd International Conference on Process Mining (ICPM). 161–168. https://doi.org/10.1109/ICPM49681.2020.00032
K. L. McMillan and D. K. Probst. 1995. A technique of state space search based on unfolding. Formal Methods in System Design 6, 1 (01 1 1995), 45–65. https://doi.org/10.1007/BF01384314
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. arXiv:1301.3781 [cs] (Sept. 2013). http://arxiv.org/abs/1301.3781 arXiv:1301.3781.
Maryam Mousavi, A. A. Bakar, and M. Vakilian. 2015. Data stream clustering algorithms: A review. [link].
Timo Nolle, Stefan Luettgen, Alexander Seeliger, and Max Mühlhäuser. 2019. BINet: Multi-perspective business process anomaly classification. Information Systems (Oct. 2019), 101458. https://doi.org/10.1016/j.is.2019.101458
A. Rozinat and W.M.P. van der Aalst. 2008. Conformance checking of processes based on monitoring real behavior. Information Systems 33, 1 (2008), 64 – 95.
Daniel Schuster and Sebastiaan J. van Zelst. 2020. Online Process Monitoring Using Incremental State-Space Expansion: An Exact Algorithm. In Business Process Management, Dirk Fahland, Chiara Ghidini, Jörg Becker, and Marlon Dumas (Eds.). Springer International Publishing, Cham, 147–164.
Gabriel Marques Tavares and Sylvio Barbon. 2020. Analysis of Language Inspired Trace Representation for Anomaly Detection. In ADBIS, TPDL and EDA 2020 Common Workshops and Doctoral Consortium, Ladjel Bellatreche, Mária Bieliková, Omar Boussaïd, Barbara Catania, Jérôme Darmont, Elena Demidova, Fabien Duchateau, Mark Hall, Tanja Merčun, Boris Novikov, Christos Papatheodorou, Thomas Risse, Oscar Romero, Lucile Sautot, Guilaine Talens, Robert Wrembel, and Maja Žumer (Eds.). Springer International Publishing, Cham, 296–308.
G. M. Tavares, P. Ceravolo, V. G. Turrisi Da Costa, E. Damiani, and S. Barbon Junior. 2019. Overlapping Analytic Stages in Online Process Mining. In 2019 IEEE International Conference on Services Computing (SCC). 167–175. https://doi.org/10.1109/SCC.2019.00037 ISSN: 2474-2473.
Gabriel Marques Tavares, Victor G Turrisi da Costa, Vinicius Eiji Martins, Paolo Ceravolo, and Sylvio Barbon Jr. 2018. Anomaly detection in business process based on data stream mining. In Proceedings of the XIV Brazilian Symposium on Information Systems. 1–8.
Wil M. P. van der Aalst. 2016. Process Mining: Data Science in Action(2 ed.). Springer, Heidelberg.
Sebastiaan J. van Zelst, Alfredo Bolt, Marwan Hassani, Boudewijn F. van Dongen, and Wil M. P. van der Aalst. 2017. Online conformance checking: relating event streams to process models using prefix-alignments. International Journal of Data Science and Analytics (27 10 2017). https://doi.org/10.1007/s41060-017-0078-6
Sebastiaan J. van Zelst, Mohammadreza Fani Sani, Alireza Ostovar, Raffaele Conforti, and Marcello La Rosa. 2020. Detection and removal of infrequent behavior from event streams of business processes. Information Systems 90(2020), 101451. https://doi.org/10.1016/j.is.2019.101451 Advances in Information Systems Engineering Best Papers of CAiSE 2018.
Marcel R. Ackermann, Marcus Märtens, Christoph Raupach, Kamil Swierkot, Christiane Lammersen, and Christian Sohler. 2012. StreamKM++: A clustering algorithm for data streams. ACM Journal of Experimental Algorithmics 17 (May 2012), 2.4:2.1–2.4:2.30. https://doi.org/10.1145/2133803.2184450
Charu C. Aggarwal, Jiawei Han, Jianyong Wang, and Philip S. Yu. 2003. A framework for clustering evolving data streams. In Proceedings - 29th International Conference on Very Large Data Bases, VLDB 2003. Morgan Kaufmann, 81–92. [link].
P. Angelov. 2014. Anomaly detection based on eccentricity analysis. In 2014 IEEE Symposium on Evolving and Autonomous Learning Systems (EALS). 1–8. https://doi.org/10.1109/EALS.2014.7009497
Sylvio Barbon Junior, Gabriel Marques Tavares, Victor G Turrisi da Costa, Paolo Ceravolo, and Ernesto Damiani. 2018. A framework for human-in-the-loop monitoring of concept-drift detection in event log stream. In Companion Proceedings of the The Web Conference 2018. 319–326.
Clauber Gomes Bezerra, Bruno Sielly Jales Costa, Luiz Affonso Guedes, and Plamen Parvanov Angelov. 2020. An evolving approach to data streams clustering based on typicality and eccentricity data analytics. Information Sciences 518 (May 2020), 13–28. https://doi.org/10.1016/j.ins.2019.12.022
Fábio Bezerra and Jacques Wainer. 2013. Algorithms for anomaly detection of traces in logs of process aware information systems. Information Systems 38, 1 (March 2013), 33–44. https://doi.org/10.1016/j.is.2012.04.004
F. Bezerra, J. Wainer, and Van Der W. M.P. Aalst. 2009. Anomaly detection using process mining. Enterprise, Business-Process and Information Systems Modeling (10th International Workshop, BPMDS 2009, and 14th International Conference, EMMSAD 2009, held at CAiSE 2009, Amsterdam, The Netherlands, June 8-9, 2009. Proceedings) (2009), 149–161. https://doi.org/10.1007/978-3-642-01862-6_13
Kristof Böhmer and Stefanie Rinderle-Ma. 2016. Multi-perspective Anomaly Detection in Business Process Execution Events. In On the Move to Meaningful Internet Systems: OTM 2016 Conferences. Springer International Publishing, Cham, 80–98.
Andrea Burattin. 2015. PLG2: Multiperspective Processes Randomization and Simulation for Online and Offline Settings. ArXiv (2015).
Andrea Burattin and Josep Carmona. 2017. A Framework for Online Conformance Checking. In International Conference on Business Process Management. Springer, 165–177.
Andrea Burattin, Sebastiaan J. van Zelst, Abel Armas-Cervantes, Boudewijn F. van Dongen, and Josep Carmona. 2018. Online Conformance Checking Using Behavioural Patterns. In Business Process Management, Mathias Weske, Marco Montali, Ingo Weber, and Jan vom Brocke (Eds.). Springer International Publishing, Cham, 250–267.
Kristof Böhmer and Stefanie Rinderle-Ma. 2016. Multi-perspective Anomaly Detection in Business Process Execution Events. In On the Move to Meaningful Internet Systems: OTM 2016 Conferences(Lecture Notes in Computer Science), Christophe Debruyne, Hervé Panetto, Robert Meersman, Tharam Dillon, eva Kühn, Declan O’Sullivan, and Claudio Agostino Ardagna (Eds.). Springer International Publishing, Cham, 80–98. https://doi.org/10.1007/978-3-319-48472-3_5
F. Cao, M. Ester, W. Qian, and A. Zhou. 2006. Density-Based Clustering over an Evolving Data Stream with Noise. In SDM. https://doi.org/10.1137/1.9781611972764.29
Josep Carmona, Boudewijn F. van Dongen, Andreas Solti, and Matthias Weidlich. 2018. Conformance Checking - Relating Processes and Models. Springer. 1–263 pages.
Paolo Ceravolo, Gabriel Marques Tavares, Sylvio Barbon Junior, and Ernesto Damiani. 2020. Evaluation goals for online process mining: a concept drift perspective. IEEE Transactions on Services Computing(2020).
Pieter De Koninck, Seppe vanden Broucke, and Jochen De Weerdt. 2018. act2vec, trace2vec, log2vec, and model2vec: Representation Learning for Business Processes. In Business Process Management, Mathias Weske, Marco Montali, Ingo Weber, and Jan vom Brocke (Eds.). Springer International Publishing, Cham, 305–321.
Cleiton dos Santos Garcia, Alex Meincheim, Elio Ribeiro Faria Junior, Marcelo Rosano Dallagassa, Denise Maria Vecino Sato, Deborah Ribeiro Carvalho, Eduardo Alves Portela Santos, and Edson Emilio Scalabrin. 2019. Process mining techniques and applications – A systematic mapping study. Expert Systems with Applications 133 (2019), 260 – 295. https://doi.org/10.1016/j.eswa.2019.05.003
Mohammadreza Fani Sani, Sebastiaan J. van Zelst, and Wil M. P. van der Aalst. 2020. Conformance Checking Approximation Using Subset Selection and Edit Distance. In Advanced Information Systems Engineering. Springer International Publishing, Cham, 234–251.
Mohammed Ghesmoune, Mustapha Lebbah, and Hanene Azzag. 2016. State-of-the-art on clustering data streams. Big Data Analytics 1, 1 (Dec. 2016), 13. https://doi.org/10.1186/s41044-016-0011-3
R. P. Jagadeesh Chandra Bose and Wil van der Aalst. 2010. Trace Alignment in Process Mining: Opportunities for Process Diagnostics. In Business Process Management. Springer Berlin Heidelberg, Berlin, Heidelberg, 227–242.
S. B. Junior, P. Ceravolo, E. Damiani, N. J. Omori, and G. M. Tavares. 2020. Anomaly Detection on Event Logs with a Scarcity of Labels. In 2020 2nd International Conference on Process Mining (ICPM). 161–168. https://doi.org/10.1109/ICPM49681.2020.00032
K. L. McMillan and D. K. Probst. 1995. A technique of state space search based on unfolding. Formal Methods in System Design 6, 1 (01 1 1995), 45–65. https://doi.org/10.1007/BF01384314
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. arXiv:1301.3781 [cs] (Sept. 2013). http://arxiv.org/abs/1301.3781 arXiv:1301.3781.
Maryam Mousavi, A. A. Bakar, and M. Vakilian. 2015. Data stream clustering algorithms: A review. [link].
Timo Nolle, Stefan Luettgen, Alexander Seeliger, and Max Mühlhäuser. 2019. BINet: Multi-perspective business process anomaly classification. Information Systems (Oct. 2019), 101458. https://doi.org/10.1016/j.is.2019.101458
A. Rozinat and W.M.P. van der Aalst. 2008. Conformance checking of processes based on monitoring real behavior. Information Systems 33, 1 (2008), 64 – 95.
Daniel Schuster and Sebastiaan J. van Zelst. 2020. Online Process Monitoring Using Incremental State-Space Expansion: An Exact Algorithm. In Business Process Management, Dirk Fahland, Chiara Ghidini, Jörg Becker, and Marlon Dumas (Eds.). Springer International Publishing, Cham, 147–164.
Gabriel Marques Tavares and Sylvio Barbon. 2020. Analysis of Language Inspired Trace Representation for Anomaly Detection. In ADBIS, TPDL and EDA 2020 Common Workshops and Doctoral Consortium, Ladjel Bellatreche, Mária Bieliková, Omar Boussaïd, Barbara Catania, Jérôme Darmont, Elena Demidova, Fabien Duchateau, Mark Hall, Tanja Merčun, Boris Novikov, Christos Papatheodorou, Thomas Risse, Oscar Romero, Lucile Sautot, Guilaine Talens, Robert Wrembel, and Maja Žumer (Eds.). Springer International Publishing, Cham, 296–308.
G. M. Tavares, P. Ceravolo, V. G. Turrisi Da Costa, E. Damiani, and S. Barbon Junior. 2019. Overlapping Analytic Stages in Online Process Mining. In 2019 IEEE International Conference on Services Computing (SCC). 167–175. https://doi.org/10.1109/SCC.2019.00037 ISSN: 2474-2473.
Gabriel Marques Tavares, Victor G Turrisi da Costa, Vinicius Eiji Martins, Paolo Ceravolo, and Sylvio Barbon Jr. 2018. Anomaly detection in business process based on data stream mining. In Proceedings of the XIV Brazilian Symposium on Information Systems. 1–8.
Wil M. P. van der Aalst. 2016. Process Mining: Data Science in Action(2 ed.). Springer, Heidelberg.
Sebastiaan J. van Zelst, Alfredo Bolt, Marwan Hassani, Boudewijn F. van Dongen, and Wil M. P. van der Aalst. 2017. Online conformance checking: relating event streams to process models using prefix-alignments. International Journal of Data Science and Analytics (27 10 2017). https://doi.org/10.1007/s41060-017-0078-6
Sebastiaan J. van Zelst, Mohammadreza Fani Sani, Alireza Ostovar, Raffaele Conforti, and Marcello La Rosa. 2020. Detection and removal of infrequent behavior from event streams of business processes. Information Systems 90(2020), 101451. https://doi.org/10.1016/j.is.2019.101451 Advances in Information Systems Engineering Best Papers of CAiSE 2018.
Publicado
07/06/2021
Como Citar
VERTUAM NETO, Renato; TAVARES, Gabriel; CERAVOLO, Paolo; BARBON, Sylvio.
On the use of online clustering for anomaly detection in trace streams. In: SIMPÓSIO BRASILEIRO DE SISTEMAS DE INFORMAÇÃO (SBSI), 17. , 2021, Uberlândia.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2021
.