Vector space models for trace clustering: a comparative study

  • Mateus Alex dos Santos Luna USP
  • André Paulino Lima USP
  • Thaís Rodrigues Neubauer USP
  • Marcelo Fantinato USP
  • Sarajane Marques Peres USP


Process mining explores event logs to offer valuable insights to business process managers. Some types of business processes are hard to mine, including unstructured and knowledge-intensive processes. Then, trace clustering is usually applied to event logs aiming to break it into sublogs, making it more amenable to the typical process mining task. However, applying clustering algorithms involves decisions, such as how traces are represented, that can lead to better results. In this paper, we compare four vector space models for trace clustering, using them with an agglomerative clustering algorithm in synthetic and real-world event logs. Our analyses suggest the embeddings-based vector space model can properly handle trace clustering in unstructured processes.


Appice, A. and Malerba, D. (2016). A co-training strategy for multiple view clustering in process mining. IEEE Trans. Serv. Comput., 9(6):832–845.

Baeza-Yates, R. A. and Ribeiro-Neto, B. (1999). Modern information retrieval. AddisonWesley Longman Publishing Co., Boston.

De Koninck, P., Nelissen, K., vanden Broucke, S., Baesens, B., Snoeck, M., and De Weerdt, J. (2021). Expert-driven trace clustering with instance-level constraints. Knowl. Inf. Syst., 63:1197–1220.

de Leoni, M., van der Aalst, W. M. P., and Dees, M. (2015). A general process mining framework for correlating, predicting and clustering dynamic behavior based on event logs. Inf. Syst., 56:235–257.

Han, J., Pei, J., and Kamber, M. (2012). Data Mining: Concepts and Techniques. Morgan Kauffman, Waltham, 3rd edition.

Koninck, P., Broucke, S., and Weerdt, J. (2018). act2vec, trace2vec, log2vec, and moIn Bus. Process Manage., del2vec: Representation learning for business processes. volume 11080 of Lect. Notes Comput. Sci., pages 305–321, Berlin. Springer.

Krantz, D., Luce, D., Suppes, P., and Tversky, A. (1971). Foundations of Measurement, volume 1. Dover, US.

Lu, X. (2018). Using behavioral context in process mining: exploration, preprocessing and analysis of event data. PhD dissertation, Eindhoven University of Technology.

Luettgen, S., Seeliger, A., Nolle, T., and Mühlhäuser, M. (2021). Case2vec: Advances in representation learning for business processes. In Process Mining Workshops, pages 162–174. Springer.

Maita, A. R. C., Martins, L., Paz, C. R. L., Rafferty, L., Hung, P. C. K., Peres, S. M., and Fantinato, M. (2017). A systematic mapping study of process mining. Enterprise Inf. Syst., 12:1–45.

Mannhardt, F. and Blinde, D. (2017). Analyzing the trajectories of patients with sepsis using process mining. In Proc. Radar tracks at the 18th Int. Working Conf. on BPMDS.

Ostovar, A., Leemans, S. J. J., and La Rosa, M. (2020). Robust drift characterization from event streams of business processes. ACM Trans. Knowl. Discov. Data, 14(3).

Peeperkorn, J., vanden Broucke, S., and Weerdt, J. (2020). Conformance checking using activity and trace embeddings. In BPM, pages 105–121, Berlin. Springer.

Salton, G., Wong, A., and Yang, C. S. (1975). A vector space model for automatic indexing. Commun. ACM, 18(11):613–620.

Song, M., Gunther, C. W., and van der Aalst, W. M. P. (2008). Trace clustering in process mining. In Bus. Process Manage. Workshops, pages 109–120, Berlin. Springer.

Tavares, G. and Barbon, S. (2020). Analysis of language inspired trace representation for anomaly detection. In ADBIS, TPDL and EDA 2020 Common Workshops and Doctoral Consortium, volume 1260, pages 296–308. Springer.

van der Aalst, W. M. P. (2011). Process mining – discovery, conformance and enhancement of business processes. Springer, Berlin, 1st edition.

van der Aalst, W. M. P. (2016). Process Mining: Data Science in Action. Springer, Berlin, 2nd edition.

Weske, M. (2007). Business Process Management: Concepts, Languages, Architectures. Springer, Berlin, 2nd edition.
LUNA, Mateus Alex dos Santos; LIMA, André Paulino; NEUBAUER, Thaís Rodrigues; FANTINATO, Marcelo; PERES, Sarajane Marques. Vector space models for trace clustering: a comparative study. In: ENCONTRO NACIONAL DE INTELIGÊNCIA ARTIFICIAL E COMPUTACIONAL (ENIAC), 18. , 2021, Evento Online. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2021 . p. 446-457. ISSN 2763-9061. DOI:


1 2 > >>