An Annotated Dataset for Automatic Extraction of Entities and Restrictions from Business Process Models

  • Diogo S. Candido IFES
  • João Victor Berti Lima IFES
  • Hilário Oliveira IFES
  • Mateus B. Costa IFES

Abstract


Business Process Modeling is often perceived as a high-potential activity that is difficult to implement. Various techniques and methods have been proposed and investigated to support this activity, with emphasis on the use of natural language processing techniques. However, the scarcity of datasets specifically for this purpose constitutes an important limitation recognized by the literature. This work proposes an annotated dataset for identifying typical business process entities and restrictions. Experiments conducted focusing on entity recognition suggest that the BiLSTM-CRF architecture, with word embeddings extracted from the GloVe, Flair, and BERT models, achieved the best performance based on the micro average of the f1-score measure.
Keywords: Business Process Modeling, Natural Language Processing, Named Entity Recognition

References

Ackermann, L., Neuberger, J., and Jablonski, S. (2021). Data-driven annotation of textual process descriptions based on formal meaning representations. In International Conference on Advanced Information Systems Engineering, pages 75–90. Springer.

Akbik, A., Blythe, D., and Vollgraf, R. (2018). Contextual string embeddings for sequence labeling. In Proceedings of the 27th international conference on computational linguistics, pages 1638–1649.

Barba, I., Del Valle, C., Weber, B., and Jimenez, A. (2013). Automatic generation of optimized business process models from constraint-based specifications. International Journal of Cooperative Information Systems, 22(02):1350009.

Beerepoot, I., Di Ciccio, C., Reijers, H. A., Rinderle-Ma, S., Bandara, W., Burattin, A., Calvanese, D., Chen, T., Cohen, I., Depaire, B., et al. (2023). The biggest business process management problems to solve before we die. Computers in Industry, 146:103837.

Bellan, P., Dragoni, M., and Ghidini, C. (2020). A qualitative analysis of the state of the art in process extraction from text. DP@AI*IA, pages 19–30.

Bellan, P., Dragoni, M., and Ghidini, C. (2022a). Extracting business process entities and relations from text using pre-trained language models and in-context learning. In International Conference on Enterprise Design, Operations, and Computing, pages 182–199. Springer.

Bellan, P., van der Aa, H., Dragoni, M., Ghidini, C., and Ponzetto, S. P. (2022b). Pet: an annotated dataset for process extraction from natural language text tasks. In International Conference on Business Process Management, pages 315–321. Springer.

Bellan, P., van der Aa, H., Dragoni, M., Ghidini, C., and Ponzetto, S. P. (2023). Process extraction from text: Benchmarking the state of the art and paving the way for future challenges. arXiv preprint arXiv:2110.03754.

Costa, M. B. and Tamzalit, D. (2017). Recommendation patterns for business process imperative modeling. In Proceedings of the Symposium on Applied Computing, pages 735–742.

da Silva, M. G. and de Oliveira, H. T. A. (2022). Combining word embeddings for portuguese named entity recognition. In International Conference on Computational Processing of the Portuguese Language, pages 198–208. Springer.

Deng, S., Wang, D., Li, Y., Cao, B., Yin, J., Wu, Z., and Zhou, M. (2016). A recommendation system to facilitate business process modeling. IEEE transactions on cybernetics, 47(6):1380–1394.

Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.

do Amaral, D. O. F. and Vieira, R. (2014). Nerp-crf: uma ferramenta para o reconhecimento de entidades nomeadas por meio de conditional random fields. Linguamática, 6(1):41–49.

Dumas, M., La Rosa, M., Mendling, J., Reijers, H. A., et al. (2018). Fundamentals of business process management, volume 2. Springer.

Epure, E. V., Mart́ın-Rodilla, P., Hug, C., Deneckère, R., and Salinesi, C. (2015). Automatic process model discovery from textual methodologies. In 2015 IEEE 9th International Conference on Research Challenges in Information Science (RCIS), pages 19–30. IEEE.

Ferreira, R. C. B., Thom, L. H., and Fantinato, M. (2017). A semi-automatic approach to identify business process elements in natural language texts. In International Conference on Enterprise Information Systems, volume 2, pages 250–261. SCITEPRESS.

Fionda, V. and Guzzo, A. (2020). Control-flow modeling with declare: Behavioral properties, computational complexity, and tools. IEEE Transactions on Knowledge & Data Engineering, 32(05):898–911.

Friedrich, F., Mendling, J., and Puhlmann, F. (2011). Process model generation from natural language text. In Advanced Information Systems Engineering: 23rd International Conference, CAiSE 2011, London, UK, June 20-24, 2011. Proceedings 23, pages 482–496. Springer.

Klievtsova, N., Benzin, J.-V., Kampik, T., Mangler, J., and Rinderle-Ma, S. (2023). Conversational process modelling: State of the art, applications, and implications in practice. arXiv preprint arXiv:2304.11065.

Li, J., Sun, A., Han, J., and Li, C. (2020). A survey on deep learning for named entity recognition. IEEE transactions on knowledge and data engineering, 34(1):50–70.

López, H. A., Strømsted, R., Niyodusenga, J.-M., and Marquard, M. (2021). Declarative process discovery: Linking process and textual views. In Nurcan, S. and Korthaus, A., editors, Intelligent Information Systems, pages 109–117, Cham. Springer International Publishing.

Mangler, J. and Klievtsova, N. (2023). Textual process descriptions and corresponding bpmn models. DOI: 10.5281/zenodo.7783492.

Pennington, J., Socher, R., and Manning, C. D. (2014). Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pages 1532–1543.

Qian, C., Wen, L., Kumar, A., Lin, L., Lin, L., Zong, Z., Li, S., and Wang, J. (2020). An approach for process model extraction by multi-grained text classification. In Advanced Information Systems Engineering: 32nd International Conference, CAiSE 2020, Grenoble, France, June 8–12, 2020, Proceedings 32, pages 268–282. Springer.

Quishpi, L., Carmona, J., and Padró, L. (2020). Extracting annotations from textual descriptions of processes. In Business Process Management: 18th International Conference, BPM 2020, Seville, Spain, September 13–18, 2020, Proceedings 18, pages 184–201. Springer.

Shilov, N., Othman, W., Fellmann, M., and Sandkuhl, K. (2023). Machine learning for enterprise modeling assistance: an investigation of the potential and proof of concept. Software and Systems Modeling, 22(2):619–646.

Van der Aa, H., Di Ciccio, C., Leopold, H., and Reijers, H. A. (2019). Extracting declarative process models from natural language. In Advanced Information Systems Engineering: 31st International Conference, CAiSE 2019, Rome, Italy, June 3–7, 2019, Proceedings 31, pages 365–382. Springer.

Yuan, A., Ippolito, D., Nikolaev, V., Callison-Burch, C., Coenen, A., and Gehrmann, S. (2021). Synthbio: A case study in faster curation of text datasets. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2).
Published
2024-11-17
CANDIDO, Diogo S.; LIMA, João Victor Berti; OLIVEIRA, Hilário; COSTA, Mateus B.. An Annotated Dataset for Automatic Extraction of Entities and Restrictions from Business Process Models. In: NATIONAL MEETING ON ARTIFICIAL AND COMPUTATIONAL INTELLIGENCE (ENIAC), 21. , 2024, Belém/PA. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2024 . p. 978-989. ISSN 2763-9061. DOI: https://doi.org/10.5753/eniac.2024.245085.