An Annotated Dataset for Automatic Extraction of Entities and Restrictions from Business Process Models
Resumo
A Modelagem de Processos de Negócio é frequentemente percebida como uma atividade com alto retorno potencial, mas de dif́ıcil realização. Diversas técnicas e métodos têm sido propostos e investigados para apoiar esta atividade, destacando-se o uso de técnicas de processamento de linguagem natural. Entretanto, a escassez de conjuntos de dados especificamente para este fim constitui uma importante limitação reconhecida pela literatura. Este trabalho propõe uma base de dados anotada para a identificação de entidades e restrições t́ıpicas de processos de negócios. Experimentos conduzidos com foco no reconhecimento de entidades sugerem que a arquitetura BiLSTM-CRF, com incorporações de palavras extráıdas dos modelos GloVe, Flair e BERT, alcançou o melhor desempenho com base na média micro da medida f1-score.
Palavras-chave:
Modelagem de Processos de Negócio, Processamento de Linguagem Natural, Reconhecimento de Entidade Nomeada
Referências
Ackermann, L., Neuberger, J., and Jablonski, S. (2021). Data-driven annotation of textual process descriptions based on formal meaning representations. In International Conference on Advanced Information Systems Engineering, pages 75–90. Springer.
Akbik, A., Blythe, D., and Vollgraf, R. (2018). Contextual string embeddings for sequence labeling. In Proceedings of the 27th international conference on computational linguistics, pages 1638–1649.
Barba, I., Del Valle, C., Weber, B., and Jimenez, A. (2013). Automatic generation of optimized business process models from constraint-based specifications. International Journal of Cooperative Information Systems, 22(02):1350009.
Beerepoot, I., Di Ciccio, C., Reijers, H. A., Rinderle-Ma, S., Bandara, W., Burattin, A., Calvanese, D., Chen, T., Cohen, I., Depaire, B., et al. (2023). The biggest business process management problems to solve before we die. Computers in Industry, 146:103837.
Bellan, P., Dragoni, M., and Ghidini, C. (2020). A qualitative analysis of the state of the art in process extraction from text. DP@AI*IA, pages 19–30.
Bellan, P., Dragoni, M., and Ghidini, C. (2022a). Extracting business process entities and relations from text using pre-trained language models and in-context learning. In International Conference on Enterprise Design, Operations, and Computing, pages 182–199. Springer.
Bellan, P., van der Aa, H., Dragoni, M., Ghidini, C., and Ponzetto, S. P. (2022b). Pet: an annotated dataset for process extraction from natural language text tasks. In International Conference on Business Process Management, pages 315–321. Springer.
Bellan, P., van der Aa, H., Dragoni, M., Ghidini, C., and Ponzetto, S. P. (2023). Process extraction from text: Benchmarking the state of the art and paving the way for future challenges. arXiv preprint arXiv:2110.03754.
Costa, M. B. and Tamzalit, D. (2017). Recommendation patterns for business process imperative modeling. In Proceedings of the Symposium on Applied Computing, pages 735–742.
da Silva, M. G. and de Oliveira, H. T. A. (2022). Combining word embeddings for portuguese named entity recognition. In International Conference on Computational Processing of the Portuguese Language, pages 198–208. Springer.
Deng, S., Wang, D., Li, Y., Cao, B., Yin, J., Wu, Z., and Zhou, M. (2016). A recommendation system to facilitate business process modeling. IEEE transactions on cybernetics, 47(6):1380–1394.
Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
do Amaral, D. O. F. and Vieira, R. (2014). Nerp-crf: uma ferramenta para o reconhecimento de entidades nomeadas por meio de conditional random fields. Linguamática, 6(1):41–49.
Dumas, M., La Rosa, M., Mendling, J., Reijers, H. A., et al. (2018). Fundamentals of business process management, volume 2. Springer.
Epure, E. V., Mart́ın-Rodilla, P., Hug, C., Deneckère, R., and Salinesi, C. (2015). Automatic process model discovery from textual methodologies. In 2015 IEEE 9th International Conference on Research Challenges in Information Science (RCIS), pages 19–30. IEEE.
Ferreira, R. C. B., Thom, L. H., and Fantinato, M. (2017). A semi-automatic approach to identify business process elements in natural language texts. In International Conference on Enterprise Information Systems, volume 2, pages 250–261. SCITEPRESS.
Fionda, V. and Guzzo, A. (2020). Control-flow modeling with declare: Behavioral properties, computational complexity, and tools. IEEE Transactions on Knowledge & Data Engineering, 32(05):898–911.
Friedrich, F., Mendling, J., and Puhlmann, F. (2011). Process model generation from natural language text. In Advanced Information Systems Engineering: 23rd International Conference, CAiSE 2011, London, UK, June 20-24, 2011. Proceedings 23, pages 482–496. Springer.
Klievtsova, N., Benzin, J.-V., Kampik, T., Mangler, J., and Rinderle-Ma, S. (2023). Conversational process modelling: State of the art, applications, and implications in practice. arXiv preprint arXiv:2304.11065.
Li, J., Sun, A., Han, J., and Li, C. (2020). A survey on deep learning for named entity recognition. IEEE transactions on knowledge and data engineering, 34(1):50–70.
López, H. A., Strømsted, R., Niyodusenga, J.-M., and Marquard, M. (2021). Declarative process discovery: Linking process and textual views. In Nurcan, S. and Korthaus, A., editors, Intelligent Information Systems, pages 109–117, Cham. Springer International Publishing.
Mangler, J. and Klievtsova, N. (2023). Textual process descriptions and corresponding bpmn models. DOI: 10.5281/zenodo.7783492.
Pennington, J., Socher, R., and Manning, C. D. (2014). Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pages 1532–1543.
Qian, C., Wen, L., Kumar, A., Lin, L., Lin, L., Zong, Z., Li, S., and Wang, J. (2020). An approach for process model extraction by multi-grained text classification. In Advanced Information Systems Engineering: 32nd International Conference, CAiSE 2020, Grenoble, France, June 8–12, 2020, Proceedings 32, pages 268–282. Springer.
Quishpi, L., Carmona, J., and Padró, L. (2020). Extracting annotations from textual descriptions of processes. In Business Process Management: 18th International Conference, BPM 2020, Seville, Spain, September 13–18, 2020, Proceedings 18, pages 184–201. Springer.
Shilov, N., Othman, W., Fellmann, M., and Sandkuhl, K. (2023). Machine learning for enterprise modeling assistance: an investigation of the potential and proof of concept. Software and Systems Modeling, 22(2):619–646.
Van der Aa, H., Di Ciccio, C., Leopold, H., and Reijers, H. A. (2019). Extracting declarative process models from natural language. In Advanced Information Systems Engineering: 31st International Conference, CAiSE 2019, Rome, Italy, June 3–7, 2019, Proceedings 31, pages 365–382. Springer.
Yuan, A., Ippolito, D., Nikolaev, V., Callison-Burch, C., Coenen, A., and Gehrmann, S. (2021). Synthbio: A case study in faster curation of text datasets. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2).
Akbik, A., Blythe, D., and Vollgraf, R. (2018). Contextual string embeddings for sequence labeling. In Proceedings of the 27th international conference on computational linguistics, pages 1638–1649.
Barba, I., Del Valle, C., Weber, B., and Jimenez, A. (2013). Automatic generation of optimized business process models from constraint-based specifications. International Journal of Cooperative Information Systems, 22(02):1350009.
Beerepoot, I., Di Ciccio, C., Reijers, H. A., Rinderle-Ma, S., Bandara, W., Burattin, A., Calvanese, D., Chen, T., Cohen, I., Depaire, B., et al. (2023). The biggest business process management problems to solve before we die. Computers in Industry, 146:103837.
Bellan, P., Dragoni, M., and Ghidini, C. (2020). A qualitative analysis of the state of the art in process extraction from text. DP@AI*IA, pages 19–30.
Bellan, P., Dragoni, M., and Ghidini, C. (2022a). Extracting business process entities and relations from text using pre-trained language models and in-context learning. In International Conference on Enterprise Design, Operations, and Computing, pages 182–199. Springer.
Bellan, P., van der Aa, H., Dragoni, M., Ghidini, C., and Ponzetto, S. P. (2022b). Pet: an annotated dataset for process extraction from natural language text tasks. In International Conference on Business Process Management, pages 315–321. Springer.
Bellan, P., van der Aa, H., Dragoni, M., Ghidini, C., and Ponzetto, S. P. (2023). Process extraction from text: Benchmarking the state of the art and paving the way for future challenges. arXiv preprint arXiv:2110.03754.
Costa, M. B. and Tamzalit, D. (2017). Recommendation patterns for business process imperative modeling. In Proceedings of the Symposium on Applied Computing, pages 735–742.
da Silva, M. G. and de Oliveira, H. T. A. (2022). Combining word embeddings for portuguese named entity recognition. In International Conference on Computational Processing of the Portuguese Language, pages 198–208. Springer.
Deng, S., Wang, D., Li, Y., Cao, B., Yin, J., Wu, Z., and Zhou, M. (2016). A recommendation system to facilitate business process modeling. IEEE transactions on cybernetics, 47(6):1380–1394.
Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
do Amaral, D. O. F. and Vieira, R. (2014). Nerp-crf: uma ferramenta para o reconhecimento de entidades nomeadas por meio de conditional random fields. Linguamática, 6(1):41–49.
Dumas, M., La Rosa, M., Mendling, J., Reijers, H. A., et al. (2018). Fundamentals of business process management, volume 2. Springer.
Epure, E. V., Mart́ın-Rodilla, P., Hug, C., Deneckère, R., and Salinesi, C. (2015). Automatic process model discovery from textual methodologies. In 2015 IEEE 9th International Conference on Research Challenges in Information Science (RCIS), pages 19–30. IEEE.
Ferreira, R. C. B., Thom, L. H., and Fantinato, M. (2017). A semi-automatic approach to identify business process elements in natural language texts. In International Conference on Enterprise Information Systems, volume 2, pages 250–261. SCITEPRESS.
Fionda, V. and Guzzo, A. (2020). Control-flow modeling with declare: Behavioral properties, computational complexity, and tools. IEEE Transactions on Knowledge & Data Engineering, 32(05):898–911.
Friedrich, F., Mendling, J., and Puhlmann, F. (2011). Process model generation from natural language text. In Advanced Information Systems Engineering: 23rd International Conference, CAiSE 2011, London, UK, June 20-24, 2011. Proceedings 23, pages 482–496. Springer.
Klievtsova, N., Benzin, J.-V., Kampik, T., Mangler, J., and Rinderle-Ma, S. (2023). Conversational process modelling: State of the art, applications, and implications in practice. arXiv preprint arXiv:2304.11065.
Li, J., Sun, A., Han, J., and Li, C. (2020). A survey on deep learning for named entity recognition. IEEE transactions on knowledge and data engineering, 34(1):50–70.
López, H. A., Strømsted, R., Niyodusenga, J.-M., and Marquard, M. (2021). Declarative process discovery: Linking process and textual views. In Nurcan, S. and Korthaus, A., editors, Intelligent Information Systems, pages 109–117, Cham. Springer International Publishing.
Mangler, J. and Klievtsova, N. (2023). Textual process descriptions and corresponding bpmn models. DOI: 10.5281/zenodo.7783492.
Pennington, J., Socher, R., and Manning, C. D. (2014). Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pages 1532–1543.
Qian, C., Wen, L., Kumar, A., Lin, L., Lin, L., Zong, Z., Li, S., and Wang, J. (2020). An approach for process model extraction by multi-grained text classification. In Advanced Information Systems Engineering: 32nd International Conference, CAiSE 2020, Grenoble, France, June 8–12, 2020, Proceedings 32, pages 268–282. Springer.
Quishpi, L., Carmona, J., and Padró, L. (2020). Extracting annotations from textual descriptions of processes. In Business Process Management: 18th International Conference, BPM 2020, Seville, Spain, September 13–18, 2020, Proceedings 18, pages 184–201. Springer.
Shilov, N., Othman, W., Fellmann, M., and Sandkuhl, K. (2023). Machine learning for enterprise modeling assistance: an investigation of the potential and proof of concept. Software and Systems Modeling, 22(2):619–646.
Van der Aa, H., Di Ciccio, C., Leopold, H., and Reijers, H. A. (2019). Extracting declarative process models from natural language. In Advanced Information Systems Engineering: 31st International Conference, CAiSE 2019, Rome, Italy, June 3–7, 2019, Proceedings 31, pages 365–382. Springer.
Yuan, A., Ippolito, D., Nikolaev, V., Callison-Burch, C., Coenen, A., and Gehrmann, S. (2021). Synthbio: A case study in faster curation of text datasets. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2).
Publicado
17/11/2024
Como Citar
CANDIDO, Diogo S.; LIMA, João Victor Berti; OLIVEIRA, Hilário; COSTA, Mateus B..
An Annotated Dataset for Automatic Extraction of Entities and Restrictions from Business Process Models. In: ENCONTRO NACIONAL DE INTELIGÊNCIA ARTIFICIAL E COMPUTACIONAL (ENIAC), 21. , 2024, Belém/PA.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2024
.
p. 978-989.
ISSN 2763-9061.
DOI: https://doi.org/10.5753/eniac.2024.245085.