Avaliação da Capacidade de LLMs para Especificar Workflows
Resumo
Este artigo avalia o uso de Modelos de Linguagem de Grande Escala (LLMs) na especificação de workflows a partir de descrições em linguagem natural. Foram comparados três LLMs (GPT-4o, DeepSeek V3 e Command-A), duas versões de prompts e quatro sistemas de workflow (Nextflow, Parsl, Dask e Airflow), aplicados a três níveis de complexidade de workflows. Os resultados indicam que prompts com exemplos produzem especificações sintaticamente mais corretas e semanticamente mais alinhadas com a especificação em linguagem natural, com destaque para o desempenho do GPT-4o e do Dask. Ainda assim, desafios persistem na geração de workflows complexos e que envolvem paralelismo.
Palavras-chave:
LLM, Workflows
Referências
Babuji, Y. N. et al. (2019). Parsl: Pervasive parallel programming in python. In HPDC’19, pages 25–36. ACM.
Choi, H. K. and Li, Y. (2024). PICLe: Eliciting Diverse Behaviors from Large Language Models with Persona In-Context Learning. In PMLR’24, pages 8722–8739.
de Oliveira, D. et al. (2019). Data-Intensive Workflow Management: For Clouds and Data-Intensive and Scalable Computing Environments. Morgan & Claypool Publishers.
Di Tommaso, P. et al. (2017). Nextflow enables reproducible computational workflows. Nature Biotechnology, 35(4):316–319.
Dong, Q. et al. (2024). A Survey on In-context Learning. In Proc. of EMNLP’24, pages 1107–1128, Miami, Florida, USA. ACL.
Duque, A., Syed, A., Day, K. V., Berry, M. J., Katz, D. S., et al. (2023). Leveraging large language models to build and execute computational workflows.
Koziolek, H. et al. (2024). Llm-based and retrieval-augmented control code generation. In LLM4Code ’24, LLM4Code ’24, page 22–29, New York, NY, USA. ACM.
Matthew Rocklin (2015). Dask: Parallel Computation with Blocked algorithms and Task Scheduling. In Proc. of the 14th Python in Science Conference, pages 126 – 132.
Paiva, L. et al. (2025). Domínio delimitado, Ódio exposto: O uso de prompts para identificação de discurso de Ódio online com llms. In SBBD’25, Fortaleza, Brasil.
Sänger, M. et al. (2024). A qualitative assessment of using chatgpt as large language model for scientific workflow development. GigaScience, 13.
Vaswani, A. et al. (2017). Attention is All you Need. In Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc.
Wei, J. et al. (2022). Chain-of-thought prompting elicits reasoning in large language models. In NIPs’22, Red Hook, NY, USA. Curran Associates Inc.
Xu, J. et al. (2024). Llm4workflow: An llm-based automated workflow model generation tool. In ASE’24, ASE ’24, page 2394–2398, New York, NY, USA. ACM.
Yildiz, O. and Peterka, T. (2025). Do large language models speak scientific workflows? Zhang, X. et al. (2024). Massw: A new dataset and benchmark tasks for ai-assisted scientific workflows. arXiv preprint arXiv:2406.06357.
Choi, H. K. and Li, Y. (2024). PICLe: Eliciting Diverse Behaviors from Large Language Models with Persona In-Context Learning. In PMLR’24, pages 8722–8739.
de Oliveira, D. et al. (2019). Data-Intensive Workflow Management: For Clouds and Data-Intensive and Scalable Computing Environments. Morgan & Claypool Publishers.
Di Tommaso, P. et al. (2017). Nextflow enables reproducible computational workflows. Nature Biotechnology, 35(4):316–319.
Dong, Q. et al. (2024). A Survey on In-context Learning. In Proc. of EMNLP’24, pages 1107–1128, Miami, Florida, USA. ACL.
Duque, A., Syed, A., Day, K. V., Berry, M. J., Katz, D. S., et al. (2023). Leveraging large language models to build and execute computational workflows.
Koziolek, H. et al. (2024). Llm-based and retrieval-augmented control code generation. In LLM4Code ’24, LLM4Code ’24, page 22–29, New York, NY, USA. ACM.
Matthew Rocklin (2015). Dask: Parallel Computation with Blocked algorithms and Task Scheduling. In Proc. of the 14th Python in Science Conference, pages 126 – 132.
Paiva, L. et al. (2025). Domínio delimitado, Ódio exposto: O uso de prompts para identificação de discurso de Ódio online com llms. In SBBD’25, Fortaleza, Brasil.
Sänger, M. et al. (2024). A qualitative assessment of using chatgpt as large language model for scientific workflow development. GigaScience, 13.
Vaswani, A. et al. (2017). Attention is All you Need. In Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc.
Wei, J. et al. (2022). Chain-of-thought prompting elicits reasoning in large language models. In NIPs’22, Red Hook, NY, USA. Curran Associates Inc.
Xu, J. et al. (2024). Llm4workflow: An llm-based automated workflow model generation tool. In ASE’24, ASE ’24, page 2394–2398, New York, NY, USA. ACM.
Yildiz, O. and Peterka, T. (2025). Do large language models speak scientific workflows? Zhang, X. et al. (2024). Massw: A new dataset and benchmark tasks for ai-assisted scientific workflows. arXiv preprint arXiv:2406.06357.
Publicado
29/09/2025
Como Citar
WOYAMES, Paula; PINA, Débora; KUNSTMANN, Liliane; MATTOSO, Marta; DE OLIVEIRA, Daniel.
Avaliação da Capacidade de LLMs para Especificar Workflows. In: BRAZILIAN E-SCIENCE WORKSHOP (BRESCI), 19. , 2025, Fortaleza/CE.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2025
.
p. 81-88.
ISSN 2763-8774.
DOI: https://doi.org/10.5753/bresci.2025.248218.
