A Retrieval-Augmented Generation Information System for the Oil and Gas Industry

Rhuan Garcia de Assis Teixeira; Luciano Henrique Peixoto da Silva; Thiago Oliveira-Santos; Alexandre Rodrigues; Marcos Pellegrini Ribeiro; Flávio Miguel Varejão

doi:10.5753/sbsi.2026.248691

Rhuan Garcia de Assis Teixeira UFES
Luciano Henrique Peixoto da Silva UFES
Thiago Oliveira-Santos UFES
Alexandre Rodrigues UFES
Marcos Pellegrini Ribeiro Petróleo Brasileiro S.A.
Flávio Miguel Varejão UFES

DOI: https://doi.org/10.5753/sbsi.2026.248691

Resumo

Research Context: The oil and gas industry relies on extensive and complex technical documentation, making manual information retrieval slow and inefficient for engineers and technicians who require quick and reliable answers for decisions. Scientific and/or Practical Problem: Generalist Large Language Models (LLMs) often struggle with specialized domains, leading to inaccuracies, contextual errors (”hallucinations”), and an inability to handle specific technical jargon. Proposed Solution and/or Analysis: This paper presents ”RAG Petrolês”an information system assistant built using Retrieval-Augmented Generation (RAG) on ”Petrolês”, a collection of Portuguese theses and dissertations on the topic. It uses the IBM Granite model for embeddings, FAISS for similarity search, and a combination of the DeepSeek R1 and Mistral Small 3.2 LLMs to generate and refine answers. Related IS Theory: The work is grounded in the Information Processing theory where a Retrieval-Augmented Generation (RAG) system integrates external knowledge bases with LLMs to enhance accuracy, contextual relevance of their responses and reduce hallucinations. Research Method: A quantitative and qualitative evaluation was performed. The system’s performance was tested against a custom dataset of 1500 questions. The evaluation involved a manual analysis of 150 answers and a broader statistical analysis of all 1500 responses using an ”LLM-as-a-Judge”approach combined with Prediction-Powered Inference (PPI) to ensure robust results. Summary of Results: The proposed system achieved an accuracy of 88.88% in manual evaluation. The statistical analysis resulted in an accuracy range from 81.48% to 97.71% within a 95% confidence interval. Outperforming baseline models and demonstrating superior performance compared to a direct adaptation of a similar existing framework. Contributions and Impact to IS area: This research demonstrates the effectiveness of a specialized RAG system in a technical, non-English domain. It provides a viable architecture for creating highly accurate, specialized information systems assistants without resorting to retrain foundational models.

Referências

Angelopoulos, A. N., Bates, S., Fannjiang, C., Jordan, M. I., and Zrnic, T. (2023). Prediction-powered inference.

Awasthy, P., Trivedi, A., Li, Y., Bornea, M., Cox, D., Daniels, A., Franz, M., Goodhart, G., Iyer, B., Kumar, V., Lastras, L., McCarley, S., Murthy, R., P, V., Rosenthal, S., Roukos, S., Sen, J., Sharma, S., Sil, A., Soule, K., Sultan, A., and Florian, R. (2025). Granite embedding models.

Chase, H. et al. (2022). Langchain: Building applications with llms through composability. [link]. Acesso em: 15 de jul. de 2025.

Chen, X. and Wiseman, S. (2023). Bm25 query augmentation learned end-to-end.

Cordeiro, F. C. (2020). Petrolês – como construir um corpus especializado em óleo e gás em português. Master’s thesis, Pontifícia Universidade Católica do Rio de Janeiro, Rio de Janeiro, RJ, Brasil.

Dam, S. K., Hong, C. S., Qiao, Y., and Zhang, C. (2024). A complete survey on llm-based ai chatbots.

Dong, W., Moses, C., and Li, K. (2011). Efficient k-nearest neighbor graph construction for generic similarity measures. In Proceedings of the 20th International Conference on World Wide Web, WWW ’11, page 577–586, New York, NY, USA. Association for Computing Machinery.

Douze, M., Guzhva, A., Deng, C., Johnson, J., Szilvasy, G., Mazaré, P.-E., Lomeli, M., Hosseini, L., and Jégou, H. (2025). The faiss library.

et al., D.-A. (2025a). Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning.

et al., G. T. (2025b). Gemma 3 technical report.

et al., T. B. B. (2020). Language models are few-shot learners.

Fu, C., Xiang, C., Wang, C., and Cai, D. (2025). Fast approximate nearest neighbor search with the navigating spreading-out graph.

Gu, J., Jiang, X., Shi, Z., Tan, H., Zhai, X., Xu, C., Li, W., Shen, Y., Ma, S., Liu, H., Wang, S., Zhang, K., Wang, Y., Gao, W., Ni, L., and Guo, J. (2025). A survey on llm-as-a-judge.

Guo, R., Sun, P., Lindgren, E., Geng, Q., Simcha, D., Chern, F., and Kumar, S. (2020). Accelerating large-scale inference with anisotropic vector quantization.

Jiang, A. Q., Sablayrolles, A., Mensch, A., Bamford, C., Chaplot, D. S., de las Casas, D., Bressand, F., Lengyel, G., Lample, G., Saulnier, L., Lavaud, L. R., Lachaux, M.-A., Stock, P., Scao, T. L., Lavril, T., Wang, T., Lacroix, T., and Sayed, W. E. (2023). Mistral 7b.

Johnson, J., Douze, M., and Jégou, H. (2017). Billion-scale similarity search with gpus.

Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., tau Yih, W., Rocktäschel, T., Riedel, S., and Kiela, D. (2021). Retrieval-augmented generation for knowledge-intensive nlp tasks.

Long, C., Liu, Y., Ouyang, C., and Yu, Y. (2024). Bailicai: A domain-optimized retrieval-augmented generation framework for medical applications.

Malkov, Y. A. and Yashunin, D. A. (2018). Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs.

Meta Platforms, Inc. (2025). Meta llama 3.3. [link]. Acesso em: 15 de jul. de 2025.

Mistral AI (2025). Mistral small 3.2. [link]. Acessado em: 26 de setembro de 2025.

Sun, H., Wang, Y., and Zhang, S. (2024). Retrieval-augmented generation for domain-specific question answering: A case study on pittsburgh and cmu.

Wang, W., Wei, F., Dong, L., Bao, H., Yang, N., and Zhou, M. (2020). Minilm: Deep self-attention distillation for task-agnostic compression of pre-trained transformers.

Xu, F., Hao, Q., Zong, Z., Wang, J., Zhang, Y., Wang, J., Lan, X., Gong, J., Ouyang, T., Meng, F., Shao, C., Yan, Y., Yang, Q., Song, Y., Ren, S., Hu, X., Li, Y., Feng, J., Gao, C., and Li, Y. (2025). Towards large reasoning models: A survey of reinforced reasoning with large language models.

Yu, H., Gan, A., Zhang, K., Tong, S., Liu, Q., and Liu, Z. (2025). Evaluation of Retrieval-Augmented Generation: A Survey, page 102–120. Springer Nature, Singapore.

Zhang, T., Patil, S. G., Jain, N., Shen, S., Zaharia, M., Stoica, I., and Gonzalez, J. E. (2024). Raft: Adapting language model to domain specific rag.

A Retrieval-Augmented Generation Information System for the Oil and Gas Industry

Resumo

Referências

Artigos mais lidos do(s) mesmo(s) autor(es)