Contrato360: an application for questions and answers using language models, documents, and databases
Abstract
We present a methodology for building question-and-answer (Q&A) applications validated in a contract management process. We capture information from contract documents in PDF format and data from the support system. Then, it is submitted to GPT-4 to provide detailed answers. The relevance of the responses is improved through Augmented Retrieval (RAG) and text-to-SQL techniques without retraining the model. We also explored Prompt Engineering to focus the responses better. Throughout our work, we observed that these combined techniques increased the relevance of the answers. We highlight the potential of Large Language Models (LLMs) in building systems, paving the way for information systems that use natural language as an interface.
Keywords:
Contracts, large language models, question and answering, prompt engineering, rag, text-to-sql
References
Chen, J., Lin, H., Han, X., and Sun, L. (2024). Benchmarking large language models in retrieval-augmented generation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 17754–17762.
Feng, Z., Feng, X., Zhao, D., Yang, M., and Qin, B. (2024). Retrieval-generation synergy augmented large language models. In ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 11661–11665. IEEE.
Gao, D., Wang, H., Li, Y., Sun, X., Qian, Y., Ding, B., and Zhou, J. (2023a). Text-to-sql empowered by large language models: A benchmark evaluation. arXiv preprint arXiv:2308.15363.
Gao, Y., Xiong, Y., Gao, X., Jia, K., Pan, J., Bi, Y., Dai, Y., Sun, J., and Wang, H. (2023b). Retrieval-augmented generation for large language models: A survey. arXiv preprint arXiv:2312.10997.
Giray, L. (2023). Prompt engineering with chatgpt: a guide for academic writers. Annals of biomedical engineering, 51(12):2629–2633.
Jeong, C. (2023). A study on the implementation of generative ai services using an enterprise data-based llm application architecture. arXiv preprint arXiv:2309.01105.
Li, H., Su, Y., Cai, D., Wang, Y., and Liu, L. (2022). A survey on retrieval-augmented text generation. arXiv preprint arXiv:2202.01110.
Liu, A., Hu, X., Wen, L., and Yu, P. S. (2023). A comprehensive evaluation of chatgpt’s zero-shot text-to-sql capability. arXiv preprint arXiv:2303.13547.
OpenAI (2023a). Chatgpt fine-tune description. [link]. Accessed: 2024-03-01.
OpenAI (2023b). Chatgpt prompt engineering. [link]. Accessed: 2024-04-01.
Pinheiro, J., Victorio, W., Nascimento, E., Seabra, A., Izquierdo, Y., Garcıa, G., Coelho, G., Lemos, M., Leme, L. A. P. P., Furtado, A., et al. (2023). On the construction of database interfaces based on large language models. In Proceedings of the 19th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST, pages 373–380. INSTICC, SciTePress.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., and Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30.
Wang, M., Wang, M., Xu, X., Yang, L., Cai, D., and Yin, M. (2023). Unleashing chatgpt’s power: A case study on optimizing information retrieval in flipped classrooms via prompt engineering. IEEE Transactions on Learning Technologies.
White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., and Schmidt, D. C. (2023). A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382.
Feng, Z., Feng, X., Zhao, D., Yang, M., and Qin, B. (2024). Retrieval-generation synergy augmented large language models. In ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 11661–11665. IEEE.
Gao, D., Wang, H., Li, Y., Sun, X., Qian, Y., Ding, B., and Zhou, J. (2023a). Text-to-sql empowered by large language models: A benchmark evaluation. arXiv preprint arXiv:2308.15363.
Gao, Y., Xiong, Y., Gao, X., Jia, K., Pan, J., Bi, Y., Dai, Y., Sun, J., and Wang, H. (2023b). Retrieval-augmented generation for large language models: A survey. arXiv preprint arXiv:2312.10997.
Giray, L. (2023). Prompt engineering with chatgpt: a guide for academic writers. Annals of biomedical engineering, 51(12):2629–2633.
Jeong, C. (2023). A study on the implementation of generative ai services using an enterprise data-based llm application architecture. arXiv preprint arXiv:2309.01105.
Li, H., Su, Y., Cai, D., Wang, Y., and Liu, L. (2022). A survey on retrieval-augmented text generation. arXiv preprint arXiv:2202.01110.
Liu, A., Hu, X., Wen, L., and Yu, P. S. (2023). A comprehensive evaluation of chatgpt’s zero-shot text-to-sql capability. arXiv preprint arXiv:2303.13547.
OpenAI (2023a). Chatgpt fine-tune description. [link]. Accessed: 2024-03-01.
OpenAI (2023b). Chatgpt prompt engineering. [link]. Accessed: 2024-04-01.
Pinheiro, J., Victorio, W., Nascimento, E., Seabra, A., Izquierdo, Y., Garcıa, G., Coelho, G., Lemos, M., Leme, L. A. P. P., Furtado, A., et al. (2023). On the construction of database interfaces based on large language models. In Proceedings of the 19th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST, pages 373–380. INSTICC, SciTePress.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., and Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30.
Wang, M., Wang, M., Xu, X., Yang, L., Cai, D., and Yin, M. (2023). Unleashing chatgpt’s power: A case study on optimizing information retrieval in flipped classrooms via prompt engineering. IEEE Transactions on Learning Technologies.
White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., and Schmidt, D. C. (2023). A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382.
Published
2024-10-14
How to Cite
MEDEIROS, Antony Seabra de; CAVALCANTE, Claudio; NEPOMUCENO, João; LAGO, Lucas; RUBERG, Nicolaas; LIFSCHITZ, Sérgio.
Contrato360: an application for questions and answers using language models, documents, and databases. In: BRAZILIAN SYMPOSIUM ON DATABASES (SBBD), 39. , 2024, Florianópolis/SC.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2024
.
p. 155-166.
ISSN 2763-8979.
DOI: https://doi.org/10.5753/sbbd.2024.240871.
