Benchmarking LLMs in Geoscience: A Serverless Approach using GeoBench and AWS

  • Otávio Parraga PUCRS
  • Arthur Fachel PUCRS
  • Rodolfo S. Antunes UNISINOS
  • Luiz Gonzaga Jr UNISINOS
  • Maurício Roberto Veronez UNISINOS
  • Rodrigo C. Barros PUCRS
  • Lucas S. Kupssinskü PUCRS

Resumo


Este artigo apresenta uma avaliação sistemática de Large Language Models (LLMs) de pesos abertos para tarefas geocientíficas, utilizando o benchmark GeoBench. Para superar restrições de hardware local ao avaliar modelos massivos, implementamos uma infraestrutura em nuvem serverless na AWS, utilizando API Gateway, Lambda e Amazon Bedrock. Essa arquitetura permitiu inferência em larga escala e o aumento automatizado de dados.

Referências

DeepSeek-AI, Guo, D., Yang, D., Zhang, H., Song, J.-M., Zhang, R., Xu, R., Zhu, Q., Ma, S., Wang, P., Bi, X., Zhang, X., et al. (2025). Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning.

Deng, C., Zhang, T., He, Z., Chen, Q., Shi, Y., Xu, Y., Fu, L., Zhang, W., Wang, X., Zhou, C., et al. (2024). K2: A foundation language model for geoscience knowledge understanding and utilization. In Proceedings of the 17th ACM International Conference on Web Search and Data Mining, pages 161–170.

Dramsch, J. S. (2020). 70 years of machine learning in geoscience in review. Advances in geophysics, 61:1–55.

Dubey, A., Jauhri, A., Pandey, A., Kadian, A., Al-Dahle, A., Letman, A., Mathur, A., Schelten, A., Yang, A., Fan, A., et al. (2024). The llama 3 herd of models. arXiv preprint arXiv:2407.21783.

Garcez, V. H., Parraga, O., Marques, A., Spigolon, A. L. D., De Barros, G., Gonzaga, L., Veronez, M. R., Barros, R. C., and Kupssinskü, L. S. (2025). Which is the best llm for geosciences? In IGARSS 2025-2025 IEEE International Geoscience and Remote Sensing Symposium, pages 6374–6378. IEEE.

Lin, Z., Deng, C., Zhou, L., Zhang, T., Xu, Y., Xu, Y., He, Z., Shi, Y., Dai, B., Song, Y., et al. (2023). Geogalactica: A scientific large language model in geoscience. arXiv preprint arXiv:2401.00434.

Marques Jr, A., Horota, R. K., De Souza, E. M., Kupssinskü, L., Rossa, P., Aires, A. S., Bachi, L., Veronez, M. R., Gonzaga Jr, L., and Cazarin, C. L. (2020). Virtual and digital outcrops in the petroleum industry: A systematic review. Earth-Science Reviews, 208:103260.

Meta AI (2025). Llama 4: Multimodal intelligence. [link]. Accessed: 2026-01-08.

Mistral AI (2025). Introducing mistral 3. [link]. Accessed: 2026-01-29.

Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A., et al. (2022). Training language models to follow instructions with human feedback. Advances in neural information processing systems, 35:27730–27744.

Parraga, O., More, M. D., Oliveira, C. M., Gavenski, N. S., Kupssinskü, L. S., Medronha, A., Moura, L. V., Simões, G. S., and Barros, R. C. (2023). Fairness in deep learning: A survey on vision and language research. ACM Computing Surveys.

Whitmeyer, S. J., Nicoletti, J., and De Paor, D. G. (2010). The digital revolution in geologic mapping. Gsa Today, 20(4/5):4–10.

Zhao, W. X., Zhou, K., Li, J., Tang, T., Wang, X., Hou, Y., Min, Y., Zhang, B., Zhang, J., Dong, Z., et al. (2023). A survey of large language models. arXiv preprint arXiv:2303.18223.

Zhong, J., Shen, W., Li, Y., Gao, S., Lu, H., Chen, Y., Zhang, Y., Zhou, W., Gu, J., and Zou, L. (2025). A comprehensive survey of reward models: Taxonomy, applications, challenges, and future. arXiv preprint arXiv:2504.12328.
Publicado
19/07/2026
PARRAGA, Otávio; FACHEL, Arthur; ANTUNES, Rodolfo S.; GONZAGA JR, Luiz; VERONEZ, Maurício Roberto; BARROS, Rodrigo C.; KUPSSINSKÜ, Lucas S.. Benchmarking LLMs in Geoscience: A Serverless Approach using GeoBench and AWS. In: SIMPÓSIO DE INFRAESTRUTURA DIGITAL/NUVEM PARA PESQUISA (PESQUISA@NUVEM), 1. , 2026, Gramado/RS. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2026 . p. 109-114. DOI: https://doi.org/10.5753/pesquisanuvem.2026.22263.