Towards statistical-based prediction models for seasonal precipitation in the southeast of Brazil

  • Helder Arruda ITV
  • Anita Drumond ITV
  • Ewerton Oliveira ITV
  • Julio Freitas ITV / UFPA
  • Rafael Rocha ITV
  • Nikolas Carneiro ITV
  • Renata Tedeschi ITV
  • Ronnie Alves ITV / UFPA
  • Sergio Viademonte ITV
  • Eduardo Carvalho ITV / UFPA

Abstract


This study evaluates precipitation forecasting at 12 monitoring points in southeastern Brazil using five statistical models (MLR, ARIMA, SARIMA, SARIMAX, and VARMAX). The forecast accuracy was assessed at two time points (M1 and M2) using RMSE. MLR performed best in the short term (M1) in half the locations (50%), while SARIMAX led in four (33%). In the long term (M2), VARMAX outperformed others in seven locations (58%), highlighting its strength in the capture of multivariate dynamics. The results underscore the value of statistical models for localized weather forecasting and infrastructure planning.

References

Balaji, T. K., Annavarapu, C. S. R., and Bablani, A. (2021). Machine learning algorithms for social media analysis: A survey. Computer Science Review, 40:100395.

Bergmeir, C., Hyndman, R. J., and Koo, B. (2018). A note on the validity of cross-validation for evaluating autoregressive time series prediction. Computational Statistics & Data Analysis, 120:70–83.

Berrang-Ford, L., Sietsma, A. J., Callaghan, M., Minx, J. C., Scheelbeek, P. F., Haddaway, N. R., Haines, A., and Dangour, A. D. (2021). Systematic mapping of global research on climate and health: a machine learning review. The Lancet Planetary Health, 5(8):e514–e525.

Bowden, R. S. and Clarke, B. R. (2017). Using multivariate time series methods to estimate location and climate change effects on temperature readings employed in electricity demand simulation. Australian & New Zealand Journal of Statistics, 59(4):413–431.

Carpenedo, C. B. and da Silva, C. B. (2022). Influência de teleconexões na precipitação pluvial do cerrado brasileiro. Revista Brasileira de Climatologia, 30(18):26–46.

Chicco, D., Warrens, M. J., and Jurman, G. (2021). The coefficient of determination r-squared is more informative than smape, mae, mape, mse and rmse in regression analysis evaluation. Peerj computer science, 7:e623.

Coelho, C. A. S., de Oliveira, C. P., Ambrizzi, T., Reboita, M. S., Carpenedo, C. B., Campos, J. L. P. S., Tomaziello, A. C. N., Pampuch, L. A., Custódio, M. d. S., Dutra, L. M. M., Da Rocha, R. P., and Rehbein, A. (2016). The 2014 southeast brazil austral summer drought: regional scale mechanisms and teleconnections. Climate Dynamics, 46:3737–3752.

Elshewey, A. M., Shams, M. Y., Elhady, A. M., Shohieb, S. M., Abdelhamid, A. A., Ibrahim, A., and Tarek, Z. (2022). A novel wd-sarimax model for temperature forecasting using daily delhi climate dataset. Sustainability, 15(1):757.

Garg, A. and Mago, V. (2021). Role of machine learning in medical research: A survey. Computer science review, 40:100370.

Hayawi, K., Shahriar, S., and Hacid, H. (2025). Climate data imputation and quality improvement using satellite data. Journal of Data Science and Intelligent Systems, 3(2):87–97.

Hersbach, H., Bell, B., Berrisford, P., Biavati, G., Dee, D., Horányi, A., Nicolas, J., Peubey, C., Radu, R., Rozum, I., et al. (2019). The era5 global atmospheric reanalysis at ecmwf as a comprehensive dataset for climate data homogenization, climate variability, trends and extremes. In Geophysical Research Abstracts, volume 21.

Hersbach, H., Bell, B., Berrisford, P., Hirahara, S., Horányi, A., Muñoz-Sabater, J., Nicolas, J., Peubey, C., Radu, R., Schepers, D., et al. (2020). The era5 global reanalysis. Quarterly Journal of the Royal Meteorological Society, 146(730):1999–2049.

Hodson, T. O. (2022). Root mean square error (rmse) or mean absolute error (mae): When to use them or not. Geoscientific Model Development Discussions, 2022:1–10.

Kim, S.-J., Bae, S.-J., and Jang, M.-W. (2022). Linear regression machine learning algorithms for estimating reference evapotranspiration using limited climate data. Sustainability, 14(18):11674.

Kim, T., Shin, J.-Y., Kim, H., Kim, S., and Heo, J.-H. (2019). The use of large-scale climate indices in monthly reservoir inflow forecasting and its application on time series and artificial intelligence models. Water, 11(2):374.

Nunes, L. H., Vicente, A. K., and Candido, D. H. (2009). Clima da região sudeste do brasil. In Tempo e clima no Brasil, pages 243–258. Oficina de Textos, Sao Paulo.

Oliveira, E. C. L. d., Nogueira Neto, A. V., Santos, A. P. P. d., da Costa, C. P. W., Freitas, J. C. G. d., Souza-Filho, P. W. M., Rocha, R. d. L., Alves, R. C., Franco, V. d. S., Carvalho, E. C. d., et al. (2023). Precipitation forecasting: from geophysical aspects to machine learning applications. Frontiers in Climate, 5:1250201.

Petrucci, E., Oliveira, L. A., and Silva, R. C. (2022). Secas pluviométricas no estado de minas gerais, de 1980 a 2017. Raega, 54:129–153.

Popp, T., Hegglin, M. I., Hollmann, R., Ardhuin, F., Bartsch, A., Bastos, A., Bennett, V., Boutin, J., Brockmann, C., Buchwitz, M., et al. (2020). Consistency of satellite climate data records for earth system monitoring. Bulletin of the American Meteorological Society, 101(11):E1948–E1971.

Reboita, M. S., Rodrigues, M., Silva, L. F., and Alves, M. A. (2015). Aspectos climáticos do estado de minas gerais. Revista brasileira de Climatologia, 17.

Ruxton, G. D. (2006). The unequal variance t-test is an underused alternative to student’s t-test and the mann–whitney u test. Behavioral Ecology, 17(4):688–690.

Sampaio, G. and Silva Dias, P. L. (2015). Evolução dos modelos climáticos e de previsão de tempo e clima. Revista USP.

Silva Ferreira, D. B., Kuhn, P. A. F., Silva, F. d. O., Costa, C. P. W., Tedeschi, R. G., and Santos, A. P. P. (2021). Sistema de previsões meteorológicas para corredores sulsudeste da vale. Technical report, Instituto Tecnológico Vale.

Vien, B. S., Wong, L. D. Z., Kuen, T., Rose, L. R. F., and Chiu, W. K. (2021). A Machine Learning Approach for Anaerobic Reactor Performance Prediction Using Long Short-Term Memory Recurrent Neural Network. In 8th Asia Pacific Workshop on Structural Health Monitoring, pages 61–70.

Xu, X., Du, H., and Lian, Z. (2022). Discussion on regression analysis with small determination coefficient in human-environment researches. Indoor air, 32(10):e13117.

Yavuz, V. S. (2025). Forecasting monthly rainfall and temperature patterns in van province, türkiye, using arima and sarima models: a long-term climate analysis. Journal of Water and Climate Change, 16(2):800–818.

Yu, N. and Haskins, T. (2021). Bagging machine learning algorithms: A generic computing framework based on machine-learning methods for regional rainfall forecasting in upstate new york. In Informatics, volume 8, page 47. MDPI.
Published
2025-09-29
ARRUDA, Helder et al. Towards statistical-based prediction models for seasonal precipitation in the southeast of Brazil. In: NATIONAL MEETING ON ARTIFICIAL AND COMPUTATIONAL INTELLIGENCE (ENIAC), 22. , 2025, Fortaleza/CE. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2025 . p. 261-272. ISSN 2763-9061. DOI: https://doi.org/10.5753/eniac.2025.12385.