Interpreting ML in Ecology: A RAG-Based Approach to Explainability in Species Distribution Modeling
Abstract
Species Distribution Modeling (SDM) relies increasingly on Machine Learning (ML), but many models remain opaque, limiting their usability for non-experts. While SHAP and LIME improve interpretability, they still require technical expertise. This study proposes an agentic Retrieval-Augmented Generation (RAG) framework, integrating ML models (Logistic Regression, Random Forests, MLP), XAI techniques (SHAP, LIME), and a LLM-powered explanation system to enhance explainability. Using GoAmazon 2014/15 environmental data and GBIF species occurrences, we evaluated explanations based on completeness, and context-awareness, achieving a significant improvement in this study case. Results indicate that LLMs combined with XAI can significantly enhance explainability in SDM.References
Amâncio, S., Souza, V. B., and Melo, C. (2008). Columba livia e pitangus sulphuratus como indicadoras de qualidade ambiental em área urbana. Revista Brasileira de Ornitologia, 16(1):32–37.
ARM (2025). Arm research facility. Available on: [link]. Acessed in 23 March 2025.
Beery, S., Cole, E., Parker, J., Perona, P., and Winner, K. (2021). Species distribution modeling for machine learning practitioners: A review. In Proceedings of ACM SIGCAS Conference on Computing and Sustainable Societies (COMPASS) 2021.
Breiman, L. (2001). Random forests. Machine Learning, 45(1):5–32.
Caseli, H. and Nunes, M. (2023). Processamento de Linguagem Natural: Conceitos, Técnicas e Aplicações em Português. Brasileiras - Processamento de Linguagem Natural.
Cueva, D., Bravo, G., and Silveira, L. (2022). Systematics of thraupis (aves, passeriformes) reveals an extensive hybrid zone between t. episcopus (blue-gray tanager) and t. sayaca (sayaca tanager). PLoS ONE, 17(10).
Doran, D., Schulz, S., and Besold, T. R. (2017). What does explainable ai really mean? a new conceptualization of perspectives. In Proceedings of the First International Workshop on Comprehensibility and Explanation in AI and ML.
Elith, J. and Leathwick, J. R. (2009). Species distribution models: Ecological explanation and prediction across space and time. The Annual Review of Ecology, Evolution and Systematics, 40:677–697.
Global Biodiversity Information Facility (2025). Gbif occurrence download. DOI: 10.15468/dl.ppwbzv. 23 March 2025.
Hutchinson, G. E. (1991). Population studies: Animal ecology and demography. Bulletin of Mathematical Biology, 53(1-2):193–213.
James, G., Witten, D., Hastie, T., and Tibshirani, R. (2013). An Introduction to Statistical Learning. Springer, Londres.
Lundberg, S. M. and Lee, S. (2017). A unified approach to interpreting model predictions. Proceedings of the 31st International Conference on Neural Information Processing Systems.
Martin, S. T., Artaxo, P., Machado, L., Manzi, A. O., Souza, R. A. F. d., Schumacher, C., Wang, J., Biscaro, T., Brito, J., Calheiros, A., et al. (2017). The green ocean amazon experiment (goamazon2014/5) observes pollution affecting gases, aerosols, clouds, and rainfall over the rain forest. Bulletin of the American Meteorological Society, 98(5):981–997.
Miyaji, R. O., , Almeida, F. V., and Corrêa, P. L. P. (2023). Evaluating the explainability of machine learning classifiers: A case study of species distribution modeling in the amazon. In SYMPOSIUM ON KNOWLEDGE DISCOVERY, MINING AND LEARNING (KDMILE), pages 49–56. SBC.
Miyaji, R. O., Almeida, F. V., Bauer, L. O., Ferrari, V., Corrêa, P. L. P., Rizzo, L. V., and Prakash, G. (2021). Spatial interpolation of air pollutant and meteorological variables in central amazonia. Data, 6(12):126.
OpenAI (2025). Gpt-4o mini: advancing cost-efficient intelligence. Available on: [link]. Acessed in 23 March 2025.
Ribeiro, M. T., Singh, S., and Guestrin, C. (2016). ”why should i trust you?”: Explaining the predictions of any classifier. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.
Ryo, M., Angelov, B., Mammola, S., Kass, J. M., Benito, B. M., and F, H. (2021). Explainable artificial intelligence enhances the ecological interpretability of black-box species distribution models. Ecography, 44:199–205.
Spitzer, P., Celis, S., Martin, D., Kühl, N., and Satzger, G. (2024). Looking through the deep glasses: How large language models enhance explainability of deep learning models. Proceedings of Mensch und Computer 2024.
Wang, L., Ma, C., Feng, X., Zhang, Z., Yang, H., Zhang, J., Chen, Z., Tang, J., Chen, X., Lin, Y., Zhao, W., Wei, Z., and Wen, J. (2023). A survey on large language model based autonomous agents. Frontiers Comput. Sci.
Zhu, Y., Yuan, H., Wang, S., Liu, S., Liu, W., Deng, C., Chen, H., Liu, Z., Dou, Z., and Wen, J. (2024). Large language models for information retrieval: A survey. ArXiv.
Zytek, A., Pidò, S., and Veeramachaneni, K. (2024). Llms for xai: Future directions for explaining explanations. ACM CHI Workshop on Human-Centered Explainable AI.
ARM (2025). Arm research facility. Available on: [link]. Acessed in 23 March 2025.
Beery, S., Cole, E., Parker, J., Perona, P., and Winner, K. (2021). Species distribution modeling for machine learning practitioners: A review. In Proceedings of ACM SIGCAS Conference on Computing and Sustainable Societies (COMPASS) 2021.
Breiman, L. (2001). Random forests. Machine Learning, 45(1):5–32.
Caseli, H. and Nunes, M. (2023). Processamento de Linguagem Natural: Conceitos, Técnicas e Aplicações em Português. Brasileiras - Processamento de Linguagem Natural.
Cueva, D., Bravo, G., and Silveira, L. (2022). Systematics of thraupis (aves, passeriformes) reveals an extensive hybrid zone between t. episcopus (blue-gray tanager) and t. sayaca (sayaca tanager). PLoS ONE, 17(10).
Doran, D., Schulz, S., and Besold, T. R. (2017). What does explainable ai really mean? a new conceptualization of perspectives. In Proceedings of the First International Workshop on Comprehensibility and Explanation in AI and ML.
Elith, J. and Leathwick, J. R. (2009). Species distribution models: Ecological explanation and prediction across space and time. The Annual Review of Ecology, Evolution and Systematics, 40:677–697.
Global Biodiversity Information Facility (2025). Gbif occurrence download. DOI: 10.15468/dl.ppwbzv. 23 March 2025.
Hutchinson, G. E. (1991). Population studies: Animal ecology and demography. Bulletin of Mathematical Biology, 53(1-2):193–213.
James, G., Witten, D., Hastie, T., and Tibshirani, R. (2013). An Introduction to Statistical Learning. Springer, Londres.
Lundberg, S. M. and Lee, S. (2017). A unified approach to interpreting model predictions. Proceedings of the 31st International Conference on Neural Information Processing Systems.
Martin, S. T., Artaxo, P., Machado, L., Manzi, A. O., Souza, R. A. F. d., Schumacher, C., Wang, J., Biscaro, T., Brito, J., Calheiros, A., et al. (2017). The green ocean amazon experiment (goamazon2014/5) observes pollution affecting gases, aerosols, clouds, and rainfall over the rain forest. Bulletin of the American Meteorological Society, 98(5):981–997.
Miyaji, R. O., , Almeida, F. V., and Corrêa, P. L. P. (2023). Evaluating the explainability of machine learning classifiers: A case study of species distribution modeling in the amazon. In SYMPOSIUM ON KNOWLEDGE DISCOVERY, MINING AND LEARNING (KDMILE), pages 49–56. SBC.
Miyaji, R. O., Almeida, F. V., Bauer, L. O., Ferrari, V., Corrêa, P. L. P., Rizzo, L. V., and Prakash, G. (2021). Spatial interpolation of air pollutant and meteorological variables in central amazonia. Data, 6(12):126.
OpenAI (2025). Gpt-4o mini: advancing cost-efficient intelligence. Available on: [link]. Acessed in 23 March 2025.
Ribeiro, M. T., Singh, S., and Guestrin, C. (2016). ”why should i trust you?”: Explaining the predictions of any classifier. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.
Ryo, M., Angelov, B., Mammola, S., Kass, J. M., Benito, B. M., and F, H. (2021). Explainable artificial intelligence enhances the ecological interpretability of black-box species distribution models. Ecography, 44:199–205.
Spitzer, P., Celis, S., Martin, D., Kühl, N., and Satzger, G. (2024). Looking through the deep glasses: How large language models enhance explainability of deep learning models. Proceedings of Mensch und Computer 2024.
Wang, L., Ma, C., Feng, X., Zhang, Z., Yang, H., Zhang, J., Chen, Z., Tang, J., Chen, X., Lin, Y., Zhao, W., Wei, Z., and Wen, J. (2023). A survey on large language model based autonomous agents. Frontiers Comput. Sci.
Zhu, Y., Yuan, H., Wang, S., Liu, S., Liu, W., Deng, C., Chen, H., Liu, Z., Dou, Z., and Wen, J. (2024). Large language models for information retrieval: A survey. ArXiv.
Zytek, A., Pidò, S., and Veeramachaneni, K. (2024). Llms for xai: Future directions for explaining explanations. ACM CHI Workshop on Human-Centered Explainable AI.
Published
2025-07-20
How to Cite
MIYAJI, Renato O.; CORRÊA, Pedro L. P..
Interpreting ML in Ecology: A RAG-Based Approach to Explainability in Species Distribution Modeling. In: WORKSHOP ON COMPUTING APPLIED TO THE MANAGEMENT OF THE ENVIRONMENT AND NATURAL RESOURCES (WCAMA), 16. , 2025, Maceió/AL.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2025
.
p. 117-126.
ISSN 2595-6124.
DOI: https://doi.org/10.5753/wcama.2025.8219.
