Evaluating the Explainability of Machine Learning Classifiers: A case study of Species Distribution Modeling in the Amazon

Resumo


Machine Learning Models are widely used in Computational Ecology. They can be applied for Species Distribution Modeling, which aims to determine the probability of occurrence of a species, given the environmental conditions. However, for ecologists, these models are considered as "black boxes", since basic Machine Learning knowledge is necessary to interpret them. Thus, in this work four Explainable Artificial Intelligence techniques - Local Interpretable Model-Agnostic Explanation (LIME), SHapley Additive exPlanations (SHAP), BreakDown and Partial Dependence Plots - were evaluated to the Random Forests classifier for Coragyps atratus in the Amazon Basin region. It was found that the SHapley Additive exPlanations technique and Partial Dependence Plots are able to improve the explainability of the model.

Palavras-chave: explainable artificial intelligence, machine learning, random forests

Referências

Baniecki, H., Kretowicz, W., Piatyszek, P., Wisniewski, J., and Biecek, P. dalex: Responsible machine learning with interactive explainability and fairness in python. Journal of Machine Learning Research 22 (214): 1–7, 2021.

Carter, S., van Rees, C. B., Hand, B. K., Muhlfeld, C. C., Luikart, G., and Kimball, J. S. Testing a generalizable machine learning workflow for aquatic invasive species on rainbow trout (oncorhynchus mykiss) in northwest montana. Frontiers in Big Data vol. 4, 2021.

Doran, D., Schulz, S., and Besold, T. R. What does explainable ai really mean? a new conceptualization of perspectives. In Proceedings of the First International Workshop on Comprehensibility and Explanation in AI and ML, 2017.

Elith, J. and Leathwick, J. R. Species distribution models: Ecological explanation and prediction across space and time. The Annual Review of Ecology, Evolution and Systematics vol. 40, pp. 677–697, 2009.

Fern, R. R., Morrison, M. L., Grant, W. E., Wang, H., and Campbell, T. A. Modeling the influence of livestock grazing pressure on grassland bird distributions. Ecological Processes 9 (42), 2020.

Hegel, T. M., Cushman, A., Evans, J., and Huetmann, F. Current State of the Art for Statistical Modelling of Species Distributions. In , Spatial Complexity, Informatics and Wildlife Conservation. Springer, 2010.

Hernandez, P. A., Graham, C. H., Master, L. L., and Albert, D. L. The effect of sample size and species characteristics on performance of different species distribution modeling methods. Ecography 29 (5): 773–785, 2006.

Lundberg, S. Shap documentation. https://shap.readthedocs.io/en/latest/, 2018. Acesso em: 03/07/2022.

Lundberg, S. M. and Lee, S. A unified approach to interpreting model predictions. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), 2017.

Martin, S. T., Artaxo, P., Machado, L., Manzi, A. O., Souza, R. A. F. d., Schumacher, C., Wang, J., Biscaro, T., Brito, J., Calheiros, A., et al. The green ocean amazon experiment (goamazon2014/5) observes pollution affecting gases, aerosols, clouds, and rainfall over the rain forest. Bulletin of the American Meteorological Society 98 (5): 981–997, 2017.

Mateo, R. G., Vanderpoorten, A., Muñoz, J., Laenen, B., and Désamoré, A. Modeling species distributions from heterogeneous data for the biogeographic regionalization of the european bryophyte flora. PLoS One 8 (2): e55648, 2013

Miyaji, R. O., Almeida, F. V., Bauer, L. O., Ferrari, V., Corrêa, P. L. P., Rizzo, L. V., and Prakash, G. Spatial interpolation of air pollutant and meteorological variables in central amazonia. Data 6 (12), 2021.

Murdoch, W. J., Singh, C., Kumbier, K., Abbasi-Asi, R., and Yu, B. Definitions, methods, and applications in interpretable machine learning. In Proceedings of the National Academy of Sciences of the United States of America. pp. 22071–22080, 2019.

Nurhussen, A., Atzberger, C., and Zewdia, W. Species distribution modelling performance and its implication for sentinel-2-based prediction of invasive prosopis juliflora in lower awash river basin, ethiopia. Ecological Processes 10 (18), 2021.

Phillips, S. J. Maximum entropy modeling of species geographic distribution. Ecological Modelling vol. 190, pp. 231–259, 2005.

Phillips, S. J., Dudik, M., and Schapire, R. E. A. Maximum entropy approach to species distribution modelling. In Proceedings of the Twenty-First International Conference on Machine Learning. pp. 655–662, 2004.

Pinaya, J. and Corrêa, P. Metodologia para definição das atividades do processo de modelagem de distribuição de espécies. In Anais do V Workshop de Computação Aplicada a Gestão do Meio Ambiente e Recursos Naturais. SBC, Porto Alegre, RS, Brasil, pp. 45–54, 2014.

Rademaker, M., Hogeweg, L., and Vos, R. Modelling the niches of wild and domestic ungulate species using deep learning. Biodiversity Information Science and Standards, 2019.

Ribeiro, M. T. Local interpretable model-agnostic explanations (lime) documentation. https://lime-ml.readthedocs.io/en/latest/, 2016. Acesso em: 03/07/2022.

Ribeiro, M. T., Singh, S., and Guestrin, C. “why should i trust you?” explaining the prediction of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp. 1135–1144, 2016.

Ryo, M., Angelov, B., Mammola, S., Kass, J. M., Benito, B. M., and F, H. Explainable artificial intelligence enhances the ecological interpretability of black-box species distribution models. Ecography vol. 44, pp. 199–205, 2021.

Staniak, M. and Biecek, P. Explanations of models predictions with live and breakdown packages. The R Journal 10 (2): 395–409, 2018.
Publicado
26/09/2023
MIYAJI, Renato Okabayashi; ALMEIDA, Felipe Valencia; CORRÊA, Pedro Luiz Pizzigatti. Evaluating the Explainability of Machine Learning Classifiers: A case study of Species Distribution Modeling in the Amazon. In: SYMPOSIUM ON KNOWLEDGE DISCOVERY, MINING AND LEARNING (KDMILE), 11. , 2023, Belo Horizonte/MG. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2023 . p. 49-56. ISSN 2763-8944. DOI: https://doi.org/10.5753/kdmile.2023.232929.