Machine Learning post-hoc interpretability: a systematic mapping study

Carla Piazzon Vieira; Luciano Antonio Digiampietri

Carla Piazzon Vieira Universidade de São Paulo
Luciano Antonio Digiampietri Universidade de São Paulo https://orcid.org/0000-0003-4890-1548

Resumo

Context: In the pre-algorithm world, humans and organizations made decisions in hiring and criminal sentencing. Nowadays, some of these decisions are entirely made or influenced by Machine Learning algorithms. Problem: Research is starting to reveal some troubling examples in which the reality of algorithmic decision-making runs the risk of replicating and even amplifying human biases. Along with that, most algorithmic decision systems are opaque and not interpretable - which makes it more difficult to detect potential biases and mitigate them. Solution: This paper reports an overview of the current literature on machine learning interpretability. IS Theory: This work was conceived under the aegis of the Sociotechnical theory. Artificial Intelligence systems can only be understood and improved if both ‘social’ and ‘technical’ aspects are brought together and treated as interdependent parts of a complex system. Method: The overview presented in this article has resulted from a systematic mapping study. Summary of Results: We find that, currently, the majority of XAI studies are not for end-users affected by the model but rather for data scientists who use explainability as a debugging tool. There is thus a gap in the quality assessment and deployment of interpretable methods. Contributions and Impact in the IS area: The main contribution of the paper is to serve as the motivating background for a series of challenges faced by XAI, such as combining different interpretable methods, evaluating interpretability, and building human-centered methods. We end by discussing concerns raised regarding explainability and presenting a series of questions that can serve as an agenda for future research in the field.

Palavras-chave: xai, machine learning, explainability, interpretability, fairness, black-box

Referências

David Alvarez-Melis and Tommi S. Jaakkola. 2018. On the Robustness of Interpretability Methods. CoRR abs/1806.08049(2018), 6. arxiv:1806.08049 http://arxiv.org/abs/1806.08049

Alejandro Barredo Arrieta, Natalia Díaz-Rodríguez, Javier Del Ser, Adrien Bennetot, Siham Tabik, Alberto Barbado, Salvador Garcia, Sergio Gil-Lopez, Daniel Molina, Richard Benjamins, Raja Chatila, and Francisco Herrera. 2020. Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Information Fusion 58(2020), 82–115. https://doi.org/10.1016/j.inffus.2019.12.012

Umang Bhatt, Alice Xiang, Shubham Sharma, Adrian Weller, Ankur Taly, Yunhan Jia, Joydeep Ghosh, Ruchir Puri, José M. F. Moura, and Peter Eckersley. 2020. Explainable Machine Learning in Deployment. arxiv:1909.06342 [cs.LG]

Adrien Bibal and Benoît Frenay. 2016. Interpretability of Machine Learning Models and Representations: an Introduction. In 24th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, Michel Verleysen (Ed.). CIACO, Bruges, Belgium, 77–82. 24th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, ESANN 2016 ; Conference date: 27-04-2016 Through 29-05-2016.

Adam Bloniarz, Ameet Talwalkar, Bin Yu, and Christopher Wu. 2016. Supervised Neighborhoods for Distributed Nonparametric Regression. In Proceedings of the 19th International Conference on Artificial Intelligence and Statistics(Proceedings of Machine Learning Research, Vol. 51), Arthur Gretton and Christian C. Robert (Eds.). PMLR, Cadiz, Spain, 1450–1459. https://proceedings.mlr.press/v51/bloniarz16.html

Diogo V. Carvalho, Eduardo M. Pereira, and Jaime S. Cardoso. 2019. Machine Learning Interpretability: A Survey on Methods and Metrics. Electronics 8, 8 (2019), 34. https://doi.org/10.3390/electronics8080832

Mark W. Craven and Jude W. Shavlik. 1995. Extracting Tree-structured Representations of Trained Networks. In Proceedings of the 8th International Conference on Neural Information Processing Systems (Denver, Colorado) (NIPS’95). MIT Press, Cambridge, MA, USA, 24–30. Available at http://dl.acm.org/citation.cfm?id=2998828.2998832.

Mengnan Du, Ninghao Liu, and Xia Hu. 2019. Techniques for Interpretable Machine Learning. arxiv:1808.00033 [cs.LG]

Benjamin P. Evans, Bing Xue, and Mengjie Zhang. 2019. What's inside the Black-Box? A Genetic Programming Method for Interpreting Complex Machine Learning Models. In Proceedings of the Genetic and Evolutionary Computation Conference (Prague, Czech Republic) (GECCO ’19). Association for Computing Machinery, New York, NY, USA, 1012–1020. https://doi.org/10.1145/3321707.3321726

Timnit Gebru. 2019. Oxford Handbook on AI Ethics Book Chapter on Race and Gender. arxiv:1908.06165 [cs.CY]

Leilani H. Gilpin, David Bau, Ben Z. Yuan, Ayesha Bajwa, Michael Specter, and Lalana Kagal. 2019. Explaining Explanations: An Overview of Interpretability of Machine Learning. arxiv:1806.00069 [cs.AI]

Brynce Goodman and Seth Flaxman. 2017. European Union Regulations on Algorithmic Decision-Making and a “Right to Explanation”. AI Magazine 38, 3 (02 Oct 2017), 50–57. https://doi.org/10.1609/aimag.v38i3.2741

Riccardo Guidotti, Anna Monreale, Salvatore Ruggieri, Franco Turini, Dino Pedreschi, and Fosca Giannotti. 2018. A Survey Of Methods For Explaining Black Box Models. arxiv:1802.01933 [cs.CY] https://arxiv.org/abs/1802.01933

David Gunning. 2017. Explainable Artificial Intelligence (XAI). DARPA. Available at https://www.darpa.mil/attachments/XAIProgramUpdate.pdf.

Mark Ibrahim, Melissa Louie, Ceena Modarres, and John Paisley. 2019. Global Explanations of Neural Networks: Mapping the Landscape of Predictions. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society (Honolulu, HI, USA) (AIES ’19). Association for Computing Machinery, New York, NY, USA, 279–287. https://doi.org/10.1145/3306618.3314230

Ulf Johansson, Rikard König, and Lars Niklasson. 2010. Genetic Rule Extraction Optimizing Brier Score. In Proceedings of the 12th Annual Conference on Genetic and Evolutionary Computation (Portland, Oregon, USA) (GECCO ’10). Association for Computing Machinery, New York, NY, USA, 1007–1014. https://doi.org/10.1145/1830483.1830668

Jalil Kazemitabar, Arash Amini, Adam Bloniarz, and Ameet S Talwalkar. 2017. Variable Importance Using Decision Trees. In Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Curran Associates, Inc., Long Beach, California, USA, 426–435. http://papers.nips.cc/paper/6646-variable-importance-using-decision-trees.pdf

Himabindu Lakkaraju, Ece Kamar, Rich Caruana, and Jure Leskovec. 2019. Faithful and Customizable Explanations of Black Box Models. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society (Honolulu, HI, USA) (AIES ’19). Association for Computing Machinery, New York, NY, USA, 131–138. https://doi.org/10.1145/3306618.3314229

Thibault Laugel, Marie-Jeanne Lesot, Christophe Marsala, Xavier Renard, and Marcin Detyniecki. 2019. The Dangers of Post-hoc Interpretability: Unjustified Counterfactual Explanations. arxiv:1907.09294 [cs.LG]

Thibault Laugel, Marie-Jeanne Lesot, Christophe Marsala, Xavier Renard, and Marcin Detyniecki. 2019. Unjustified Classification Regions and Counterfactual Explanations In Machine Learning. In Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2019(Lecture Notes in Computer Science, Vol. 11907). Springer, Würzburg, Germany, 37–54. https://doi.org/10.1007/978-3-030-46147-8_3

Ana Lucic, Hinda Haned, and Maarten de Rijke. 2020. Why Does My Model Fail? Contrastive Local Explanations for Retail Forecasting. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency (Barcelona, Spain) (FAT* ’20). Association for Computing Machinery, New York, NY, USA, 90–98. https://doi.org/10.1145/3351095.3372824

Andreas Messalas, Yiannis Kanellopoulos, and Christos Makris. 2019. Model-Agnostic Interpretability with Shapley Values. In 2019 10th International Conference on Information, Intelligence, Systems and Applications (IISA). IEEE, Patras, Greece, 1–7.

Sina Mohseni, Niloofar Zarei, and Eric D. Ragan. 2020. A Multidisciplinary Survey and Framework for Design and Evaluation of Explainable AI Systems. arxiv:1811.11839 [cs.HC]

Christoph Molnar. 2017. Interpretable Machine Learning. https://christophm.github.io/interpretable-ml-book

Ramaravind K. Mothilal, Amit Sharma, and Chenhao Tan. 2020. Explaining Machine Learning Classifiers through Diverse Counterfactual Explanations. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency (Barcelona, Spain) (FAT* ’20). Association for Computing Machinery, New York, NY, USA, 607–617. https://doi.org/10.1145/3351095.3372850

Cathy O'Neil. 2016. Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy. Crown Publishing Group, USA.

Gregory Plumb, Denali Molitor, and Ameet Talwalkar. 2018. Model Agnostic Supervised Local Explanations. In Proceedings of the 32nd International Conference on Neural Information Processing Systems (Montréal, Canada) (NIPS’18). Curran Associates Inc., Red Hook, NY, USA, 2520–2529.

Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. Model-Agnostic Interpretability of Machine Learning. ArXiv abs/1606.05386(2016), 5.

Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. ”Why Should I Trust You?”: Explaining the Predictions of Any Classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (San Francisco, California, USA) (KDD ’16). Association for Computing Machinery, New York, NY, USA, 1135–1144. https://doi.org/10.1145/2939672.2939778

Mireia Ribera and Agata Lapedriza. 2019. Can we do better explanations? A proposal of User-Centered Explainable AI.

Shubham Sharma, Jette Henderson, and Joydeep Ghosh. 2020. CERTIFAI: A Common Framework to Provide Explanations and Analyse the Fairness and Robustness of Black-Box Models. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society (New York, NY, USA) (AIES ’20). Association for Computing Machinery, New York, NY, USA, 166–172. https://doi.org/10.1145/3375627.3375812

Shohei Shirataki and Saneyasu Yamaguchi. 2017. A study on interpretability of decision of machine learning. In 2017 IEEE International Conference on Big Data (Big Data). IEEE, Boston, MA, USA, 4830–4831. https://doi.org/10.1109/BigData.2017.8258557

Dylan Slack, Friedler, Sorelle A., and Emile Givental. 2020. Fairness Warnings and Fair-MAML: Learning Fairly with Minimal Data. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency (Barcelona, Spain) (FAT* ’20). Association for Computing Machinery, New York, NY, USA, 200–209. https://doi.org/10.1145/3351095.3372839

Dylan Slack, Sophie Hilgard, Emily Jia, Sameer Singh, and Himabindu Lakkaraju. 2020. Fooling LIME and SHAP: Adversarial Attacks on Post Hoc Explanation Methods. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society (New York, NY, USA) (AIES ’20). Association for Computing Machinery, New York, NY, USA, 180–186. https://doi.org/10.1145/3375627.3375830

Sandra Wachter, Brent Mittelstadt, and Chris Russell. 2018. Counterfactual Explanations Without Opening the Black Box: Automated Decisions and the GDPR. Harvard journal of law & technology 31 (04 2018), 841–887.

Machine Learning post-hoc interpretability: a systematic mapping study

Resumo

Referências

Artigos mais lidos do(s) mesmo(s) autor(es)