Multilayer Counterfactual Explanations for Machine Learning Classifiers - MSc student project

Multilayer Counterfactual Explanations for Machine Learning Classifiers - MSc student project

The last decade saw the growth of machine learning (ML) technologies, where their usages have been prevalent in various domains, including sensitive ones. Despite its widespread use, numerous ML models provide no transparency or explanations of how they work, raising substantial questions of trust concerning the safety and accountability of the models. In this project, we focused on one type of explanation methods, the counterfactual explanations, which are simple but adequate to meet the explanatory requirements of the EU Data Protection Regulation (GDPR). Counterfactual explanations answer the why-question by identifying what must be different minimally for the desired outcome to materialise. The primary contribution of this project is the novel state-of-the-art counterfactual analysis framework: Approximate Layered Influences Counterfactual Explanations (ALICE). Our framework allows the generation of layered counterfactual explanations that is richer and more scalable in terms of computation time and space to predecessor method CFX. When doing so, ALICE maintains CFX's desirable properties gaining inherent advantages of CFX, making it also superior to other existing frameworks like MCX and PIX. Other than the framework, our research discovers the connection between counterfactual and feature attribution explanations. We showed that we could derive feature attribution scores grounded in actual causality using counterfactual explanations. Our experimentation suggests promising performance as counterfactual-based scores attained comparable performance to state-of-the-art popular feature attribution methods, like LIME and SHAP. Student: Maleakhi Wijaya Supervisor: Professor Francesca Toni Project report: https://bit.ly/2TUr6v0