In this article is debated the necessity to produce a framework for explainability and are present some insights and advances
Explainability is the science to interpret what an algorithm did or might have done. The new GPDR regulation demands that every prediction needs to be followed by an explanation? Explanations are made in order to make trust and robust predictions.
Despite existing tools and techniques to work with explainability, there are far away to be a general framework. The actual tools do not have a formal measure or output for explanations. In this sense is impossible to affirm that model A has better interpretability than model B or explanation X is better than Y.
Actual tools are based, mostly, in post-hoc analysis (such as LIME or SHAP). The explanation is made post-mortem; the algorithm is trained and used to predict. After that, and independently each prediction is explained.
Post-hoc analysis is not very flexible. Improve the system, eliminating bias or forcing the algorithm to learn singularities set by the user is hard. These reasons encourage to work on a Human-in-the-loop approach. In this strategy, the algorithm learns according to live feedback (the model output an explanation of what is learning and the user change it to his desired output) Using this metodhology, problems such as bias are controlled during the learning stage and not after (post-hoc analysis).
Some properties of an Human-in-the-loop approach for explainability are the following:
Causal Discovery: iterating explanations (human-algorithm) the algorithm must converge into the desired output. The system suggests relationships in the data and the user point if are correlations or causal relationships.
Ontology Alignment determining correspondences between each dimension in the dataset and concept in ontologies might help causal discovery and friendly explanations
Traceability: in each iteration, the algorithm is updated, this information needs to be stored and be accessible to the user
.. . . TBD
Explanations are subjective and influenced by human knowledge and objective of the model. Inferring (prior) Human Knowledge to the system we can measure “interpretability”. A confidence score can be estimated measuring if the explanation contains or not relations traced by the user.