Hands-on Global Model Interpretation – Towards Data Science

Example and Interpretation

What feature does a model think are important for determining if a patient has or doesn’t have a heart disease?

This question can be answered using feature importance.

As I already mentioned at the start of the article we will work on the Heart Disease Data-set. You can find all the code used in this tutorial on my Github or as a Kaggle kernel.

Most libraries like Scikit-Learn, XGBoost as well as other machine learning libraries already have their own feature importance methods but if you want to get exact results when working with models from multiple libraries it is advantageous to use the same method to calculate the feature importance for every model.

To ensure this we will use the ELI5 library. ELI5 allows users to visualize and debug various Machine Learning Models. It offers way more than just feature importance including library-specific features as well as a text-explainer.

To calculate the feature importance we can use the permutation_importance method. After calculating the feature importance of a given model we can visualize it using the show_weights method.

Using the method above we can get the feature importance of models and compare them with each other.

Figure 3: Feature Importance of Logistic Regression
Figure 4: Feature Importance of XGBoost

You can see that the two models have very different importance scores for a feature. This can have negative effects on the amount of trust you can have in the results.

Nonetheless we can see that feature like ca, sex and thal are quite useful for getting to the right predictions whilst age and cp aren’t important for getting to the right predictions.