mustafac
5 min readDec 26, 2022

Explainable Machine Learning , SHAPLEY Values

Machine Learning tools are getting richer and better. Lately I see everyone begin to use SHAP or SAGE libraries. There are nice tutorials, so I will not detail everything. Just will again create a base case for testing to debug framework.

Code is at github : Link

SHAP and SAGE , two game theory related methods based on the Shapley value, which explains importance of our variables for ML model.
SHAP answers the question how much does each feature contribute to this individual prediction(local interpretability).
SAGE answers the question how much does the model depend on each feature overall (global interpretability). SAGE values can be calculated by calculating LossSHAP values for the whole dataset and then taking the mean.

How can we find the effect of one variable in the calculation. Simply calculating without that variable and checking the difference. Shapley has a similar approach(For real perfect explanations, search google, I will give a very simple sample).

- We have 3 features n={a, b,c}

- we have a function f, and these values generate combinations of f values as below :
f{a} = 20
f{b}=30
f{c}=50
f{a,b} = 60
f{a,c}=50
f{b,c}=80
f{a,b, c}=130 (grand coalition)

- we want to understand importance(contribution) of each of a,b,c .
1)Check if parameter(feature) “a” has single value (a=20)
2)check if there is a coalition with (a,b)
Given : a + b = 60 , a = 20
Then : b = 40

3)
Given : a = 20 , b = 40, a+b+c = 130
Then : c= 70

- Then for 1st iteration, from feature “a” point of view
a=20 b=40 c=70

  • now do this for all permutations(beginning same with b, then with c ) and take mean.

The algorithm is something similar to this. I explained at most simple way, in level of ML developer.

But also keep in mind, this is some kind of approximation. As I read these models do not try every possible combination of features. (It needs too much resource, long time). So instead, they work on sample subsets. These solutions will not be a perfect solution.
Also keep in mind that, explaining power is not same for all Machine Learning models.

As expected Neural Networks is not explainable, because we generate combination of our parameters from 1st hidden layer, and infact we scale up dimension(like kernel function) and parameters are no more single contributing the results. So these models are for mostly not Neural Network models, for simpler ones.

As always I tried to generate simplest basic test case to test this framework. 1 feature can be linear related, nonlinear related, or NOT related with target. I loaded the real dataset from framework, famous titanic set. Target variable is 0 or 1. I added 3 parameters as below. According to target y variable , I sample from different numbers. You can play with it, change ranges, change the divisor ( random.choice(sample_positivie) / 10 ) 10, to see the effect…

small_list = [i for i in range(0,20)]
big_list = [i for i in range(80,100)]
big_small_list = small_list + big_list
medium_list = [i for i in range(30,70)]

def assign_nonlinear(dfy,iloc, sample_positivie, sample_negative):
if dfy.iloc[iloc] == 1:
return random.choice(sample_positivie) / 10
else:
return random.choice(sample_negative) / 10

#NONLINEAR
#TO make it nonlinear either select small values or big values
nonlinear_vals_train = [ assign_nonlinear( y_train,i,big_small_list,medium_list ) for i in range(len(y_train)) ]
nonlinear_vals_test = [ assign_nonlinear( y_test,i ,big_small_list,medium_list) for i in range(len(y_test)) ]

#LINEAR
#TO make it linear select big value
linear_vals_train = [ assign_nonlinear( y_train,i,big_list, small_list) for i in range(len(y_train)) ]
linear_vals_test = [ assign_nonlinear( y_test,i ,big_list, small_list) for i in range(len(y_test)) ]

#NORELATION
#TO make it not related select always medium values
norelation_vals_train = [ assign_nonlinear( y_train,i,medium_list, medium_list) for i in range(len(y_train)) ]
norelation_vals_test = [ assign_nonlinear( y_test,i ,medium_list, medium_list) for i in range(len(y_test)) ]
Low, and High values result in 1, Nonlinear relationship
High values result in 1, Linear relationship
No relation with x and y

After running the standard sample functions we get charts, that I will explain. It generates a URL and u go to that URL to see the results.

Below we see SHAP values. Bigger value means, bigger weight. At left graph ( Shap Feature Importances)we can see our linear and nonlinear variables have high contribution. At right(Shap Dependence Graph) we can see the indivual graph for “linear_vals” vs “SHAP value”. As expected high values have high shap value.

If we check “nonlinear_vals” we see that high SHAP values are at low and high values of nonlinear_vals variable.(That is how we setup, so as expected.)

For NOT related variable, we see there is no pattern.

Individual Results
The great feature comes with this framework is checking individual items. Select the index and see prediction.

For above sample, we predicted as 1 with probability 96.5 . At below Contribution plot, we see that how much each variable contributed.
linear_vals +42.27 , nonlinear +16.56 …

If you comment the below part you can run the sample with default data with my generated data added.

#Comment here if if u want to see the standard sample without added variables.
#X_train = pd.DataFrame()
#X_test = pd.DataFrame()

Our generated values are at top. You can play with parameters and try different versions to understand the usage.

Here I created a simple base test case to understand, if framework works as I understand. I verified 3 types of variables. You can take the idea and apply to different algorithms.