*NOTE*: You will now get the XAI course for free if you sign up (not the SHAP course) SHAP course: adataodyssey.com/courses/shap-with-python/ XAI course: adataodyssey.com/courses/xai-with-python/ Newsletter signup: mailchi.mp/40909011987b/signup
What I don’t understand is that, in the game video, the value of a coalition is calculated by re-running the game with each coalition. Here it would seem to me that finding the value of a feature coalition would mean re-training the model for each coalition. That doesn’t just seem expensive, it seems downright prohibitive for complex or large models. You started with a fixed regression model where the weights are already determined, but for a e.g. a neural network model leaving out features could change the weights significantly right?
@@cauchyschwarz3295 This is a bit confusing! But no, you will only have one model (the one you want to explain). You marginalise the prediction function (I.e. model) over the features that are not in set S. You do not have to retrain a new model by excluding the features that are not in S.
I think there's a subtlety here in the application. In the theoretical model the goal is to calculate a fair payout for an individual feature based on how much it contributes to building a good model. Good models are measured by how well they predict things. So, we would want to think about how much the feature contributes to a reduction in model error, and you would want to train a model with and without a given feature to figure this out. (But instead we use other methods for determining which features are good that are easier to compute.) In the use case in this video (and what is the norm in ML), they are using Shapley to explain how a feature contributed to the model prediction, irregardless of how good the model actually is. This is helpful because Shapley still has desireable properties like additivity in this use case. If you trained a new model without this feature, it wouldn't answer the question of how much the feature contributed to the prediction value in the old model.
Nice video. But i've a question, what you showed in the video are if we are trying to "exclude" a categorical column (degree) What about continuous column (number column)? (the age) What value would we use?
Thanks! If the continuous variable is in the coalition, then we use the actual value for that instance (i.e. the person's actual age). If the continuous variable is not in the coalition, then we integrate over the values of the variable w.r.t. to the probability of the values. However, in practice, we will not know the probability distribution of a variable. So we will have to randomly sample different values for the variable from our dataset. We do this a bunch of times so we end up approximating the distribution. I hope that makes sense? There is a lot of statistical theory that underlies this explanation!
@@chrisleenatra Unfortunatley, I don't have any specific references. I'm using the knowledge from back in my undergrad. If you want to understand take a look at "stochastic calculus"
Yes, that's correct. The efficiency property of Shapley values doesn't have anything to do with computational efficiency. It just means that if you add the Shapley values of an instance and the mean prediction across all instances, you will get the prediction for that instance.
*NOTE*: You will now get the XAI course for free if you sign up (not the SHAP course)
SHAP course: adataodyssey.com/courses/shap-with-python/
XAI course: adataodyssey.com/courses/xai-with-python/
Newsletter signup: mailchi.mp/40909011987b/signup
These are some of the best data science tutorials Ive seen on RUclips. Don't give up, keep making it. I know you'll make it big =)
Thank you so much! I really appreciate the support :)
starting an internship in september where the goal is to explain with shap alot of models, this channel is a gold mine
Hi Louis, I am glad I could help. Good luck for your internship!
nice explanation! :)
Thank you Erica!
at 5:27 you mention the formula for calculating valx(S). don't we also need to subtract EX(f(X)) from that?
You can but they will cancel out when you subtract val(S) from val(S U {i})
Aaaaah of course, that makes sense! Thanks, you helped me a lot - not only with this comment but the entire video series:)
@@NeverHadMakingsOfAVarsityAthle No problem Matthias! I'm glad I could help :)
Hi, your video is very well summarized but there is an error in the formula 1 must be excluded in the (Val S U {1})
I don't see the mistake. At what time period do I introduce (Val S U{1})?
{1} referes to a coalition of feature 1 i.e. x1 and not the feature's values
What I don’t understand is that, in the game video, the value of a coalition is calculated by re-running the game with each coalition. Here it would seem to me that finding the value of a feature coalition would mean re-training the model for each coalition. That doesn’t just seem expensive, it seems downright prohibitive for complex or large models. You started with a fixed regression model where the weights are already determined, but for a e.g. a neural network model leaving out features could change the weights significantly right?
@@cauchyschwarz3295 This is a bit confusing! But no, you will only have one model (the one you want to explain). You marginalise the prediction function (I.e. model) over the features that are not in set S. You do not have to retrain a new model by excluding the features that are not in S.
I think there's a subtlety here in the application. In the theoretical model the goal is to calculate a fair payout for an individual feature based on how much it contributes to building a good model. Good models are measured by how well they predict things. So, we would want to think about how much the feature contributes to a reduction in model error, and you would want to train a model with and without a given feature to figure this out. (But instead we use other methods for determining which features are good that are easier to compute.)
In the use case in this video (and what is the norm in ML), they are using Shapley to explain how a feature contributed to the model prediction, irregardless of how good the model actually is. This is helpful because Shapley still has desireable properties like additivity in this use case. If you trained a new model without this feature, it wouldn't answer the question of how much the feature contributed to the prediction value in the old model.
Can you share link of Previous video which explains Shapley Formula?
Sure, Avijeet! You can find all the videos in this playlist: SHAP
ruclips.net/p/PLqDyyww9y-1SJgMw92x90qPYpHgahDLIK
Nice video.
But i've a question, what you showed in the video are if we are trying to "exclude" a categorical column (degree)
What about continuous column (number column)? (the age)
What value would we use?
Thanks! If the continuous variable is in the coalition, then we use the actual value for that instance (i.e. the person's actual age). If the continuous variable is not in the coalition, then we integrate over the values of the variable w.r.t. to the probability of the values.
However, in practice, we will not know the probability distribution of a variable. So we will have to randomly sample different values for the variable from our dataset. We do this a bunch of times so we end up approximating the distribution.
I hope that makes sense? There is a lot of statistical theory that underlies this explanation!
@@adataodyssey Ahh I see, got it.
But out of curiosity, Can you give me the reference of those statistical theory?
@@chrisleenatra Unfortunatley, I don't have any specific references. I'm using the knowledge from back in my undergrad. If you want to understand take a look at "stochastic calculus"
First you say that it's efficient. One minute later you say that is very computational expsensive?
Yes, that's correct. The efficiency property of Shapley values doesn't have anything to do with computational efficiency. It just means that if you add the Shapley values of an instance and the mean prediction across all instances, you will get the prediction for that instance.