SHAP with Python (Code and Explanations)

Поделиться
HTML-код
  • Опубликовано: 2 окт 2024

Комментарии • 95

  • @adataodyssey
    @adataodyssey  7 месяцев назад +3

    *NOTE*: You will now get the XAI course for free if you sign up (not the SHAP course)
    SHAP course: adataodyssey.com/courses/shap-with-python/
    XAI course: adataodyssey.com/courses/xai-with-python/
    Newsletter signup: mailchi.mp/40909011987b/signup

    • @mohadesehkeshavarz9107
      @mohadesehkeshavarz9107 6 месяцев назад

      why can not get the XAI for free? the time had ended?

    • @adataodyssey
      @adataodyssey  6 месяцев назад

      @@mohadesehkeshavarz9107 if you sign up for the newsletter letter, you will get a coupon that gives you free access to the XAI course. If you are still having trouble, send me your email on Instagram.

  • @adeauy2294
    @adeauy2294 27 дней назад

    Nice video! the plots will be different for keras model right? i follow your codes but it seems that it wont work for neural network model tho.

    • @adataodyssey
      @adataodyssey  27 дней назад

      @@adeauy2294 The plots should be the same if you train a NN on tabular data. However, I’ve had a lot of trouble trying to get the package to work with PyTorch. I’m not sure about Keras but I expect you are having similar problems.

  • @rafaelagd0
    @rafaelagd0 Год назад +2

    Great video! Could you comment on the future of SHAP? It seems the project was abandoned. The latest commit is from June 2022 and there is a pile of 1.5k issues. I couldn't
    find much information about it and the other packages seem to depend on it. So there may be no alternative.

    • @adataodyssey
      @adataodyssey  Год назад +3

      That is a good point, Rafael! I think SHAP has a good future regardless of the package. The method is widely used in industry and is based on solid theory. The method is based on Shapley values which have been around for long time.
      For now the package works well for me. The 1.5k issues is more an indication of the popularity than major issues with the package. Hopefully, if it does run into serious issues then updates will be made. If not, I’m sure something will take it’s place.
      As I mentioned, it is very popular so someone is sure to take advantage of that. The code and method is all open sourced so it shouldn’t be too hard to replicate. I know there are already other implementations in R (see IML package).

  • @pilarangelicarodriguezcaba8199
    @pilarangelicarodriguezcaba8199 8 месяцев назад +3

    really easy to understand, a lot better than the offician documentation from shap plots

    • @adataodyssey
      @adataodyssey  8 месяцев назад

      Thank you! This was my motivation for the content. Had to do a lot of work to understand the method fully :)

  • @mahsadehghan-ws1kn
    @mahsadehghan-ws1kn 4 месяца назад

    Thank you so much for this awesome video. When I use this code in the #Train model section, I encounter this error. What is the solution?[17:50:59] C:\buildkite-agent\builds\buildkite-windows-cpu-autoscaling-group-i-0b3782d1791676daf-1\xgboost\xgboost-ci-windows\src\data\array_interface.h:492: Unicode-7 is not supported.

    • @adataodyssey
      @adataodyssey  4 месяца назад

      There could be many things going wrong. You can try creating a Python environment and downloading the XGBoost package and only the other ones necessary to train the model.

  • @brenoingwersen784
    @brenoingwersen784 27 дней назад

    For categorical features @3:35 wouldn't it make sense to just create a full pipeline in which all raw features are preprocessed (scaled, encoded, etc) and run through the model to generate predictions and afterwards calculating the shap values? This way you have the categorical feature contribution in an interpretable way...

    • @adataodyssey
      @adataodyssey  27 дней назад

      @@brenoingwersen784 The problem is if you have a categorical feature with many categories (say 10), you will have 10 dummy features after encoding. This means you will have 10 SHAP values for the categorical feature making it difficult to understand the overall effect of that feature.
      You can solve this by adding the SHAP values for each dummy feature or using catboost.

  • @mulusewwondieyaltaye4937
    @mulusewwondieyaltaye4937 6 месяцев назад

    I can't access SHAP python course. Could you please give me the access

    • @adataodyssey
      @adataodyssey  6 месяцев назад

      Hi Mulusew, the SHAP course is no longer free. But you will now get free access to my XAI course if you sign up to the newsletter

  • @mayuribhandari2224
    @mayuribhandari2224 11 дней назад

    I have subscribed to newsletter but not getting access to XAI course

    • @adataodyssey
      @adataodyssey  11 дней назад

      You should receive a coupn code in your mail. Let me know if you don't get it!

  • @ShotClockHoops
    @ShotClockHoops 7 месяцев назад

    This is the best way to explain explanations 😁
    I am interested to see a video of yours with more complex models like Deep Neural Networks on Signal Data and how can we use SHAP on that.
    Great work!

    • @adataodyssey
      @adataodyssey  7 месяцев назад

      Thank you! I will keep that in mind

  • @fouedhamouda7356
    @fouedhamouda7356 Месяц назад

    Thanks,
    can I use Shap with GAN model?

    • @adataodyssey
      @adataodyssey  Месяц назад

      SHAP is model agnostic so it could be used. However, SHAP can be difficult to implement for neural networks in general. I'm not aware of it being used for GANs.

  • @cutestbear3327
    @cutestbear3327 11 месяцев назад +2

    thank you for the awesome video~ really like the way you explain everything thoroughly and meticulously. really friendly to people like us who have just begun our journey into data science

    • @adataodyssey
      @adataodyssey  11 месяцев назад

      I'm glad you found it useful! Are there any other related concepts you are interested in learning about?

    • @cutestbear3327
      @cutestbear3327 11 месяцев назад +1

      @@adataodyssey hi conor, thnx for your kind reply. i am happy to go with whatever topic you dive into. maybe random forest (and its hyperparameter tuning) since it is such a classic?
      may you have fun and enjoy continued success on youtube~~ cheers

  • @yael123gut
    @yael123gut 7 дней назад

    It was so clearly and well explained, thank you!

  • @Irades
    @Irades Месяц назад

  • @shamkhalmammadov4083
    @shamkhalmammadov4083 Год назад +2

    Can you please make another example with categorical variables

    • @adataodyssey
      @adataodyssey  Год назад +1

      Hi Shamkhal, there is a video in the course that explains categorical features :) Otherwise, you might find this article useful (no-paywall link): towardsdatascience.com/shap-for-categorical-features-7c63e6a554ea?sk=2eca9ff9d28d1c8bfde82f6784bdba19

    • @shamkhalmammadov4083
      @shamkhalmammadov4083 Год назад +1

      @@adataodyssey Thank you very much! I am your big fun. I loved the way you explained SHAP. I got medium 3 days ago just to read your article. I still have a big problem with waterfall plot my targte variable has 3 classes - 0,1,2 for some reason I can not plot faterfall type plot

    • @adataodyssey
      @adataodyssey  Год назад

      @@shamkhalmammadov4083 Okay, in this case you have a categorical feature as your target variable. I assumed you meant categorical feature as an input feature. I have only worked with binary target variables.
      Can you send me your link to your dataset>

  • @soniaspisak645
    @soniaspisak645 4 месяца назад

    Hi, I'm struggling with explaining GRU and LSTM models with SHAP. Encouraged by your videos, I am considering buying the course, but does it cover working with 3D data? Is even possible to implement SHAP and obtain reliable plots (without flattening the data) for time-series models?

    • @adataodyssey
      @adataodyssey  4 месяца назад

      Hi Sonia, unfortunately, the course focuses on tabular data and models like XGBoost, Random Forest and CatBoost. There is one lesson on SHAP for image data but it doesn't sound like that will help you much.
      If you are working with PyTorch, these articles might help you get started with applying SHAP:
      towardsdatascience.com/image-classification-with-pytorch-and-shap-can-you-trust-an-automated-car-4d8d12714eea?sk=b04dcbb8a09f049f605d2110b5c8d851
      towardsdatascience.com/using-shap-to-debug-a-pytorch-image-regression-model-4b562ddef30d?sk=7eb3016839186f1ba2a6f1f105f8ff64

  • @possakornkittipipatthanapo1737
    @possakornkittipipatthanapo1737 3 месяца назад

    Hi Shapley value is very amazing in various interpretation and model understanding. However, I didn't see application related to the multi model like visual language model for example CLIP. Could you please provide any explanation or reference to further research?

    • @adataodyssey
      @adataodyssey  3 месяца назад

      Hi I'm not too familiar with this area. I think SHAP is not the best for LLMs or generative models as you are not making predictions.

  • @noazamstein5795
    @noazamstein5795 7 месяцев назад

    What does it mean that being a male increases the prediction by 0.78, AND ALSO not being an infant FURTHER increases it by 0.42? These two are obviously mutually exclusive, so I would expect either one of them being the sum of 0.78+0.42 or something else

    • @adataodyssey
      @adataodyssey  7 месяцев назад

      Your confusion is warranted as there is not a clear interpretation for this feature. In the model, there are three sex features (M, F and I). Together they are mutually exclusive. You are right, by summing up the values you get a clear interpretation of the contribution of the original categorical feature.
      Unfortunately, there is no easy way to do this with the SHAP package. We discuss this is in my SHAP course. You can also find a solution in this article:
      towardsdatascience.com/shap-for-categorical-features-7c63e6a554ea?sk=2eca9ff9d28d1c8bfde82f6784bdba19

  • @slimanearbaoui1237
    @slimanearbaoui1237 Год назад +1

    can this library work with lstm model

    • @adataodyssey
      @adataodyssey  Год назад +1

      Hi Slimane :) I've never applied it to an lstm models. Applying SHAP to deep learning models can be challenging. You may be able to apply SHAP to lstm model with some work.
      I have applied it to convolutional neural networks used for image classification and regression tasks. I've linked to two article below. I used the PyTorch. I know that SHAP also works with keras.
      towardsdatascience.com/image-classification-with-pytorch-and-shap-can-you-trust-an-automated-car-4d8d12714eea?sk=b04dcbb8a09f049f605d2110b5c8d851
      towardsdatascience.com/using-shap-to-debug-a-pytorch-image-regression-model-4b562ddef30d?sk=7eb3016839186f1ba2a6f1f105f8ff64

  • @NasirUddin-im2zb
    @NasirUddin-im2zb Год назад

    When i was running my code i had this issues, regading shap: FutureWarning: In the future `np.long` will be defined as the corresponding NumPy scalar.
    long_ = _make_signed(np.long), I did pip install 1.20.0, 1.24.2, 1.22.2 so on, no of them work, what can i do, if you can suggest me something it will be great.

    • @adataodyssey
      @adataodyssey  Год назад

      Hi Nasir, sorry about that. I've never seen that issue before. To confirm, do you mean that you installed different versions on NumPy?
      This link might help: github.com/neonbjb/tortoise-tts/issues/379
      They suggest trying:
      pip install numpy==1.20.0

  • @Gustavo-nn7zc
    @Gustavo-nn7zc 4 месяца назад

    Hi @adataodyssey , great video, thanks! Is there a way to use SHAP for ARIMA/SARIMA?

    • @adataodyssey
      @adataodyssey  3 месяца назад

      Hi Gustavo, it's been a while since I've done time series analysis. If I remember correctly, those models are "interstitially interpretable." This means you can look directly at the parameters in the model to understand how it works and don't need model-agnostic methods like SHAP.
      That being said, you can still apply SHAP to linear models (see the article below). So it may be useful for ARIMA but I haven't seen it applied before.
      medium.com/towards-data-science/8-plots-for-explaining-linear-regression-to-a-layman-489b753da696?sk=ae508ca38771f36045312a27b81ffa75

  • @ShrijaSheth
    @ShrijaSheth 8 месяцев назад

    I tried XGBoost for a different dataset but it did not give a good scatter plot nor a red line significant to separate the observations. So which other model should one use if the number of features are 870?

    • @adataodyssey
      @adataodyssey  8 месяцев назад +1

      This is too many features! You will never be able to get good explanations. Try to reduce the amount of features by removing the highly correlated ones.

  • @apogounte8239
    @apogounte8239 11 месяцев назад

    Hi! Interesting video! Just wanted to mention that if you just run shap.plots.waterfall(shap_values[0]), you never get on the y-axis, the actual names of the features, but you get instead feature 5, feature 2, etc. Is there a quick fix?

    • @adataodyssey
      @adataodyssey  11 месяцев назад

      Yes, you should be able to fix that. You can try:
      1) Make sure your X feature matrix (that you pass into the explainer function i.e. shap_values = explainer(X)) is a pandas dataframe and the column names are the correct feature names. You can check these using X.columns
      2) Update the shap_values after they have been created using something like:
      shap_values.feature_names = list(["feature 1","feature 2", ... ]). It is important to pass the new names as a list.
      Let me know if that helps

  • @bakerb-rz6lv
    @bakerb-rz6lv Год назад +2

    I got something strange bugs. I copy your code, and I run it. At today morning, The code work correctly. But now, it cannot work. I did not change anything!
    The error message is, After I run the code "explainer = shap.Explainer(model)":
    TypeError: The passed model is not callable and cannot be analyzed directly with the given masker! Model: XGBRegressor(base_score=None, booster=None, callbacks=None,
    colsample_bylevel=None, colsample_bynode=None,
    colsample_bytree=None, early_stopping_rounds=None,
    enable_categorical=False, eval_metric=None, feature_types=None,
    gamma=None, gpu_id=None, grow_policy=None, importance_type=None,
    interaction_constraints=None, learning_rate=None, max_bin=None,
    max_cat_threshold=None, max_cat_to_onehot=None,
    max_delta_step=None, max_depth=None, max_leaves=None,
    min_child_weight=None, missing=nan, monotone_constraints=None,
    n_estimators=100, n_jobs=None, num_parallel_tree=None,
    predictor=None, random_state=None, ...)

    • @adataodyssey
      @adataodyssey  Год назад

      Can you try to run this code:
      explainer = shap.Explainer(model,X[0:10])
      where X is the feature matrix used to train your model. For some models, you need to pass this in as a mask. You can see the full example for a random forest here:
      github.com/conorosully/SHAP-tutorial/blob/main/src/project_1_solution.ipynb

    • @bakerb-rz6lv
      @bakerb-rz6lv Год назад

      @@adataodyssey It still cannot work. Strangely, it says "AttributeError: module 'numpy' has no attribute 'bool'". I do not understand why this code is about the numpy. All packages I used is the newest version.

    • @bakerb-rz6lv
      @bakerb-rz6lv Год назад

      @@adataodyssey And I found another difference. In your GitHub code, the step 9--Train model. Your output is
      XGBRegressor(base_score=0.5, booster='gbtree', callbacks=None,
      colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1,
      early_stopping_rounds=None, enable_categorical=False,
      eval_metric=None, gamma=0, gpu_id=-1, grow_policy='depthwise',
      importance_type=None, interaction_constraints='',
      learning_rate=0.300000012, max_bin=256, max_cat_to_onehot=4,
      max_delta_step=0, max_depth=6, max_leaves=0, min_child_weight=1,
      missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=0,
      num_parallel_tree=1, predictor='auto', random_state=0, reg_alpha=0,
      reg_lambda=1, ...)
      But my output and your video's output is :
      XGBRegressor(base_score=None, booster=None, callbacks=None,
      colsample_bylevel=None, colsample_bynode=None,
      colsample_bytree=None, early_stopping_rounds=None,
      enable_categorical=False, eval_metric=None, feature_types=None,
      gamma=None, gpu_id=None, grow_policy=None, importance_type=None,
      interaction_constraints=None, learning_rate=None, max_bin=None,
      max_cat_threshold=None, max_cat_to_onehot=None,
      max_delta_step=None, max_depth=None, max_leaves=None,
      min_child_weight=None, missing=nan, monotone_constraints=None,
      n_estimators=100, n_jobs=None, num_parallel_tree=None,
      predictor=None, random_state=None, ...)

    • @adataodyssey
      @adataodyssey  Год назад +1

      @@bakerb-rz6lv Sometimes, if you are using the newest versions, then other packages have not caught up yet. It could be that SHAP uses an older version of numpy. See this similar issue: stackoverflow.com/questions/74893742/how-to-solve-attributeerror-module-numpy-has-no-attribute-bool#:~:text=This%20means%20you%20are%20using,while%20that%20isn't%20fixed.
      The important point is: "Then, in version NumPy 1.24.0, the deprecated np.bool was entirely removed. This means you are using a NumPy version that removed the deprecated ways AND the library you are using wasn't updated to match that version (uses something like np.bool instead of just bool)."
      You could try to install an early version of numpy. But this is just a guess on my part.

    • @bakerb-rz6lv
      @bakerb-rz6lv Год назад

      @@adataodyssey God damn it! You are right. I install numpy==1.22.3 and it work correctly. Maybe you can set this comment to top to notice other freshmen.

  • @sirireddy3102
    @sirireddy3102 4 месяца назад

    I am getting error near model.fit my data has text and numeric
    So can you help me resolving it

    • @adataodyssey
      @adataodyssey  3 месяца назад

      You will probably need to SHAP text explainer

  • @yukiwang5825
    @yukiwang5825 Год назад +1

    Wonderful video' Thanks for this.

  • @ooplectures3828
    @ooplectures3828 Год назад

    Please explain how can i use shap to determine features important against classes in a multi classification problem. I need to know which features or values of features are contributing to prediction of each class in a multi classification system.

    • @adataodyssey
      @adataodyssey  Год назад

      This has been on the list for a while. I'm not sure when I'll be able to do it but hopefully soon!

  • @DarkKnight7_1
    @DarkKnight7_1 Год назад

    Hi Connor, you mentioned on the limitation of the SHAP values that "highly correlated features are a problem when using shap values technique", but on this video the heat map shows that features are highly correlated?

    • @adataodyssey
      @adataodyssey  Год назад

      The problem with correlated features is that they can potentially lead to unexpected model predictions. That is when we sample pairs of feature values that do not exist in the dataset. Some models will still produce reasonable predictions even if there are correlated features.
      The point is you can still use SHAP even if you have correlated features. You just need to be aware that the results may be negatively impacted. It is important to validate the results using other methods and visualisations. For example, it's not included here, but in the course, we use SHAP interaction values to find an interaction between two features. We then confirm this interaction using a scatter plot. In other words, we had a useful result even with highly correlated features.
      I hope that makes sense!

  • @markfedenia3383
    @markfedenia3383 Год назад

    I see that cuML computes Shapley values, however it does not look like the Explainer object is compatible with shap. Do you know if there is any way to use the cuML Explainer object and model with the shap package (by the way, excellent videos)

    • @adataodyssey
      @adataodyssey  Год назад

      Thanks! I'm not too familiar with cuML but I think it should be possible. You would have to replace all SHAP values and base_values in a SHAP explainer object with those from the cuML explainer object.
      It's not exactly what you are looking for but this article explains how you can manipulate the SHAP values object and then use the SHAP plots as normal: towardsdatascience.com/shap-for-categorical-features-7c63e6a554ea?sk=2eca9ff9d28d1c8bfde82f6784bdba19

  • @digitama
    @digitama 9 месяцев назад

    Your explanation is very interesting, but I met with a problem that is "Numba needs NumPy 1.20 or less" and no matter how much downgrade the Numpy and Numba I did, the problem still doesn't go away, any suggestions?

    • @adataodyssey
      @adataodyssey  9 месяцев назад

      Sorry to hear that! Did you try only downgrading the Numpy package? Also you could try upgrading the Numba package instead so it is inline with the latest version of Numpy. Remember to refresh your kernel after installing a new package, if you are working with a notebook.

    • @digitama
      @digitama 9 месяцев назад

      @@adataodyssey I did downgraded Numba and havent tried upgrading it, what is the version to upgrade to?

  • @famin7794
    @famin7794 2 месяца назад

    Can't thank you enough. You solve my problem.

  • @elenal8494
    @elenal8494 2 месяца назад

    Thank you! your youtube videos are very helpful!

  • @anki8136
    @anki8136 Год назад

    Hey connor , Thanks for the course
    I just have one doubt , how to explain this stacked force plot , I am having some problems in that.
    can you make a video or something?

    • @adataodyssey
      @adataodyssey  Год назад

      Hi Anki, I am sorry that the explanation was not clear. Yet, I am reluctant to make a video on the stacked force plot. This is because, in practice, I have not found it very useful. It is used to explore relationships between features and shap values. But you can do this using the dependence plots which are also easier to understand.
      In the course, I go into a bit more detail on the stacked force plot. Did you see that section?

    • @anki8136
      @anki8136 Год назад

      @@adataodyssey no I didn't saw that video yet but I will watch it now

    • @adataodyssey
      @adataodyssey  Год назад +1

      @@anki8136 Okay, hopefully that clears things up for you. It is in the aggregations lesson

  • @bakerb-rz6lv
    @bakerb-rz6lv Год назад

    love you, bro.😀

  • @thegerman1239
    @thegerman1239 10 месяцев назад

    Thank you so much for this awesome video! I'm currently writing a term paper about this topic and other machine learning explainability techniques. This helped me out a lot while creating my examples!
    Kind regards from Germany!

    • @adataodyssey
      @adataodyssey  10 месяцев назад +1

      Guten tag! I'm glad this helped. I also have videos about the maths behind Shapley values:
      ruclips.net/video/UJeu29wq7d0/видео.htmlsi=-s-QTmLoQmSiYwFD
      ruclips.net/video/b9qqbFudVhI/видео.htmlsi=uMpSUk7ue6Tzs8SQ

    • @thegerman1239
      @thegerman1239 10 месяцев назад

      Hey I'm done with the paper! The videos about the math really helped me as well. You're a champ

    • @adataodyssey
      @adataodyssey  10 месяцев назад +1

      @@thegerman1239 Great stuff! All the best with the result.

  • @KOTESWARARAOMAKKENAPHD
    @KOTESWARARAOMAKKENAPHD Год назад

    I got error in boxplot code

    • @adataodyssey
      @adataodyssey  Год назад

      Sorry to hear that. Can you describe the error in more detail?

  • @felicebugge
    @felicebugge 5 месяцев назад

    Really useful , thank you

  • @tamojitmaiti
    @tamojitmaiti 7 месяцев назад

    This is so clear and concise! Thank you!

    • @adataodyssey
      @adataodyssey  7 месяцев назад

      No problem Tamojit! This is my goal. More XAI content is on the way.

  • @murilopalomosebilla2999
    @murilopalomosebilla2999 Год назад

    Really well explained. Thanks ^^

    • @adataodyssey
      @adataodyssey  Год назад

      No problem! I'm glad you found it useful

  • @wangchris5468
    @wangchris5468 Год назад

    Lovely ~~~~ 👍👍👍

  • @bakerb-rz6lv
    @bakerb-rz6lv Год назад

    Hello, teacher. I use another method to train my model. Here are some codes:
    from sklearn.model_selection import train_test_split
    # Extract feature and target arrays
    X, y = df.drop('Grade', axis=1), df[['Grade']]
    # Extract text features
    cats = X.select_dtypes(exclude=np.number).columns.tolist()
    # Convert to Pandas category
    for col in cats:
    X[col] = X[col].astype('category')
    X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=1)
    dtrain_reg = xgb.DMatrix(X_train, y_train, enable_categorical=True)
    dtest_reg = xgb.DMatrix(X_test, y_test, enable_categorical=True)

    • @bakerb-rz6lv
      @bakerb-rz6lv Год назад

      params = {"objective": "reg:squarederror", "tree_method": "gpu_hist"}
      n = 100
      model = xgb.train(
      params=params,
      dtrain=dtrain_reg,
      num_boost_round=n,
      )
      explainer = shap.Explainer(model)
      shap_values = explainer(X)

    • @bakerb-rz6lv
      @bakerb-rz6lv Год назад

      And it have something wrong:
      TypeError: The passed model is not callable and cannot be analyzed directly with the given masker! Model:
      How can I fix it?

    • @adataodyssey
      @adataodyssey  Год назад

      Sorry I missed this comment! But I think I answered you question on the other comment :)