- Видео 41
- Просмотров 288 396
A Data Odyssey
Ирландия
Добавлен 15 фев 2014
Data exploration, interpretable machine learning, explainable AI and algorithm fairness
Applying Permutation Channel Importance (PCI) to a Remote Sensing Model | Python Tutorial
🚀 Course 🚀
Free: adataodyssey.com/permutation-channel-importance
Paid: adataodyssey.com/courses/xai-for-cv/
We dive into Permutation Channel Importance (PCI) and show you how to apply it using Python. We'll work with the Landsat Irish Coastal Segmentation (LICS) Dataset, a resource for advancing deep learning methods in coastal water body segmentation. This dataset includes 100 multispectral test images, each with a binary segmentation mask that classifies pixels as either land or ocean. This is a great start if you are interested in applying Explainable AI methods to remote sensing machine learning models.
🚀 Useful playlists 🚀
XAI for CV: ruclips.net/p/PLqDyyww9y-1QA4-o4tTAF_iD5cKCC1qEA&si=o...
Free: adataodyssey.com/permutation-channel-importance
Paid: adataodyssey.com/courses/xai-for-cv/
We dive into Permutation Channel Importance (PCI) and show you how to apply it using Python. We'll work with the Landsat Irish Coastal Segmentation (LICS) Dataset, a resource for advancing deep learning methods in coastal water body segmentation. This dataset includes 100 multispectral test images, each with a binary segmentation mask that classifies pixels as either land or ocean. This is a great start if you are interested in applying Explainable AI methods to remote sensing machine learning models.
🚀 Useful playlists 🚀
XAI for CV: ruclips.net/p/PLqDyyww9y-1QA4-o4tTAF_iD5cKCC1qEA&si=o...
Просмотров: 143
Видео
Explaining Computer Vision Models with PCI
Просмотров 19321 день назад
🚀 Course 🚀 Free: adataodyssey.com/permutation-channel-importance Paid: adataodyssey.com/courses/xai-for-cv/ In this video, we dive into Permutation Channel Importance (PCI) - a powerful explainability technique that helps identify which color channels (like RGB in regular images) contribute to a machine leanring model's predictions. PCI is a simple yet effective approach within Explainable AI (...
Explaining Anomalies with Isolation Forest and SHAP | Python Tutorial
Просмотров 1,2 тыс.Месяц назад
In this video, we dive deep into the world of anomaly detection with a focus on the Isolation Forest algorithm. Isolation Forest is a powerful machine learning model for identifying outliers in high-dimensional data, but understanding why an anomaly is detected can be a challenge. That's where SHAP (SHapley Additive exPlanations) comes in. We'll explore how to use both KernelSHAP and TreeSHAP t...
SHAP with CatBoostClassifier for Categorical Features | Python Tutorial
Просмотров 9892 месяца назад
Combining CatBoost and SHAP can provide powerful insight into your machine learning models. Especially, when you are working with categorical features. With other modelling packages, we need to first transform categorical features using one-hot encodings. The problem is that each binary variable will have its own SHAP value. This makes it difficult to see the overall contribution of the origina...
Applying LIME with Python | Local & Global Interpretations
Просмотров 9815 месяцев назад
LIME is a popular local explainable AI (XAI) method. It can be used to understand the individual predictions made by a black-box model. We will be applying the method using Python. We will see that although LIME is a local method, we can still aggregate lime weights to get global interpretations of a machine learning model. We do this using feature trends, absolute mean weights and a beeswarm p...
An introduction to LIME for local interpretations | Intuition and Algorithm |
Просмотров 1,3 тыс.5 месяцев назад
LIME is a popular explainable AI (XAI) method. It is known as a local model agnostic method. This means it can be used to explain the individual predictions of any machine learning model. It does this by building simple surrogate models around the black-box model’s prediction for an individual instance. We will: - Explain the algorithm used by LIME to get local interpretations. - Discuss in det...
Friedman's H-statistic Python Tutorial | Artemis Package
Просмотров 7975 месяцев назад
Friedman’s h-statistic, also known as the H-stat or H-index, is a metric used to analyse interactions in a machine learning model. We apply the explainable AI (XAI) method using the artemis Python package. We also explain how to interpret the interaction heatmaps and bar plots. This includes the overall, parwise, normalised and unnormalised h-stat. These can be used to understand the percentage...
Friedman's H-statistic for Analysing Interactions | Maths and Intuition
Просмотров 2,4 тыс.5 месяцев назад
We dive deep into Friedman's H-statistic also known as the H-stat or H-index. This is a popular explainable AI (XAI) method. It is a powerful metric for analyzing interactions between features in your machine learning model. We will: - Build intuition for the method by comparing it to PDPs and ICE Plots. - Explain the mathematics behind the pairwise, overall and unnormalised formulas. - Discuss...
Accumulated Local Effect Plots (ALEs) | Explanation & Python Code
Просмотров 1,1 тыс.6 месяцев назад
Highly correlated features can wreak havoc on your machine-learning model interpretations. To overcome this, we could rely on good feature selection. But there are still cases when a feature, although highly correlated, will provide some unique information leading to a more accurate model. So we need a method that can provide clear interpretations, even with multicollinearity. Thankfully we can...
PDPs and ICE Plots | Python Code | scikit-learn Package
Просмотров 8076 месяцев назад
Both Partial Dependence Plots (PDPs) and Individual Conditional Expectation (ICE) plots are a popular explainable AI (XAI) method. They can visualise the relationships used by a machine learning model to make predictions. In this video, we will see how to apply the methods using Python. We will use the scikit-learn package and the PartialDependenceDisplay & partial_dependence functions. We will...
Partial Dependence (PDPs) and Individual Conditional Expectation (ICE) Plots | Intuition and Math
Просмотров 1,9 тыс.6 месяцев назад
Both Partial Dependence (PDPs) and Individual Conditional Expectation (ICE) Plots are used to understand and explain machine learning models. PDPs can tell us if a relationship between a model feature and target variable is linear, non-linear or if there is no relationship. Similarly, ICE plots are used to visualise interactions. Now, at first glance, these plots may look complicated. But you w...
Permutation Feature Importance from Scratch | Explanation & Python Code
Просмотров 1,3 тыс.7 месяцев назад
Feature importance scores are a collection of methods all used to answer one question: which machine learning model features have contributed the most to predictions in general? Amongst all these methods, permutation feature importance is the most popular. This is due to it’s intuitive calculation and because it can be applied to any machine learning model. Understanding PFI is also an importan...
Model Agnostic Methods for XAI | Global v.s. Local | Permutation v.s. Surrogate Models
Просмотров 6327 месяцев назад
Model agnostic method can be used with any model. In Explainable AI (XAI), this means we can use them to interpret models without looking at their interworkings. This gives us a powerful way to interpret and explain complex black-box machine learning models. We will elaborate on this definition. We will also discuss the taxonomy of model agnostic methods for interpretability. They can be classi...
8 Plots for Explaining Linear Regression | Residuals, Weight, Effect & SHAP
Просмотров 1,1 тыс.7 месяцев назад
For data scientists, a regression summary might be all that's needed to understand a linear model. However, when explaining these models to a non-technical audience, it’s crucial to employ more digestible visual explanations. These 8 methods not only make linear regression more accessible but also enrich your analytical storytelling, making your findings resonate with any audience. We understan...
Feature Selection using Hierarchical Clustering | Python Tutorial
Просмотров 2,7 тыс.7 месяцев назад
In this comprehensive Python tutorial, we delve into feature selection for machine learning with hierarchical clustering. We guide you through the essentials of partitioning features into cohesive groups to minimize redundancy in model training. This technique is particularly important as your dataset expands, offering a structured alternative to manual grouping. What you'll learn: - The import...
8 Characteristics of a Good Machine Learning Feature | Predictive, Variety, Interpretability, Ethics
Просмотров 3897 месяцев назад
8 Characteristics of a Good Machine Learning Feature | Predictive, Variety, Interpretability, Ethics
Interpretable Feature Engineering | How to Build Intuitive Machine Learning Features
Просмотров 6608 месяцев назад
Interpretable Feature Engineering | How to Build Intuitive Machine Learning Features
Modelling Non-linear Relationships with Regression
Просмотров 7348 месяцев назад
Modelling Non-linear Relationships with Regression
Explaining Machine Learning to a Non-technical Audience
Просмотров 8868 месяцев назад
Explaining Machine Learning to a Non-technical Audience
Get more out of Explainable AI (XAI): 10 Tips
Просмотров 1,1 тыс.8 месяцев назад
Get more out of Explainable AI (XAI): 10 Tips
The 6 Benefits of Explainable AI (XAI) | Improve accuracy, decrease harm and tell better stories
Просмотров 1,5 тыс.9 месяцев назад
The 6 Benefits of Explainable AI (XAI) | Improve accuracy, decrease harm and tell better stories
Introduction to Explainable AI (XAI) | Interpretable models, agnostic methods, counterfactuals
Просмотров 6 тыс.9 месяцев назад
Introduction to Explainable AI (XAI) | Interpretable models, agnostic methods, counterfactuals
Data Science vs Science | Differences & Bridging the Gap
Просмотров 312Год назад
Data Science vs Science | Differences & Bridging the Gap
About the Channel and my Background | ML, XAI and Remote Sensing
Просмотров 1,4 тыс.Год назад
About the Channel and my Background | ML, XAI and Remote Sensing
SHAP for Binary and Multiclass Target Variables | Code and Explanations for Classification Problems
Просмотров 13 тыс.Год назад
SHAP for Binary and Multiclass Target Variables | Code and Explanations for Classification Problems
Introduction to Algorithm Fairness | Causes, Measuring & Preventing Unfairness in Machine Learning
Просмотров 2,3 тыс.Год назад
Introduction to Algorithm Fairness | Causes, Measuring & Preventing Unfairness in Machine Learning
SHAP Violin and Heatmap Plots | Interpretations and New Insights
Просмотров 6 тыс.Год назад
SHAP Violin and Heatmap Plots | Interpretations and New Insights
Correcting Unfairness in Machine Learning | Pre-processing, In-processing, Post-processing
Просмотров 1,2 тыс.Год назад
Correcting Unfairness in Machine Learning | Pre-processing, In-processing, Post-processing
Definitions of Fairness in Machine Learning | Equal Opportunity, Equalized Odds & Disparate Impact
Просмотров 4 тыс.Год назад
Definitions of Fairness in Machine Learning | Equal Opportunity, Equalized Odds & Disparate Impact
Exploratory Fairness Analysis | Quantifying Unfairness in Data
Просмотров 1,1 тыс.Год назад
Exploratory Fairness Analysis | Quantifying Unfairness in Data
Hi, will there any video on XAI for a fine-tuned LLM on this channel?
bro you should be in hollywood
Haha will stick to RUclips for now :)
I cannot use Shap and LDA together, I am experiencing an index error.
could someone help me
Hi Mustafa, it's not possible to solve your problem based on the information you provided. What package are you using? Are there any other example of where SHAP has been implemented for that package?
La diferencia seria: modelo explicable o inexplicable 😅
thank you for the amazing video! so what do you think are the main differences for global interpretations of LIME and SHAP methods? also this is the first time I'm seeing that LIME is used as a global "interpretator". why do you think that LIME is mostly used for local points whereas it can be aggregated just like SHAP? thanks!
Great questions! SHAP is used to estimate Shapley values. So it inherits properties that are useful for understanding and comparing models. See these videos on the theory behind Shapley values for ML: ruclips.net/video/UJeu29wq7d0/видео.html ruclips.net/video/b9qqbFudVhI/видео.html I think LIME is not used in this way for two reasons: 1) it is slower than SHAP (especially TreeSHAP) to get individual feature weights. This can make it time-consuming to aggregate and analyse many instances. 2) There are no built-in functions to create aggregated plots for LIME like you see in the SHAP package. In general, LIME has fallen out of favour due to the solid theory and speed of SHAP values. So perhaps the creators have decided not to develop the method further.
Thank you for the video. Don’t you think it might be better to simply ignore a certain channel and re-run the convolutional network with fewer input channels? This approach could give a more accurate assessment of overall performance compared to changing a channel to complete noise. Respectfully, I find it hard to accept this as a reliable method for evaluating a channel's importance.
Hi Eli, yes I agree that there are better approaches if you want to analyse the "potential" of different channels. You could use metrics like correlation to avoid training a bunch of models (each model would take me about 10 to 16 hours to train). However, these would not tell you which channels the trained model is using to do segmentation. "Importance" is about what is important to a model and not to the problem in general. The method you described could not be used to explain a trained model as you would be evaluating different models. Lastly, PCI can thought of as an extension of permutation feature importance (PFI). This is a commonly accepted method for explaining models built on tabular data. In the same way, correlation or models built on individual features would not provide the same information as PFI scores.
@@adataodyssey Oh i see. Thanks!
Excellent videos. Well done. I have 2 questions. (1) Is SHAP unsupervised learing? and (2) Can it be used for time-to event (Survival), where there are censored data, analysis? Many thanks.
Thanks! (1) No, SHAP is not model so it is not supervised or unsupervised learning algorithm. It is a method used to explain a model. (2) I'm not familiar with this usecase. But SHAP can be used whenever you have input variables, a function and output. Where you want to explain the contributions each of the input variables to the output. In the context of predictive machine learning, you can use SHAP to explain the contributions of each model feature to a prediction.
@@adataodyssey Thank you for your prompt reply and advice. Best wishes.
Hey great explanation! I have a question: Say I have time series of how many items i sold over 3 years for different items. The items can be sold in multiple stores across the world. My task is to detect an anomaly on the item level (not on the aggregate level). Do I run this isolation forest on each invidual time series and add the store (as a one hot encoded variable) to the feature matrix? Running it individually for each item seems to lose potential information that can be extracted when looking at global patterns across different items. What would you advise in this case? It seems to be a hierarchical time series anomaly detection problem
Hi, good question. You may want to include additional features that capture this global information. For example the average value for one of the features in the month or year before the reading. This will allow you to understand if the current feature value is high/low relative to the average. However, in this case, IsolationForest (IF) may not be the best model. On further investigation, IF cannot model interactions. You would either need to explicitly add a feature that captures the interaction. Like the ratio of the current reading to the average. Or use a model for anomaly detection that does capture interactions like an auto encoder.
thanks
@@maxpain6666 no problem!
Do it with the 2020 election
thank you for the great explanation! though I would've expected the sum of shapley values to be smaller than 1 for a classification problem. am I missing something?
For classification problems, shapley values are interpreted in terms of log odds. See this video for more details: ruclips.net/video/2xlgOu22YgE/видео.html
Thank you so much!
No problem!
Incredible! Thank you! Are there any other resources or papers on this topic? And Im wondering how it relates to superpixels
You can read my conference paper on the topic: arxiv.org/abs/2405.11500 Permutation with superpixels is more aimed at explaining important regions/ groups of pixels. Like how they are used in SHAP or occlusion.
🚀 Course 🚀 Free: adataodyssey.com/permutation-channel-importance Paid: adataodyssey.com/courses/xai-for-cv/
Thanks!
Wow, my first super thanks. Much appreciated Mary
Very informative video! I’m looking forward to the next video on PCI application in satellite imagery classification.
Thanks Danish :)
🚀 Course 🚀 Free: adataodyssey.com/permutation-channel-importance Paid: adataodyssey.com/courses/xai-for-cv/
Hi I cannot understand negative partial dependency values. What does a negative partial dependency value, such as a change from -0.7 to -0.5 with an increase in input, indicate for a parameter? Does this mean the parameter has a negative relationship with the output, or does it imply that the predictions are generally lower than average? see this for example i.sstatic.net/Jmbs5s2C.png
Hey! Thank you for the video. Just a note: XGBoost now automatically deals with categorical features like Catboost. You just need to pass enable_categorical=True when creating the XGBClassifier!
That's great! I guess this is the end of CatBoost then 😅
clear as day! Thank you:)
This was so great
Thanks Joseph :)
could you please link this notebook in the comments or the video description?
Here you go: github.com/conorosully/SHAP-tutorial/blob/main/src/shap_tutorial.ipynb
Hi. Could you please clarify if the model that is given as input to SHAP is built for each possible combination of n features? Say n=3, will the model experiment using several combimations of 3 features and the calculate the values/output value for each case?
Hi Sukhsehaj, good question. I discuss the application to ML in this video: ruclips.net/video/b9qqbFudVhI/видео.html To summarise, you train one model with all features. To get the contribution of different feature sets, you "marginalise" over the features that are not in the set.
Thanks for the detailed video. It is really helpful. Can I get the code base which was used in your demo?
Sure! You can find it here: github.com/conorosully/SHAP-tutorial/blob/main/src/additional_resources/IsolationForest.ipynb
Thanks for being the sole XAI youtuber! as an active learner this is very nice<3
No problem Vic! Will be out with some new content soon :)
Thank you so much for making a video about this topic!
No problem Pablo!
Thanks a lot for the video. You talked about a robust method to counter multicollinearity but I can't get the name you said. AI ease ?
No problem! They are called Accumulated Local Effects (ALE) Plots. I have a video on the topic: ruclips.net/video/5KCA1FMy6U4/видео.html
Thanks 😁 @@adataodyssey
Hello - this is wonderful content ! May I reference this video in a LinkedIn post giving you credit? If so, can I have your LinkedIn handle?
Hi Togy, thanks! Yes, that's no problem at all. My linkedin page is: www.linkedin.com/in/conor-o-sullivan-14b267142/
Amazing Video! Thank you very much! I do have one question regarding the coalition values of 3 players game: if the value of the prize (first, second, or third) can only be obtain when there is a coalition of 3, then what is the meaning of a coalition of 2 players: C_12, C_13, C_23 ?
These are the coalition values (i.e. prize money) if only the 2 players were in a team. To get these, I said "we need to go back in time a few more times". This is so that each alternative team could play the game and win prize money. Obviously, this is not possible. In reality, we must estimate these values. So, the actual prize money of $10000 must be split across the 3 players. However, to find a fair way to divide the money we need to know how much players would have won in alternative teams of 2 or less members.
@@adataodyssey thank you very much for the detailed answer. the marginal coalition values are calculated in advance right ?
@@yaelefrat7677 They will need to be determined in advanced. For some applications they can be observed or estimated but not calculated. For other applications, like machine learning they are calculated.
Great video!!!
Thanks Yash!
🚀 SHAP Course 🚀 SHAP course: adataodyssey.com/courses/shap-with-python/ The first 20 people to use the coupon code "CATSHAP24" will get 100% off!
It was so clearly and well explained, thank you!
My pleasure :)
I have subscribed to newsletter but not getting access to XAI course
You should receive a coupn code in your mail. Let me know if you don't get it!
Very good explanation
Thanks Jhaoui!
WOW! Such an amazing explanation on SHAP! I really enjoyed. Thank you.
No problem Zahra! I'm glad you found it useful
Clear explanation. Thanks!
No problem :)
Thank you!
No problem Panh!
Thank you! You clearly have a talent for explaining complex topics. At my university, it feels like the XAI course is covering almost exclusively topics from your channel! I appreciate your work and hope you find the time and enjoy making these videos! Best, Tim
Thanks Tim! It's good to hear I am covering relevant topics :)
You always bring topics that nobody has heard of or not popular, that's just why I love your content. Best wishes from India.
Thank you! The hope is that someone out there will find it useful :)
@@adataodyssey you're a superhero without a cape.
@@adataodyssey this is my the topic of my Master's so you're really helping lol
@@Daniel-RedTsar I'm glad I coud help. Good luck with the masters!
Thanks for the video really useful, i am facing issue in understanding how you calculated the value for Player (P2) in your companion article. A little explanation on the weights would help there as i am struggling to get the value $3750. Thanks
Embrase the struggle! Will you will learn more through trying to solve it :)
Is there any way to deal with limitation 2: Feature Dependencies ?
Often, even if you have highly correlated features, SHAP will still work. It is just important to keep in mind that it may have problems if you do have highly correlatated features. In this case, you just need to confirm the results from shap using a method that is robust to them like ALEs or simple data exploration methods.
Awesome lecture
@@civilengineeringonlinecour7143 thanks glad it could help!
@@civilengineeringonlinecour7143 I’m glad you found it useful!
Nice video! the plots will be different for keras model right? i follow your codes but it seems that it wont work for neural network model tho.
@@adeauy2294 The plots should be the same if you train a NN on tabular data. However, I’ve had a lot of trouble trying to get the package to work with PyTorch. I’m not sure about Keras but I expect you are having similar problems.
For categorical features @3:35 wouldn't it make sense to just create a full pipeline in which all raw features are preprocessed (scaled, encoded, etc) and run through the model to generate predictions and afterwards calculating the shap values? This way you have the categorical feature contribution in an interpretable way...
@@brenoingwersen784 The problem is if you have a categorical feature with many categories (say 10), you will have 10 dummy features after encoding. This means you will have 10 SHAP values for the categorical feature making it difficult to understand the overall effect of that feature. You can solve this by adding the SHAP values for each dummy feature or using catboost.
What I don’t understand is that, in the game video, the value of a coalition is calculated by re-running the game with each coalition. Here it would seem to me that finding the value of a feature coalition would mean re-training the model for each coalition. That doesn’t just seem expensive, it seems downright prohibitive for complex or large models. You started with a fixed regression model where the weights are already determined, but for a e.g. a neural network model leaving out features could change the weights significantly right?
@@cauchyschwarz3295 This is a bit confusing! But no, you will only have one model (the one you want to explain). You marginalise the prediction function (I.e. model) over the features that are not in set S. You do not have to retrain a new model by excluding the features that are not in S.
I think there's a subtlety here in the application. In the theoretical model the goal is to calculate a fair payout for an individual feature based on how much it contributes to building a good model. Good models are measured by how well they predict things. So, we would want to think about how much the feature contributes to a reduction in model error, and you would want to train a model with and without a given feature to figure this out. (But instead we use other methods for determining which features are good that are easier to compute.) In the use case in this video (and what is the norm in ML), they are using Shapley to explain how a feature contributed to the model prediction, irregardless of how good the model actually is. This is helpful because Shapley still has desireable properties like additivity in this use case. If you trained a new model without this feature, it wouldn't answer the question of how much the feature contributed to the prediction value in the old model.