The shap values output should always be the same if you are using the same shap_values object. Perhaps you are rerunning the piece of code where a random sample is created?
Hey great explanation! I have a question: Say I have time series of how many items i sold over 3 years for different items. The items can be sold in multiple stores across the world. My task is to detect an anomaly on the item level (not on the aggregate level). Do I run this isolation forest on each invidual time series and add the store (as a one hot encoded variable) to the feature matrix? Running it individually for each item seems to lose potential information that can be extracted when looking at global patterns across different items. What would you advise in this case? It seems to be a hierarchical time series anomaly detection problem
Hi, good question. You may want to include additional features that capture this global information. For example the average value for one of the features in the month or year before the reading. This will allow you to understand if the current feature value is high/low relative to the average. However, in this case, IsolationForest (IF) may not be the best model. On further investigation, IF cannot model interactions. You would either need to explicitly add a feature that captures the interaction. Like the ratio of the current reading to the average. Or use a model for anomaly detection that does capture interactions like an auto encoder.
great explanation, You are the best !
Thanks Dương!
why does Shap waterfall show different tags names every time we execute for the same shap_value.
The shap values output should always be the same if you are using the same shap_values object. Perhaps you are rerunning the piece of code where a random sample is created?
Hey great explanation! I have a question: Say I have time series of how many items i sold over 3 years for different items. The items can be sold in multiple stores across the world. My task is to detect an anomaly on the item level (not on the aggregate level). Do I run this isolation forest on each invidual time series and add the store (as a one hot encoded variable) to the feature matrix? Running it individually for each item seems to lose potential information that can be extracted when looking at global patterns across different items. What would you advise in this case? It seems to be a hierarchical time series anomaly detection problem
Hi, good question. You may want to include additional features that capture this global information. For example the average value for one of the features in the month or year before the reading. This will allow you to understand if the current feature value is high/low relative to the average. However, in this case, IsolationForest (IF) may not be the best model.
On further investigation, IF cannot model interactions. You would either need to explicitly add a feature that captures the interaction. Like the ratio of the current reading to the average. Or use a model for anomaly detection that does capture interactions like an auto encoder.
thanks
@@maxpain6666 no problem!
Thanks for the detailed video. It is really helpful. Can I get the code base which was used in your demo?
Sure! You can find it here: github.com/conorosully/SHAP-tutorial/blob/main/src/additional_resources/IsolationForest.ipynb
Do it with the 2020 election