Absolutely love this content and these kinds of videos. This was nuanced and came with real-world examples and experience. Beats anything I ever got from my stats grad school years. All the best.
I've lost this one for some reason. Anyway, as someone who knows decision trees quite "in-depth" I'd say this is a very clear lesson, very good material as always
Like always, great quality video Dimitri! If you are looking for video suggestions, I would really like to see a video regarding the possible risks of creating models based on observed granger causality between financial timeseries (perhaps not explainable because of the high number of independent variables) that may have led to good out of sample prediction performance. Would like to hear a practical example of model monitoring (perhaps some of the more popular metrics you have used previously) that could help detect if the model is deteriorating. Thanks again for your effort placed into putting these types of videos together.
Hi Dimitri. Great video. I’m currently halfway through my first year of undergrad. I’m doing a dual math cs degree. I’ve chosen the Stats ‘track’ for the math part of my degree, but I’m not sure what the optimal ‘track’ for CS would be if I’m looking to best prepare myself for quant work. My options are data science, machine learning and scientific computing. I’m sure they’re all valuable skills to learn, but which do you think is the best foundation for quant work? Thanks in advance.
I would do scientific computing but all the are decent choices as ML and data science are taking off. Scientific computing should give you some nice math overlap and numerical analysis is a key part of quant finance.
Great video! How would one avoid such a situation? In a scenario where there are thousands of predictors, I can hardly imagine looking at correlations before building the model could help, as there are just too many to manually go through. The same would apply when pruning a tree.
Cluster analysis. You create clusters based on statistical relationships using something like PCA. There will be a point when the value added from adding more clusters becomes trivial. Often we end up with around 20 clusters for 500 variables. Then you manually review the top few variables in each cluster and build a model with those variables which would give you around 60 final variables.
Hi Dimitri, informative and great video as always! Just a quick question, do you personally think that a degree in statistics to then go onto a Mfe would give me a better chance to become a quant analyst, or a financial mathematics degree to then go onto an Mfe. Which degree do you think will prepare me better for a MFE too? Thanks
No but you could Google and see if any come up. Multicollinearity can be logically drawn from the math and method of trees. You don't need a paper to come to this conclusion.
If one of the correlated variables is used in the split then other ones automatically become unlikely as they won't reduce impurity. Won't this help a Decision Tree be more robust? Added the question from premiere in case someone has the same doubt.
The strength of a decision tree is that it will prevent multicollinearity further down a branch. The issue is when variables are blindly selected based on correlation. If a wrong variable is used it is highly likely the tree will fail quickly which reduces the robustness.
@@DimitriBianco It is true, blindly selection of variable into model is a very dangerous business in ML/Datai and especially XAI that we wish to interpret Partial Dependency Plot blindly came with some wrong and/or noisy sign. Thanks for good explanations. From An Asian (Cambodian) Applied and Theoretical Economist’s Econometrics Mathematical Statistician
Imagine you where the FBI and you predicted crime stats based on ice cream sale. Suddenly in November a video posted on Facebook and then ensues mass riots. Ice cream sale wouldn't change yet crimes would rise.
This type of video is my favorite. It's unique on RUclips.
s/o from South Africa
Thanks! I have a similar one coming out in a few weeks on how not to do cross validation and sampling.
Absolutely love this content and these kinds of videos. This was nuanced and came with real-world examples and experience. Beats anything I ever got from my stats grad school years. All the best.
Thanks! I'll try and make a few more of these types of videos.
I've lost this one for some reason. Anyway, as someone who knows decision trees quite "in-depth" I'd say this is a very clear lesson, very good material as always
Can't wait.
Please make more ✅
Like always, great quality video Dimitri!
If you are looking for video suggestions, I would really like to see a video regarding the possible risks of creating models based on observed granger causality between financial timeseries (perhaps not explainable because of the high number of independent variables) that may have led to good out of sample prediction performance. Would like to hear a practical example of model monitoring (perhaps some of the more popular metrics you have used previously) that could help detect if the model is deteriorating.
Thanks again for your effort placed into putting these types of videos together.
I'll look into making some videos around these ideas.
Hi Dimitri. Great video. I’m currently halfway through my first year of undergrad. I’m doing a dual math cs degree.
I’ve chosen the Stats ‘track’ for the math part of my degree, but I’m not sure what the optimal ‘track’ for CS would be if I’m looking to best prepare myself for quant work.
My options are data science, machine learning and scientific computing. I’m sure they’re all valuable skills to learn, but which do you think is the best foundation for quant work?
Thanks in advance.
I would do scientific computing but all the are decent choices as ML and data science are taking off. Scientific computing should give you some nice math overlap and numerical analysis is a key part of quant finance.
Great stuff as always! I was wondering what's the average age group of your viewers?
85% is between 18 and 34 years old.
Great video!
How would one avoid such a situation? In a scenario where there are thousands of predictors, I can hardly imagine looking at correlations before building the model could help, as there are just too many to manually go through. The same would apply when pruning a tree.
Cluster analysis. You create clusters based on statistical relationships using something like PCA. There will be a point when the value added from adding more clusters becomes trivial. Often we end up with around 20 clusters for 500 variables. Then you manually review the top few variables in each cluster and build a model with those variables which would give you around 60 final variables.
@@DimitriBianco Makes sense, thank you!
How do we fix it?
Hi Dimitri, informative and great video as always! Just a quick question, do you personally think that a degree in statistics to then go onto a Mfe would give me a better chance to become a quant analyst, or a financial mathematics degree to then go onto an Mfe. Which degree do you think will prepare me better for a MFE too? Thanks
Do you have any suggestions for scientific articles on the topic you mentioned in the video? Thank you...
No but you could Google and see if any come up. Multicollinearity can be logically drawn from the math and method of trees. You don't need a paper to come to this conclusion.
If one of the correlated variables is used in the split then other ones automatically become unlikely as they won't reduce impurity. Won't this help a Decision Tree be more robust?
Added the question from premiere in case someone has the same doubt.
The strength of a decision tree is that it will prevent multicollinearity further down a branch. The issue is when variables are blindly selected based on correlation. If a wrong variable is used it is highly likely the tree will fail quickly which reduces the robustness.
@@DimitriBianco It is true, blindly selection of variable into model is a very dangerous business in ML/Datai and especially XAI that we wish to interpret Partial Dependency Plot blindly came with some wrong and/or noisy sign. Thanks for good explanations. From An Asian (Cambodian) Applied and Theoretical Economist’s Econometrics Mathematical Statistician
Imagine you where the FBI and you predicted crime stats based on ice cream sale. Suddenly in November a video posted on Facebook and then ensues mass riots. Ice cream sale wouldn't change yet crimes would rise.