Bayesian Linear Regression : Data Science Concepts
HTML-код
- Опубликовано: 21 авг 2024
- The crazy link between Bayes Theorem, Linear Regression, LASSO, and Ridge!
LASSO Video : • Lasso Regression
Ridge Video : • Ridge Regression
Intro to Bayesian Stats Video : • What the Heck is Bayes...
My Patreon : www.patreon.co...
As soon as you explained the results from Bayesian my jaw was wide open for like 3 minutes this is so interesting
Read it on a book. Didn't understand jack shit back then. Your videos are awesome. Rich, small, consise. Please make a video on Linear Discriminant Analysis and how its related to bay's theorem. This video will be saved in my data science playlist.
This video is a true gem, informative and simple at once. Thank you so much!
Glad it was helpful!
Regardless of how they were really initially devised, seeing the regularization formulas pop out of the bayesian linear regression model was eye-opening - thanks for sharing this insight
Yes. This really blew my mind. Boom.
Love you, bro, I got my joining letter from NASA as a Scientific Officer-1, believe me, your videos always helped me in my research works.
I used to be afraid of Bayesian Linear Regression until I saw this vid. Thank you sooo much
Awesome! Youre welcome
For me, the coolest thing about statistics is that every time I do a refresh on these topics, I get some new ideas or understandings. It's lucky that I came across this video after a year, which could also explain why we need to "normalized" the X (0 centered, with stdev = 1) before we feed them into the MLP model, if we use regularization terms in the layers.
Unbelievable, you explained linear reg, explained in simple terms Bayesian stat, and showed the connection under 20min .... Perfect
At last!! I could find an explanation for the lasso and ridge regression lamdas!!! Thank you!!!
Happy to help!
Amazing, you kept it simple and showed how regularization terms in linear regression originated from Bayesian approach!! Thank U!
Mi mente explotó con este video. Gracias
at last!!! Now I can see what lamda was doing in tne lasso and ridge regression!! great video!!
Glad you liked it!
Man I'm going to copy-paste your video whenever I want to explain regularization to anyone! I knew the concept but I would never explain it the way you did. You nailed it!
This is my favorite video out of a large set of fantastic videos that you have made. It just brings everything together in such a brilliant way. I keep getting back to it over and over again. Thank you so much!
This is the best explanation of L1 and L2 I've ever heard
I've seen everything in this video many, many times, but no one had done as good a job as this in pulling these ideas together in such an intuitive and understandable way. Well done and thank you!
This is truly cool. I had the same thing with the lambda. It’s good to know that it was not some engineering trick.
Awesome explanation! Especially the details on the prior were so helpful!
Glad it was helpful!
Amazing video! Really clearly explained! Keep em coming!
Glad you liked it!
Awesome explanation!
This is incredible. Clear, well paced and explained. Thank you!
I'd never considered a Bayesian approach to linear regression let alone its relation to lasso/ridge regression. Really enlightening to see!
Thanks!
Awesome video. I didn't realize that the L1, L2 regularization had a connection with the Bayesian framework. Thanks for shedding some much needed light on the topic. Could you please also explain the role of MCMC Sampling within Bayesian Regression models? I recently implemented a Bayesian Linear Regression model using PyMC3, and there's definitely a lot of theory involved with regards to MCMC NUTS (No U-Turn) Samplers and the associated hyperparameters (Chains, Draws, Tune, etc.). I think it would be a valuable video for many of us.
And of course, keep up the amazing work! :D
good suggestion!
Thanks a lot! Great! I am reading Elements of Statistical Learning and did not understand what they were talking about. Now I got it.
This was an excellent introduction to Bayesian Regression. Thanks a lot!
Brilliant and clear explanation, I was struggling to grasp the main idea for a Machine Learning exam but your video was a blessing. Thank you so much for the amazing work!
This blew my mind.Thanks
Wow, killer video. This was a topic where it was especially nice to see everything written on the board in one go. Was cool to see how a larger lambda implies a more pronounced prior belief that the parameters lie close to 0.
I also think it’s pretty cool 😎
you are so good at this, this video is amazing
Thank you so much!!
very cool the link you explained between regularization and prior
Mind blown on the connection between regularization and priors in linear regression
It just blown my mind too. I can feel you brother. Thank you!
Best of all videos on Bayesian regression; other videos are so boring and long but this one has quality as well as ease of understanding..Thank you so much!
Thank you for this amazing video, It clarified many things to me!
Cristal clear! , thank you so much, the explanation is very structured and detailed
This was incredible, thank you so much.
This is brillian man! Brilliant! Literally solved where the lamda comes from!
This is sooo clear. Thank you so much!
Really good explanation. I really like how you gave context and connected all topics together and it make perfect sense. While maintaining the perfect balance b/w math and intution. Great worl. Thank You !
This is an awesome explanation
Excellent tutorial! I have applied RIDGE as the loss function in different models.
However, it is the first time I understand the mathematical meaning of lambda. It is really cool!
Thank you for sharing this fantastic content.
Glad you enjoy it!
Your videos are a Godsend!
There is an error at the beginning of the video, in frequentist approaches X is treated as non random covariate data and y is the random part so the high variance of OLS should be expressed as small changes to y => big changes to OLS estimator.
The changes to covariate matrix becoming big changes to OLS estimator is more like a non robustness of OLS wrt outlier contamination.
Also the lambda should be 1/2τ^2 not σ^2/τ^2 since:
ln(P(β))=-p * ln(τ * √2*π) - ||β||₂/2τ^2
Overall this was very helpful cheers!
Legendary video
Notes for my future revision.
*Priror β*
10:30
Value of Prior β is normally distributed. The by product of using Normal Distribution is Regularisation. Because the prior values of β won't be too large (or too small) from the mean.
Regularisation keep values of β small.
You are THE LEGEND
Incredible explanation!
Man! What a great explanation of Bayesian Stats. It's all starting to make sense now. Thank you!!!
Max ( P(this is the best vid explaining these regressions | RUclips) )
Can you please please do a series on categorical distribution, multinomial distribution, Dirichlet distribution, Dirichlet process and finally non parametric Bayesian tensor factorisation including clustering of steaming data. I will personally pay you for this. I mean it!!
There are a few videos on these things on youtube, some are good, some are way high-level. But, no one can explain the way you do.
This simple video has such profound importance!!
This was awesome, thanks a lot for your time :)
Your videos are a true gem, and an inspiration even. I hope to be as instructive as you are if I ever become a teacher!
your videos are awesome so much better than my prof
Super informative and clear lesson! Thank you very much!
Great thanks! .. was feeling the same discomfort about the origin of these...
This video is super informative! It gave me the actual perspective on regularization.
such a nice explanation. I mean thats the first time I actually understood it.
Love this content! More examples like this are appreciated
More to come!
mind boggling
Your videos are great. Love the connections you make so that stats is intuitive as opposed to plug and play formulas.
you got a subscriber, awesome explanation. I spent hours learning it from other source, but no success. You are just great
Thanks, man. A really good and concise explanation of the approach (together with the video on Bayesian statistics).
Great video, do you have some sources I can use for my university presentation? You helped me a lot 🙏 thank you!
What a wonderful explanation!!
Glad you think so!
Tks a lot for this clear explanation !
At the end I understand it too finally. A hint for peaple who also struggle on BR like me: do a Bayesian linear regression in Python from any tutorial that you find online, you are going to understand, trust me. I think that one of the initial problems for a person that face a Bayesian approach it’s the fact that you are actually obtaining a posterior *of weights*!. Now looks kinda obvious but at the beginning I was really stuck, I could not understand what was actually the posterior doing.
thank you so much for the great explanation
truly excellent explanation; well done
This video is amazing!!! so helpful and clear explanation
Most insightful! L1 as Laplacian toward the end was a bit skimpy, though. Maybe I should watch your LASSO clip. Could you do a video on elastic net? Insight on balancing the L1 and L2 norms would be appreciated.
Yea, Elasticnet and comparison to Ridge/Lasso would be very helpful
you are a great teacher!!!🏆🏆🏆
Thank you! 😃
Thanks a lottttt! I had so much difficulty understanding this.
You are a great teacher thank you for your videos!!
Thanks for video.. Its really helpful.. I was trying to understand how regularization terms are coming.. Now i got. Thanks ..
perfect explanation thank you
My mind is blown.....woow...
Thanks, that was a good one. Keep up the good work!
wonderful stuff! thank you
Holy shit! This is amazing. Mind blown :)
Wonderfully explained! Mathematicians should be more subscribed to!
Great video. The relation between the prior and LASSO penalty was a "wow" moment for me. It would be helpful to see actual computation example in python or R. A common problem I see in Bayesian lectures is - too much focus on math rather to show how actually/ how much the resulting parameters differs. Specially, when to consider bayesian approach over ols.
Excellent!
Thank you! Cheers!
fantastic! u r my savor!
Great video with a very clear explanation. COuld you also do a video on Bayesian logistic regression
Thank You , I saw this before but i didnt understand. Please , where can i find the complete derivation? And maybe You can do a complete series in this topic
You are the go-to for me when I need to understand topics better. I understand Bayesian parameter estimation thanks to this video!
Any chance you can do something on the difference between Maximum Likelihood and Bayesian parameter estimation? I think anyone that watches both of your videos will be able to pick up the details but seeing it explicitly might go a long way for some.
Nice i never thought that 👍🏼👍🏼
It's a great video. Few people manage to boil things down to that point while retaining some of the key steps involved and you nailed it. Now, a few points -- mostly for the benefit of your viewers.
First, reading Tibshirani's original paper, it seems like the Bayesian interpretation is more of a happy coincidence than the primary motivation and it would make sense because a Bayesian statistician most likely wouldn't bother looking for a maximum aposteriori estimator. They almost always use the mean, median or mode of their marginal posteriors -- or just show you the whole distribution.
Second, there is a frequentist justification for LASSO: see, for example, Zou (2006) "The Adaptative LASSO and Its Oracle Properties." Zou shows that there are some conditions under which LASSO will *correctly* do what it was intended to do -- that is, jointly solve your model selection and estimation problems. However, in general, there's a tension between getting consistent model selection and consistent estimation. Fortunately, Zou also gives a very simple solution that involves only a very mild modification of LASSO: (1) allow each parameters to be penalized slightly differently and (2) cleverly choose penalty weights. If you do that, LASSO will choose the correct model and yields asymptotic normality (and root-n convergence). If you care about what the coefficients mean and not just the forecast, that might be important. That said, adaptative LASSO will occasionally perform *better* at forecasting.
Third, when you mention linear regression, you should include an asterisk somewhere: you don't need Gaussian iid errors for OLS to have some desirable properties. I'm sure you're well aware of all of that, but I'll include a few examples for the benefit of your viewers:
1. Conditionally mean zero errors (E(e|X)=0) gives you absence of bias (E(bhat|X) = b, the true value);
2. (1) and homoskedastic errors (E(e(i)|X) = sigma^2 for all i = 1,...,N) shows OLS is the lowest variance unbiased linear estimator (Gauss-Markov theorem);
3. (1), (2) and e ~ iid P, but P is another elliptic symmetric distribution (say, a Student), then your scale-invariant statistics like t and F retain their exact finite sample distribution (that's in King's 1979 thesis)
4. And there's a whole host of situations where none of the above applies, but you can get invoke asymptotic arguments to justify some properties as approximately holding in finite sample. Since your viewers seem to be interested mostly in forecasting, say all X's are covariance stationnary (*unconditional* means, variances and covariances are all finite and don't depend on time) and the error term follows a weak white noise process (not serially correlated, not contemporaneously correlated with the X's, but homoskedastic). Then both X'e/N and X'X/N satisfies a law of large number, so OLS will be convergent by a continuous mapping argument. Similarly, X'e/sqrt(N) satisfies a central limit theorem, so it will also be asymptotically normal. In other words, you get a property kind of like (1) and another similar to (3), except it applies in a much broader setting.
Fourth, if people are curious, I have two published papers with my coauthors that look into deep comparisons of many forecasting tools in the context of macroeconomic forecasting. Variants of LASSO are included and, for macro data, that concern of dimension reduction seems to be best handled using some kind of factor model (think, PCA or something like that). They can look me up on Scholar to find them.
Thank you very much. Pretty helpful video!
Beautiful!
Thank you! Cheers!
very great, thank you
A-👏ma-👏zing 👏
Excellent thank you
very gord
Excellent
Thank you very much
thank you!
I wonder if this is related to BIC, Bayesian Information Criterion. It's about choosing the simpler model with fewer variables, similar to regularization.
Great video!!