Bayesian Linear Regression : Data Science Concepts

ritvikmath

Просмотров 78 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 21 авг 2024
The crazy link between Bayes Theorem, Linear Regression, LASSO, and Ridge!
LASSO Video : • Lasso Regression
Ridge Video : • Ridge Regression
Intro to Bayesian Stats Video : • What the Heck is Bayes...
My Patreon : www.patreon.co...

Комментарии • 171

@brycedavis5674 3 года назад ⁺³¹
As soon as you explained the results from Bayesian my jaw was wide open for like 3 minutes this is so interesting
@kunalchakraborty3037 3 года назад ⁺⁴³
Read it on a book. Didn't understand jack shit back then. Your videos are awesome. Rich, small, consise. Please make a video on Linear Discriminant Analysis and how its related to bay's theorem. This video will be saved in my data science playlist.
@tobias2688 3 года назад ⁺⁶⁵
This video is a true gem, informative and simple at once. Thank you so much!
@ritvikmath 3 года назад ⁺¹
Glad it was helpful!
@icybrain8943 3 года назад ⁺²²
Regardless of how they were really initially devised, seeing the regularization formulas pop out of the bayesian linear regression model was eye-opening - thanks for sharing this insight
@dennisleet9394 2 года назад
Yes. This really blew my mind. Boom.
@sudipanpaul805 Год назад ⁺²
Love you, bro, I got my joining letter from NASA as a Scientific Officer-1, believe me, your videos always helped me in my research works.
@swapnajoysaha6982 5 месяцев назад ⁺¹
I used to be afraid of Bayesian Linear Regression until I saw this vid. Thank you sooo much
@ritvikmath 5 месяцев назад
Awesome! Youre welcome
@chenqu773 Год назад ⁺⁵
For me, the coolest thing about statistics is that every time I do a refresh on these topics, I get some new ideas or understandings. It's lucky that I came across this video after a year, which could also explain why we need to "normalized" the X (0 centered, with stdev = 1) before we feed them into the MLP model, if we use regularization terms in the layers.
@mohammadmousavi1 Год назад
Unbelievable, you explained linear reg, explained in simple terms Bayesian stat, and showed the connection under 20min .... Perfect
@juliocerono5193 4 месяца назад
At last!! I could find an explanation for the lasso and ridge regression lamdas!!! Thank you!!!
@ritvikmath 4 месяца назад
Happy to help!
@Structuralmechanic 6 месяцев назад
Amazing, you kept it simple and showed how regularization terms in linear regression originated from Bayesian approach!! Thank U!
@ezragarcia6910 Год назад
Mi mente explotó con este video. Gracias
@juliocerono_stone5365 4 месяца назад
at last!!! Now I can see what lamda was doing in tne lasso and ridge regression!! great video!!
@ritvikmath 4 месяца назад
Glad you liked it!
@mohammadkhalkhali9635 3 года назад ⁺²
Man I'm going to copy-paste your video whenever I want to explain regularization to anyone! I knew the concept but I would never explain it the way you did. You nailed it!
@fluidice1656 Год назад ⁺⁴
This is my favorite video out of a large set of fantastic videos that you have made. It just brings everything together in such a brilliant way. I keep getting back to it over and over again. Thank you so much!
@chenjus 3 года назад
This is the best explanation of L1 and L2 I've ever heard
@jlpicard7 8 месяцев назад
I've seen everything in this video many, many times, but no one had done as good a job as this in pulling these ideas together in such an intuitive and understandable way. Well done and thank you!
@user-or7ji5hv8y 3 года назад ⁺³
This is truly cool. I had the same thing with the lambda. It’s good to know that it was not some engineering trick.
@sebastianstrumbel4335 3 года назад ⁺³
Awesome explanation! Especially the details on the prior were so helpful!
@ritvikmath 3 года назад
Glad it was helpful!
@dylanwatts4463 3 года назад ⁺⁴
Amazing video! Really clearly explained! Keep em coming!
@ritvikmath 3 года назад ⁺¹
Glad you liked it!
@JohnJones-rp2wz 3 года назад ⁺²
Awesome explanation!
@rajanalexander4949 Год назад ⁺¹
This is incredible. Clear, well paced and explained. Thank you!
@millch2k8 Год назад
I'd never considered a Bayesian approach to linear regression let alone its relation to lasso/ridge regression. Really enlightening to see!
@ritvikmath Год назад
Thanks!
@narinpratap8790 3 года назад ⁺⁷
Awesome video. I didn't realize that the L1, L2 regularization had a connection with the Bayesian framework. Thanks for shedding some much needed light on the topic. Could you please also explain the role of MCMC Sampling within Bayesian Regression models? I recently implemented a Bayesian Linear Regression model using PyMC3, and there's definitely a lot of theory involved with regards to MCMC NUTS (No U-Turn) Samplers and the associated hyperparameters (Chains, Draws, Tune, etc.). I think it would be a valuable video for many of us.
And of course, keep up the amazing work! :D
@ritvikmath 3 года назад ⁺²
good suggestion!
@joachimrosenberger2109 Год назад
Thanks a lot! Great! I am reading Elements of Statistical Learning and did not understand what they were talking about. Now I got it.
@TejasEkawade 9 месяцев назад
This was an excellent introduction to Bayesian Regression. Thanks a lot!
@davidelicalsi5915 2 года назад ⁺¹
Brilliant and clear explanation, I was struggling to grasp the main idea for a Machine Learning exam but your video was a blessing. Thank you so much for the amazing work!
@curiousobserver2006 Год назад
This blew my mind.Thanks
@dmc-au Год назад
Wow, killer video. This was a topic where it was especially nice to see everything written on the board in one go. Was cool to see how a larger lambda implies a more pronounced prior belief that the parameters lie close to 0.
@ritvikmath Год назад
I also think it’s pretty cool 😎
@chuckleezy Год назад
you are so good at this, this video is amazing
@ritvikmath Год назад
Thank you so much!!
@marcogelsomini7655 Год назад
very cool the link you explained between regularization and prior
@matthewkumar7756 2 года назад
Mind blown on the connection between regularization and priors in linear regression
@umutaltun9049 2 года назад
It just blown my mind too. I can feel you brother. Thank you!
@MoumitaHanra 2 года назад
Best of all videos on Bayesian regression; other videos are so boring and long but this one has quality as well as ease of understanding..Thank you so much!
@antaresd1 10 месяцев назад
Thank you for this amazing video, It clarified many things to me!
@mateoruizalvarez1733 6 месяцев назад
Cristal clear! , thank you so much, the explanation is very structured and detailed
@julissaybarra4031 8 месяцев назад
This was incredible, thank you so much.
@nirmalpatil5370 2 года назад
This is brillian man! Brilliant! Literally solved where the lamda comes from!
@chiawen. 10 месяцев назад
This is sooo clear. Thank you so much!
@rishabhbhatt7373 Год назад
Really good explanation. I really like how you gave context and connected all topics together and it make perfect sense. While maintaining the perfect balance b/w math and intution. Great worl. Thank You !
@brandonjones8928 4 месяца назад
This is an awesome explanation
@qiguosun129 2 года назад
Excellent tutorial! I have applied RIDGE as the loss function in different models.
However, it is the first time I understand the mathematical meaning of lambda. It is really cool!
@caiocfp 3 года назад ⁺³
Thank you for sharing this fantastic content.
@ritvikmath 3 года назад ⁺¹
Glad you enjoy it!
@dirknowitzki9468 2 года назад
Your videos are a Godsend!
@ThePiotrekpecet Год назад ⁺¹
There is an error at the beginning of the video, in frequentist approaches X is treated as non random covariate data and y is the random part so the high variance of OLS should be expressed as small changes to y => big changes to OLS estimator.
The changes to covariate matrix becoming big changes to OLS estimator is more like a non robustness of OLS wrt outlier contamination.
Also the lambda should be 1/2τ^2 not σ^2/τ^2 since:
ln(P(β))=-p * ln(τ * √2*π) - ||β||₂/2τ^2
Overall this was very helpful cheers!
@kaartiki1451 4 месяца назад
Legendary video
@karannchew2534 Год назад
Notes for my future revision.
*Priror β*
10:30
Value of Prior β is normally distributed. The by product of using Normal Distribution is Regularisation. Because the prior values of β won't be too large (or too small) from the mean.
Regularisation keep values of β small.
@j29Productions 6 месяцев назад
You are THE LEGEND
@javiergonzalezarmas8250 Год назад
Incredible explanation!
@sambacon2141 3 года назад
Man! What a great explanation of Bayesian Stats. It's all starting to make sense now. Thank you!!!
@shipan5940 2 года назад ⁺¹
Max ( P(this is the best vid explaining these regressions | RUclips) )
@souravdey1227 Год назад
Can you please please do a series on categorical distribution, multinomial distribution, Dirichlet distribution, Dirichlet process and finally non parametric Bayesian tensor factorisation including clustering of steaming data. I will personally pay you for this. I mean it!!
There are a few videos on these things on youtube, some are good, some are way high-level. But, no one can explain the way you do.
This simple video has such profound importance!!
@benjtheo414 Год назад
This was awesome, thanks a lot for your time :)
@feelmiranda 2 года назад ⁺¹
Your videos are a true gem, and an inspiration even. I hope to be as instructive as you are if I ever become a teacher!
@haeunroh8945 2 года назад
your videos are awesome so much better than my prof
@FRequena 3 года назад ⁺¹
Super informative and clear lesson! Thank you very much!
@AnotherBrickinWall Год назад
Great thanks! .. was feeling the same discomfort about the origin of these...
@SaiVivek15 2 года назад
This video is super informative! It gave me the actual perspective on regularization.
@axadify 2 года назад
such a nice explanation. I mean thats the first time I actually understood it.
@dodg3r123 3 года назад ⁺²
Love this content! More examples like this are appreciated
@ritvikmath 3 года назад ⁺¹
More to come!
@convex9345 3 года назад ⁺¹
mind boggling
@tj9796 3 года назад
Your videos are great. Love the connections you make so that stats is intuitive as opposed to plug and play formulas.
@shantanuneema 3 года назад
you got a subscriber, awesome explanation. I spent hours learning it from other source, but no success. You are just great
@fktx3507 2 года назад
Thanks, man. A really good and concise explanation of the approach (together with the video on Bayesian statistics).
@Aviationlads 9 месяцев назад ⁺¹
Great video, do you have some sources I can use for my university presentation? You helped me a lot 🙏 thank you!
@houyao2147 3 года назад ⁺¹
What a wonderful explanation!!
@ritvikmath 3 года назад
Glad you think so!
@rmiliming Год назад
Tks a lot for this clear explanation !
@yodarocco Год назад
At the end I understand it too finally. A hint for peaple who also struggle on BR like me: do a Bayesian linear regression in Python from any tutorial that you find online, you are going to understand, trust me. I think that one of the initial problems for a person that face a Bayesian approach it’s the fact that you are actually obtaining a posterior *of weights*!. Now looks kinda obvious but at the beginning I was really stuck, I could not understand what was actually the posterior doing.
@amirkhoutir2649 Год назад
thank you so much for the great explanation
@FB0102 Год назад
truly excellent explanation; well done
@AntonioMac3301 2 года назад
This video is amazing!!! so helpful and clear explanation
@petmackay 3 года назад ⁺⁴
Most insightful! L1 as Laplacian toward the end was a bit skimpy, though. Maybe I should watch your LASSO clip. Could you do a video on elastic net? Insight on balancing the L1 and L2 norms would be appreciated.
@danielwiczew 2 года назад ⁺¹
Yea, Elasticnet and comparison to Ridge/Lasso would be very helpful
@manishbhanu2568 Год назад
you are a great teacher!!!🏆🏆🏆
@ritvikmath Год назад
Thank you! 😃
@mahdijavadi2747 3 года назад
Thanks a lottttt! I had so much difficulty understanding this.
@Maciek17PL 2 года назад
You are a great teacher thank you for your videos!!
@Life_on_wheeel 3 года назад
Thanks for video.. Its really helpful.. I was trying to understand how regularization terms are coming.. Now i got. Thanks ..
@samirelamrany5323 Год назад
perfect explanation thank you
@adityagaurav2816 3 года назад ⁺²
My mind is blown.....woow...
@alim5791 2 года назад
Thanks, that was a good one. Keep up the good work!
@SamuelMMuli-sy6wk 2 года назад
wonderful stuff! thank you
@hameddadgour 2 года назад
Holy shit! This is amazing. Mind blown :)
@AYANShred123 3 года назад
Wonderfully explained! Mathematicians should be more subscribed to!
@imrul66 Год назад
Great video. The relation between the prior and LASSO penalty was a "wow" moment for me. It would be helpful to see actual computation example in python or R. A common problem I see in Bayesian lectures is - too much focus on math rather to show how actually/ how much the resulting parameters differs. Specially, when to consider bayesian approach over ols.
@julianneuer8131 3 года назад ⁺²
Excellent!
@ritvikmath 3 года назад
Thank you! Cheers!
@louisc2016 2 года назад
fantastic! u r my savor!
@vipinamar8323 2 года назад
Great video with a very clear explanation. COuld you also do a video on Bayesian logistic regression
@jairjuliocc 3 года назад ⁺³
Thank You , I saw this before but i didnt understand. Please , where can i find the complete derivation? And maybe You can do a complete series in this topic
@undertaker7523 Год назад
You are the go-to for me when I need to understand topics better. I understand Bayesian parameter estimation thanks to this video!
Any chance you can do something on the difference between Maximum Likelihood and Bayesian parameter estimation? I think anyone that watches both of your videos will be able to pick up the details but seeing it explicitly might go a long way for some.
@godse54 3 года назад ⁺¹
Nice i never thought that 👍🏼👍🏼
@stephanesurprenant60 Год назад
It's a great video. Few people manage to boil things down to that point while retaining some of the key steps involved and you nailed it. Now, a few points -- mostly for the benefit of your viewers.
First, reading Tibshirani's original paper, it seems like the Bayesian interpretation is more of a happy coincidence than the primary motivation and it would make sense because a Bayesian statistician most likely wouldn't bother looking for a maximum aposteriori estimator. They almost always use the mean, median or mode of their marginal posteriors -- or just show you the whole distribution.
Second, there is a frequentist justification for LASSO: see, for example, Zou (2006) "The Adaptative LASSO and Its Oracle Properties." Zou shows that there are some conditions under which LASSO will *correctly* do what it was intended to do -- that is, jointly solve your model selection and estimation problems. However, in general, there's a tension between getting consistent model selection and consistent estimation. Fortunately, Zou also gives a very simple solution that involves only a very mild modification of LASSO: (1) allow each parameters to be penalized slightly differently and (2) cleverly choose penalty weights. If you do that, LASSO will choose the correct model and yields asymptotic normality (and root-n convergence). If you care about what the coefficients mean and not just the forecast, that might be important. That said, adaptative LASSO will occasionally perform *better* at forecasting.
Third, when you mention linear regression, you should include an asterisk somewhere: you don't need Gaussian iid errors for OLS to have some desirable properties. I'm sure you're well aware of all of that, but I'll include a few examples for the benefit of your viewers:
1. Conditionally mean zero errors (E(e|X)=0) gives you absence of bias (E(bhat|X) = b, the true value);
2. (1) and homoskedastic errors (E(e(i)|X) = sigma^2 for all i = 1,...,N) shows OLS is the lowest variance unbiased linear estimator (Gauss-Markov theorem);
3. (1), (2) and e ~ iid P, but P is another elliptic symmetric distribution (say, a Student), then your scale-invariant statistics like t and F retain their exact finite sample distribution (that's in King's 1979 thesis)
4. And there's a whole host of situations where none of the above applies, but you can get invoke asymptotic arguments to justify some properties as approximately holding in finite sample. Since your viewers seem to be interested mostly in forecasting, say all X's are covariance stationnary (*unconditional* means, variances and covariances are all finite and don't depend on time) and the error term follows a weak white noise process (not serially correlated, not contemporaneously correlated with the X's, but homoskedastic). Then both X'e/N and X'X/N satisfies a law of large number, so OLS will be convergent by a continuous mapping argument. Similarly, X'e/sqrt(N) satisfies a central limit theorem, so it will also be asymptotically normal. In other words, you get a property kind of like (1) and another similar to (3), except it applies in a much broader setting.
Fourth, if people are curious, I have two published papers with my coauthors that look into deep comparisons of many forecasting tools in the context of macroeconomic forecasting. Variants of LASSO are included and, for macro data, that concern of dimension reduction seems to be best handled using some kind of factor model (think, PCA or something like that). They can look me up on Scholar to find them.
@kennethnavarro3496 2 года назад
Thank you very much. Pretty helpful video!
@yulinliu850 3 года назад ⁺¹
Beautiful!
@ritvikmath 3 года назад
Thank you! Cheers!
@datle1339 Год назад
very great, thank you
@markusheimerl8735 Год назад
A-👏ma-👏zing 👏
@abdelkaderbousabaa7020 2 года назад
Excellent thank you
@abhishek50393 3 года назад ⁺²
very gord
@jaivratsingh9966 2 года назад
Excellent
@chenqu773 3 года назад
Thank you very much
@TK-mv6sq 2 года назад
thank you!
@alish2950 Месяц назад
I wonder if this is related to BIC, Bayesian Information Criterion. It's about choosing the simpler model with fewer variables, similar to regularization.
@alexanderbrandmayr7408 3 года назад
Great video!!

Следующие

Автовоспроизведение

How do you figure out the *ideal* sample size ... as a data scientist?