Ridge Regression

Поделиться
HTML-код
  • Опубликовано: 7 сен 2024
  • My Patreon : www.patreon.co...

Комментарии • 177

  • @xavierfournat8264
    @xavierfournat8264 3 года назад +51

    This is showing that the quality and value of a video is not depending on how fancy the animations are, but how expert and pedagogue the speaker is. Really brilliant! I assume you spent a lot of time designing that course, so thank you for this!

    • @ritvikmath
      @ritvikmath  3 года назад +1

      Wow, thanks!

    • @backstroke0810
      @backstroke0810 2 года назад

      Totally agree. I learn a lot from his short videos. Precise, concise, enough math, enough ludic examples. True professor mind.

  • @tzu-chunchen5139
    @tzu-chunchen5139 9 месяцев назад +4

    This is the best explanation of Ridge regression that I have ever heard! Fantastic! Hats off!

  • @rez_daddy
    @rez_daddy 4 года назад +45

    "Now that we understand the REASON we're doing this, let's get into the math."
    The world would be a better place if more abstract math concepts were approached this way, thank you.

  • @GreenEyesVids
    @GreenEyesVids 3 года назад +3

    Watched these 5 years ago to understand the concept and I passed an exam. Coming back to it now to refresh my memory, still very well explained!

  • @nadekang8198
    @nadekang8198 5 лет назад +2

    This is awesome! Lots of machine learning books or online courses don't bother explaining the reason behind Ridge regression, you helped me a lot by pulling out the algebraic and linear algebra proofs to show the reason WHY IT IS THIS! Thanks!

  • @siddharthkshirsagar2545
    @siddharthkshirsagar2545 4 года назад +4

    I was searching for ridge regression on the whole internet and stumbled upon this is a video which is by far the best explanation you can find anywhere thanks.

  • @zgbjnnw9306
    @zgbjnnw9306 2 года назад +2

    It's so inspiring to see how you get rid of the c^2! I learned Ridge but didn't know why! Thank you for making this video!

  • @SarahPourmolamohammadi
    @SarahPourmolamohammadi Год назад

    You are the best of all.... you explained all the things,,, so nobody is gonna have problems understanding them.

  • @taareshtaneja7523
    @taareshtaneja7523 5 лет назад +8

    This is, by far, the best explanation of Ridge Regression that I could find on RUclips. Thanks a lot!

  • @RobertWF42
    @RobertWF42 7 месяцев назад

    Excellent video! One more thing to add - if you're primarily interested in causal inference, like estimating the effect of daily exercise on blood pressure while controlling for other variables, then you want an unbiased estimate of the exercise coefficient and standard OLS is appropriate. If you're more interested in minimizing error on blood pressure predictions and aren't concerned with coefficients, then ridge regression is better.
    Also left out is how we choose the optimal value of lambda by using cross-validation on a selection of lambda values (don't think there's a closed form expression for solving for lambda, correct me if I'm wrong).

  • @surajshivakumar5124
    @surajshivakumar5124 3 года назад

    This is literally the best video on ridge regression

  • @q0x
    @q0x 8 лет назад +14

    I think its explained very fast, but still very clear, for my level of understanding its just perfect !

  • @yxs8495
    @yxs8495 7 лет назад +38

    This really is gold, amazing!

  • @nikunjgattani999
    @nikunjgattani999 2 года назад

    Thanks a lot.. I watched many videos and read blogs before this but none of them clarified at this depth

  • @BhuvaneshSrivastava
    @BhuvaneshSrivastava 4 года назад +3

    Your data science videos are the best I have seen on RUclips till now. :)
    Waiting to see more

  • @aarshsachdeva5785
    @aarshsachdeva5785 7 лет назад

    You should add in that all the variables (dependent and independent) need to be normalized prior to doing a ridge regression. This is because betas can vary in regular OLS depending on the scale of the predictors and a ridge regression would penalize those predictors that must take on a large beta due to the scale of the predictor itself. Once you normalize the variables, your A^t*A matrix being a correlation matrix of the predictors. The regression is called "ridge" regression because you add (lambda*I + A^t*A ) which is adding the lambda value to the diagonal of the correlation matrix, which is like a ridge. Great video overall though to start understanding this regression.

  • @xwcao1991
    @xwcao1991 3 года назад +1

    Thank you. I make the comment because I know I will never need to watch it again! Clearly explained..

  • @alecvan7143
    @alecvan7143 4 года назад

    Amazing video, you really explained why we do things which is what really helps me!

  • @abhichels1
    @abhichels1 7 лет назад +12

    This is gold. Thank you so much!

  • @charlesity
    @charlesity 4 года назад +2

    Stunning! Absolute gold!

  • @mohamedgaal5340
    @mohamedgaal5340 Год назад

    I was looking for the math behind the algorithm. Thank you for explaining it.

  • @OmerBoehm
    @OmerBoehm 2 года назад

    Brilliant simplification of this topic. No need for fancy presentation to explain the essence of an idea!!

  • @teegnas
    @teegnas 4 года назад +1

    These explanations are by far the best ones I have seen so far on youtube ... would really love to watch more videos on the intuitions behind more complicated regression models

  • @TahaMVP
    @TahaMVP 6 лет назад

    best explanation of any topic i've ever watched , respect to you sir

  • @Viewfrommassada
    @Viewfrommassada 4 года назад +1

    I'm impressed by your explanation. Great job

    • @ritvikmath
      @ritvikmath  4 года назад

      Thanks! That means a lot

  • @Lisa-bp3ec
    @Lisa-bp3ec 7 лет назад

    Thank you soooo much!!! You explain everything so clear!! and there is no way I couldn't understand!

  • @mikeperez4222
    @mikeperez4222 3 года назад

    Anyone else get anxiety when he wrote with the marker?? Just me?
    Felt like he was going to run out of space 😂
    Thank you so much thoo, very helpful :)

  • @akino.3192
    @akino.3192 6 лет назад

    You, Ritvik, are simply amazing. Thank you!

  • @bettychiu7375
    @bettychiu7375 4 года назад

    This really helps me! Definitely the best ridge and lasso regression explanation videos on RUclips. Thanks for sharing! :D

  • @cu7695
    @cu7695 6 лет назад +2

    I subscribed just after watching this. Great foundation for ML basics

  • @babakparvizi2425
    @babakparvizi2425 6 лет назад

    Fantastic! It's like getting the Cliff's Notes for Machine Learning. These videos are a great supplement/refresher for concepts I need to knock the rust off of. I think he takes about 4 shots of espresso before each recording though :)

  • @theoharischaritidis4173
    @theoharischaritidis4173 6 лет назад

    This really helped a lot. A big thanks to you Ritvik!

  • @aDifferentHandle
    @aDifferentHandle 6 лет назад

    The best ridge regression lecture ever.

  • @soudipsanyal
    @soudipsanyal 6 лет назад

    Superb. Thanks for such a concise video. It saved a lot of time for me. Also, subject was discussed in a fluent manner and it was clearly understandable.

  • @nickb6811
    @nickb6811 7 лет назад +1

    So so so very helpful! Thanks so much for this genuinely insightful explanation.

  • @Krishna-me8ly
    @Krishna-me8ly 9 лет назад

    Very good explanation in an easy way!

  • @e555t66
    @e555t66 Год назад

    I don't have money to pay him so leaving a comment instead for the algo. He is the best.

  • @mortezaabdipour5584
    @mortezaabdipour5584 5 лет назад

    It's just awesome. Thanks for this amazing explanation. Settled in mind forever.

  • @murraystaff568
    @murraystaff568 8 лет назад +2

    Brilliant! Just found your channel and can't wait to watch them all!!!

  • @nicolasmanelli7393
    @nicolasmanelli7393 Год назад

    I think it's the best video ever made

  • @yanlinwang5703
    @yanlinwang5703 2 года назад

    The explanation is so clear!! Thank you so much!!

  • @RAJIBLOCHANDAS
    @RAJIBLOCHANDAS 2 года назад

    Excellent approach to discuss Lasso and Ridge regression. It could have been better if you have discussed how Lasso yields sparse solutions! Anyway, nice discussion.

  • @wi8shad0w
    @wi8shad0w 4 года назад

    THIS IS ONE HELL OF A VIDEO !!!!

  • @jhhh0619
    @jhhh0619 9 лет назад +1

    Your explanation is extremely good!

  • @ethanxia1288
    @ethanxia1288 8 лет назад +6

    Excellent explanation! Could you please do a similar video for Elastic-net?

  • @canernm
    @canernm 3 года назад +2

    Hi and thanks fr the video. Can you explain briefly why when the m_i and t_i variables are highly correlated , then the estimators β0 and β1 are going to have very big variance? Thanks a lot in advance!

    • @lanag873
      @lanag873 2 года назад

      Hi same question here😶‍🌫

  • @sanketchavan8
    @sanketchavan8 7 лет назад

    best explanation on ridge reg. so far

  • @shiva6016
    @shiva6016 7 лет назад

    simple and effective video, thank you!

  • @prabhuthomas8770
    @prabhuthomas8770 5 лет назад

    SUPER !!! You have to become a professor and replace all those other ones !!

  • @vishnu2avv
    @vishnu2avv 7 лет назад

    Awesome, Thanks a Million for great video! Searching you have done video on LASSO regression :-)

  • @youyangcao3837
    @youyangcao3837 8 лет назад +1

    great video, the explanation is really clear!

  • @HeduAI
    @HeduAI 7 лет назад

    I would trade diamonds for this explanation (well, allegorically! :) ) Thank you!!

  • @prateekcaire4193
    @prateekcaire4193 7 месяцев назад

    It is unintuitive that we are constraining weights(betas) within value c^2, yet the regularization expression does not include the c but rather sum of squared weights. Certainly I am missing something here. Alternatively, why adding a sum of squared betas(or weights) to the cost function help optimize beta that stays within constraint so that betas don't become large and vary across datasets?

  • @JC-dl1qr
    @JC-dl1qr 7 лет назад

    great video, brief and clear.

  • @Sytch
    @Sytch 6 лет назад

    Finally, someone who talks quickly.

  • @kamesh7818
    @kamesh7818 6 лет назад

    Excellent explanation, thanks!

  • @sasanosia6558
    @sasanosia6558 5 лет назад

    Amazingly helpful. Thank you.

  • @zw7453
    @zw7453 2 года назад

    best explanation ever!

  • @meysamsojoudi3947
    @meysamsojoudi3947 3 года назад

    It is a brilliant video. Great

  • @Thaifunn1
    @Thaifunn1 8 лет назад +1

    excellent video! Keep up the great work!

  • @TURBOKNUL666
    @TURBOKNUL666 8 лет назад +1

    great video! thank you very much.

  • @zhongshanhu7376
    @zhongshanhu7376 8 лет назад +1

    very good explanation in an easy way!!

  • @nickwagner5173
    @nickwagner5173 6 лет назад +3

    We start out by adding a constraint that beta 1 squared + beta 2 squared must be less than c squared, where c is some number we choose. But then after choosing lamda, we minimize F and c ends up having no effect at all on our choice of the betas. I may be wrong but it doesn't seem like c has any effect on our choice of lambda either. I find it strange that we start out with the criteria that beta 1 squared + beta 2 squared must be less than c squared, but the choice of c is irrelevant. If someone can help me un-boggle my mind that would be great.

    • @RobertWF42
      @RobertWF42 7 месяцев назад

      Good question - I think it has to do with using the method of Lagrange multipliers to solve the constrained OLS optimization problem. The lambda gets multiplied by the expression in the parentheses at 11:17, which includes the c squared term. So whatever c squared value you choose, it's going to be changed anyways when you multiply by the lambda.

  • @kartikkamboj295
    @kartikkamboj295 4 года назад

    Dude ! Hats off 🙏🏻

  • @adrianfischbach9496
    @adrianfischbach9496 Год назад

    Huge thanks!

  • @LossAndWaste
    @LossAndWaste 6 лет назад

    you are the man, keep doing what you're doing

  • @adityakothari193
    @adityakothari193 7 лет назад

    Excellent explanation .

  • @garbour456
    @garbour456 2 года назад

    great video - thanks

  • @Hazit90
    @Hazit90 7 лет назад +2

    excellent video, thanks.

  • @myazdani2997
    @myazdani2997 7 лет назад

    I love this video, really informative! Thanks a lot

  • @intom1639
    @intom1639 6 лет назад

    Brilliant! Could you make more videos about Cross validation, RIC, BIC, and model selection.

  • @sachinrathi7814
    @sachinrathi7814 3 года назад

    Can anyone explain the statement "The efficient property of any estimator says that the estimator is the minimum variance unbiased estimator", so what is minimum variance denotes here.

  • @tamoghnamaitra9901
    @tamoghnamaitra9901 7 лет назад

    Beautiful explanation

  • @zehuilin8783
    @zehuilin8783 4 года назад

    Hey Ritvik, I have a question about this one, I don't really know why we are choosing the point that is far from the origin point. So which direction does the gradient descent and why? Please help me out here, thank you so much!

  • @abhijeetsingh5049
    @abhijeetsingh5049 8 лет назад +1

    Stunning!! Need more access to your coursework

  • @jakobforslin6301
    @jakobforslin6301 2 года назад

    You are awesome!

  • @xiaoguangzhao34
    @xiaoguangzhao34 6 лет назад

    awesome video, thank you very much!

  • @sagarsitap3540
    @sagarsitap3540 4 года назад

    Thanks! why lamba cannot be negative? What if to improve variance it is need to increase the slope and not decrease?

  • @jamiewilliams9271
    @jamiewilliams9271 6 лет назад

    Thank you so much!!!!

  • @hunarahmad
    @hunarahmad 7 лет назад

    thanks for the nice explanation

  • @kevinwong8020
    @kevinwong8020 4 года назад

    I was taught that the name Ridge Regression comes from the lambda I matrix. It looks like a ridged staircase shape.

  • @zhilingpan2486
    @zhilingpan2486 7 лет назад

    Very clear. Thank you!

  • @danahn5819
    @danahn5819 6 лет назад

    Thank you!

  • @kxdy8yg8
    @kxdy8yg8 6 лет назад

    This is gold indeed!

  • @JuPeggy
    @JuPeggy 6 лет назад

    excellent video! thank you!

  • @sergioperezmelo3090
    @sergioperezmelo3090 5 лет назад

    Super clear

  • @abeaumont10
    @abeaumont10 6 лет назад

    Great videos thanks for making it

  • @JoonHeeKim
    @JoonHeeKim 6 лет назад +1

    Great video. A (very minor) question: isn't it c instead of c^2 when you draw the radius vector of the circle for \beta restriction?

    • @Viewfrommassada
      @Viewfrommassada 4 года назад

      think of it as an equation of a circle with center (0,0)

  • @faeritaaf
    @faeritaaf 7 лет назад

    Thank you! Your explaining is really good, Sir. Do you have time to make a video explaining the adaptive lasso too?

  • @brendachirata2283
    @brendachirata2283 5 лет назад

    hey, great video and excellent job

  • @SUBHRASANKHADEY
    @SUBHRASANKHADEY 6 лет назад +2

    Shouldn't the radius of the Circle be c instead of c^2 (at time around 7:00)?

  • @Theateist
    @Theateist 6 лет назад

    Is the reason to not choose big LAMBDA because we maight get underfitting? If we choose big LAMBDA we get small W and then the output function (hypothesis) won’t reflect our data and we might see underfitting.

  • @adarshnamdev5834
    @adarshnamdev5834 3 года назад

    @ritvik when you said that the estimated coefficients has small variance does that implies the tendency of obtaining different estimate values of those coefficients ? I tend to confuse this term 'variance ' with the statistic Variance (spread of the data!).

    • @benxneo
      @benxneo 3 года назад

      Variance is the change in prediction accuracy of ML model between training data and test data.
      Simply what it means is that if a ML model is predicting with an accuracy of "x" on training data and its prediction accuracy on test data is "y"
      Variance = x - y
      A smaller variance would thus mean the model is fitting less noise on the training data, reducing overfitting.
      this definition was taken from: datascience.stackexchange.com/questions/37345/what-is-the-meaning-of-term-variance-in-machine-learning-model
      Hope this helps.

    • @adarshnamdev5834
      @adarshnamdev5834 3 года назад

      @@benxneo thanks mate!

  • @vinceb8041
    @vinceb8041 3 года назад

    Can anyone help me understanding the effects of multicollinearity? I understand that the estimators will be highly variable, but why would they be very large?

    • @benxneo
      @benxneo 3 года назад

      thats actually an interesting question, have you found an explanation to this? I seem to only be able to say that regression depends on variables to be independent on each other, and multicolinearity makes it sensitive to small changes. But why is it that coefficients are larger I cant seem to understand.

  • @hyunpang8267
    @hyunpang8267 Год назад

    if only all the research papers explain things in this way.

  • @bnouadam
    @bnouadam 4 года назад +1

    Impressive

  • @ibrahimkarabayir8963
    @ibrahimkarabayir8963 9 лет назад

    nice video , I have a question: lambda depends on c, isnt it?

  • @yassersaeid3424
    @yassersaeid3424 7 лет назад +1

    a big thanks

  • @mariodamianorusso9045
    @mariodamianorusso9045 5 лет назад

    kudos

  • @uwerich9885
    @uwerich9885 5 лет назад

    Hi,
    OLS is unbiased. So also for data with multicollinearity, OLS generates an unbiased model (training data) but with high variance (test data). Ridge regression adds bias to the model (training data) to reduce the variance and make it more accurate (test data). I many real-world applications not only the model accuracy but also the "true" Betas (test set) are relevant to make accurate statements regarding the prediction and the importance of the variables. I tried the ridge regression in a simulation. However, the Betas were far away from the "true" Betas. Are the shrunken Betas are more like the "true" Betas (test set) or are there only less variance (e.g. a lower RMSE) than the Betas from the training data.