Regularization Part 3: Elastic Net Regression

Поделиться
HTML-код
  • Опубликовано: 9 фев 2025
  • Elastic-Net Regression is combines Lasso Regression with Ridge Regression to give you the best of both worlds. It works well when there are lots of useless variables that need to be removed from the equation and it works well when there are lots of useful variables that need to be retained. And it does better than either one when it comes to handling correlated variables. Dang!!!!
    NOTE: This StatQuest follows up on the the StatQuest on Ridge Regression...
    • Regularization Part 1:...
    ...and the StatQuest on Lasso Regression....
    • Regularization Part 2:...
    For a complete index of all the StatQuest videos, check out:
    statquest.org/...
    Also, here are some references that helped me put this video together:
    The original manuscript on Elastic-Net Regression: web.stanford.e...
    A webpage at North Carolina State University that shows different situations for Ridge, Lasso and Elastic-Net Regression: www4.stat.ncsu...
    If you'd like to support StatQuest, please consider...
    Buying The StatQuest Illustrated Guide to Machine Learning!!!
    PDF - statquest.gumr...
    Paperback - www.amazon.com...
    Kindle eBook - www.amazon.com...
    Patreon: / statquest
    ...or...
    RUclips Membership: / @statquest
    ...a cool StatQuest t-shirt or sweatshirt:
    shop.spreadshi...
    ...buying one or two of my songs (or go large and get a whole album!)
    joshuastarmer....
    ...or just donating to StatQuest!
    www.paypal.me/...
    Lastly, if you want to keep up with me as I research and create new StatQuests, follow me on twitter:
    / joshuastarmer
    #statquest #regularization

Комментарии • 337

  • @statquest
    @statquest  5 лет назад +38

    NOTE: In this video, for some reason I used the word "variable" instead of "parameter" in the equations for elastic-net. We are trying to shrink the parameters, not the variables.
    Support StatQuest by buying my books The StatQuest Illustrated Guide to Machine Learning, The StatQuest Illustrated Guide to Neural Networks and AI, or a Study Guide or Merch!!! statquest.org/statquest-store/

    • @badoiuecristian
      @badoiuecristian 5 лет назад +13

      Clarification: Params {slope, intercept}. Variable {weight, hight} - for anyone that got confused

    • @statquest
      @statquest  5 лет назад +2

      @@badoiuecristian Exactly.

    • @s25412
      @s25412 3 года назад +2

      3:56 "...when there are correlations between parameters..." this should be between variables instead. Similarly at 4:48 "...job dealing with correlated parameters..."

    • @statquest
      @statquest  3 года назад +1

      @@s25412 Oops! You are correct.

    • @falaksingla6242
      @falaksingla6242 3 года назад

      Hi Josh,
      Love your content. Has helped me to learn a lot & grow. You are doing an awesome work. Please continue to do so.
      Wanted to support you but unfortunately your Paypal link seems to be dysfunctional. Please update it.

  • @jbridger9
    @jbridger9 4 года назад +16

    When I first watched one of your videos I was struck by how entertaining it was. But the more videos I watch, the more I notice how well I'm understanding the explanations in your videos.Thanks a lot for your amazing efforts!

  • @Paul-tl1du
    @Paul-tl1du 6 лет назад +16

    You have an uncanny way of explaining this material well. Thank you so much for creating these videos!

  • @zhizhongzhu9524
    @zhizhongzhu9524 5 лет назад +91

    Lasso: Yee-ha! Ridge: Brrr... Elastic Net: ...

    • @statquest
      @statquest  5 лет назад +17

      Great Question!!! :) "snap"?

    • @swaggcam412
      @swaggcam412 4 года назад

      Lol my favorite part of the video by far

  • @becayebalde3820
    @becayebalde3820 2 года назад +15

    I have now finished the 3 parts, ouf! Thank you a thousand times for the awesome content you provide 👏🏾

    • @statquest
      @statquest  2 года назад +4

      BAM! There's one more part here: ruclips.net/video/Xm2C_gTAl8c/видео.html

  • @zdeacon
    @zdeacon Год назад +1

    Thank you so much Josh!! I was struggling with Lasso, Ridge, and ElasticNet Regression for my graduate class. Your 3 videos cleared up all the confusion. Thank you SO much for all that you do to make these topics accessible for all!

  • @prodigyprogrammer3187
    @prodigyprogrammer3187 2 года назад +1

    Man that intro is the best it forces me to listen to the rest of the lecture. Thanks :)

  • @rachithrr
    @rachithrr 5 лет назад +7

    I can't say which is better, your albums or this amazing series.

    • @statquest
      @statquest  5 лет назад +1

      Thank you very much! :)

  • @resonatingvoice1
    @resonatingvoice1 6 лет назад +1

    I mean how easy can it get ... these views are the perfect example of how complex algorithms can be explained in simple and then later people can dive into the actual math behind it to get the full picture... Awesome ... thanks

    • @statquest
      @statquest  6 лет назад

      That's exactly the idea. I'm so glad you like the videos. :)

    • @resonatingvoice1
      @resonatingvoice1 6 лет назад +1

      @@statquest you're welcome and thank you for creating awesome videos..... i really enjoyed the pca ones... as first time i understood the svd in a simple way :-)

  • @amitgoel2810
    @amitgoel2810 5 лет назад +2

    Thank you very much Josh for explaining regularization so clearly ! The visuals that you use in your videos makes the learning easy.

  • @jinlee1874
    @jinlee1874 3 года назад +1

    The best explanations I could find online for Stats!!! Thank you Josh!

  • @tymothylim6550
    @tymothylim6550 4 года назад +2

    Thank you, Josh, for this excellent video on Elastic-Net Regression! It was a great finish to this 3-part series on Regularization!

  • @pratiknabriya5506
    @pratiknabriya5506 4 года назад +1

    Hey Josh, thanks for the crisp explanation. Today after long procrastination, I managed to watch all three of the videos - L1, L2 and Elastic Net.

  • @TheParkitny
    @TheParkitny 2 года назад +1

    This clears things up a lot. 4 years on still the best explanation online. Yeeha

  • @yannisskoupras4138
    @yannisskoupras4138 5 лет назад +3

    Thank you for the amazing videos! Your ability to explain the concepts simply is incomparable..!

  • @dannychan0510
    @dannychan0510 4 года назад +8

    At first, I came here for the stats revision. Lately, I've been finding myself visiting to remind myself of the tunes instead!

  • @redpantherofmadrid
    @redpantherofmadrid Год назад +1

    you explain really well, better than the course I am following! thanks 🙏

  • @SOLONASSYMEOU
    @SOLONASSYMEOU Год назад +1

    I sometimes come just for the intros! Amazing work!!!

  • @cosimocuriale8871
    @cosimocuriale8871 4 года назад +1

    Hi Josh, your lessons are so nice that I decided to support you. I bought your digita album "Made from TV". You rock!

    • @statquest
      @statquest  4 года назад +1

      WOW!!!! Awesome! Thank you!

  • @sachof
    @sachof 6 лет назад +4

    You are awesome... I gonna buy a t-shirt with "I love StatQuest" written on it !

    • @statquest
      @statquest  6 лет назад

      Hooray!!! One day I'll have those shirts for sale.... One day.

  • @naveedmazhar5186
    @naveedmazhar5186 4 года назад +7

    Sir! I really liked your style thank you for such entertainment and informative lecture🙏

  • @Ramakrishna-je6xn
    @Ramakrishna-je6xn 2 месяца назад +1

    Thanks for making this so simple, you are gifted trainer.. thanks a lot

  • @viralthegenius
    @viralthegenius 5 лет назад +2

    Only reason I subscribed you is because of your singing before every videos! No doubt you explain very well

  • @chitracprabhu2922
    @chitracprabhu2922 4 года назад +1

    You explain the concepts so well ......Thanks a lot for these videos

  • @javiermenac
    @javiermenac 6 лет назад +1

    I'm from Chile... i 've loved your videos of regularization, specially each intro!!!

    • @statquest
      @statquest  6 лет назад

      Hooray!!!! Thank you so much! :)

    • @javiermenac
      @javiermenac 6 лет назад +1

      @@statquest do you have any videos about SVM , Neural Net models?

    • @statquest
      @statquest  6 лет назад

      @@javiermenac Not yet, but I'm working on both. SVM will probably come first, followed by Neural Net.

  • @phamphuminhphuc5859
    @phamphuminhphuc5859 2 года назад +1

    Thanks Josh its 2022 and your videos saved me well!

  • @_subrata
    @_subrata 2 года назад +1

    OMG! I'm so happy I found your channel.

  • @hanadiam8910
    @hanadiam8910 2 года назад +1

    Another great video!!! Keep it up!! Always big fan

  • @VishalBalaji
    @VishalBalaji 4 года назад +1

    Wow, Ur channel is a boon to beginners like me in the world of Data Science.....Thanks a lot

  • @divyyy4358
    @divyyy4358 4 года назад +1

    I love your channel man, its the major reason I'll be majoring in Data science in college!

    • @statquest
      @statquest  4 года назад +1

      Thank you very much and good luck with your degree! :)

  • @malinkata1984
    @malinkata1984 2 года назад +1

    Ihaaa!😀 All tutorials are brilliant! A huge thank you.

  • @cookiemonster6900
    @cookiemonster6900 3 года назад +2

    Your explanations are on point and easy to understand. (Can be used as quick reference) 🙆🏻👍🏻💯

  • @rezahay
    @rezahay 5 лет назад

    Wonderful, brilliant, awesome. What a relief! Finally, I understand some important concepts of the statistics. Thank you very much Josh.

  • @yourboyfazal
    @yourboyfazal 2 года назад +1

    man i love your way of teaching

  • @sophiepan6182
    @sophiepan6182 3 года назад +1

    You are the best! I indeed learned a lot from you! Thanks!

  • @Neuroszima
    @Neuroszima 2 месяца назад

    Sometimes you just stay in owe, when you see a conference topic on a subject that just reduces to simply:
    "Yeah and this is so *EXCITING!!* Because, you see, we just added 2 terms together and we had super special awesome new thing to play with!"

  • @xiangnan-oz9hs
    @xiangnan-oz9hs 6 лет назад +1

    Thanks very much, StatQuest. each lecture is fantastic and interesting. Looking forward to your clearly explanation of Bayesian statistics, MCMC, MH, Gibbs sampling, etc.

    • @statquest
      @statquest  6 лет назад +1

      Glad you like the video! All of those topics are on the To-Do list, and hopefully I can get to them sooner than later. :)

  • @omnia_credit
    @omnia_credit 3 года назад +2

    you save my life

  • @Abrar_Ahmed05
    @Abrar_Ahmed05 5 лет назад +5

    whoa

  • @angelbueno7810
    @angelbueno7810 6 лет назад +1

    Fantastic video Josh !! Thanks a lot, keep up the good work ! :)

  • @wenwensi9597
    @wenwensi9597 Год назад +1

    I like it a lot when he said the super fancy thing is actually xxx.

  • @rishipatel7998
    @rishipatel7998 2 года назад +1

    I know answering these many comments is very boring
    I guess you using NLP to filter the comments and answer the important ones and auto reply the others
    Above video was wonderful! Thank You again Sir 😁

  • @mariapaulasancineto8871
    @mariapaulasancineto8871 5 лет назад +2

    Love your clearly explained videos. And your songs are so sweet like Phoebe Buffay’s 😉

    • @statquest
      @statquest  5 лет назад +2

      Ha! Thanks. I sing the smelly song every day as a warm up. ;)

  • @moeinhasani8718
    @moeinhasani8718 6 лет назад +1

    really useful series, keep doing the great tutorials!

    • @statquest
      @statquest  6 лет назад

      Will do! I'm glad you like the videos. :)

  • @glaswasser
    @glaswasser 3 года назад +1

    Thanks, I ended up googling that airspeed of a swallow thing and watching a Monty Python scene instead of learning how to do elastic net lol

  • @georgfaustmann852
    @georgfaustmann852 5 лет назад

    Thanks for the entertaining and informative channel. Keep up the good work!

  • @niklasfelix7126
    @niklasfelix7126 3 года назад +2

    Thank you so much! Once again, your videos are of invaluable help to my PhD dissertation! And the "Brrr" made me laugh out loud :D

  • @nguyenthituyetnhung1780
    @nguyenthituyetnhung1780 Год назад +1

    excellent explanation for complexity of model 👍

  • @kkviks
    @kkviks 4 года назад +1

    I love your channel!

  • @rsandeepnttf
    @rsandeepnttf 5 лет назад

    Crisp and clear ! thanks for the video

  • @shruti5472
    @shruti5472 3 года назад

    In the intro song, I thought you would say "simpler.. than you might expect **it to be**" cause that rhymes. Anyways, love your videos. Thanks for doing such great work.

  • @yousufali_28
    @yousufali_28 6 лет назад +1

    thanks for making elastic net this easy

  • @rahulsadanandan5076
    @rahulsadanandan5076 4 года назад +1

    Statquest staaaaat quest whaaat are we learning today....

    • @statquest
      @statquest  4 года назад

      Looks like Elastic Net! :)

  • @markaitkin
    @markaitkin 6 лет назад +1

    Legendary as always! 😁🤘🤙👌👍

  • @jiayoongchong2606
    @jiayoongchong2606 4 года назад +1

    POISSON REGRESSION, PARTIAL LEAST SQUARES AND PRINCIPAL COMPONENT REGRESSION PLEASEEE DR JOSHHH # WE LOVE YOU

  • @yulinliu850
    @yulinliu850 6 лет назад +1

    Great! Thank-you Josh!

  • @hung717
    @hung717 4 года назад +1

    Thank you, your video is very useful

    • @statquest
      @statquest  4 года назад

      Glad it was helpful!

  • @haneensuradi
    @haneensuradi 6 лет назад +5

    Thanks for this amazing series! It is making my life way easier while I am taking Machine Learning course in university.
    Can you please 'clearly explain' what do you mean by correlated variables? And what Elastic Net regression does to them?

    • @statquest
      @statquest  6 лет назад +7

      An example of correlated variables is if I wanted to use "weight" and "height" measurements to predict something. Since small people tend to weight less than tall people, weight and height are correlated. Elastic-Net Regression would shrink the parameters associated with those variables together.

  • @babasupremacy5225
    @babasupremacy5225 4 года назад +1

    It doesn't gets more easier than this

  • @rezab314
    @rezab314 4 года назад +1

    The intro song with 2.0 speed is nice alternative to the original version :D

  • @sowmy525
    @sowmy525 6 лет назад +14

    Thank you StatQuest...awesome series :)
    Can you do videos on time series methods as well ...clearly explained :P

    • @statquest
      @statquest  6 лет назад +4

      You're welcome! I'll add time-series to the to-do list. The more people that ask for it, the more I'll move it up the list. :)

    • @purneshdasari5667
      @purneshdasari5667 6 лет назад +1

      @@statquest Thanks for the video lectures josh sir , I am also waiting for timeseries forecasting classes

    • @saikumartadi8494
      @saikumartadi8494 6 лет назад +1

      @@statquest it would be great if u do it

    • @statquest
      @statquest  6 лет назад +2

      @@saikumartadi8494 Your vote has been noted and I bumped time series up on the list. :)

    • @saikumartadi8494
      @saikumartadi8494 6 лет назад +2

      @@statquest awaiting for the video :)

  • @jonibeki53
    @jonibeki53 6 лет назад +1

    So amazing, thank you!

  • @heteromodal
    @heteromodal 3 года назад

    Hey Josh! Thank you for this, watched your 4 regularization videos today and am happy! And a suggestion for a related, followup video - collinearity & multi-collinearity :)

    • @statquest
      @statquest  3 года назад

      Thanks! I'll keep those topics in mind.

    • @heteromodal
      @heteromodal 3 года назад +1

      @@statquest Thank YOU for all you do!

  • @arunasingh8617
    @arunasingh8617 Год назад +1

    nice explanation

  • @ayushbhardwaj4168
    @ayushbhardwaj4168 6 лет назад +1

    Awesome man .. Thanks a lot

  • @조동민-f6o
    @조동민-f6o 6 лет назад +2

    I love this channel BAM~~~

  • @scutily
    @scutily 6 лет назад +1

    Really nice videos! It's very well explained and helpful!
    Can you also do videos on adaptive elastic net and multi-step elastic net ? Thank you so much!

  • @tanphan3970
    @tanphan3970 3 года назад

    Hello @Josh Starmer,
    Thank you for your videos, so easy to understand.
    But we are talking about Elastic_net (also Ridge/Lasso) technical in Regression model.
    So how about others model? They can apply to solve overfitting situation as Regression!

    • @statquest
      @statquest  3 года назад

      Yes. Ridge, Lasso and Elastic net style penalties can be added to all kinds of models.

    • @tanphan3970
      @tanphan3970 3 года назад

      @@statquest all kinds of models with same formulars as Regression?

    • @statquest
      @statquest  3 года назад

      @@tanphan3970 No, pretty much any formula will work. For example, regularization can be applied to Neural Networks, which are very different.

  • @sslxxbc4092
    @sslxxbc4092 3 года назад +1

    Great video, thank you!! I'm just a bit unsure about the scaling of features. Say, if we scale a feature, what would change for lasso and ridge?

    • @statquest
      @statquest  3 года назад +1

      Scaling the data will change the original least squares parameter estimates, but it will not change the process that Elastic-Net uses to reduce the influence of features that are less useful.

  • @datascienceandmachinelearn3807
    @datascienceandmachinelearn3807 5 лет назад


    Thank you so much. I have learned alot and look forward to new videos. Good luck

  • @박세영-i8z
    @박세영-i8z 5 лет назад +1

    I have a question...
    I don't know what is the advantage of ridge regression.
    Ridge doesn't eliminate the trivial variables but lasso does.
    Then why we have to combinate them?
    I thought that ridge has a computational advantage because it doesn't use 'abstract'.
    But when we put them in together, so we use elastic-net algorithm, that advantage will disappear.
    Why we have to use elastic-net, not lasso?
    What is the advantage of keeping ridge's penalty term?

    • @statquest
      @statquest  5 лет назад +1

      This is a good question. It turns out that there are some technical issues with Lasso. To quote from the documentation:
      "It is known that the ridge penalty shrinks the coefficients of correlated predictors towards each other while the lasso tends to pick one of them and discard the others. The elastic-net penalty mixes these two; if predictors are correlated in groups, an α=0.5 tends to select the groups in or out together."
      You can read more here: web.stanford.edu/~hastie/glmnet/glmnet_alpha.html

    • @박세영-i8z
      @박세영-i8z 5 лет назад +1

      @@statquest wow, thank you very much!

  • @antoniovivaldi2270
    @antoniovivaldi2270 5 лет назад

    Your presentations are good - short, clear and well explained. However, the "lambda" parameters have to be arbitrarily chosen so the Lasso Regression and Ridge Regression methods lose objectivity - the result depends on the observer. I wonder where those methods are used. In my opinion, the classic Least Means Squares (LMS) or LMS with statistical weights (in different variations) are still the best methods/techniques for reduction of experimental data and modeling.

    • @statquest
      @statquest  5 лет назад

      Elastic Net is used all the time in Machine Learning and lambda is determined using cross validation.

    • @antoniovivaldi2270
      @antoniovivaldi2270 5 лет назад

      @@statquest Thank you! Would you kindly recommend a link to that "cross validation"?

    • @statquest
      @statquest  5 лет назад

      @@antoniovivaldi2270 Here's a link to the StatQuest on cross validation: ruclips.net/video/fSytzGwwBVw/видео.html

  • @shoto6018
    @shoto6018 2 года назад

    Hi Josh, Thank you for the StatQuest. I am still slightly confused as how Elastic Net would improve the correlation between variables.
    I get that
    1) lasso regression would bring variable's weights or parameters to 0 if they are useless
    2) ridge regression would not be able to do that but can improve the parameter's influence on the graph more than lasso
    but I am still confused on the idea of correlation of variables for the lasso+ridge combination

    • @statquest
      @statquest  2 года назад +1

      For more details, see: ruclips.net/video/ctmNq7FgbvI/видео.html

    • @shoto6018
      @shoto6018 2 года назад

      @@statquest thanks for the answer :D, I was thinking of skipping this since it was coded in R and i don't know R but I will watch it :D

    • @statquest
      @statquest  2 года назад +1

      @@shoto6018 You can ignore the details about R and focus on the results.

  • @innamemetova5923
    @innamemetova5923 6 лет назад +1

    Sorry, mb its a bit silly, but.. Don't we need brackets after lambda1 for all absolute parameters and brackets after lambda2? 3:46 in the video

    • @statquest
      @statquest  6 лет назад +3

      Yes! You are correct. That was a slight omission. I hope it's not too confusion.

  • @mrcharm767
    @mrcharm767 2 года назад +1

    brilliant !! but have a doubt in mind that how are we sure that elastic net regression would not cause high variance since its summing both ridge and lasso and due to this it will guide the model to change through a higher range?

    • @statquest
      @statquest  2 года назад

      I'm not sure I understand your question, but, by using validation, we can test to see if elastic-net is increasing variance, and if so, not use it.

    • @mrcharm767
      @mrcharm767 2 года назад +1

      @@statquest I meant was since the line tries to adjust to lowest error from the target as possible with the gradient descent and all .. but we use ridge and lasso regression that would slightly variance the line from the data (predicted points line to the actual data points line ) and the accuracy would be slightly increased or decreased depending on the data .. so if we use elastic net regression which is combination of both ridge and lasso it would cause higher variance and it's confirm that accuracy would be bit reduced right ?? This was the question

    • @mrcharm767
      @mrcharm767 2 года назад

      here by variance i mean the distance between predict data points line and the actual data line

    • @statquest
      @statquest  2 года назад

      @@mrcharm767 To be honest, I still don't understand your question. But I think part of the problem is that the term "variance" has two meanings - the statistical one ( ruclips.net/video/SzZ6GpcfoQY/видео.html ) - and the machine learning one ( ruclips.net/video/EuBBz3bI-aA/видео.html ). The whole point of regularization is to reduce variation in the sense of used in machine learning (and thus, increase long term accuracy) and we do that by desensitizing the model to the variables in the model. To see this in action, and to verify that it works correctly, see: ruclips.net/video/ctmNq7FgbvI/видео.html

    • @mrcharm767
      @mrcharm767 2 года назад +1

      @@statquest yes u got right where i was actually i made a mistake interchanging bias and variance in the explanation

  • @JSS11
    @JSS11 3 года назад

    Why do we not use parentheses after each lambda? I got confused as we did in the two earlier videos on regularization. Thanks for helping out!

    • @statquest
      @statquest  3 года назад

      Oops. Looks like I forgot to add the parentheses. Sorry about the confusion that caused. :(

    • @JSS11
      @JSS11 3 года назад +1

      ​@@statquest No worries! I thank you for your keeping my motivation level up there and getting back to me so quick.

  • @nashaeshire6534
    @nashaeshire6534 3 года назад

    Thanks a lot for your great videos.
    I don't understand why use Lasso reg or Ridge reg when we can use Elastic-Net reg?
    What is the draw back of Elastic Neg regression?

    • @statquest
      @statquest  3 года назад +1

      None that I know of. However, not every ML method implements the full elastic net.

    • @nashaeshire6534
      @nashaeshire6534 3 года назад +1

      @@statquest Thanks!
      I don't get why you don't get more thumbs up...
      Great show, thanks again

  • @elinkim7212
    @elinkim7212 5 лет назад +4

    I think you are missing parenthesis in penalty terms 3:31.
    But thank you so much for the videos!

  • @MohamedIbrahim-qk3tk
    @MohamedIbrahim-qk3tk 4 года назад +1

    Lasso does the job of shrinking the coefficients AND removing the useless parameters right?

    • @statquest
      @statquest  4 года назад

      In this video I show the roles that both Ridge and Lasso play in Elastic Net: ruclips.net/video/ctmNq7FgbvI/видео.html

  • @ashwinmanickam
    @ashwinmanickam 2 года назад

    Hey Josh great video as usual ! I have a question for you, grateful if you can answer
    Let’s say I do a market mix modelling and I have closer to 250 variables and closer to 180 line items, which of these would be most suitable.
    Info about data
    A lot of these variables are super correlated, but I cannot afford to drop anyone off then since I need to present contribution of every channel to the business and they are naturally Co related since spending from business usually happens in clusters and are similar for similar channels like Facebook and Instagram.
    Any pointers on these will be very useful thanks!

    • @statquest
      @statquest  2 года назад

      Try just using Ridge Regression and see how that works.

  • @basavarajtm5018
    @basavarajtm5018 4 года назад +1

    Hi Josh, Thank you for another awesome video. I have one qn, how to decide which parameters to group for lasso and ridge penalty for Elastic net regression?? are they selected randomly? thanks in advance

    • @statquest
      @statquest  4 года назад

      Elastic-net takes care of all of that for you. See it in action here: ruclips.net/video/ctmNq7FgbvI/видео.html

  • @龔冠宇-u9x
    @龔冠宇-u9x 3 года назад +1

    Thanks a lot for your amazing videos. I just wondering, when I use Elastic Net, the coefficient of useless variable seems will not go to zero because of the part of the Ridge Regression in the equation. So, why not just use Losso Regression first to eliminate the useless variable and then use the Ridge Regression to regularize?

    • @statquest
      @statquest  3 года назад

      Interesting. You could try that. However, in theory, elastic net is supposed to do that for you. So there may be some aspect specific to your data that is giving you strange results.

  • @carloszarzar
    @carloszarzar 5 лет назад +1

    Hi everyone. Could I consider the lambda as a hyperparameter in Ridge Regression and Lasso Regression?

  • @Han-ve8uh
    @Han-ve8uh 3 года назад

    1:12 says "Ridge works best when most variables in model are useful" I don't understand how would you know "most variables are useful" even before using Ridge, and how it can be used to select Lasso vs Ridge? Isn't the purpose of fitting a model to discover what is useful? Do you mean a Lasso is fit first to see which coefficients are non-zero to check usefulness, then subsequently fit Ridge in hopes that it will help on test set performance? What exactly is done first to ascertain that "most variables are useful"?
    Another thing i'm unsure when watching these regularization videos is whether the purpose (test set prediction performance vs studying coefficients) of the regression affects interpretations? Is a larger coefficient more useful? Or small coefficients on variables that give good test set prediction performance are useful?

    • @statquest
      @statquest  3 года назад

      You don't know in advance if most of the variables are useful or not, but the good news is that Elastic-Net means we do not need to know. Elastic-Net automatically (through the process of applying cross validation to the different penalties) figures out how to best balance Ridge and Lasso regression penalties. For an example of this being done in practice, see: ruclips.net/video/ctmNq7FgbvI/видео.html
      Also, regularization does affect interpretation, and not in a good way. It is mostly used for predictions, rather than interpretations.

  • @Yzhang250
    @Yzhang250 6 лет назад +1

    Hi Josh, Just revisited this video and very clearly explained. But what are the disadvantages of elastic net? Is this model more computational expensive?

    • @statquest
      @statquest  6 лет назад

      As far as I know, it's pretty efficient.

  • @tanbui7569
    @tanbui7569 3 года назад

    Awesome explanation but i think you forgot the bracket in the equation for Lasso and Ridge Reg equation :)). The Lambda is supposed to multiply with the whole sum of squared slopes or absolute slopes.

    • @statquest
      @statquest  3 года назад

      Yes, you are right. Sorry about that.

    • @tanbui7569
      @tanbui7569 3 года назад +1

      @@statquest tks for your reply. I only realized it after taking notes

  • @adhiyamaanpon4168
    @adhiyamaanpon4168 4 года назад

    hi josh!! am still unable to understand the advantage of elastic net regression..i can easily understand how ridge regression is advantageous compared to lasso..(i.e) instead of completely eliminating correlated features, it shrinks the parameters associated with those features..now plz tell how elasticnet is advantageous compared to using ridge regression alone!!

    • @statquest
      @statquest  4 года назад

      I talk about that in this video: ruclips.net/video/ctmNq7FgbvI/видео.html

  • @merrimac1
    @merrimac1 5 лет назад

    Thank you for the tutorial. One thing I don't get is why Elastic Net can remove some variables. It has the component of Ridge regression, so a variable won't be removed all together. How come?

  • @RealSlimShady7
    @RealSlimShady7 4 года назад +1

    I am in love with your videos Josh! BAM! I just wanted to ask when we have so many features and multicollinear variables (real case datasets), is applying Elastic Net Regression always better than Ridge and Lasso? I mean, we cannot actually check that as there are so many variables ( Your Deep Learning Example) so can we say that Elastic Net is best of both worlds? We can apply it in most of the scenarios where making a hypothesis about the features not very simple?

    • @statquest
      @statquest  4 года назад +1

      I talk about this in my video that shows how to do Elastic Net regression in R. The answer is, "Yes, elastic-net gives you the best possible situation". See: ruclips.net/video/ctmNq7FgbvI/видео.html

    • @RealSlimShady7
      @RealSlimShady7 4 года назад

      @@statquest Thank you so much!!! You are a savior!! BAMMMMM!!!

  • @anmoldeep3588
    @anmoldeep3588 4 года назад

    Notes:
    - The hybrid Elastic-Net Regression is especially good at dealing with situations when there are correlations between parameters.
    - Lasso Regression tends to pick just one of the correlated items and eliminates the others
    - Ridge Regression tends to shrink all the parameters for the correlated variables together
    - By combining Lasso and Ridge regression, Elastic-Net Regression groups and shrinks the parameters associated with the correlated variables and leaves them in equation or removes them all at once.

  • @maskew
    @maskew 6 лет назад +2

    Hi Josh, great video! There's just one thing that I'm confused about. I understand that Elastic-Net is meant to provide the best of both worlds out of Lasso and Ridge regression but I'm struggling to get my head around what this means. You said that
    "Elastic-Net regression - groups and shrinks the parameters associated with the correlated variables and leaves them in the equation or removes them all at once".
    What's the advantage to keeping all of the correlated variables in the equation? I thought that this was a bad thing to do since they are likely providing the same information to the model more than once. Also, does Lasso always keep a single variable of a correlated variable group, even if the group doesn't actually help at all to make predictions?

    • @statquest
      @statquest  6 лет назад

      This is a great question, and there may be more than one good answer. However, here's my take on it. In a pure "machine learning" setting, retaining correlated variables my not be very useful, but in a research setting, it is very useful. If you have thousands of variables, it my be very useful to see which groups of variables are correlated because that could give you insight into your data that you didn't have before. Does that make sense?
      And if a variable or a group of correlated variables are not useful, then the corresponding coefficients will shrink, all of them.

    • @maskew
      @maskew 6 лет назад +2

      @@statquest Thanks for the quick answer. I can understand why this would be useful in a research setting but surely the purpose of regularization is to find the best set of parameter values to model the function? By holding onto these variables, I can only see them having a negative effect on the optimality of the model

    • @statquest
      @statquest  6 лет назад +6

      ​@@maskew So true! So I Iooked into this, and the answer is that correlated variables don't get in the way of predictions. They get in the way of trying to make sense of the effect of each variable on the prediction, but not in making the prediction itself. Said another way, if we used elastic-net regression and left a group of correlated variables in the model, we could conclude that they helped make good predictions, but we would not be able to make any conclusions about relationship between any one variable and the predicted response based on the coefficients. For more details, see point 5 on this page: newonlinecourses.science.psu.edu/stat501/node/346/

    • @maskew
      @maskew 6 лет назад +1

      @@statquest Right okay that makes sense then! Thanks a lot for getting back to me

  • @paulstevenconyngham7880
    @paulstevenconyngham7880 6 лет назад +1

    hey man, might you be able to do a wee vid on z-score?

  • @yc6768
    @yc6768 2 года назад

    Thanks for the video Josh. Your explaination makes sense, but i can't wrap my head to think of a reason why would this work still. If we know some variables that are less important (e.g., Age in your previous example), don't we still have those variables that in the loss function? Is it just that their impact will be sitting in between none and when using L2?

    • @statquest
      @statquest  2 года назад

      I'm not sure I understand your question. However, for less important variables, we can reduce their associated coefficients without drastically reducing the fit of the model to the data, and this will result in a significant reduction in the "penalty" that we add to the loss function.

  • @ruihanli7241
    @ruihanli7241 2 года назад

    Thanks for the clearly explanation, so since elastic regression is the best, should we just use elasitc regression every time instead of using lasso or ridge?

    • @statquest
      @statquest  2 года назад

      Yes, because you can use Elastic Net to be pure Lasso or pure Ridge, and everything in between, so you can have it all.

  • @basmaal-ghali9174
    @basmaal-ghali9174 4 года назад

    Thanks. But what if we want to estimate a variable from only one variable , does using elastic net regression improve the results? I already used Ols , r_aquared =0.65 with rms=0.04. How to improve the model?

    • @statquest
      @statquest  4 года назад

      For details on the pros/cons of elastic vs ridge vs lasso, see: ruclips.net/video/ctmNq7FgbvI/видео.html

  • @ankitbiswas8380
    @ankitbiswas8380 Год назад

    So will Lasso remove the highly correlated ones in elastic net regression or will the combined effect of both lasso and ridge will just shrink the highly correlated features ? I am still unclear on that ? Who wins in terms of handling the correlated features ? Lsso or Ridge ?

    • @statquest
      @statquest  Год назад

      It depends on how you configure it. For more details, see: ruclips.net/video/ctmNq7FgbvI/видео.html

  • @weiqingwang1202
    @weiqingwang1202 4 года назад

    Why would elastic net remove the parameters of correlated variables altogether or keep them and shrink them? I thought it’s a little bit of both.

  • @jy2502
    @jy2502 3 года назад

    Thank you for your wonderful videos. It really helped me to understand ridge/lasso/elastic net. I still have one question though, it seems like elastic net regression can delete some variables even though both lambda 1 and 2 are not 0 (I found it from other papers). but I am not sure how that is possible if lambda 2 is not 0..... do you have any idea for this? Thanks again!

    • @statquest
      @statquest  3 года назад

      As long as the lasso penalty is in use, then you can eliminate variables.

  • @JMRG2992
    @JMRG2992 5 лет назад

    I want to know something. Minimum sample size for ridge and lasso is?. I have checked tons of papers, where some journals use at least 4, and others use 30, and others requires to estimate (like Greenes) for about 250 observatoins. Would this change with ridge and lasso regressions?

  • @johannesgengenbach6701
    @johannesgengenbach6701 5 лет назад +2

    Thanks for all the Great Videos with decent musical intros! ;)
    I have a question concerning this one:
    You mention "lambdaX * variableX" but shouldn't it rather be "lambdaX * parameterX" (except the y intercept).

    • @statquest
      @statquest  5 лет назад +1

      You are exactly right.