Ridge, Lasso & Elastic Net Regression with R | Boston Housing Data Example, Steps & Interpretation

Dr. Bharatendra Rai

Просмотров 51 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 2 дек 2024

Комментарии • 191

@bassamabdelnabi3117 5 лет назад ⁺⁴
Thanks to you is not enough. You have done a lot to share knowledge with people. Great people like you are the people who make difference. Please continue to be generous and kind.
@bkrai 5 лет назад
Thanks for your comments!
@flamboyantperson5936 6 лет назад ⁺²⁰
Sir many many many thanks to you. I really have no words to thank you for making this video. I have looking for these three methods since a very long time on internet but I have not found it anywhere and you made it. I'm extremely happy and extremely thankful to you. You are the best Professor in the world. Respect.
@bkrai 6 лет назад ⁺¹
Thanks for your comments!
@khalidalonso5172 3 года назад
i guess it is kinda off topic but do anyone know of a good website to stream newly released tv shows online?
@winstonedison2001 3 года назад
@Khalid Alonso lately I have been using flixzone. Just search on google for it =)
@harlanmaverick3296 3 года назад
@Winston Edison definitely, have been watching on FlixZone for months myself :)
@shanealec3399 3 года назад
@Winston Edison Thanks, signed up and it seems like they got a lot of movies there :) I appreciate it!
@ashok6644 6 лет назад ⁺³
You are explained it clearly on Ride, Lasso and Elastic Net Regressions. and your teaching styles are awesome. Thanks Sir.
@bkrai 6 лет назад
Thanks for comments!
@okolosundayojotule5484 3 года назад ⁺¹
This video is so helpful, I now have a glimpse of how to go about comparing Ridge, Lasso, and Elastic Net. Thank you so much Sir.
@bkrai 3 года назад
Thanks for your comments!
@ramchandersrivastava6352 6 лет назад ⁺¹
Dear Sir, in your videos u make our life so simple and save no of hr's.
@bkrai 6 лет назад
Good to know it helps save time!
@samueljuma5905 Год назад ⁺¹
You just solved my of biggest problems in my article, THANK YOU!
@bkrai Год назад
You are welcome!
@samueljuma5905 Год назад
Please help at 23:43, what can I do so that xyplot() compares different models eg LASSO vs Ridge. Thanks!
@AkshatJha 2 года назад ⁺¹
Thanks for the video. For Lasso and Elastic Net, shouldn't we scale the data before running the model?
@bkrai 2 года назад
That's always a good idea.
@Dejia_Space 4 года назад ⁺¹
Dr. Rai
After I run the following code
ridge
@bkrai 4 года назад
You can try that and see if it helps to improve the model.
@abc_def789 4 года назад
Isn't value of lambda supposed to be between 0 and 1 if you're using the values in sequence?
@aleksandrpugachev1035 2 года назад ⁺¹
Thank you for clear and simple explanations. After you use Elastic to predict, how do you interpret your results? Please provide more explanation ex. prediction on train is 4.11 and on test is 6.15. What does this actually tell us?
@bkrai 2 года назад
Those are root mean square error (RMSE) values that tell us how accurate our model is. Lower values of RMSE indicates better prediction performance by the model.
@cruisedog22 4 года назад ⁺¹
Thank you for this - question: how would you code to group your factor variables so that elastic net throws out or keeps all levels of the factor variable?
@kylmaz5782 Год назад
Hello. I want to implement the ridge regression method on a small dataset. but I want to get it by solving the model manually (by hand). How can I do it? I will be glad if you can help.
@adriennewelch5891 5 лет назад ⁺³
Hi, thanks for nice video tutoial. one of the best. I wanted to go over your tutorial. Unfortunately, I also being stopped at Ridge. I got the same problem over there: > ridge
@jarrelldunson 4 года назад
getting the same error... is there a solutution?
@bkrai 4 года назад
I don't see any error. Probably you can delete code, and rewrite as sometimes it is something small that we may miss.
@Yllemanden 6 лет назад ⁺²
Thank you for this nice explanatory video. However, I do have one question. In the end, you end up with some variables having the highest coefficients, i.e. the ones explaining the outcome data the best. But how do you determine a cut-off ? In your case "nox" and "rm" have higher coefficients than most making them the easy choice. But what if you have some that are closer together etc. Do you not use variables that have coefficients lower than 1 (higher than -1), or how do you actually pick the right variables, and the correct amount none the less, instead of ending up with 10 variables for your model.
@akshaytakhi8016 5 лет назад ⁺¹
please reply on this sir
@bkrai 4 года назад ⁺¹
We don't need to pick variable as feature selection happens automatically here.
@bkrai 4 года назад ⁺¹
I saw your comment today.
@NextCR7Seb 3 года назад ⁺¹
You're the real MVP! thanks alot for sharing this
@bkrai 3 года назад
You are welcome!
@rohitnath5545 3 года назад
can we have an example of multiple factor level categories in the model and see how elastic net works. will it keep insignificant factor levels or drop them off as they are insignificant
@alifmaulida4475 3 года назад
thank you for your video, very help, i've got a question, how about SEE for penalized spline regression? what is the different with ridge regression?
thanks in advance
@melissanichol4051 5 лет назад ⁺¹
Thanks for a fantastic video. I have run this tutorial without issue with Boston data set. I am replicating this with my own current data set of 10 variables with 226 observations. When running lasso and elastic net I keep getting this error "In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
There were missing values in resampled performance measures." However I have no missing data within the data set. Any thoughts? Thank you.
@bkrai 4 года назад
I saw this today, probably you already solved this problem.
@anthonysun2193 4 года назад ⁺¹
Thank you for the video, Dr. Rai.
Does anyone know how to make the font size on the label larger in the plot? It is extremely small. I searched online and tried for a while, but cannot get it done. Also, how about adding legend to the plot?
@bkrai 4 года назад
To change font size you can use 'cex'.
@anthonysun2193 4 года назад
@@bkrai
Thank you for your reply, Dr. Rai. I tried to use cex, but none of the methods change the size of the label on the lines. (I only know of cex.axis, cex.lab, cex.main, and cex.sub, and none of them work in this case.)
- Anthony
@anthonysun2193 4 года назад
@@bkrai
Also, in the plots (log lambda vs. Coeff and Fraction Deviance Explained vs. Coeff), they showed that variable 6 is better than variable 5 as Variable 6 is not overfitted. What are Var(5) and Var(6)? Are Var(5), nox, and Var(6), rm? But based on the importance order, nox is much more important than rm, so did I get the number /variables order messed up?
@shadyjii2154 5 лет назад ⁺²
You are a legend, this was super helpful.
@bkrai 5 лет назад ⁺¹
Thanks for comments!
@mandava5103 5 лет назад
Thanks for the video Prof. Rai. I have a couple of questions: What is finalModel, You have set lambda between 0.001 and 1. However I see large log lambda values in the plot. How to explain this? Thanks
@ramchandersrivastava6352 6 лет назад ⁺²
u would definitely work so hard for all this, hats off to you.
@bkrai 6 лет назад
Thanks for comments!
@chriss7771 6 лет назад ⁺²
Great music... :) What a great video. I just learn so much from you. Thank you.
@bkrai 6 лет назад
Thanks for comments!
@davidli2294 4 года назад ⁺¹
Thank you thank you so much . Can I ask is it the same testing for for logistic model ?
@bkrai 4 года назад
In logistic model response is a factor variable. So these two situations are different.
@dilshadkhanum6953 6 лет назад ⁺¹
Thank you so much for explaining in such simple and efficient way. Does Ridge/Lasso/Elastic Net fall into the dummy variable tarp ? I have a data set where I have created dummy variables for categorical variables, and then aggregated them. Can I use all of them in these methods?
@bkrai 6 лет назад
Yes, that should work fine.
@SaveSoilSaveSoil 4 года назад ⁺¹
Thank you Dr. Rai. This was helpful. Do you know how to alter the selection criterion (e.g., to aic, bic, sbc, etc.) by any chance?
@bkrai 4 года назад
See if this helps: www.methodology.psu.edu/resources/AIC-vs-BIC/
@SaveSoilSaveSoil 3 года назад ⁺¹
Thank you very much Dr. Rai. I posted my question in the link below. I am leaving a record here in case anyone has the same question.
stats.stackexchange.com/questions/502356/lasso-regression-with-aic-or-bic-as-model-selection-criterion
@bkrai 3 года назад
You are welcome!
@sandysanju9675 4 года назад ⁺¹
This is an excellent explanation sir, hats off to you. sir I have a query, could you please tell me how can I use the ridge, lasso and elastic net regression for unbalanced panel data... looking for your help.. kindly tell me the source also for the panel data ridge, lasso and elastic net regression...
@bkrai 4 года назад
For class imbalance problem, you may find following useful:
ruclips.net/video/Ho2Klvzjegg/видео.html
@sandysanju9675 4 года назад
Sir my data is in the form of Yit= a +bXit+ eit
where i is the number of firm and t is the time period.
i have collected the data for n firms over period 0f Time t. i mean i have collected the data for
each firm for t period. now my problem is i have several explanatory variables therefore i want to
use normalisation for variable selection and inference for imbalanced panel data kindly help me.
@@bkrai
@mehmetfatihyildirim3449 4 года назад ⁺¹
Thank you for a great tutorial, Dr. Bharatendra Rai! I wonder how you handled the categorical variable. Did you put into the LASSO model as numeric or factor?
@bkrai 4 года назад
Note that chas is a factor variable in this example.
@adityapatnaik7078 6 лет назад ⁺¹
excellent work! cud you plz make a video to check for assumptions of linear regressions
@bkrai 6 лет назад
thanks for the suggestion, I've added it to my list.
@abhinavmishra7786 6 лет назад ⁺²
Hi Sir, really a very helpful interpretations of outcomes for lasso ridge and elasticnet. I have a doubt, can we apply ridge, lasso and elasticnet on classification problem??
@bkrai 6 лет назад
Yes, they can be used for classification problems too.
@abhinavmishra7786 6 лет назад ⁺¹
Hi sir, Thanks a lot for replying... I am practicing on churn analysis data with logistic regression. I request you to please guide us with a video demonstrating the customer life time value analysis on R and how and where it is used
@bkrai 6 лет назад ⁺¹
I've added this to list of future videos.
@abhinavmishra7786 6 лет назад
Thank u sir..
@abdinardooliveira9109 6 лет назад ⁺¹
Hello, I am trying to run your algorithm but when I tried to run the linear model (7:53) it fails, showing the following message: predictions failed for Fold01.Rep1: intercept=TRUE Error : $ operator is invalid for atomic vectors. How can I solve this? Thank you! PS: Nevermind...I could solve this by myself: it is just reset R. I worked now. Anyway, thanks!
@bkrai 6 лет назад ⁺¹
Thanks for the update!
@abhishekagnihotri9233 6 лет назад ⁺²
Impresseive explanations Sir, I liked it too much....
@bkrai 6 лет назад
Thanks for comments!
@tomishrabarua9168 3 года назад ⁺¹
Hello sir
For my dataset i am getting same lower RMSE and same higher Rsquared value for lasso and elastic net methods...how can i choose better algo from these two?
@bkrai 3 года назад ⁺¹
You can choose whichever is simpler.
@jubayerbiswas1356 4 года назад ⁺¹
Thank you for your excellent and informative lecture. I want to ask one question that I am facing in recently. I am trying quantile regression with LASSO penalty. Is it possible to use the “train” function by changing method and tuneGrid ? If not then what will be the solution ? would you please make video on quantile regression with LASSO penalty.
@bkrai 4 года назад ⁺²
Thanks, I've added this to my list.
@angelali6437 2 года назад
Thanks for the great tutorial! Would anyone know why coef(ridge) returns many coefficients for each predictor? Thanks
@farygreenart 6 лет назад ⁺¹
Thanks thousands for useful video . I have a question about "Error in plot.new() : figure margins too large " after running pairs.panels(Data[c(-2, -55)]) . I have 55 factors and 57686 observation . it can handle this up to limited number of factors?
@bkrai 6 лет назад
You can try with smaller number of variables. With 55 variables, even if plot is made, it will be difficult to see patterns.
@kartikjha5704 3 года назад ⁺¹
Sir, If independent variables are highly correlated in classification problems than also we should use these types of procedures or we can ignore that. Kindly help
@bkrai 3 года назад ⁺¹
For classification problems, try this:
ruclips.net/video/hCLKMiZBTrU/видео.html
@kartikjha5704 3 года назад ⁺¹
@@bkrai Thank you sir. Your videos helped me a lot in learning ML and R coding. Thanks a lot sir again.
@bkrai 3 года назад
You are welcome!
@John-dw6jb 3 года назад ⁺¹
Amazing tutorial. Thank you very much, I just subscribed to your channel.
@bkrai 3 года назад
Thanks and welcome!
@John-dw6jb 3 года назад ⁺¹
@@bkrai I am a graduate student currently doing a research project with sg-LASSO and your video really helped me. Thanks again.
@bkrai 3 года назад
Thanks for comments!
@Thamizhadi 4 года назад ⁺²
Sir, can you do a tutorial for using glmnet package for doing logistic regression.
@bkrai 4 года назад ⁺¹
Thanks, I've added it to my list of future videos. Meanwhile you may also refer to this:
ruclips.net/video/AVx7Wc1CQ7Y/видео.html
@prarthanawajpai5180 5 лет назад ⁺¹
Thankyou for sharing this knowledge sir.
I am having some errror.Follwing is the code and error
airglasso
@prarthanawajpai5180 5 лет назад
Please helpout sir
@bkrai 5 лет назад
I see you have used "
@bkrai 5 лет назад
There is a typo in the code.
@prarthanawajpai5180 5 лет назад ⁺¹
@@bkrai Its the typo while copypasting code over here sir.Even after that correction i getting same error
@bkrai 5 лет назад
Share your code so that I can take a look
@abc_def789 4 года назад
Sir I want to ask about the sequence, when in ridge regression you set the sequence as 0.0001 to 1 with length 5, for this sequence I got the lambda= 0.00001, when I changed the sequence to 0.001 to 1, my lambda also changed.
Can you please tell me what lambda should I select?
@andresbaron8557 4 года назад ⁺¹
How do you find the final equation of the regression ? and the confusion matrix ?
@bkrai 4 года назад
Line-95 gives intercept and coefficients for the model. In regression problems, confusion matrix is not required. You can use RMSE or R-square.
@ainaumairahmazlan2029 4 года назад ⁺¹
Sir, i do have a question can this method apply to gene expression dataset for feature selection?
@bkrai 4 года назад ⁺²
I'll have to look at the data to make suggestion.
@ainaumairahmazlan2029 4 года назад ⁺¹
@@bkrai the data usually contains small sample and hundred to thousands of gene variable Sir.
@bkrai 4 года назад ⁺²
You can follow the steps in the video and let me know if there is any issue. You can also try:
ruclips.net/video/VEBax2WMbEA/видео.html
@ainaumairahmazlan2029 4 года назад ⁺¹
@@bkrai I will do. tqvm Sir 😊
@bkrai 4 года назад
Welcome!
@FarooqiA1 4 года назад
Very Nice. Are the lambdas in SS's not different?
@kalyanasundaramsp8267 6 лет назад ⁺¹
excellent sir..pls continue posting videos
@bkrai 6 лет назад
Thanks for comments!
@99chintu 6 лет назад ⁺¹
Sir.. you did not performed any standardization on data before running the alogrithm. Is standardization required or not before running this techniques.
@bkrai 4 года назад
Not required here but even if you do it, should be fine.
@xuxiaoliu8113 5 лет назад ⁺¹
so good!!!! the video helps a lot! my first comment on RUclips. thank you very much!
@bkrai 5 лет назад
Thanks for comments!
@Vish_27-v8x 5 лет назад ⁺¹
Excellent video.
@bkrai 5 лет назад
Thanks for comments!
@kalyanasundaramsp8267 6 лет назад ⁺¹
thanks so much sir. when we have the target variable as classification(0 or 1),will the same approach work?
@bkrai 6 лет назад
For target variables as factor, you can use logistic regression.
@kalyanasundaramsp8267 6 лет назад ⁺¹
thanks sir
@kalyanasundaramsp8267 6 лет назад
sir, i have lot for independent factor variables. is there a model to crunch them?
@chubathien9 3 года назад
What if my lambda = 1? Is it an issue with my model and CV?
@varunamal 4 года назад ⁺¹
Very informative. Thank you.
@bkrai 4 года назад
You are welcome!
@hans11996 5 лет назад ⁺²
Dear sir,
Please on below error i am getting while getting same code-
set.seed(1234)
ridge
@bkrai 5 лет назад
Looks like your data has missing values which is causing this error.
@chriss7771 6 лет назад ⁺¹
Is there any way you could do a video on caret ensamble?
@bkrai 6 лет назад
Thanks for the suggestion, I've added this to my list.
@chriss7771 6 лет назад
@@bkrai Thank you sir.
@anabruhn 2 года назад
What if I wanted the programm to define the best valeu for lambda? I mean, without me putting a sequence in the code
@subaganesh552 5 лет назад ⁺¹
Sir, Will you make sequential feature selection algorithm in r...
@bkrai 5 лет назад
Thanks for the suggestion, I've added this to my list. For feature selection you can also refer to this:
ruclips.net/video/VEBax2WMbEA/видео.html
@laiqashafique569 6 лет назад ⁺¹
sir kindly tell me if we don,t want partitioning of data can we omit that part why we use train word in each model
@bkrai 6 лет назад
It is always a good idea to partition data to avoid over-fitting. "train" is the function used for developing a model, that's why we need to use it.
@laiqashafique569 6 лет назад
thanks for ur reply but when i run ur program i got lots of errors
@laiqashafique569 6 лет назад
sir waiting for your video on multi level regression by using lasso and rigid regression
@bkrai 6 лет назад
I would suggest first try same data that is used in the video. That will make sure codes have no error.
@amgdhussein7040 3 года назад ⁺¹
In this code, the train function is regressing against all data which doesnt allow for any feature engineering. can anyone clarify ?
@bkrai 3 года назад
The methodology takes care of that.
@optoed 6 лет назад
Dear Bharatendra, thank you a lot for the bright explanation! I would like to ask you how can I get p-values for coefficients from elastic net? Thank you in advance!
@jwck7 6 лет назад
it's not simple, and is a somewhat new area of research. usually you don't worry about it, after all the only reason you see the variables is because the elastic net procedure decided they should be kept, remember?
@parasrai145 6 лет назад ⁺¹
Very well explained.
@bkrai 6 лет назад
Thanks for comments!
@praveenk9237 4 года назад ⁺²
Thanks a lot for the tutorial Sir. Really helpful.Just a small question though.
Is there a code that i can use to see the P value of the regression equation also also along with the R squared and RMSE values?
@bkrai 4 года назад
The output includes all of them.
@jvickers1879 4 года назад ⁺¹
When doing the scatterplot, why eliminate the target variable? Isn't helpful to see what variables are, and aren't, correlated to it?
@bkrai 4 года назад
It was mainly for discussing multicollinearity where target variable is not needed.
@kalyanasundaramsp8267 6 лет назад
also when the dependent variables are discrete, can we convert them to factors and apply the same approach
@bkrai 6 лет назад
You need logistic regression for discrete target variable.
@kalyanasundaramsp8267 6 лет назад
sorry sir am asking lot of questions, in my case am having lot of independent variables that are factors and would like to crunch them in to few to make it simple. in my case target variable is "yes" or "no".glmnet function in R says it also supports binomial.
@jwck7 6 лет назад ⁺¹
20:40 you say the purple variable (#6) is more important than blue (#5), but this always disagrees with the variable importance plots, which say nox (variable 5) is most important with rm (variable 6) the second most important. Did you mean to say the blue line was more important? You also say this at16:58
@bkrai 6 лет назад
You are right, nox is contributing more to the model.
@jwck7 6 лет назад ⁺¹
Okay thanks. Great video by the way there's a lot of great stuff in here in addition to what's in the title so thanks for making it, you're also the only person I've seen who speeds up the code writing portions instead of putting us to sleep to the sound of the keyboard slowly typing code so that was a smart touch too :)
@bkrai 6 лет назад
Thanks for feedback!
@send2milan 6 лет назад ⁺¹
Thank you sir. I expect more of your videos in youTube.
@bkrai 6 лет назад
Thanks for the comments! You can find some useful links below:
Introductory R Videos: goo.gl/NZ55SJ
Machine Learning videos: goo.gl/WHHqWP
Deep Learning with TensorFlow: goo.gl/5VtSuC
Image Analysis & Classification: goo.gl/Md3fMi
Text mining: goo.gl/7FJGmd
Data Visualization: goo.gl/Q7Q2A8
@lelabachiashvili2107 5 лет назад ⁺¹
many thanks, one of the helpful videos
@bkrai 5 лет назад
Thanks for comments!
@rolandkouakou9235 5 лет назад ⁺¹
Thanks Mr Rai
@bkrai 5 лет назад
Thanks for comments!
@humawaleed7489 7 месяцев назад ⁺¹
hats off sir
@bkrai 7 месяцев назад
Thanks!
@yuanjunlu9219 5 лет назад ⁺¹
Love U so much. This video help me a lot
@bkrai 5 лет назад
Thanks for comments!
@sandysanju9675 4 года назад ⁺¹
Sir I have got this problem and i have checked my each value i do not have any missing value plz help
Error in na.fail.default(list(LTDA = c(0.26683158, 0.37309602, 0.40553323, :
missing values in object
@bkrai 4 года назад
Look at summary of data using 'summary' function. If there is missing data, it will show up as NAs.
@sandysanju9675 4 года назад
@@bkrai Sir, I have seen through summary but it shows no missing data
@sandysanju9675 4 года назад
lm
@sandysanju9675 4 года назад
Sir please help
@sandysanju9675 4 года назад
Sir I have checked my summary statistics but i did not found any missing value
@me3jab1 5 лет назад ⁺¹
wow Thank you Boss
@bkrai 5 лет назад
Thanks for comments!
@dobariyahardik3388 2 года назад ⁺¹
thanks allot sir
@bkrai 2 года назад
You are welcome!
@mabelezeonugo5023 4 года назад ⁺¹
Getting an error when trying to run my prediction and my dependent variable is a factor
@bkrai 4 года назад ⁺¹
For factor type dependent variable, try this:
ruclips.net/p/PL34t5iLfZddvv-L5iFFpd_P1jy_7ElWMG
@mabelezeonugo5023 4 года назад
@@bkrai thank you Sir, is it possible to get the confusion matrix showing the F1-score
@drvishalsrivastava2331 5 лет назад ⁺¹
Amazing
@bkrai 5 лет назад
thanks!
@yogitasolanki4638 5 лет назад ⁺¹
Accuracy kese chk karenge sir
@bkrai 4 года назад
using RMSE.
@hhangjohn75 5 лет назад ⁺¹
i stopped at ridge step, it shows Error: Stopping
In addition: There were 26 warnings (use warnings() to see them)
DAMNNNNNNN
@bkrai 5 лет назад
If you are already using the R code that I provided in the description area, then I'll suggest review the code as it may be a small error. Also check if you have missing data.
@okan702 4 года назад
@@bkrai Sir the code you provided has some missing data. I tried to fill by looking at your video but it doesn't work. Can you share the exact code in this video? Thank you
@sudheersk5727 6 лет назад ⁺¹
Hi sir can you please tell me your mail id, i have a one problem statement, i need to know your suggestion on that dataset.
@bkrai 4 года назад
seemabharat@gmail.com
@ammar46 4 года назад ⁺¹
I get less RMSE for test data😂
@bkrai 4 года назад
It can happen sometimes, but mostly it is the other way.

Следующие

Автовоспроизведение

Feature Selection Using R | Machine Learning Models using Boruta Package