Logistic Regression with R: Categorical Response Variable at Two Levels (2018)

Поделиться
HTML-код
  • Опубликовано: 1 дек 2024

Комментарии • 335

  • @diliniherath1299
    @diliniherath1299 4 года назад +4

    No words to express my gratitude for you. Found your channel days before submitting the project and you saved me !

    • @bkrai
      @bkrai  4 года назад

      Great to hear!

  • @earlymorningcodes6100
    @earlymorningcodes6100 4 года назад +12

    2:46 two way table of factor variables 3:23 Data Partition 5:21 logistic regression Model,8:36 prediction 10:05 probability calculation 17:32 error test data 14:24 Interpretation of coefficinet,18:28 goodness of fit 15:03 error training data 16:04 confusion matrix

    • @bkrai
      @bkrai  4 года назад

      Thanks!

  • @Lilian.Chidinma.Nwafor
    @Lilian.Chidinma.Nwafor 2 месяца назад +1

    Dr., I don't know how to tell God to bless you for me. your MLR video saved me during my research presentation, see my explaining like a pro!! all thanks to you. This also is very helpful, especially how you gave detailed explanations.

    • @bkrai
      @bkrai  2 месяца назад

      Glad it was helpful!

  • @wardhereadan1187
    @wardhereadan1187 5 лет назад +2

    DR. Bharatendra Rai that video was amazing!! I hope you continue to post more videos like this! seriously amazing!!!!!!!!!!!!!!!

    • @bkrai
      @bkrai  5 лет назад

      Thanks for comments!

  • @saipri
    @saipri 3 года назад +4

    Extremely crisp and accurate! Hope you get many more views! By far the best on this topic...

    • @bkrai
      @bkrai  3 года назад

      Thanks for the comments!

  • @youroldmangaming8150
    @youroldmangaming8150 4 года назад +1

    Thanks mate. Been struggling to find a practical person to show how to do this. Very clear and well thought out. Thank you.

    • @bkrai
      @bkrai  4 года назад

      You're very welcome!

  • @crossray974
    @crossray974 3 года назад +2

    Thank you Mr. Dr. Bharatendra - your stuff and method are on top of youtube, greets from Europe!

    • @bkrai
      @bkrai  3 года назад +1

      Most welcome!

  • @mnorberta24
    @mnorberta24 4 года назад +2

    Thank you for helping save my grades for this module!!!!! I might just watch all your videos because they're so helpful!!!!

    • @bkrai
      @bkrai  4 года назад

      Glad to hear it!

  • @shaikhalishams4065
    @shaikhalishams4065 4 года назад +10

    Finally I've got a perfect video on this topic.

    • @bkrai
      @bkrai  4 года назад +2

      Thanks for comments!

  • @kir66846037
    @kir66846037 3 года назад +2

    the best teaching of logistic regression!!!! Thanks a lot

    • @bkrai
      @bkrai  3 года назад

      Most welcome!

  • @allabtlyf
    @allabtlyf 5 лет назад +2

    Wonderful video.. I was struggling to calculate the probablity from estimate in notebook but you made it quite simple.
    Thanks a lot

    • @bkrai
      @bkrai  5 лет назад +1

      Thanks for comments!

  • @PA_hunter
    @PA_hunter 3 года назад +1

    Thank you Dr. Bharatendra Rai. Can you explain more why Rank 1 is not included in the model, please?

    • @bkrai
      @bkrai  3 года назад +1

      For factor independent variables, we covert them to dummy variables. For more detailed coverage see:
      ruclips.net/video/s23CMIjfwHk/видео.html

  • @jbei9981
    @jbei9981 4 года назад +1

    Thank you so much. Excellent video. I was really thinking I would fail my assignment until I found this.

    • @bkrai
      @bkrai  4 года назад +1

      You're very welcome!

  • @philippdegens6776
    @philippdegens6776 4 года назад +1

    Thanks. You gave a clear and concise explanation and a bonus was that it was in R which I am learning.

    • @bkrai
      @bkrai  4 года назад

      You're very welcome!

    • @bkrai
      @bkrai  4 года назад

      You may also find this useful:
      ruclips.net/p/PL34t5iLfZddvv-L5iFFpd_P1jy_7ElWMG

  • @viigeminalegio
    @viigeminalegio 4 года назад +1

    Thanks you very much Dr. Bharatendra. I was looking to solve some of my doubts and I finally solved them. Thanks for sharing your knowledge. I wish I could have the opportunity to help you in some occasion. Thanks for all, great job.

    • @bkrai
      @bkrai  4 года назад

      Thanks for your comments and feedback!

  • @williamstan1780
    @williamstan1780 2 года назад +1

    Very informative video and explain it in a manner that easy to understand
    I have a question though , what is the difference between logistic regression and multinomial logistic regression ?

    • @bkrai
      @bkrai  2 года назад

      response variable has more than 2 levels in multinomial. See this for details:
      ruclips.net/video/ftjNuPkPQB4/видео.html

  • @nth.education
    @nth.education 4 года назад +3

    Amazing explanation, loved the way you went through with the code and how to proceed step by step.
    I have a doubt with the pvalue calculation at the end. Can you explain a bit more the "with" command you used ? i couldn't understand the parameters used in that, interpretation of p-value is fine, but would like to know the use of the command so i can employ that in some places as well.
    Thanks

    • @bkrai
      @bkrai  4 года назад +1

      you can run ?with in the console, it will give you all details and also examples.

  • @jared1122
    @jared1122 4 года назад +2

    Thank you Dr Rai for the wonderful explanation👍 God bless you 🙏

    • @bkrai
      @bkrai  4 года назад

      Welcome!

  • @dmukherjee4049
    @dmukherjee4049 6 лет назад +3

    Sir can you explain "goodness of fit test". What is df.null-df.residual, lower tail & why it is 'F'?
    Thank You

    • @bkrai
      @bkrai  6 лет назад

      When in RStudio, you can run ?glm. This will provide you with more details.

  • @pranatim
    @pranatim 3 года назад +1

    Best tutorial on logistic regression. Thank you so much for sharing.

    • @bkrai
      @bkrai  3 года назад

      You're very welcome!

  • @rajeshtukdeo
    @rajeshtukdeo 4 года назад +1

    Amazing video to understand the logistic regression concepts thoroughly !!!

    • @bkrai
      @bkrai  4 года назад

      Thanks for comments!

  • @lindanidube5714
    @lindanidube5714 4 года назад +2

    This was amazing... you explain everything step by step nicely :-)

    • @bkrai
      @bkrai  4 года назад

      Thanks for your feedback!

  • @fernandoflores3161
    @fernandoflores3161 3 года назад +1

    Excellent explanation! How do you deal with ordinal and nominal categorical variables?

    • @bkrai
      @bkrai  3 года назад

      If response variable is ordinal, refer to this:
      ruclips.net/video/qkivJzjyHoA/видео.html

  • @reubenmarfo9855
    @reubenmarfo9855 3 года назад +1

    Professor, can you please comment on why in your previous video on logistic regression, you trained the model and predicted on the same data without splitting.

    • @bkrai
      @bkrai  3 года назад

      Just wanted to show mainly how to run logistic regression. But after getting feedback created this on which is more complete.

  • @laxmanbisht2638
    @laxmanbisht2638 2 года назад +2

    Sir, thanks a lot!

    • @bkrai
      @bkrai  2 года назад

      Most welcome!

  • @dipeshpatel3106
    @dipeshpatel3106 3 года назад +2

    Sir plz make on Monte Carlo simulation R

    • @bkrai
      @bkrai  3 года назад

      Thanks for suggestion!

  • @flamboyantperson5936
    @flamboyantperson5936 6 лет назад +1

    Excellent video Sir. You are a great statistician and expert in R. Thank you for the video Sir

    • @bkrai
      @bkrai  6 лет назад

      Thanks!

  • @maheswarivemula141
    @maheswarivemula141 3 года назад +1

    Thank you sir for the wonderful video.
    Sir, I have a doubt that I'm not getting value while running on the test dataset. Could you please help me out of this error.
    It is showing ' all arguments must have same length '

    • @bkrai
      @bkrai  3 года назад

      Check your data again.

  • @anuchowdarybds
    @anuchowdarybds 5 лет назад +1

    Very clear explanation . Thank you . Do you have any more videos on logit regression ?

    • @bkrai
      @bkrai  5 лет назад

      Thanks for comments! The link below also has multinomial logistic regression and other regression based methods.
      ruclips.net/p/PL34t5iLfZddtKNwFNic3HWNV2qMsQ9AjD

  • @soumikchatterjee3996
    @soumikchatterjee3996 4 года назад +1

    Excellent video. Just few things to mention. In glm result, residual deviance is greater than residual degree of freedom that means the data has overdispersion. Better to use quasibinomial function rather than binomial. Other wise p value would show false significance level.
    Second thing to mention backward variable selection without montecarlo permutation has type2 error therefore better to use it cautiously or use Information theoretic approach proposed by Burnham etal with model weight as a criterion.
    Thanks for this beautiful video sir

    • @soumikchatterjee3996
      @soumikchatterjee3996 4 года назад +1

      Although you created seed and resample which can reduce the error but it is extremely difficult to find proper seed size without understanding model weight (wi). Thanks

    • @bkrai
      @bkrai  4 года назад +1

      Thanks for the feedback and comments!

  • @Debashish_Chatterjee
    @Debashish_Chatterjee 5 лет назад +2

    I love your videos .... concise and to the point. Superb .... keep it up

    • @bkrai
      @bkrai  5 лет назад

      Thanks for comments!

  • @ezechielamoussou7409
    @ezechielamoussou7409 2 года назад +1

    Thank you for the video Sir.
    If I were running a logistic regression with categorical predictor variables, should I change them to factors?

    • @bkrai
      @bkrai  2 года назад +1

      Yes.

  • @mohamedbousarout6515
    @mohamedbousarout6515 3 года назад +2

    Thank you sir keep up the good work ;)

    • @bkrai
      @bkrai  3 года назад +1

      You are welcome!

  • @rizwanghulamhussain7309
    @rizwanghulamhussain7309 3 года назад

    Excellent Video! Could you please guide how to fit panel logistic regression in R. I want to make confusion matrix / ROC curve using pglm library but could not find fitting probabilities in pglm library

  • @evansumido6191
    @evansumido6191 2 года назад +1

    hi sir. do you have a code for cross validation? thank you.

    • @bkrai
      @bkrai  2 года назад +1

      refer to this for various ways to use CV:
      ruclips.net/video/GmkHvDs0GG8/видео.html

  • @shajibkumarguha234
    @shajibkumarguha234 4 года назад +1

    Hello Sir! Why did you choose rank as factor and not as ordered?

    • @bkrai
      @bkrai  4 года назад

      You are right, ordinal will be more correct.

  • @mueezwaq
    @mueezwaq Год назад +1

    Hi there, thanks for this. I don't like how R displays the results for factors with more than 2 level - is there any way to get output like SPSS (which supplies a single odds ratio, 95% CI and p-value for each variable in the model). I have tried both the logistic.display and exp() commands but they do not provide an overall value like this. Any ideas?

    • @bkrai
      @bkrai  Год назад

      You can use the output and customize it.

  • @shivamparashar...9536
    @shivamparashar...9536 Год назад +1

    Sir when I uploaded a data set then it doesn't take all data ..it is leaving a few rows.
    Please tell me how I can upload a dataset.
    Thank you

    • @bkrai
      @bkrai  11 месяцев назад

      How many rows your original data has?

  • @victorhenostroza1871
    @victorhenostroza1871 5 лет назад +2

    Thanks man, again other amazing job, u r the teacher we all want at univ.

    • @bkrai
      @bkrai  5 лет назад +1

      Thanks for comments!

  • @yaweli2968
    @yaweli2968 Год назад

    If you have two or more categorical variables which are strings, how do you decide which one to make a factor of 0 or 1. Like how do you assign them specific factors ?

  • @genevieveemefaasare8352
    @genevieveemefaasare8352 2 года назад +1

    thanks so much. very precise and concise explanations. Thank you Sir.

    • @bkrai
      @bkrai  2 года назад

      You are very welcome!

  • @muralitharanrap4534
    @muralitharanrap4534 4 года назад +1

    Can u plz explain . what would xtab does ?

    • @bkrai
      @bkrai  4 года назад

      It is for cross tabulation or for making a 2-way table shown in the example.

  • @navneetjain2507
    @navneetjain2507 4 года назад +1

    What about the case when we have a lot of independent variables that have zero as a response or missing values?

    • @bkrai
      @bkrai  4 года назад

      For missing values refer to this link:
      ruclips.net/video/An7nPLJ0fsg/видео.html

  • @femiakinmade4077
    @femiakinmade4077 4 года назад +2

    I enjoyed your video, thank you! Can I get some clarity on why you used the "train" dataset in your prediction instead of "test"? dataset: ## p1

    • @bkrai
      @bkrai  4 года назад

      After 'train', I also use 'test'. Note that if you get good results with 'train' but not with 'test', it will suggest over-fitting problem.

    • @femiakinmade4077
      @femiakinmade4077 4 года назад +1

      @@bkrai Thanks for your response. Appreciated

    • @bkrai
      @bkrai  4 года назад

      Welcome!

  • @harishnagpal21
    @harishnagpal21 6 лет назад +1

    Hi Bharatendra, I saw your linear regression video also. The explanation on results was fantastic. I got to learn new things. One query - when to use linear and when to use logistic regression? Thanks

    • @bkrai
      @bkrai  6 лет назад

      When y variable is factor, logistic is used. For numeric y linear regression is used.

    • @harishnagpal21
      @harishnagpal21 6 лет назад +1

      thanks :)

  • @MemphianSounds
    @MemphianSounds 4 года назад +1

    Great as always! What do you do when you have so many rows and variables that your computer can't compute the vector in R?

    • @bkrai
      @bkrai  4 года назад +1

      You can take a sample.

  • @valeriasanchez4910
    @valeriasanchez4910 4 года назад +1

    Excellent video Dr.!, I just have one question: Why it is necessary to do the data partition for the estimation?

    • @bkrai
      @bkrai  4 года назад +2

      It can help to avoid over fitting which happens when results are good with training data, but not so good on test data.

  • @alphar85
    @alphar85 4 года назад +1

    You are just amazing 👏. You made my life easier with the codes.

    • @bkrai
      @bkrai  4 года назад

      Happy to hear that!

  • @jarrelldunson
    @jarrelldunson 4 года назад +3

    Thank you for sharing, very helpful

    • @bkrai
      @bkrai  4 года назад

      You are so welcome!

  • @yousif_alyousifi
    @yousif_alyousifi 2 года назад +1

    Hi,
    Dear Dr.Bharatendra Rai
    What are the best models for fitting the binary data? I know that the logistic regression model is one of the models.
    What is the other model to make a comparison with the logistic model to find the best model?
    I would be grateful if you could assist me with this.
    I look forward to hearing from you soon
    Best regards,

    • @bkrai
      @bkrai  2 года назад

      You can use tree based methods for comparison, especially random forest and extreme gradient boosting. See this link for details:
      ruclips.net/video/hCLKMiZBTrU/видео.html

    • @yousif_alyousifi
      @yousif_alyousifi 2 года назад +1

      @@bkrai Thank you, Prof.
      Are these methods (tree-based methods) can be used for regression or classification? Since my concern is to do regression ( predict disease status). As I think that these methods are used only for classification. Kindly confirm. Best regards

    • @bkrai
      @bkrai  2 года назад

      It does both regression or classification. I have included examples for both regression and classification.

    • @yousif_alyousifi
      @yousif_alyousifi 2 года назад

      @@bkrai Thank you, Prof.

  • @AnaPTedim
    @AnaPTedim 6 лет назад +2

    Great video it really helped a lot. I have a question though can I use the same model if one of my categorical variables has in the two-way table that equal zero? If not is there any alternative? How can I solve this?

    • @bkrai
      @bkrai  6 лет назад

      Let's say your categorical variable has 10 levels and the last one has frequency below 5. You can combine last two levels into one and then do the analysis.

    • @AnaPTedim
      @AnaPTedim 6 лет назад +1

      Thank you very much! That might work :)

  • @priyadarsinisamal1779
    @priyadarsinisamal1779 3 года назад

    sir how can i use one data set for training and another different dataset(having similar variables like training set) for testing?

  • @yogeshdhar5825
    @yogeshdhar5825 4 года назад +2

    Very well explained!

    • @bkrai
      @bkrai  4 года назад

      Thanks for comments!

  • @rohitkamble1737
    @rohitkamble1737 3 года назад +1

    Very clear explanation. Understand all things

    • @bkrai
      @bkrai  3 года назад

      Thanks for comments!

    • @rohitkamble1737
      @rohitkamble1737 3 года назад

      @@bkrai sir, I am working on project on Real estate and banking model to predict prizes of house, could you plz help me on that?

  • @rohankulkarni8613
    @rohankulkarni8613 5 лет назад +1

    Please let me know if we have data visualization on this data ? like in tableau or any other software ?

    • @bkrai
      @bkrai  5 лет назад

      For data visualization, you can try this link:
      ruclips.net/video/niB5A8qa88I/видео.html

  • @myakaramyakrishna4400
    @myakaramyakrishna4400 4 года назад +1

    Why do you use xtabs
    ?
    How we do find a dependent variable in data set?

    • @bkrai
      @bkrai  4 года назад +1

      xtabs is for cross tabulation. A dependent variable is based on the context of data. In the example I have used, it is obvious.

  • @wardhereadan1187
    @wardhereadan1187 5 лет назад +1

    how do you plot a logistic regression model?

    • @bkrai
      @bkrai  5 лет назад

      It has a equation, there is no plot.

  • @irfaneditzstatus9760
    @irfaneditzstatus9760 4 года назад +1

    Hi Sir, I am analyzing the data based on the traffic survey. I have Age, Gender, TripDist, TravelMode, TravelTime, DepartureTime, LectureTime information. What is the meaning of factor and margin in regression Modelling. can you help me in that. Thanks in advance

    • @bkrai
      @bkrai  4 года назад

      'factor' is another name for a categorical or qualitative variable.

  • @hyunjungariuka1686
    @hyunjungariuka1686 4 года назад +1

    can anyone help? I got about NSP, but in the regression in appears only 2 rows which is suspect and pathological, but in my regression there is 4 lines like that. I think that it is suspect pathological and the other 2 what can it be?

    • @bkrai
      @bkrai  4 года назад

      For response more than 2 levels, you need to apply multinomial logistic. Here is the link:
      ruclips.net/p/PL34t5iLfZddvv-L5iFFpd_P1jy_7ElWMG

  • @akashprabhakar6353
    @akashprabhakar6353 4 года назад +1

    Thanks for this video sir...
    Kindly tell how can we increase the accuracy of this model...as error rate is quite high..

    • @bkrai
      @bkrai  4 года назад

      You can try other methods to improve accuracy:
      ruclips.net/p/PL34t5iLfZddsQ0NzMFszGduj3jE8UFm4O

  • @uhsay1986
    @uhsay1986 6 лет назад +1

    Hi Sir , i have a retail train data set where i need to predict if a store should be opened or not in a respective location. I removed NAs from the train set , trying to apply glm function ( store~. , data=train, family='binomial' ) .. even after waiting 5-10 min i dont get any output .. the data set consist of character , int columns.

    • @bkrai
      @bkrai  4 года назад

      You will have to look at the structure of your data and make sure response variable is of factor type.

  • @vairachilai3588
    @vairachilai3588 4 года назад +1

    In Logistic regression, how to check the linear relationship between the logit of outcome and each predictor values

    • @bkrai
      @bkrai  4 года назад

      That's not needed.

    • @vairachilai3588
      @vairachilai3588 4 года назад +1

      @@bkrai Linear relationship between the logit of outcome and each predictor values.
      If this condition is not met, logistic regression is invalid
      log⁡〖𝑝/(1−𝑝)〗=𝑏0+𝑏1 ∗𝑋
      I read in almost many article. If possible can you explain for this case study

  • @narasimhapuvalla3211
    @narasimhapuvalla3211 6 лет назад +1

    1.) Suppose we have categorical fields in our data. Is it mandatory to always change to numeric factors ?
    2.) If the answer for question 1 is correct, then what if we have too many unique values in each category columns?
    Let us take for example : I have a dataset of 100,000 records. There are a few columns with categorical data in it. Each of these categorical columns may have 1000 or more unique values. So if I convert them into factors, then "labels = c(1:1000 or more)".
    Is this ok to do it this way?
    3.) Is there a way to not convert categorical data into numeric values and still use them in the machine learning model?
    4.) How do we deal with Date fields?
    5.) The conversion of categorical variables into dummy variables --> should we do this in all cases or is this something we need to consider only if the unique values in the categorical fields are limited to a lesser number?

    • @dhavalpatel1843
      @dhavalpatel1843 4 года назад +1

      1. No , it is not mandotary to change. You can set family parameter as “binomial”.
      2.Answered in no.1
      3.Answered in no.1
      4.Convert it into factor variables
      5.Try to consider it in all cases.

  • @divyasree3261
    @divyasree3261 4 года назад +1

    Sir i need robust regression using r..can u please post the next video for robust regression

    • @bkrai
      @bkrai  4 года назад

      Thanks for the suggestion, I've added it to my list.

  • @ramp2011
    @ramp2011 7 лет назад +1

    Great video. rank is a factor variable and looks like logistic regression has auto converted that in to dummy variables internally (from the summary model). Is there a way to find which algorithms auto converts categorical variable to dummy variables automatically and the ones one has to convert manually? Thank you for your help

    • @bkrai
      @bkrai  7 лет назад

      Many algorithms do not need conversion of categorical variables to dummy variables. However, when using regression-based methods, R does so automatically.

  • @poornalya9605
    @poornalya9605 3 года назад +1

    Sir in my data the rank value is not displayed..wht is the reason!!

    • @bkrai
      @bkrai  3 года назад

      Rank should have values 1. W. 3 and 4.

  • @adtx11
    @adtx11 4 года назад +1

    One quick question: model 1 : a is the output variable, b and c are covariates, and both have significant p value. Model 2 : same output variable, b and d are covariates, and both have significant p values after we run the summary command. Finally Model 3: same output variable and all three b, c, d are covariates. Here if we see that only b and c are significant, but d doesn't have a significant p value , - then how do you interpret the result ? Can we say that adding covariate d doesn't add value to the model , even though it was significant in the previous bivariate scenario? Thank you.

    • @bkrai
      @bkrai  4 года назад +2

      Check relationship between c and d, that may help clarify.

  • @InfinitesimallyInfinite
    @InfinitesimallyInfinite 5 лет назад +1

    Brilliant video professor. I have question... so it is always that you convert categorical integer variables into factor variables before performing logistic regression? At the other places, like the algorithm XGB, I haven't seen you convert 'Admit' variable into a factor variable, why is it so? Thanks.

    • @bkrai
      @bkrai  5 лет назад +1

      Different methods require data to be prepared in certain way. For example, XGB and neural networks require response to have numeric format.

    • @InfinitesimallyInfinite
      @InfinitesimallyInfinite 5 лет назад +1

      Thanks professor for the quick response. Really appreciate. 😀

    • @bkrai
      @bkrai  5 лет назад

      Thanks!

  • @amritthapa9315
    @amritthapa9315 5 лет назад +1

    Could you please explain the importance of "xtabs" command in logistic regression? You said we should not get zero. Could you explain more on this.

    • @bkrai
      @bkrai  5 лет назад +1

      Key idea is to have sufficient number of samples in each cell. If there are too few or zero samples, then the prediction model may not be stable or consistent.

    • @JidduVillarin
      @JidduVillarin 5 лет назад

      @@bkrai Thank you for this video. It is very concise and understandable. I'd like expand on this question slightly. If you did have a zero value in the xtab, what would have been the appropriate course of action?

  • @kavyayd3577
    @kavyayd3577 5 лет назад +2

    Great explaination, sir, can you upload a video of logistic regression with more than 10 varaibles. it would be great help.

    • @bkrai
      @bkrai  5 лет назад

      The process will work same with any number of variables.

  • @s9438679525
    @s9438679525 5 лет назад +1

    Hi sir,
    Please explain the use of type='response' in line number 23
    Thanks

    • @bkrai
      @bkrai  4 года назад

      The type="response" option tells R to output probabilities of the form P(Y = 1|X), as opposed to other information such as the logit.

  • @sayamnandy5855
    @sayamnandy5855 6 лет назад +1

    I have a question sir..Should i check multicollinearity (Vif) while performing the logistic regression? If any of the variable's vif value is greater than 2 then i will remove this variable from my model. Can i do that?

    • @bkrai
      @bkrai  4 года назад

      Yes, you should be able to do it.

  • @YatiChoudhary
    @YatiChoudhary 4 года назад +1

    Sir, do we change intergers into factors if the variables are categorical even in Multinomial Logistic Regression or it is done only in Logistic Regression?

    • @bkrai
      @bkrai  4 года назад

      For Multinomial Logistic Regression you can refer to this:
      ruclips.net/video/S2rZp4L_nXo/видео.html

    • @YatiChoudhary
      @YatiChoudhary 4 года назад +1

      @@bkrai Sir actually I came to this video after watching your video on Multinomial Logistic Regression. But now I am confused if we should always change all categorical variables into factors or it just happens in logistic regression. Because in Multinomial Regression you changed only response variable into a factor.

    • @bkrai
      @bkrai  4 года назад +1

      For response variable I would say yes. But for others you can go case by case.

    • @YatiChoudhary
      @YatiChoudhary 4 года назад +1

      @@bkrai Thank you Sir

    • @bkrai
      @bkrai  4 года назад

      You are welcome!

  • @dipanjanroy589
    @dipanjanroy589 4 года назад +1

    sir can you please provide the code for testing accuracy of this example. I'm a new learner & i find it pretty interesting & simple by the way you teach.

    • @bkrai
      @bkrai  4 года назад

      It's in the description.

  • @shyamchaurasiya1069
    @shyamchaurasiya1069 5 лет назад +2

    Love You Sir Very Useful videos

    • @bkrai
      @bkrai  5 лет назад

      Thanks for comments! For recent Python video, see this link:
      ruclips.net/video/mKb5hRJmtCU/видео.html

  • @sagaranvekar856
    @sagaranvekar856 5 лет назад +2

    Great Explanation.Thank you Sir!

    • @bkrai
      @bkrai  5 лет назад

      Thanks for comments!

  • @ZoeyLu
    @ZoeyLu 3 года назад +1

    How is it possible to train and predict using the same "train" dataset? Doesn't it defeat the purpose of training the model using the "train" dataset and then testing the model using the "test" model?

    • @bkrai
      @bkrai  3 года назад

      Comparing it with test results helps to assess if there is over-fitting or not.

  • @hanadmohamud1881
    @hanadmohamud1881 3 года назад +2

    Thank sir

    • @bkrai
      @bkrai  3 года назад

      You are welcome!

  • @marcorinaldo4139
    @marcorinaldo4139 6 лет назад +1

    Hi Bharatendra, could you please share the R code and data? Thanks a lot!!

    • @bkrai
      @bkrai  6 лет назад

      They are available in the description area below the video. Here are the links:
      Data: goo.gl/VEBvwa
      R File: goo.gl/PdRktk

    • @marcorinaldo4139
      @marcorinaldo4139 6 лет назад

      Thanks a lot! i have overlooked them! thanks

  • @letsbuy-pm7vc
    @letsbuy-pm7vc 5 лет назад +2

    rank 2 was not significant why didn't you have created dummy variables for rank and remove rank 2 from your model? Please answer

    • @bkrai
      @bkrai  5 лет назад +1

      Rank is only one variable. It can only be in or out as a whole.

  • @dhanielaritonang3273
    @dhanielaritonang3273 4 года назад +1

    Sir I have a question, how if we have three levels of categorical response variable.. what 'family' should I use ?

    • @bkrai
      @bkrai  4 года назад

      For 3 or more, use multinomial logistic regression:
      ruclips.net/p/PL34t5iLfZddvv-L5iFFpd_P1jy_7ElWMG

  • @SachinSingh-wr5yv
    @SachinSingh-wr5yv 6 лет назад +2

    sir, please make a video on K- Fold cross validation.

    • @bkrai
      @bkrai  6 лет назад

      Thanks for the suggestion, I've added it to my list.

  • @hridayborah9750
    @hridayborah9750 4 года назад +1

    let me know where can i get this dataset for practice

    • @bkrai
      @bkrai  4 года назад +1

      See the link in the description area below this video.

    • @hridayborah9750
      @hridayborah9750 4 года назад +1

      @@bkrai Oh yeah! Thank you so much Mr Rai

    • @bkrai
      @bkrai  4 года назад

      Welcome!

  • @ClickyKitsune
    @ClickyKitsune 6 лет назад +1

    didnt get the set. seed function?

    • @bkrai
      @bkrai  6 лет назад +1

      When we split data into training and testing, it is done randomly. When
      you split data again, your training and testing data will have different
      data points due to randomness. To have the same training and testing
      data, we use set.seed() function.

  • @merumomo
    @merumomo 6 лет назад +1

    Great video! What do we do if we do have "0"(zero) in factor variables?

    • @bkrai
      @bkrai  6 лет назад +1

      Do you mean missing values?

    • @merumomo
      @merumomo 6 лет назад

      Bharatendra Rai yes, I meant missing values. We fill in missing values with mean/median in numeric variables but I guess we need to remove missing values if it is in categorical variables?

    • @bkrai
      @bkrai  6 лет назад +1

      For categorical variables you can go with category with highest frequency.

    • @merumomo
      @merumomo 6 лет назад +1

      Bharatendra Rai thank you!!

  • @JainmiahSk
    @JainmiahSk 6 лет назад +1

    Sir if i want to learn R completely like you from Where should i learn. Please suggest me.

    • @bkrai
      @bkrai  4 года назад

      I saw this today. You can start with this:
      ruclips.net/p/PL34t5iLfZddv8tJkZboegN6tmyh2-zr_T

  • @DnyaneshwarPanchaldsp
    @DnyaneshwarPanchaldsp 2 года назад +1

    sir but logistic regression curve not show , how to show it ............

    • @bkrai
      @bkrai  2 года назад +1

      Refer to this:
      10 - ROC curve with AUC, Sensitivity & Specificity | Multinomial Logistic Regression in R
      ruclips.net/video/ftjNuPkPQB4/видео.html

    • @DnyaneshwarPanchaldsp
      @DnyaneshwarPanchaldsp 2 года назад +1

      @@bkrai thank you sir

    • @bkrai
      @bkrai  2 года назад

      You are welcome!

  • @ayush612
    @ayush612 6 лет назад +1

    This is an awesome video Sir...thanks for uploading this!!

    • @bkrai
      @bkrai  6 лет назад

      Thanks for comments!

  • @harishnagpal21
    @harishnagpal21 6 лет назад +1

    what is the use of set.seed? Thanks

    • @bkrai
      @bkrai  6 лет назад

      It helps with repeatability. When you split data with same seed, train and test will include same samples.

    • @harishnagpal21
      @harishnagpal21 6 лет назад

      thanks a lot for quick response :)

  • @arunshowri7829
    @arunshowri7829 6 лет назад +1

    Hi Sir,
    I have a question. how to predict the target variable if we have many independent variables( eg: around 60). what we have to do if most of the values in independent variable are NA's. Please suggest me Sir.

    • @bkrai
      @bkrai  6 лет назад

      60 independent variables should be fine. But before applying the method, you need to take care of missing values to prepare your data ready for analysis.

    • @sayamnandy5855
      @sayamnandy5855 6 лет назад

      Apart from sir's suggestion..you can go for information value concept if you have plenty independent variable.

  • @abdulazeez9863
    @abdulazeez9863 6 лет назад +2

    Excellent explanation... please make a video of Boosted Regression Tree model with R. Thank you sir.

    • @bkrai
      @bkrai  6 лет назад

      Thanks for comments and suggestion! I've added it to my list.

  • @kumarvarma942
    @kumarvarma942 6 лет назад

    HI Sir Great video's and easy to learn topics. I have small doubt don't mind. before dividing data into train and test . we need to do null values removable , finding outliers, scaling, EDA, then sampling .... could you please please share if any video on linear regression or logistic with combination of these steps. because we need to check all above conditions to predict best output . I am bit confusion on finding outliers(or remove outliers) and null values removable and scaling (min max or z-score) . Please please share any video it will helpful to us. Thanks in advance .

  • @surbhiagrawal3951
    @surbhiagrawal3951 4 года назад +1

    why is the candidate 4 has probability of .129 and has a classication of being admitted 1?

    • @bkrai
      @bkrai  4 года назад

      That's a incorrect classification. This candidate was in reality accepted but the model predicts that the candidate should not be accepted.

  • @OrcaChess
    @OrcaChess 6 лет назад +1

    Thank you very much!
    Your videos create high value.
    Kind regards from Karlsruhe
    Jonathan

    • @bkrai
      @bkrai  6 лет назад

      Thanks for your comments!

  • @nidhijoshi1532
    @nidhijoshi1532 4 года назад +1

    Sir, I have data set on food security and I want to apply logistic regression model. Sir, but I am not getting how to apply the model.

    • @bkrai
      @bkrai  4 года назад

      Make sure you have a categorical response variable just as I have 'admit' variable in this video.

    • @shanicemohanlal1605
      @shanicemohanlal1605 4 года назад

      @@bkrai Hi Dr, Please can you kindly explain how to do this when you have a categorical response variable in my case is a presence/absence and the other variables contain categorical variables as well which I have changed to read as factor variables however, when the logistic regression model runs I get the warning message glm.fit: fitted probabilities numerically 0 or 1 occurred.

  • @me3jab1
    @me3jab1 5 лет назад +1

    Hello , Before u remove gre residual deviance was 369.99 when u rerun the model without gre it became 371.81 I mean increased , PLease in this case we should not keep gre even its not significant ? or the value change is negligible

    • @bkrai
      @bkrai  5 лет назад +1

      That change is negligible. When a variable is not statistically significant, we should remove it.

    • @me3jab1
      @me3jab1 5 лет назад +1

      @@bkrai thank you Boss

    • @bkrai
      @bkrai  5 лет назад

      welcome!

    • @me3jab1
      @me3jab1 5 лет назад +1

      @@bkrai if we have only one Y result ( not 2 as this example ) which Family type we must choose ?

    • @bkrai
      @bkrai  5 лет назад +1

      If Y has only one value then that doesn't need a classification model.

  • @ketanverma7839
    @ketanverma7839 3 года назад +1

    I tried doing this on my model , splitted my dataset into training and testing data but after that m running into a problem with "OBJECT IS NOT A MATRIX" while m building the model.

    • @bkrai
      @bkrai  3 года назад

      Make sure data has data frame format.

  • @zelennefontainerivero5091
    @zelennefontainerivero5091 4 года назад +1

    Thanks a lot for the video, it helped me a lot. Would be great if you could plot these results from your video or at least write me please how would you do it.

    • @bkrai
      @bkrai  4 года назад

      Let me know what exactly you are looking to plot. Here results are simply summarized in the form of a confusion matrix.

  • @lutfyabdulah1321
    @lutfyabdulah1321 4 года назад +1

    Thanks for your share. It is very helpfull

    • @bkrai
      @bkrai  4 года назад

      You are welcome!

  • @ravindarmadishetty736
    @ravindarmadishetty736 6 лет назад +1

    Sir could you please guide me on threshold tuning

    • @bkrai
      @bkrai  6 лет назад +1

      Default value to use is 0.5. To get and idea what level will work well for a particular data set, a histogram of p-values can help. But it is mostly trial and error process.

    • @ravindarmadishetty736
      @ravindarmadishetty736 6 лет назад +1

      Thank you sir