4 Reasons Non-Parametric Bootstrapped Regression (via tidymodels) is Better then Ordinary Regression

Поделиться
HTML-код
  • Опубликовано: 21 авг 2024
  • If the assumptions of parametric models can be satisfied, parametric models are the way to go. However, there are often many assumptions and to satisfy them all is rarely possible. Data transformation or using non-parametric methods are two solutions for that. In this post we’ll learn the Non-Parametric Bootstrapped Regression as an alternative for the Ordinary Linear Regression in case when assumptions are violated.
    If you only want the code (or want to support me), consider join the channel (join button below any of the videos), because I provide the code upon members requests.
    Enjoy! 🥳
    Welcome to my VLOG! My name is Yury Zablotski & I love to use R for Data Science = "yuzaR Data Science" ;)
    This channel is dedicated to data analytics, data science, statistics, machine learning and computational science! Join me as I dive into the world of data analysis, programming & coding. Whether you're interested in business analytics, data mining, data visualization, or pursuing an online degree in data analytics, I've got you covered. If you are curious about Google Data Studio, data centers & certified data analyst & data scientist programs, you'll find the necessary knowledge right here. You'll greatly increase your odds to get online master's in data science & data analytics degrees. Boost your knowledge & skills in data science and analytics with my engaging content. Subscribe to stay up-to-date with the latest & most useful data science programming tools. Let's embark on this data-driven journey together!

Комментарии • 51

  • @utubeleo5037
    @utubeleo5037 Год назад +3

    This was a great watch. it was really well put together, with a good mix of visuals, code and narrative. Thank you for putting it together and sharing

  • @jeffbenshetler
    @jeffbenshetler 2 месяца назад

    Excellent demonstration in R.

  • @CCL-ew7pl
    @CCL-ew7pl Год назад

    Great video, thanks Yury ( Munchausen cartoon was an unexpected special treat :))

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  Год назад +1

      😂 I wasn't sure anyone would recognise Baron Münchausen 😁 Glad you enjoyed it!

  • @zane.walker
    @zane.walker Год назад +1

    I recently discovered bootstapped prediction intervals working with mixed-effects models and was quite impressed (thank goodness for modern computing power!). You present a persuasive argument to always using bootstrapped regression when any of the linear regression assumptions are violated. Are there any situations where you would use alternative methods, such as log transforms of the data, or weighted regression, to deal with issues such as heteroscedasticity rather than bootstrapping?

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  Год назад +2

      Surely there are different methods to solve problems! Many roads lead to Rom ;). Bootstrapping is one of them, if you have lot's of data (> 10.000, the more the better) and where you can't fix the assumptions, does't matter what you do. Besides, it's personal preference. Not normality from Shapiro Test is always there when you have lot's of data, even, if residuals look perfectly normal. I personally don't like log-transform data if data itself is interpretable, like weight of animals. I would never use log-weicht. But I would use log-virus-load, because the spread is huge and log shows the trend, while you would not see anything without log. Another think is - I'd rather trust averaged model from the distribution of coefs then a single coefficient from a normal "lm". I would not use bootstrap on small datasets. Finally, it's a question of context and how can you get the closest to the truth out-there.

  • @SergioUribe
    @SergioUribe Год назад

    Thanks for share! I will start to use this model

  • @ambhat3953
    @ambhat3953 Год назад

    Thanks for this...i think now i have a direction to solve the data set at work which is not normally distributed

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  Год назад

      You are welcome 🙏 if not normal distribution is your only problem, look into non parametric statistical tests, like Mann Whitney or Kruskal Wallis

    • @ambhat3953
      @ambhat3953 Год назад

      @@yuzaR-Data-Science Will do, thanks!

  • @oousmane
    @oousmane Год назад

    Always excellent ❤️

  • @eyadha1
    @eyadha1 Год назад

    great video. Thank you

  • @chacmool2581
    @chacmool2581 Год назад

    What does this resemble? Random Forests, RF. Except that RF bootstraps/samples observations as well as bootstraping predictors. Am I seeing this correctly?
    Of course, one loses interpretability with RF.
    Great stuff as always!

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  Год назад

      Sure you loose interpretability with RF. No coefficients. And that’s exactly what normal models do. But they have assumptions. So, we bootstrap/resample the data and fit 1000 models, which relaxes most assumptions, especially distributional ones

  • @heshamkrr669
    @heshamkrr669 Год назад

    WORKING thx bro

  • @EdoardoMarcora
    @EdoardoMarcora 5 месяцев назад

    I don't understand how bootstrapping dispenses you from the distributional assumptions of the linear model (normality of residuals etc). What bootstrapping is doing is generating the sampling distribution free of its usual asymptotic assumptions, but the assumptions of the likelihood distribution are still there, right?

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  5 месяцев назад +2

      certainly it can, I am 100% sure, but please, don't believe some random youtube video, there is a lot of trash out-there (most likely some of my videos are partly incorrect too), thus, please, check it online or in stats book yourself. For example, here is a reference from a stat book which might explain more, but even the first half page will do it, I think:
      www.sagepub.com/sites/default/files/upm-binaries/21122_Chapter_21.pdf

  • @joaoalexissantibanezarment4766
    @joaoalexissantibanezarment4766 2 месяца назад

    This is an excellent video!! I was thinking, a nonparametric alternative for linear regression could be LOESS regression and boostrapp could be done without problem but, because LOESS is a nonparametric, instead of medians the means could be used properly or also in this case the medians should be used?

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  2 месяца назад

      While resampling allows for a better use of means, I am a big fan of medians, because if the distribution of anything after bootstrapping does not get normal, like in the case of p.values, I would trust the median, but not the mean. So, I would use median as much as I can.

    • @joaoalexissantibanezarment4766
      @joaoalexissantibanezarment4766 2 месяца назад

      @@yuzaR-Data-Science Ok, I really thank you for answer!

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  2 месяца назад +1

      you are very welcome!

    • @joaoalexissantibanezarment4766
      @joaoalexissantibanezarment4766 2 месяца назад

      @@yuzaR-Data-Science I had another question. Althoguh bootstrapping is not exactly an option to handle outliers, could be the case that the more resamples used, the more robust is the model to outliers?

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  2 месяца назад +1

      Yes, because then you would resample the most frequent cases more often, so their distribution would be higher, and the outliers ... hmm, we would not get rid of them, but they will be resampled very rarely. hope that helps. cheers

  • @festusattah8612
    @festusattah8612 5 месяцев назад

    Thanks for this insightful video. However, I have one question. If I want to use this approach in a research paper, do you know of some papers I can cite to back up my choice of this model.

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  5 месяцев назад +1

      In my opinion you only need some reasons to do that. For example many assumptions are not met. I am sure there are papers, but don’t have any from the top of my head. But even if nobody cited, somebody should start. I certainly will, after I am done with my current paper on quantile regression.

  • @alelust7170
    @alelust7170 Год назад

    Nice, Tks!

  • @jonascruz6562
    @jonascruz6562 Год назад

    Great video! Anyway to conduct a Bootstrap regression but using the robust (Huber) regression instead of conventional linear model for data with many outliers?

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  Год назад +1

      sure there is a way, just exchange the "lm" with "lmrob" function from library(robustbase). I actually did a video on robust regression. However, I don't think it will be necessary, because first, the bootstrapping will smooth out the influence of outliers, but if you still have to many, may be they are not outliers, but the data has a weird distribution and you need some other type of model, like poisson or similar. Thanks for your feedback and thank you for watching!

    • @jonascruz6562
      @jonascruz6562 Год назад

      Thank you for the answrr. I work with environmental contaminants, so I have a Lot of outliers even after log-transform the data. I am testing some New models. I Just found the boot.pval package, which is a Low-code package for Bootstrap regression, including rlm. Bye the way, I love you Low-code vídeos. Greatings from Brazil

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  Год назад

      Hey, Jonas, thanks for the recommendation. I'll check out the boot.pval package, because I model everyday with real-world data and need robust options. Thanks also for the feedback and for watching!

  • @rolfjohansen5376
    @rolfjohansen5376 Год назад

    How do I calculate a simple Maximum likelihood for a simple non-parametric regressior: y_i = b_i + e_i (number of datapoints = number of parameters?) thanks

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  Год назад +1

      Sorry, can't really say that with certainty, because never needed that till today. But if you somehow figure this out, please, let me know! Thanks for watching!

  • @johnsonahiamadzor7404
    @johnsonahiamadzor7404 Год назад

    Great work. How do I get these codes for practice? I'm very new to R.

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  Год назад +1

      In the description of the video is a link to a blog post where you can get all the R code and the explanations. If you are very new to R, don't be discouraged if not everything is clear and working now. Bootstrapping is kind of advanced topic. Thanks for watching!

  • @Maxwaener
    @Maxwaener Год назад

    Can you use this approach if you have a numeric predictor (change in percent) for a categorical outcome (2-4 levels)?

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  Год назад +1

      hey, sorry for late replay, I was on holidays. Yes, you can! the model you apply is up to you, you just need to specify it in the "map" function. It will then be run over the bootstrapped data, so that you can use any model. In your case it would be multinomial, I guess. But if it is only one predictor, I would turn it upsidedows and use quasibinomial model of percentage as an outcome with categorical predictor. It's easier to interpret then a multinomial in my opinion. cheers

  • @gonzalodequesada1981
    @gonzalodequesada1981 Год назад

    Is it possible to do a bootstrap for a non-parametric multiple regression model?

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  Год назад +1

      that's a great question! :) the short answer is - yes, but it's not necessary, because the method I describe is non-parametric by itself. but my scientific curiosity says - let's do it! What kind of non-parametric regression do you mean? Write a function, like "lm()" or try it out please and post it here so everyone in the community can benefit. Thanks!

  • @desaiha
    @desaiha Год назад

    How do u apply this technique to temporal data which has trend and or seasonality.

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  Год назад

      "strata" argument might help in the bootstrap funktion. ask R this: ?bootstraps. Or google of people who might have done something similar in tidymodels. I still didn't

  • @ariancorrea2711
    @ariancorrea2711 Год назад

    Hi, how can i extract the r.squared for each model?

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  Год назад

      hey, from the "glance" fucntion
      library(broom) # for tidy(), glance() & augment() functions
      nested_models %
      mutate(models = map(data, ~ lm(wage ~ age, data = .)),
      coefs = map(models, tidy, conf.int = TRUE),
      quality = map(models, glance),
      preds = map(models, augment))
      I did a demo about it in a video on "many models"