Bootstrap Hypothesis Testing in R with Example | R Video Tutorial 4.4 | MarinStatsLecutres

Поделиться
HTML-код
  • Опубликовано: 5 янв 2025

Комментарии • 67

  • @marinstatlectures
    @marinstatlectures  6 лет назад +11

    In this R video tutorial, we learn to use R to perform a hypothesis test using a bootstrap approach. An R package does exist for bootstrap hypothesis testing (“boot”), but the package is limited. Here we will show how to build the bootstrap approach; this will allow us to make changes to the sorts of statistics/estimates we want to conduct the test for. The R script accompanying this video has all the R codes used in this tutorial plus extra R codes for students to explore on their own ( statslectures.com/r-scripts-datasets ). If you like to support us, you can Donate (bit.ly/2CWxnP2), Share our Videos, Leave us a Comment and Give us a Like ! Either way We Thank You!

    • @AndreaLyuu
      @AndreaLyuu 3 года назад

      Hello Mike! Thanks for all the great content!
      Please help me with this question:
      If our hypothesis test were to be a one sided (H0:mean-c>=mean-m, H1: mean-c

  • @yoyme_blumenfeld
    @yoyme_blumenfeld Год назад

    Lovely video, thank you! everything clear and I love that you showed us what is behind those functions, so one can experiment with other libraries having good benchmark.

  • @auroralorenzi2957
    @auroralorenzi2957 5 лет назад +3

    Hi Marin. Thank you for your videos! It's not an easy topic to teach, but your videos are very clear.

  • @niceday2015
    @niceday2015 2 года назад

    Thank you very much, Your teaching has expanded my perception of the world

  • @kaibecker1411
    @kaibecker1411 3 года назад +3

    10:23
    Minor mistake: 48.0 and 68.2 (#16) are larger than your test statistic

  • @carlosalexandrecosta4234
    @carlosalexandrecosta4234 4 года назад +2

    Just an amazing content. I'm studying Data Science and i can tell you, your videos are helping a lot! Thank you!

  • @kahuilim2122
    @kahuilim2122 4 года назад +2

    Thanks for taking the time to put these amazing videos together.

  • @cb4808
    @cb4808 3 года назад +2

    5:00 what do these numbers in the brackets mean? set.seed(112358)

    • @zjardynliera-hood5609
      @zjardynliera-hood5609 3 года назад +1

      its just an arbitrary number to set the seed to that rng state. Basically if I set that seed on my machine, I would get the same results he is getting, and if I reran the code, I would get the same results each time.

  • @JJ-wx3nd
    @JJ-wx3nd 3 года назад

    Thank you so much! Especially after I found the boot() in R documentation is not enough to know.

  • @TanMan1
    @TanMan1 4 года назад +1

    at 5:35 in the video - if you want to bootstrap for multiple variables, how would you adjust the code?

  • @feloria1862
    @feloria1862 3 года назад

    When you resample at 6:20 does the resample keep the feed type variables separate? Or do weight values from either feed type resample to anywhere in the 23 observations?

    • @feloria1862
      @feloria1862 3 года назад

      I figured it out the feeds are resampled to any position because the null hypothesis is that the two populations are the same, so swapping them to any position even if the feed type doesnt match is fine.

  • @yinanxue8653
    @yinanxue8653 6 лет назад +6

    Thank you for the video. I got a question. How do you know that in the bootstrap data, the first 12 rows are weights of casein and the last 13 rows are weights of meatmeal?

    • @yinanxue8653
      @yinanxue8653 6 лет назад

      Nvm. I figured it out. Is it because H0 is difference between mean equals 0, so if H0 is true, we can pool the data?

    • @yekhtiari
      @yekhtiari 6 лет назад

      I have the same question.

    • @marinstatlectures
      @marinstatlectures  6 лет назад +1

      for bootstrapping, we resample with replacement...so really, it just matters that we have 12 observation for the one group, and 13 for the other. we could just as easily label the first 13 rows for meat meal and then the next 12 for casein. all that is really necessary is that we assign 12 of them to casein and 13 to meatball...but the order we assign them in doesnt matter...hope that clarifies it...

    • @gillianjean2237
      @gillianjean2237 5 лет назад +1

      @@marinstatlectures Hi Mike, brilliant video but I'm still confused about this part. I have 45016 observations, 22049 in group A and 22967 in group B. How do I make sure that I'm sampling correctly from both groups?

    • @rhosigma2199
      @rhosigma2199 5 лет назад +1

      @@marinstatlectures Thank you for the video.
      I also have the same question.
      BootstrapSamples = matrix(sample(variable, size =n*B,replace =TRUE ,nrow =n,ncol=B)
      Does this function assure that the BootstrapSamples have 13rows of the meat meal and 12rows of the casein?

  • @nevertheless4504
    @nevertheless4504 Год назад

    HI sir. you video really helps me a lot. However, I just have question. wen we want to do the one side test. We just need to delete the abs right, and when do the comparing, mean(Boot.test.stat1 >=test.stat1). we only use < or. >. instead of >=. Am I right

  • @MrSoumyaBanerjee
    @MrSoumyaBanerjee 4 года назад +2

    Thank you for the video, but by resampling from d$weight, won't it cause the casein and meatmeal observations to get jumbled up in most cases? Wouldn't it be better to create 2 separate dataframes for each set of weight values, and resample separately from these 2?

    • @marinstatlectures
      @marinstatlectures  4 года назад +3

      Good question. Because in a hypothesis test we begin by assuming there is no difference in weight of the two groups, we want them all mixed together..to see how often we’d observe a difference as large or larger than we saw, if there really is no difference in the groups.
      When building a confidence interval however, we want to keep the groups observation ms separate, to preserve any group differences observed.
      Hope that clarifies it

    • @SinanMavruk
      @SinanMavruk 2 года назад

      @@marinstatlectures Thank you for the video and explanations. You made the point very clear for me. But in this case, the only difference between using permutation or bootstrapping is in the replacement in resampling? So, how to decide which test is better in application?

  • @alinevm4915
    @alinevm4915 2 года назад

    such a didactical video, thanks a lot!

  • @sohanaryal
    @sohanaryal 2 года назад

    How can we randomly assign first 11 to one type of feed and remaining to other type of feed in bootstrap matrix?

  • @farahyounes2813
    @farahyounes2813 3 года назад

    thank you for your explanation, for the case of a time series how we can apply the method of bootstrap to compare two spectral densities

  • @Rainstorm121
    @Rainstorm121 3 года назад

    Thanks very much Sir. If I have random distribution of scores for each variables as follows: A=7, B=13, C=23, D=19, E=15, F=30. If I want to do hypothesis testing to find out which of the variables has statistical significance of score, what is the best advise in using bootstrapping in this situation? Given that Ho: expected probability for each of the variables is equal to 0.12, and Hi: is not equal to 0.12.

  • @munafahmed725
    @munafahmed725 4 года назад +1

    In mean.default(BootstrapSamples[1:12, i]) :
    argument is not numeric or logical: returning NA
    How to solve this error?

  • @b.ambrozio
    @b.ambrozio 4 года назад

    Thanks for the video!
    Question: About your last statement: "Any time doing a hypothesis test, we should also include a confidence interval to give an ideal of how big the difference would be". Does it mean I should run my t-test against the CI instead? (e.g. calculate the CI of all my 1000 arrays, and do the t-test agains the means from the CI means? In other words, should I use the means calculated from the range between the first and last quantile in my t-test? )

  • @forestalgarcia1506
    @forestalgarcia1506 4 года назад +1

    If I have more than two "diets", how to calculate the absolute difference of means?

  • @Loggies89
    @Loggies89 3 года назад

    You lost me at the i=1 and i=2 bit. Is there a step you aren't showing where these are created? I'm getting an error saying object 'i' not found, so i assume i have to create it at some point before entering it into the boot test statistic.

  • @xdienn
    @xdienn 3 года назад

    Great video, thank you so much! I'm still wondering though, how to proceed if you have a 2x2 factorial design? Do you then calculate 4 test statistics, one for each group? And for interaction effects?

  • @gruppenzwangimweb20
    @gruppenzwangimweb20 5 лет назад +1

    nice video! btw... this BootstrapSamples

  • @AndreaLyuu
    @AndreaLyuu 3 года назад +1

    Hello Mike! Thanks for all the great content!
    Please help me with this question:
    If our hypothesis test were to be a one sided (H0:mean-c>=mean-m, H1: mean-c

  • @duncanrager7180
    @duncanrager7180 5 лет назад

    Hi Mike, thanks for the helpful video. In this case, the first test statistic is the same as used in a two-sided, two-sample T-test. As an alternative, to use the same test statistic as a one-sided, two-sample T-test, would that be the difference of the mean weight for the two diets (not the absolute difference)?

  • @gbganalyst
    @gbganalyst 6 лет назад +1

    Can we then interpret the results of Bootstrapping with the way we interpret the result of independent t-test?

    • @marinstatlectures
      @marinstatlectures  6 лет назад

      For the most part you can. The beauty of the bootstrap though is that you can also work with more interesting/relevant statistics, aside from just mean/median, which the classical approaches use. You can work with just about any statistic/estimate you can imagine

  • @alinepontes7360
    @alinepontes7360 4 года назад

    Thanks for the video! It was the only way for doing a hypo test for a complex dataset. The boot package was not enough. BTW, is there a recommended citation for this boot method (e.g. a book)?

  • @parvenraj98
    @parvenraj98 4 года назад

    You are the best !!!!!

  • @mathieufen2239
    @mathieufen2239 4 года назад

    Very cool video! Thanks!
    I wonder if this approach could be used on paired data...

    • @marinstatlectures
      @marinstatlectures  4 года назад

      Definitely, the concept of bootstrapping can be used for just about any structure of data. I explained it simply here, but the concept transfers very widely

  • @MsWilliam63
    @MsWilliam63 5 лет назад

    Hi Mike, great videos. Really clear and helpful. I have two questions. What is the best way to report these results in text? Is it best just to report the bootstrapped difference in means and SD and p value (e.g., observered stat = X, mean ± SD, p-value=X)?
    Is it possible to combine a t-test or Mann-Whitney U with the bootstrapped data in order to get a t-stat as well as the p-value for your difference in means/medians?

    • @marinstatlectures
      @marinstatlectures  5 лет назад

      it's hard to say the best way to report, as that really depends on context...what is the discipline, what was the focus of the paper, etc.
      regarding approaches, you can certainly combine this sort of approach with a parametric one like the t-test. for example, you may wish to use a Bootstrap only to estimate the SE of your estimate, and then substitute this estimate into a standard approach like a t-test
      ex: Confidence interval: Estimate +/- t * BootstrapSE
      this of course requires the assumptions of the standard t-test/confidence interval approach to be met.

  • @mbellett74
    @mbellett74 4 года назад

    if the column "feed" is not ordered with( respect to meat meal and casein), how to order it before to run the boot.test.stat? many thanks

    • @marinstatlectures
      @marinstatlectures  4 года назад +1

      You don’t necessarily need to order it, but you can do that with the sort() command. You can also use the tidyverse arrange() command as well

    • @mbellett74
      @mbellett74 4 года назад

      @@marinstatlectures many thanks Mike, I did it using arrange(). Maybe I have to do it because in the boot.test command I have to define two groups of lines to confront: abs(median(bootstrapsamples1[1:938, i]) - median(bootstrapsamples1[939:1511, i] , ...and in my data-set the two groups of events to confront are mixed,
      again many thanks!!

    • @santiagomendozapaz2135
      @santiagomendozapaz2135 4 года назад

      @@marinstatlectures maybe I am misunderstanding the figure, but, if our matrix from which we are going to resample contains 12 values for type 1 and 11 values for type 2 and we apply the resampling directly to the 23 values, the resulting resampled matrix is going to contain randomly 23 values from both types, therefore, why are you obtaining the mean between [1:12] and [13:23] as in the resulting matrix we are not sure if type 1 is contained in [1:12] or type 2 in [13:23]?

  • @veducatube5701
    @veducatube5701 4 года назад

    Sir i jeed to bootstrap spatial point data... Meaning I have 10 values with lat long and a z . I need to bootstrap pairs of xy in a defined region (shapefile) can u help????
    regards from India

    • @marinstatlectures
      @marinstatlectures  4 года назад

      It difficult to answer without knowing exactly what your data looks like, but it sounds like you will want to res ample entire rows of your data

    • @veducatube5701
      @veducatube5701 4 года назад

      @@marinstatlectures Thank you for replying sir.
      Im giving you a dummy data :
      lat long water table ( depth in m)
      29 79 23
      28.45 78.30 21
      27 77.45 25
      30.30 79.02 26
      31 77 22
      25.45 80.30 32
      Assume that all these original points of latitudes and longitudes with water table values fall in a district (boundary line of this district is a map file format called .shp or ESRI shapefile ).
      Sir, I want to bootstrap these three columns so that I may have more geographic points for water table in my district. That is possible only when latitudes and longitudes must not fall outside the district boundary or shapefile, meaning the lat long column values must remain contend within shapefile latitudes and longitudes.
      Sir its very crucial for me.
      Please guide or share some codes with me..
      THank YOU

  • @chathuraedirisuriya6535
    @chathuraedirisuriya6535 6 лет назад +2

    Bootstrap Hypothesis Testing R script link direct to a wrong file. Please correct it.

    • @marinstatlectures
      @marinstatlectures  6 лет назад

      thanks for letting us know, It should point to the correct file now.

  • @sunayana98
    @sunayana98 5 лет назад

    Hi Marin, when I'm trying to find the test stats of bootstrap samples, R is telling me 'i' is not found. What do I do?

    • @marinstatlectures
      @marinstatlectures  5 лет назад

      it's difficult to tell without knowing the code you've entered, etc. but it sounds like this part of the code is not in a loop that is running from i=1,2,...,B
      the "i" is referencing the iteration number in the loop...and R cannot see what i is, so it sounds like you either having initiated a loop, or that command is outside of the loop

    • @sunayana98
      @sunayana98 5 лет назад

      @@marinstatlectures I've typed the command exactly like how you've typed it i.e., in the square brackets. However, it says 'i is not found'. Is there an alternate command?

    • @pilobond
      @pilobond 5 лет назад

      @@sunayana98 I had the same problem but then I realized I type "for (i in i:B)" instead of "for (i in 1:B))" by mistake. Once this was corrected it ran fine. I wonder if you have the same problem.

  • @damianspencer
    @damianspencer 4 года назад

    Do you do consultations? Please contact me.

    • @marinstatlectures
      @marinstatlectures  4 года назад

      It depends on the work. I have no way of contacting you. You can get in touch with me if you like, my contact info is in the about section of our channel