👋🏼 Hello there! In this statistics lecture we learn the Bootstrap method (a brute force method) in statistics, along with why one may want to use such an approach. Bootstrap in statistics is a re-sampling based approach, useful for estimating the sampling distribution and standard error of an estimate. If Like to support us you can Donate (bit.ly/2CWxnP2), Share our Videos, Leave us a Comment and Give us a Like 👍🏼 ! Either way We Thank You! 🦄
Thank you Mike! I have a question (I probably got it wrong...). In the lesson, you used a small sample of 5 (5 observations), so is it true that there would only be 5 to the power of 5 = 3125 ways of different sampling? If so, how would it help to have B bigger than 3125?
So many youtube videos that try to explain bootstrapping and yet this guy explains it so well you don't even want have to try hard or anything to understand! He understands it well and explains it well!
Do you guys need this for.your brain to come somewhere, o my god professor needed to come some where in your own theori. It oke there are some gow need guidance line from a PROF. your the prof in your brain or not
ABSOLUTELY BRILLIANT!!! This concept of random and sequential measurement selection is so simple, but this is the only spot on the internet which manages to explain it well
What a coincidence! We just started with bootstrapping in class. I'm currently doing second year stats. I am in my second semester right now. I didn't quite get what my lecturer was trying to say in class, but you explained it so well here. Thank you so much for this video!
You guys have to understand bootstrap is a simple idea but there are not any formal proof that is works... so it is difficult for a professor to explain it, it is always a handwaving explanation on why it works. But you put the concept very clear I liked it.
Thanks a lot Sir...The intro you provided is the best i have ever seen.. it helped a lot!! Please make some videos regarding Bayesian methods using R whenever it is possible!!!!
Beautifully explained!! One point w.r.t Bootstrapping, via resampling we create child samples out of the first sample. But doesn't it introduce a dependency between the first and the subsequent samples as we will always get the same data values in child samples? Let's say if I have Blood pressure data of 500 patients and out of these records there are only 200 unique BP values then the child samples after resampling will always have values from these 200 unique values. So, can't we say that just like the parametric approach we should have an adequate amount of observations in the parent sample for bootstrapping?
Nice informative video. I have a couple questions though: 1. If we randomly sample 10,000 times out of a small sample space (of 5 in our case) isn't that going to tend towards a normal distribution since it's a large collection of random sampled values? 2. Isn't the point of Bootstrapping to estimate the population standard deviation when we don't have enough samples, and won't a t-test be better in that case? I know that a T-test works only for normally distributed data and bootstrapping I believe is especially effective when the distribution of the population is not normal, in which case we assume the distribution to be the same as the small number of samples. But doesn't this get skewed when we do random sampling 10,000 times and get a normal distribution through that? Thanks,
Very useful! Thank you! Are you going to realize a video regarding Cross-validation and Bootstrap methods (in R) used for validating the regression models?
Suppose you're running a logistic regression, and your response variable Y/N is imbalanced were the Y class makes up a minority of the overall dataset, say 12%. Should your bootstrap samples be stratified or is it okay to not do stratified bootstrap sampling a large number of times?
In the parametric we were taking samples of same size from population and getting distribution, but in case of bootstraping we are taking data from one sample(not population) then how are u comparing these two things?
thank you sir for this wonderfully explained video, please make another video on how to do sampling for generating the sample means of sampling distribution. I don't understand how do we do that practically. Thanks in Advance...
This is an excellent presentation. I am wondering what is the point of calculating bootstrapping standard error. We used SE to calculate 95%CI in a parametric approach. When we do bootstrap, we can directly obtain 95%CI from bootstrap data. If we create 10000 bootstrap samples and sort them from minimum to maximum. The 251st and the 9750th are the lower and upper bound of 95%CI. Correct me, if I am wrong.
I got a question for you : Is there any difference between bootstrapping and the central limit theorem? if yes, I just wanna know, exactly when to use bootstrapping in inferential statistics? Thanks for all the effort u r doing.
hi, we are about to release a video explaining the concept of Power, in the context of tests for a mean. we hope to record a complimentary video showing some of that stuff in R...but have many things recorded and in need of editing before we can get to that
All resamples data size should be equal but should it need to equal to the sample data size I mean if sample has 5 data Can we take 3 data randomly from the sample for each resampling?
Hi. I want to ask about SE of 1 specific percentile. I understand that SE is on average how far sample means are likely to be from the population mean. My question is what is that going to do with percentile? Why a percentile has its own SE?
Lots of practice ;) but it’s actually using something called a “light board”, where the image is recorded and then reversed like a mirror...so I’m not actually writing backwards. I get access to it at UBC Studios :)
@@marinstatlectures Wait, so how do you seem to be writing on the right or the left, and it displays on the same size of the board. That knots my brain way harder than bootstrapping
@@Katurha I was pretty distracted by this apparently amazing ability at first, too! If you somehow haven't figured it out by now...you can test it out with a smartphone's front-facing camera. Get a thin/cheap piece of paper and a thick black marker so your text will show through. Write "Test" on the piece of paper. Hold up the paper so you can read it, and turn on the front-facing camera on the phone held out in front of you. You'll see that you've able to read the text even though the camera is looking at it from the back side! Same principle, barely different application.
I don't understand if the sample size is the number of samples from the population or the number of elements in each sample. This gets particularly confusion later when he talks about resampling from a sample of 5 elements. What am I missing?
here, we are taking repeated samples 1,2,3,4,...,B to have B total samples. although it really doesn't matter how many you take, and the concepts is the exact same if you take R=B+1
Thank you very much for this video. It was easy to understand. I do have one quick question, how did you get the sample error 5.57? Thank you in advance if you can help me answer my question!!!
I am new to statistics and I have a doubt regarding the calculation of Standard error of bootstrapping example. How did the Standard error of those 3 resamples come out to to be 5.57. Here's what I did, Could you tell me where I am wrong: I calculated the standard deviation of the 3 examples (84,73,86) and it was 7. The Standard Error is hence, 7/√3 which is 4.04.
Hello Sir, in large sample theory if we increase number of observation it will eventually leads to normal distribution. While in bootstrap I don't think so. Consider marks for exams. If we use large sample, we will eventually end in normal distribution. While during bootstrap I use only 5 random elements. Eg. 10,11,20,35,23 and I do resampling does bootstrap will give closest answer to large sample
You can use bootstrapping with statistical modeling. This video introduces the concept as it applies to a sampling distribution, but you can use a bootstrap approach as an alternative approach to most methods
Hello, thanks for the video!!! helped a lot, but can you give me some hint for doing bootstrapping on the R software? (please, don't mind my english , i'm from another country) =)
Hi, sure, we have a few different videos for that. if you check out the following playlist (ruclips.net/p/PLqzoL9-eJTNAz0IuV1nAV7KMkGBf4QcQX) you'll see in the middle 4 videos on bootstrap hypothesis tests and confidence intervals, both explained in concept, as well as implemented in R.
here is would just be the SD of all of the bootstrap-sample means. basically what you have written, but without the (1/ n^0.5).... it would just be the: (Σ (resample mean - mean.of.all.resample.mean) ^2 ) / n-1) ^0.5
@@alfcnz AHHHHHH, thought you meant the reverse writing screen. Watching again I see what you mean. That's some old school screen transitioning right there!
we have another video in editing showing how to use this to construct a confidence interval...and we will also record an R compliment to that, showing that example with R. we also have recorded videos explaining the use of Bootstrap (and Permutation/Re-Shuffling Tests) in the context of comparing 2 groups...we wont get to editing that one for about a month or more, and plan to also record the R compliment for that, showing how to implement the concepts in R. it will take a bit of time for those to get up, as we haver many others ahead in the editing cue...but we do plan to haver those up in time...
If your small sample is limited to observed data points within a short term trend there is no way account for this. If you are trying to predict sea level rise but you just bootstrap the data taken as the tide rolls in one evening you will never account for the cycles of high and low tide. Seems disingenuous to claim outliers effect large sample statistics just the same as bootstrapped samples. But I get it. I did bootstrapping all the time in the 80s and 90s when I did my science fair projects last min.
What you are describing is a poor sampling design. If you collect data in the way you describe, any analysis will lead to incorrect conclusions. I must say it’s very impressive that as a high school kid you knew bootstrapping! The first paper on the topic was published in 1979, and it didn’t become commonly used until high powered computers. Very impressive!
Hi, thank you! I'm completly lost at how many times I should bootstrap my sample...I'm making a regression model, but my errors are not normally distributed, so I'm considering bootstrapping. Question: how many times should I bootstrap the original data set? I have 21 participants, each has 2 observations of 2 tests in 2 different years (totalling 4 per participant) @MarinStatsLectures-R Programming & Statistics
👋🏼 Hello there! In this statistics lecture we learn the Bootstrap method (a brute force method) in statistics, along with why one may want to use such an approach. Bootstrap in statistics is a re-sampling based approach, useful for estimating the sampling distribution and standard error of an estimate. If Like to support us you can Donate (bit.ly/2CWxnP2), Share our Videos, Leave us a Comment and Give us a Like 👍🏼 ! Either way We Thank You! 🦄
Thank you Mike! I have a question (I probably got it wrong...). In the lesson, you used a small sample of 5 (5 observations), so is it true that there would only be 5 to the power of 5 = 3125 ways of different sampling? If so, how would it help to have B bigger than 3125?
Your video saved me. Thank you soo much. Really appreciated :) :)
So many youtube videos that try to explain bootstrapping and yet this guy explains it so well you don't even want have to try hard or anything to understand! He understands it well and explains it well!
The last question is the high light of the whole video!!! You teach much clearly than my prof!
thanks, we appreciate that :)
Do you guys need this for.your brain to come somewhere, o my god professor needed to come some where in your own theori. It oke there are some gow need guidance line from a PROF. your the prof in your brain or not
You mean you not we follow the leader is you you will never be a leader but a follower. Like following mami to come to the playingyard
ABSOLUTELY BRILLIANT!!! This concept of random and sequential measurement selection is so simple, but this is the only spot on the internet which manages to explain it well
What a coincidence! We just started with bootstrapping in class. I'm currently doing second year stats. I am in my second semester right now. I didn't quite get what my lecturer was trying to say in class, but you explained it so well here. Thank you so much for this video!
Great to hear! I’m teaching bootstrapping in my class this week :)
Thank you so much for this video!! RUclips university is a life saver
Incredibly clear and tangible. Rare to find in stats videos. Thank you!
the best explanation available in the internet!
thanks, we agree ;)
Hello, professor! I am learning a lot and contents are just.... incredibly clear and informative!! Thank you so so much for this contents!
Your explanation is so clear! Thank you.
You guys have to understand bootstrap is a simple idea but there are not any formal proof that is works... so it is difficult for a professor to explain it, it is always a handwaving explanation on why it works.
But you put the concept very clear I liked it.
Just stumbled across your channel. Fantastic work! Please keep them coming
brilliant, much better than my uni days
Wow so much great reviews, I’m going to show this to my Computer Science teacher for a project that I have to do :(
Best explanation about bootstrapping in yt
I agree ;)
Amazing and very lucid! Thanks Marin, you make life easy for grad students struggling with dense notation from their professors.
thanks, i teach grad students as well, and im trying to do the same for them, so glad to hear it's working ;)
It was very helpful. Thank you very much
OMG your videos have literally saved my life!!!! Thanks!!!!
You’re welcome, happy to help :)
Hello there, in this session it was informative and helpful. Thanks Professor for the incredible content
Very well done. Really appreciate the example at the end.
A great thanks from the Netherlands
Brilliantly clear description. Thank you!
Thanks a lot Sir...The intro you provided is the best i have ever seen.. it helped a lot!!
Please make some videos regarding Bayesian methods using R whenever it is possible!!!!
Amazing and absolutely clear video. Thank you!
Great what you're doing, Marin. Thank you so much for everything.
Thank you!
Now I understand.
Great mirror-writing skills
Very nice video, helped me alot in understanding the principle of Bootstrapping!
Very helpful indeed, and writing reverse!!!
Great video!
Very nice one
Great Video. Thank you very much.
Great lecture
Beautifully explained!!
One point w.r.t Bootstrapping, via resampling we create child samples out of the first sample. But doesn't it introduce a dependency between the first and the subsequent samples as we will always get the same data values in child samples?
Let's say if I have Blood pressure data of 500 patients and out of these records there are only 200 unique BP values then the child samples after resampling will always have values from these 200 unique values.
So, can't we say that just like the parametric approach we should have an adequate amount of observations in the parent sample for bootstrapping?
Very well explained!
You are such a great teacher! Thank you!
you're welcome :)
Great video, really helped!
quite helpful and baby's beautiful words just make my day.
good to hear! our boy wanted to be part of the video creation, and so he's taken on that role ;)
@@marinstatlectures great
Interesting one
GOD IT IS SO HELPFUL! THANK YOU FOR MAKING THIS VIDEO!
you're welcome :)
Thanks a lot Marin, Can you do some videos in Factor analysis
Thanks for the clear explanation. However, I wonder that which kind of bootstrapping this is
Nice informative video. I have a couple questions though:
1. If we randomly sample 10,000 times out of a small sample space (of 5 in our case) isn't that going to tend towards a normal distribution since it's a large collection of random sampled values?
2. Isn't the point of Bootstrapping to estimate the population standard deviation when we don't have enough samples, and won't a t-test be better in that case? I know that a T-test works only for normally distributed data and bootstrapping I believe is especially effective when the distribution of the population is not normal, in which case we assume the distribution to be the same as the small number of samples. But doesn't this get skewed when we do random sampling 10,000 times and get a normal distribution through that?
Thanks,
Did you find out?
Thank you! This helps a lot ❤️
You’re welcome, great to hear!
I am wondering why the standard error for your example is just using sqrt(n) and not sqrt(n-1) considering the data size is small?
Very useful! Thank you!
Are you going to realize a video regarding Cross-validation and Bootstrap methods (in R) used for validating the regression models?
Probably at some point, but in the near term we’re focusing on building videos for intro stats, and next for regression modeling (of all sorts)
Suppose you're running a logistic regression, and your response variable Y/N is imbalanced were the Y class makes up a minority of the overall dataset, say 12%. Should your bootstrap samples be stratified or is it okay to not do stratified bootstrap sampling a large number of times?
In the parametric we were taking samples of same size from population and getting distribution, but in case of bootstraping we are taking data from one sample(not population) then how are u comparing these two things?
"This bootstrapping appoach" aahaha
Awesome explanation! Thanks!
you're welcome
thank you sir for this wonderfully explained video, please make another
video on how to do sampling for generating the sample means of sampling
distribution. I don't understand how do we do that practically. Thanks
in Advance...
Thank you!! I has been really helpfull!!
This is an excellent presentation.
I am wondering what is the point of calculating bootstrapping standard error. We used SE to calculate 95%CI in a parametric approach. When we do bootstrap, we can directly obtain 95%CI from bootstrap data. If we create 10000 bootstrap samples and sort them from minimum to maximum. The 251st and the 9750th are the lower and upper bound of 95%CI. Correct me, if I am wrong.
I got a question for you :
Is there any difference between bootstrapping and the central limit theorem?
if yes, I just wanna know, exactly when to use bootstrapping in inferential statistics?
Thanks for all the effort u r doing.
Same question here, could you please share your feedback if you get any answer?
You made this easier to understand. Marin could you do a video on power analysis in R studio
hi, we are about to release a video explaining the concept of Power, in the context of tests for a mean. we hope to record a complimentary video showing some of that stuff in R...but have many things recorded and in need of editing before we can get to that
Having to work with a low number of samples seems to be a statistician's nightmare...
All resamples data size should be equal but should it need to equal to the sample data size
I mean if sample has 5 data
Can we take 3 data randomly from the sample for each resampling?
Hi. I want to ask about SE of 1 specific percentile. I understand that SE is on average how far sample means are likely to be from the population mean. My question is what is that going to do with percentile? Why a percentile has its own SE?
Thank you..!
You’re welcome
How can u write in reverse?? Thats so cool!!
Lots of practice ;) but it’s actually using something called a “light board”, where the image is recorded and then reversed like a mirror...so I’m not actually writing backwards.
I get access to it at UBC Studios :)
U mentioned in the video that u have some more examples of bootstrapping.....is there perhaps a link?
@@marinstatlectures Wait, so how do you seem to be writing on the right or the left, and it displays on the same size of the board. That knots my brain way harder than bootstrapping
@@Katurha I was pretty distracted by this apparently amazing ability at first, too! If you somehow haven't figured it out by now...you can test it out with a smartphone's front-facing camera.
Get a thin/cheap piece of paper and a thick black marker so your text will show through. Write "Test" on the piece of paper. Hold up the paper so you can read it, and turn on the front-facing camera on the phone held out in front of you. You'll see that you've able to read the text even though the camera is looking at it from the back side! Same principle, barely different application.
Sir small sample means how small for bootstrapping?
I don't understand if the sample size is the number of samples from the population or the number of elements in each sample. This gets particularly confusion later when he talks about resampling from a sample of 5 elements. What am I missing?
only question that I now have: where did you learn to write in mirrored letters... :D
7:42: Resampling for B times may result in a "B+1" at the foot of X-bar-*
here, we are taking repeated samples 1,2,3,4,...,B to have B total samples. although it really doesn't matter how many you take, and the concepts is the exact same if you take R=B+1
Thank you very much for this video. It was easy to understand. I do have one quick question, how did you get the sample error 5.57?
Thank you in advance if you can help me answer my question!!!
That was by calculating the SD of the bootstrap means. In reality we would do this for many more bootstrap resamples than I did in this video
@@marinstatlectures thank you, please it is important to explain how did you get 5.57? so i can correct mine and get the full knowledge
i am asking Sir, because my calculation gives 5.244044 :) not 5.57
I am new to statistics and I have a doubt regarding the calculation of Standard error of bootstrapping example. How did the Standard error of those 3 resamples come out to to be 5.57. Here's what I did, Could you tell me where I am wrong:
I calculated the standard deviation of the 3 examples (84,73,86) and it was 7. The Standard Error is hence, 7/√3 which is 4.04.
the 3 sample mean should be (84, 73, 80), and the SD/SE of them is 31^0.5 = 5.57
Hello Sir, in large sample theory if we increase number of observation it will eventually leads to normal distribution. While in bootstrap I don't think so.
Consider marks for exams. If we use large sample, we will eventually end in normal distribution. While during bootstrap I use only 5 random elements. Eg. 10,11,20,35,23 and I do resampling does bootstrap will give closest answer to large sample
Thank you for explaining this !
you're welcome :)
bravo!
Thanks for your video!!! May I ask that is the boostrap in SPSS able to do internal validation for predictive model?
I’m not sure about using SPSS...I know the basics, but I’m an R user...
thanks mari
you're welcome :)
Can we make a statistical model from bootstrap sample distribution?
You can use bootstrapping with statistical modeling. This video introduces the concept as it applies to a sampling distribution, but you can use a bootstrap approach as an alternative approach to most methods
Hello, thanks for the video!!! helped a lot, but can you give me some hint for doing bootstrapping on the R software? (please, don't mind my english , i'm from another country) =)
Hi, sure, we have a few different videos for that. if you check out the following playlist (ruclips.net/p/PLqzoL9-eJTNAz0IuV1nAV7KMkGBf4QcQX) you'll see in the middle 4 videos on bootstrap hypothesis tests and confidence intervals, both explained in concept, as well as implemented in R.
@@marinstatlectures thanks so much!!!!
what technology is this, it automatically mirror the whiteboard??? amazing!
the whole video is just mirrored ;)
Video should be called, how man wrote backwards in 17m
How did you calculate the bootstrap standard error using the 3 resamples? Is the formula (1/ n^0.5)( (Σ (resample mean - sample mean) ^2 ) / n-1) ^.05
here is would just be the SD of all of the bootstrap-sample means. basically what you have written, but without the (1/ n^0.5).... it would just be the: (Σ (resample mean - mean.of.all.resample.mean) ^2 ) / n-1) ^0.5
Marin can you plzzz explain how had u calculated the SE value as I'm getting "11.51/root 3=6.65"??? plzz explain m i putting wrong values?
Calculate Resample Mean (84,73,80) => 79
Now the SD of the resamples
This would lead you to 5.57 SE
Why does the screen keep flipping? It's making me dizzy!
@@The_Real_Goodboy_Link I'm talking about the annoying transition effect…
@@alfcnz AHHHHHH, thought you meant the reverse writing screen. Watching again I see what you mean. That's some old school screen transitioning right there!
AHHHHHH, thought you meant the reverse writing screen. Watching again I see what you mean. That's some old school screen transitioning right there!
Maybe some editing was done by him to skip some irrelevant part of the video.
Can you provide this stat example in R with some file samples ?
we have another video in editing showing how to use this to construct a confidence interval...and we will also record an R compliment to that, showing that example with R.
we also have recorded videos explaining the use of Bootstrap (and Permutation/Re-Shuffling Tests) in the context of comparing 2 groups...we wont get to editing that one for about a month or more, and plan to also record the R compliment for that, showing how to implement the concepts in R.
it will take a bit of time for those to get up, as we haver many others ahead in the editing cue...but we do plan to haver those up in time...
Wow? I thought the word i see there above the scatter plot was "PORN" 0.o
If your small sample is limited to observed data points within a short term trend there is no way account for this. If you are trying to predict sea level rise but you just bootstrap the data taken as the tide rolls in one evening you will never account for the cycles of high and low tide. Seems disingenuous to claim outliers effect large sample statistics just the same as bootstrapped samples.
But I get it. I did bootstrapping all the time in the 80s and 90s when I did my science fair projects last min.
What you are describing is a poor sampling design. If you collect data in the way you describe, any analysis will lead to incorrect conclusions.
I must say it’s very impressive that as a high school kid you knew bootstrapping! The first paper on the topic was published in 1979, and it didn’t become commonly used until high powered computers. Very impressive!
@@marinstatlectures I just mean I reused the inadequate sample I had to pretend I had collected a sufficient sampling of the population.
I need to do bootstrap on Gretl please. for estimation NARDL model
no
didnt help me
We life in a some kind of new world this is so boring and old theori
Incorrect. Classical approaches to statistical inference are old theory, this is a much more modern approach, made possible by high computing power
Hi, thank you! I'm completly lost at how many times I should bootstrap my sample...I'm making a regression model, but my errors are not normally distributed, so I'm considering bootstrapping. Question: how many times should I bootstrap the original data set? I have 21 participants, each has 2 observations of 2 tests in 2 different years (totalling 4 per participant)
@MarinStatsLectures-R Programming & Statistics
This is great. Thank you.