NOTE: A lot of people ask "What happens when the original collection of measurements is not representative of the underlying distribution?" It's important to remember that a confidence interval is not guaranteed to overlap the true, population mean. A 95% CI means that if we make a ton of CIs using the same method, 95% of them will overlap the true mean. This tells us that 5% of the time we'll be off. So yes, a sample that is totally bonkers is possible, but rare. Understanding this risk of making the wrong decision, and managing it, is what statistics is all about. Also, at 5:55 I say there are up to 8^8 combinations of observed values and possible means, but this assumes that order matters, and it doesn't. So 8^8 over counts the total number of useful combinations and the true number is 15 choose 8, which is 6435 (for details on this math, see: en.wikipedia.org/wiki/Multiset#Counting_multisets ) Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/
We take for granted all that went behind that idea of 95% CI that you stated - it was Jerzy Neyman's who came up with that definition. Have you read "The Lady Tasting Tea"? A bit of a history of some incredible mathematicians, including Ronald Fisher and Jerzy Neyman. The 95% comes up on page 123. Thanks for all your valuable statistics videos!
So, if we take our sample of 8 observations, and we calculate a 95% confidence interval around the sample mean by bootstrapping, and then a genie appears and tells us that the true population mean lies outside of that confidence interval, that's the same as saying that our original 8-observation sample's mean actually wouldn't appear 95% of the time if we repeated the experiment infinitely many times, each experiment being an 8-observation sampling of the population?
@@alexandersmith6140 The definition is of a 95% CI is that if we repeated the process of creating the 95% CI a ton of times, 95% of the CIs created that way would overlap the true mean. Thus, if collected 8 measurements and used Bootstrapping to calculate a 95%, then that if we repeated that process of creating the 95% CI a ton of times (collected 8 measurements, then calculated the CI with bootstrapping), then 95% of those CIs will overlap the true mean. In other words, it doesn't matter if we use bootstrapping, or some formula to calculate the CI, in both cases we have to collect 8 measurements a ton of times.
I have done a master's in stats and a course in data analysis, and the only reason I've passed these things is that after a long and confusing lecture I can just come and watch you explain it in simple terms. Bam! Thank you so much!!
What I love about you is that you explain the big picture first. You help me understand why we should care in the first place, or the motivation behind the concept. Then you dive into the details afterwards, you make the information more accessible without compromising the technical integrity of the information. A very rare skill indeed, I'm reading Introduction To Statistical Learning in R ( ISLR ) and some chapters aren't intuitive, whenever I read a chapter that doesn't make sense I just watch your videos. That's how I know you're not compromising the technical integrity of the information, because what you say doesn't contradict what I read in academic papers, it's just easier to understand than what I read in academic papers. You are one of a kind!
I think you summed up the value of these videos really well. Starting with the big picture and then zooming into the details is so much more beneficial for learning and I think this is one of the things Josh nails!
@@statquest is it really so effective? We really can only be as confident -- that bootstrapping produces characteristic data -- as we are that the sample is representative of the distribution -- right? Unconfident extrapolation seems like a good way to pollute datasets.
Passed all my stats courses already (thanks to your videos for a major part), but I'm still watching these as they come out, lol. Keep it up Josh, this channel is so good.
and just like that bam!! i was stuck for the last six hours rewatching what my instructor posted on the portal but this explanation made so much sense and easier to grasp the concept. thank you so much Josh!
I read a section on bootstrapping countless times and only understood it finally after watching your video! All I have to say to that is: BAM! (and thanks a bunch)
You sir are an absolute legend. Really helping me getting through my course, because my professor explains the same concept in a method that is 100 times harder to understand
You're probably the best guy for this job. Even though I don't know where I'm gonna apply all these. I just keep going through all of your videos. After finishing up this playlist I'll watch the ML playlist. Keep amazing us. Thank you JOSH
I can never get over how your videos make me love statistics when all my professors and recommended texts made me run away from it. Super grateful!! Also, I think I asked when this video was coming about a year ago.
@@Synthanicmusic I do as a data scientist. Honestly, If you know that you don't know what you're doing then you are going to be better positioned than most; it means you will be questioning why you are applying certain tests/methods, rather than just doing so blindly. Especially in the workforce you will see a lot of badly reasoned statistics!
All semester long I have been floundering through my statistics class, no thanks to my professors' boring and quite difficult-to-follow lectures on the materials. I've felt so dumb all semester, so when the next section called for "bootstrapping" I finally decided to throw her lecture videos aside and see if someone could explain the concepts better on RUclips. Boy am I glad I stumbled upon this. The visuals are straight to the point and the way you talk through everything very slowly and clearly is SOOO helpful. The enthusiasm and goofiness helps me keep my attention, which is a pain for me with ADHD. I could rewatch my prof's videos 5 times and retain nothing. Makes me wanna just burst into tears from frustration. But I felt like I could actually keep up with this video and _understand_ it! TL;DR thank you for making this, it was a HUGE improvement over my professor's teaching style and I will DEFINITELY be consulting you for future topics. You're a peach
Thanks for the videos, embarrassingly I'm relearning a lot of these concepts even though I graduated with a Statistics major. It's coming a lot easier now.
Happens a lot more often than you think. I graduated with a physics major not long ago and I can say I still cannot consider myself a physicist. I constantly keep finding myself learning things from awesome channels like Josh's that I'm supposed to know by now.
Watched the Stanford's and other lectures on similar topics, but you made it really simple and easier to understand. You teach good!! BIG BOOM BAMM !! thanks man
Another great video. This video explains how to do bootstrap, which is the easy part. The more difficult part is to understand why bootstrap works. The conceptual challenge is that bootstrapping assumes that if we were to repeat an experiment, it would produce one of the outcomes we had observed. This could be a huge assumption, depending on the applications. Boot strapping does not add any new information to what has been observed.
"The reason why this works is because the histogram of the sample tends to look very similar to the histogram of the population. That's really the key idea behind the bootstrap, and we will see how this idea can be used in all kinds of complicated situations. " Taking an online course on bootstrap regression and came here to try to understand why bootstrap works when it does not generate any new information.
@@sgpleasure When you sample from a population, it’s unsurprising that the distribution of the sample resembles the distribution of the population. So, you’re not really obtaining any new information. In essence, we’re only pretending it’s new information, when in fact, it’s just reconfirming existing information.
I'm studying at a top 10 research university in the States and every professor has a PhD from Harvard/Stanford, but none of them teach stats as well as StatQuest 🙃
A professor at the university I studied at was apparently a key contributor to Bootstrapping. Excellent job at explaining it in such an easy-to-understand way!
My man sounds sounds excited and bored at the same time and Im here for it 😂 Great explanation, something my Prof couldn’t manage. Elite university my ass lol
What a comprehensive and fun discussion! I really had trouble understanding the concept of bootstrapping by myself but your lecture helped me a great deal :> Kudos!
Thank you so much for your wonderful videos. I have a small request to provide a lecture on FLDA, GMM, EM Algorithm, MLE estimation, MAP estimation. Also, there are some lectures which are not in the book, please also include those lectures too. Thank you so much again!!!. I want to learn more and more from your lectures.
In 8:23 the notation on the x-axis should be median values not mean values since we are using median as statistic measurement for bootstrapping in this case...pls look into it
The best stats teachings out there! Kudos!!!! Question : Do we need to know/estimate the distribution(normal/gamma/exponential/etc) of the bootstrapping histogram to determine the 95% confidence interval in cases where central limit theorem doesn’t apply( such as median)?
Thank you for this informative video; it really clarified my understanding of bootstrapping! However, I'm curious about the choice between sampling with replacement versus without replacement in the bootstrap method. How do we determine the most appropriate method for our specific dataset? Are there particular scenarios or types of data where one method is preferred over the other? Any additional insights would be greatly appreciated. Thanks a lot!
I was thinking the same! I usually use bootstrapping when the "test data" in folds is too small, but I really don't know the advantage of using CV instead Bootstrapping or viceversa
At 6:40 when you start to discuss the 95% CI; I think there will be a lot of people who wont understand the subtlety of this distribution. You have created a distribtuion of 'statsitics'; in this case the mean. So, as you would appreciate you have derived the "sampling distribution' of the mean, from which the standard deviation = the standard error of the mean and the 95% CI calcaution is trivial. The uninitated might not appreciate how this is different from a distribution of a single data set; whereby the standard error = the standard devation / sqrt(n).
NOTE: A lot of people ask "What happens when the original collection of measurements is not representative of the underlying distribution?" It's important to remember that a confidence interval is not guaranteed to overlap the true, population mean. A 95% CI means that if we make a ton of CIs using the same method, 95% of them will overlap the true mean. This tells us that 5% of the time we'll be off. So yes, a sample that is totally bonkers is possible, but rare. Understanding this risk of making the wrong decision, and managing it, is what statistics is all about.
Also, at 5:55 I say there are up to 8^8 combinations of observed values and possible means, but this assumes that order matters, and it doesn't. So 8^8 over counts the total number of useful combinations and the true number is 15 choose 8, which is 6435 (for details on this math, see: en.wikipedia.org/wiki/Multiset#Counting_multisets )
Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/
We take for granted all that went behind that idea of 95% CI that you stated - it was Jerzy Neyman's who came up with that definition. Have you read "The Lady Tasting Tea"? A bit of a history of some incredible mathematicians, including Ronald Fisher and Jerzy Neyman. The 95% comes up on page 123. Thanks for all your valuable statistics videos!
@@natasgestel6873 Yes, I've read the book. Those dues were pretty smart.
Thank you for explaining that order doesn't matter. I was looking for the clarification on this everywhere.
So, if we take our sample of 8 observations, and we calculate a 95% confidence interval around the sample mean by bootstrapping, and then a genie appears and tells us that the true population mean lies outside of that confidence interval, that's the same as saying that our original 8-observation sample's mean actually wouldn't appear 95% of the time if we repeated the experiment infinitely many times, each experiment being an 8-observation sampling of the population?
@@alexandersmith6140 The definition is of a 95% CI is that if we repeated the process of creating the 95% CI a ton of times, 95% of the CIs created that way would overlap the true mean. Thus, if collected 8 measurements and used Bootstrapping to calculate a 95%, then that if we repeated that process of creating the 95% CI a ton of times (collected 8 measurements, then calculated the CI with bootstrapping), then 95% of those CIs will overlap the true mean.
In other words, it doesn't matter if we use bootstrapping, or some formula to calculate the CI, in both cases we have to collect 8 measurements a ton of times.
I have done a master's in stats and a course in data analysis, and the only reason I've passed these things is that after a long and confusing lecture I can just come and watch you explain it in simple terms. Bam!
Thank you so much!!
Thanks! I'm glad my videos are helpful! :)
I am presently in your shoes, taking a Data Science Course but thanks to @statquest. giving him Double Bam!!
That's cool! How are your studies/career going?
There is nobody on RUclips that explains statistics better or in a more entertaining way than you! Keep it up!
Wow, thanks!
What I love about you is that you explain the big picture first. You help me understand why we should care in the first place, or the motivation behind the concept. Then you dive into the details afterwards, you make the information more accessible without compromising the technical integrity of the information. A very rare skill indeed, I'm reading Introduction To Statistical Learning in R ( ISLR ) and some chapters aren't intuitive, whenever I read a chapter that doesn't make sense I just watch your videos. That's how I know you're not compromising the technical integrity of the information, because what you say doesn't contradict what I read in academic papers, it's just easier to understand than what I read in academic papers. You are one of a kind!
Thank you very much!
@@statquest No, thank YOU Josh!
I think you summed up the value of these videos really well. Starting with the big picture and then zooming into the details is so much more beneficial for learning and I think this is one of the things Josh nails!
@@CaptainFeatherSwordzIt's worlds apart from what the education system has conditioned us to right?
Still floored that this works as a method
I know - it's so easy, yet so effective.
@@statquest is it really so effective? We really can only be as confident -- that bootstrapping produces characteristic data -- as we are that the sample is representative of the distribution -- right? Unconfident extrapolation seems like a good way to pollute datasets.
@@patrickjdarrow Just like any statistical method, you have to have a reasonable sample size. n = 8 as a minimum is a good starting point.
Passed all my stats courses already (thanks to your videos for a major part), but I'm still watching these as they come out, lol. Keep it up Josh, this channel is so good.
Thank you very much! :)
and just like that bam!! i was stuck for the last six hours rewatching what my instructor posted on the portal but this explanation made so much sense and easier to grasp the concept. thank you so much Josh!
Bam! Glad it helped!
I read a section on bootstrapping countless times and only understood it finally after watching your video! All I have to say to that is: BAM! (and thanks a bunch)
Hooray!!! :)
You sir are an absolute legend. Really helping me getting through my course, because my professor explains the same concept in a method that is 100 times harder to understand
Happy to help!
You're probably the best guy for this job. Even though I don't know where I'm gonna apply all these. I just keep going through all of your videos. After finishing up this playlist I'll watch the ML playlist. Keep amazing us. Thank you JOSH
Thanks!
I am justt speechless at - how can you mae something so complicated so simple , hats off to you and thanks a ton
Thank you!
I can never get over how your videos make me love statistics when all my professors and recommended texts made me run away from it. Super grateful!! Also, I think I asked when this video was coming about a year ago.
Glad it finally came out! :) Sorry it takes me so long to make videos.
I love SQ, because I finally "get" bootstrapping, despite having used it for years!
BAM! :)
@@Synthanicmusic I do as a data scientist. Honestly, If you know that you don't know what you're doing then you are going to be better positioned than most; it means you will be questioning why you are applying certain tests/methods, rather than just doing so blindly. Especially in the workforce you will see a lot of badly reasoned statistics!
All semester long I have been floundering through my statistics class, no thanks to my professors' boring and quite difficult-to-follow lectures on the materials. I've felt so dumb all semester, so when the next section called for "bootstrapping" I finally decided to throw her lecture videos aside and see if someone could explain the concepts better on RUclips. Boy am I glad I stumbled upon this. The visuals are straight to the point and the way you talk through everything very slowly and clearly is SOOO helpful. The enthusiasm and goofiness helps me keep my attention, which is a pain for me with ADHD. I could rewatch my prof's videos 5 times and retain nothing. Makes me wanna just burst into tears from frustration. But I felt like I could actually keep up with this video and _understand_ it!
TL;DR thank you for making this, it was a HUGE improvement over my professor's teaching style and I will DEFINITELY be consulting you for future topics. You're a peach
Hooray! Thank you very much. Just for reference, here's a list of all of my videos: statquest.org/video-index/
@@statquest thank you very much
I wonder n feel so much regard for the institution and teachers, who taught you... no doubt, you are doing an incredible job...stay blessed always
Thank you! :)
BEST explanation EVER of bootstrap. Thanks for your dedication!
Glad it was helpful!
I am learning machine learning and came to this term , this videos explain it very clear, thank you.
Thanks!
Thanks for the videos, embarrassingly I'm relearning a lot of these concepts even though I graduated with a Statistics major. It's coming a lot easier now.
Glad to help!
Don't be embarrassed, it's not your fault, but the education system's
Happens a lot more often than you think. I graduated with a physics major not long ago and I can say I still cannot consider myself a physicist. I constantly keep finding myself learning things from awesome channels like Josh's that I'm supposed to know by now.
I've always grasped well enough to get a good grade but not well enough to embed, so I have to go back a lot.
Watched the Stanford's and other lectures on similar topics, but you made it really simple and easier to understand. You teach good!! BIG BOOM BAMM !! thanks man
Thank you! :)
You made this concept so much easier to understand than what I was supposed to be learning it from. Thank you so much!!
Glad it was helpful!
You are a legend my friend! A legend. I am doing my masters in Data Science this fall and this is amazing
You can do it!
Another great video. This video explains how to do bootstrap, which is the easy part. The more difficult part is to understand why bootstrap works. The conceptual challenge is that bootstrapping assumes that if we were to repeat an experiment, it would produce one of the outcomes we had observed. This could be a huge assumption, depending on the applications. Boot strapping does not add any new information to what has been observed.
Noted
"The reason why this works is because the histogram of the sample tends to look very similar to the histogram of the population. That's really the key idea behind the bootstrap, and we will see how this idea can be used in all kinds of complicated situations. "
Taking an online course on bootstrap regression and came here to try to understand why bootstrap works when it does not generate any new information.
@@sgpleasure When you sample from a population, it’s unsurprising that the distribution of the sample resembles the distribution of the population. So, you’re not really obtaining any new information. In essence, we’re only pretending it’s new information, when in fact, it’s just reconfirming existing information.
Wow! this is the first time I learned this. awesome!
BAM! :)
this is better than college-level advanced course !!! thank you
Wow, thanks!
The triple BAM was amazing, thank you!
Thank you!
I'm studying at a top 10 research university in the States and every professor has a PhD from Harvard/Stanford, but none of them teach stats as well as StatQuest 🙃
Thanks! :)
@@bellahuang8522 They already have your money, so they don’t care. College is such a scam
BRO YOU ARE THE BEST, CLEAR VISUAL AND FAST JUST WHAT I NEED NEW SUB!!!!
Thank you!
A professor at the university I studied at was apparently a key contributor to Bootstrapping. Excellent job at explaining it in such an easy-to-understand way!
What's the University and Prof's name?
bam!
blud just dropped one of the best explanatory videos out there and thought we wouldnt notice☠☠☠
bam! :)
So smoothly explained.
Thank you sir.
Thank you!
My man sounds sounds excited and bored at the same time and Im here for it 😂 Great explanation, something my Prof couldn’t manage. Elite university my ass lol
bam!
One of the most useful video on this topic on youtube, thanks!
Wow, thanks!
Always glad to see a new statquest! BAM!
BAM! :)
The intro was gold 🔥
bam!
What a comprehensive and fun discussion! I really had trouble understanding the concept of bootstrapping by myself but your lecture helped me a great deal :> Kudos!
Glad it was helpful!
Not the information I was looking for but i couldn't stop myself from watching it to the end. It was quite entertaining :)
*BAM
That's awesome! BAM! :)
I'm going to recommend this channel to a bunch of my machine Learning nerds. This guy deserves every hype possible!
Thank you! :)
Thanks for uploading these videos. It takes a lot of time and efforts to make such quality content. Thank you, Sir.
Glad you like them!
I can only say one thing: BAM!!! you are the best teacher BAM!!!
Thank you!
Great way to break bootstrapping into common language.
Thanks!
Mr. Josh - u are amazing. World needs more ppl like u. Its like education on another level. Thank you
Thanks! :)
i loved your bams and the illustrations for the steps and your explanation helped a lot
Thank you!
bruh... this explanation is simply awesome!
Glad you liked it
Wow, that was super easy to understand. Thank you very much
double bam! :)
This is the most amazing video I've seen on bootstrapping, thank you! Quadruple Bam!
Wow, thanks!
Very clear explanation. Well done!
Thank you! :)
The work you do is awesome!! Love it.
Thank you!
Excellent explanation as always by StatQuest!!! Thx a lot!!!
BAM! :)
Thanks a lot, your video helps me and hopefully it will help my paper too
Thanks!
Are there even comments that you do not comment on?
Very good video, thank you!
Sometimes there are, but it's rare.
Thank you so much for your wonderful videos. I have a small request to provide a lecture on FLDA, GMM, EM Algorithm, MLE estimation, MAP estimation. Also, there are some lectures which are not in the book, please also include those lectures too. Thank you so much again!!!. I want to learn more and more from your lectures.
Thanks! I'll keep those topics in mind.
i read 30 pages of a book , almost get sth,watch 10 min of statquest, fully understand the subject, you're the best bro
Thanks!
Statquest is the netflix for data science concepts.
bam!
OMG, it was a truly easy-to-understand video! Both the animation, narration, and explanation!!!! I wanna give a billion likes!!!
Wow, thanks!
Best intro ever
Pixar would envy you
bam!
Josh, you're sent to us from heaven, thanks
:)
Amazing videos, simple and well explained.
Many thanks!
Big BAM for so much statistic knowledge in such little time
Hooray!
This kinda feels illegal xD Really nice explained!
Thank you! :)
In 8:23 the notation on the x-axis should be median values not mean values since we are using median as statistic measurement for bootstrapping in this case...pls look into it
Yep. That's a typo.
Awesome! The proofs about it seems to be nice
Thanks!
easy to understand....thanks josh!
Thank you!
9 th wonder I learned bootstrapping and confidence intervals! hurray!
double bam! :)
Thank you for the teaching 🎉
Any time!
Love the explanation ..... Thank uh soo much❣️
Thanks!
Dear Josh I bought a few study guides :) Thanks so much for your videos
Awesome! Thank you so much for your support!!
Thank you for explaining... Eat tomato and stay healthy....
bam! :)
Thank you so much, your videos are always so helpful to me
Glad you like them!
Josh if you need someone who cleans your room or makes the dishes, just give me a call. I own you that
Wow! :)
The best stats teachings out there! Kudos!!!! Question : Do we need to know/estimate the distribution(normal/gamma/exponential/etc) of the bootstrapping histogram to determine the 95% confidence interval in cases where central limit theorem doesn’t apply( such as median)?
No
@@statquest Thanks! :)
Bootstrapping? More like "Bro, it's awesome knowledge you're dropping!" 👍
Bam! :)
@@statquest Boot! 🥾
You rock Josh. Thanks for making this video!
Thanks!
Nice way of explanation!! BAM!!!
Thanks!
University professor explained it in a confused and insufficient way (to put it politely), then I came to StatQuest.
bam!
Love all of your videos!! Thanks a lot!
Glad you like them!
Thanks Josh, you are the one!
Thank you and congratulations again. I'm so glad I was helpful. BAM! :)
Wow this is so good. The intro made me laugh so hard, it wasn't even that funny I just didn't expect it.
Thanks!
Nice explanation...awesome
Thank you!
Thank you JOSH!
bam!
Thank you for this informative video; it really clarified my understanding of bootstrapping! However, I'm curious about the choice between sampling with replacement versus without replacement in the bootstrap method. How do we determine the most appropriate method for our specific dataset? Are there particular scenarios or types of data where one method is preferred over the other? Any additional insights would be greatly appreciated. Thanks a lot!
Bootstrap always uses sampling with replacement.
Great video! I learn so much with you!
Awesome! Thank you!
BAMM! It will be great if you can compare and contrast cross-validation and bootstrapping for us.
One huge difference is that Bootstrapping uses sampling with replacement.
I was thinking the same! I usually use bootstrapping when the "test data" in folds is too small, but I really don't know the advantage of using CV instead Bootstrapping or viceversa
Great video!
Btw, you could probably do a really good Solid Snake voice. Would love to get an Easter egg in one of the next videos!!
That would be funny. :)
Nice work , man
Thanks!
Create video! One small flaw I would like to point out the lingo should be fail to reject the null hypothesis instead of cannot reject the hypothesis.
Noted
thank you, this was very helpful
Glad it was helpful!
Nice - “Part 1”, gonna have another nice solid multi-part set.
At least 2 parts. Part 2 will be out soon.
Thank you so much, very clear again! Not planning to make some videos about Fisher information, Jackknife, and Delta method, by any chance? 😬
No time soon, but I'll keep them in mind.
@@statquest Jackknife would be cool!
really clear, thanks
Thank you!
Okay you are the best thank you for doing this video !
Thank you!
Super clear, thanks!
Thank you!
Explained better than my professor! Amazing work Josh
Wow, thanks!
Loved it... Big BAM!
Thanks!
Love it!!! It's really helpful
Thanks!
You are amazing! Thank you!
Thanks!
Please upload videos on monte carlo simulation and integration
I'll keep that in mind.
I love the terminology alert😂
quadruple bam !😂
bam!
At 6:40 when you start to discuss the 95% CI; I think there will be a lot of people who wont understand the subtlety of this distribution. You have created a distribtuion of 'statsitics'; in this case the mean. So, as you would appreciate you have derived the "sampling distribution' of the mean, from which the standard deviation = the standard error of the mean and the 95% CI calcaution is trivial. The uninitated might not appreciate how this is different from a distribution of a single data set; whereby the standard error = the standard devation / sqrt(n).
noted
Josh is on Spotify! BAM
bam!
Ur videos are just so cool, tnx a lot
Glad you like them!