NOTE: A lot of people ask "What happens when the original collection of measurements is not representative of the underlying distribution?" It's important to remember that a confidence interval is not guaranteed to overlap the true, population mean. A 95% CI means that if we make a ton of CIs using the same method, 95% of them will overlap the true mean. This tells us that 5% of the time we'll be off. So yes, a sample that is totally bonkers is possible, but rare. Understanding this risk of making the wrong decision, and managing it, is what statistics is all about. Also, at 5:55 I say there are up to 8^8 combinations of observed values and possible means, but this assumes that order matters, and it doesn't. So 8^8 over counts the total number of useful combinations and the true number is 15 choose 8, which is 6435 (for details on this math, see: en.wikipedia.org/wiki/Multiset#Counting_multisets ) Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/
We take for granted all that went behind that idea of 95% CI that you stated - it was Jerzy Neyman's who came up with that definition. Have you read "The Lady Tasting Tea"? A bit of a history of some incredible mathematicians, including Ronald Fisher and Jerzy Neyman. The 95% comes up on page 123. Thanks for all your valuable statistics videos!
So, if we take our sample of 8 observations, and we calculate a 95% confidence interval around the sample mean by bootstrapping, and then a genie appears and tells us that the true population mean lies outside of that confidence interval, that's the same as saying that our original 8-observation sample's mean actually wouldn't appear 95% of the time if we repeated the experiment infinitely many times, each experiment being an 8-observation sampling of the population?
@@alexandersmith6140 The definition is of a 95% CI is that if we repeated the process of creating the 95% CI a ton of times, 95% of the CIs created that way would overlap the true mean. Thus, if collected 8 measurements and used Bootstrapping to calculate a 95%, then that if we repeated that process of creating the 95% CI a ton of times (collected 8 measurements, then calculated the CI with bootstrapping), then 95% of those CIs will overlap the true mean. In other words, it doesn't matter if we use bootstrapping, or some formula to calculate the CI, in both cases we have to collect 8 measurements a ton of times.
What I love about you is that you explain the big picture first. You help me understand why we should care in the first place, or the motivation behind the concept. Then you dive into the details afterwards, you make the information more accessible without compromising the technical integrity of the information. A very rare skill indeed, I'm reading Introduction To Statistical Learning in R ( ISLR ) and some chapters aren't intuitive, whenever I read a chapter that doesn't make sense I just watch your videos. That's how I know you're not compromising the technical integrity of the information, because what you say doesn't contradict what I read in academic papers, it's just easier to understand than what I read in academic papers. You are one of a kind!
I think you summed up the value of these videos really well. Starting with the big picture and then zooming into the details is so much more beneficial for learning and I think this is one of the things Josh nails!
I have done a master's in stats and a course in data analysis, and the only reason I've passed these things is that after a long and confusing lecture I can just come and watch you explain it in simple terms. Bam! Thank you so much!!
You're probably the best guy for this job. Even though I don't know where I'm gonna apply all these. I just keep going through all of your videos. After finishing up this playlist I'll watch the ML playlist. Keep amazing us. Thank you JOSH
I can never get over how your videos make me love statistics when all my professors and recommended texts made me run away from it. Super grateful!! Also, I think I asked when this video was coming about a year ago.
Another great video. This video explains how to do bootstrap, which is the easy part. The more difficult part is to understand why bootstrap works. The conceptual challenge is that bootstrapping assumes that if we were to repeat an experiment, it would produce one of the outcomes we had observed. This could be a huge assumption, depending on the applications. Boot strapping does not add any new information to what has been observed.
"The reason why this works is because the histogram of the sample tends to look very similar to the histogram of the population. That's really the key idea behind the bootstrap, and we will see how this idea can be used in all kinds of complicated situations. " Taking an online course on bootstrap regression and came here to try to understand why bootstrap works when it does not generate any new information.
@@sgpleasure When you sample from a population, it’s unsurprising that the distribution of the sample resembles the distribution of the population. So, you’re not really obtaining any new information. In essence, we’re only pretending it’s new information, when in fact, it’s just reconfirming existing information.
All semester long I have been floundering through my statistics class, no thanks to my professors' boring and quite difficult-to-follow lectures on the materials. I've felt so dumb all semester, so when the next section called for "bootstrapping" I finally decided to throw her lecture videos aside and see if someone could explain the concepts better on RUclips. Boy am I glad I stumbled upon this. The visuals are straight to the point and the way you talk through everything very slowly and clearly is SOOO helpful. The enthusiasm and goofiness helps me keep my attention, which is a pain for me with ADHD. I could rewatch my prof's videos 5 times and retain nothing. Makes me wanna just burst into tears from frustration. But I felt like I could actually keep up with this video and _understand_ it! TL;DR thank you for making this, it was a HUGE improvement over my professor's teaching style and I will DEFINITELY be consulting you for future topics. You're a peach
What a comprehensive and fun discussion! I really had trouble understanding the concept of bootstrapping by myself but your lecture helped me a great deal :> Kudos!
You sir are an absolute legend. Really helping me getting through my course, because my professor explains the same concept in a method that is 100 times harder to understand
I read a section on bootstrapping countless times and only understood it finally after watching your video! All I have to say to that is: BAM! (and thanks a bunch)
Thank you so much for your wonderful videos. I have a small request to provide a lecture on FLDA, GMM, EM Algorithm, MLE estimation, MAP estimation. Also, there are some lectures which are not in the book, please also include those lectures too. Thank you so much again!!!. I want to learn more and more from your lectures.
Awesome work! Thanks Josh! It seems weird to me that this creates useful/meaningful results. You're just reusing the same data again and again, so even though we can generate new means by bootstrapping, are those means actually correct? In other words, has anyone compared bootstrapping (and calculating the mean) to repeating the experiment a bunch of times (and calculating the mean)?
You can simulate bootstrapping techniques easily. The bootstrapped distribution has the same mean as the statistical mean of the sample, and the more points you have in your original sample, the closer its width will be to the spread of the means you would get if you redid the experiment a bunch of time. Bootstrapping is not the same however as doing the experiment a bunch of times, as the latter would give a bigger dataset with a narrower confidence interval. This is a confidence interval estimation method that does not make assumptions about the underlying data distribution.
In 8:23 the notation on the x-axis should be median values not mean values since we are using median as statistic measurement for bootstrapping in this case...pls look into it
Thank you for this informative video; it really clarified my understanding of bootstrapping! However, I'm curious about the choice between sampling with replacement versus without replacement in the bootstrap method. How do we determine the most appropriate method for our specific dataset? Are there particular scenarios or types of data where one method is preferred over the other? Any additional insights would be greatly appreciated. Thanks a lot!
At 6:40 when you start to discuss the 95% CI; I think there will be a lot of people who wont understand the subtlety of this distribution. You have created a distribtuion of 'statsitics'; in this case the mean. So, as you would appreciate you have derived the "sampling distribution' of the mean, from which the standard deviation = the standard error of the mean and the 95% CI calcaution is trivial. The uninitated might not appreciate how this is different from a distribution of a single data set; whereby the standard error = the standard devation / sqrt(n).
Thank you very much!! One question: When a need to calculate de standard error, I just need to calculate de standard deviation of the resamples? Or a need to calculate the standard deviation divided by the square root of n?
Remember what the standard error is - it the standard deviation of the means we would get from collecting a lot of different samples and calculating the mean for each one. So, if we use bootstrapping to create a bunch of means, all we need to do is calculate the standard deviation of those means..
Thank you for the great explanation, but just a question, the bootstrap histogram will always be similar to the original data distribution if you repeat the bootstrap a lot of times. So i'm not sure if i get the added value, especially because the bootstrapped dataset is a based on the means (in this example) but the mean of 0.5 is a bit misleading since the standard deviation is very high, based on the bootstrap and confidence interval we cannot reject the null hypothesis, as it include 0, but isn't that a bit.. like, I mean of course statistically it's correct, but in reality the drug has a very sign. effect (either positive or negative), but since both occur, the mean implies that the drug has no effect, so do the bootstrapped histograms. Thanks again.
I'm not sure I fully understand your question. The idea is that the boostrap represents what would happen if we could repeat the experiment a lot more times.
Thanks! Couple of questions - could someone please clarify this for me, please: 1) At 8:40 we should see "median values" at the bottom distribution instead of "mean"? 2) also, at the same time mark, why confidence levels moved to the left this far? they cover mostly "feeling worse" data points. More general question - is Bootstrapping theoretically or conceptually linked to the Central Limit Theorem?
1) Oops! That's a typo. It should say "median". 2) The CI was found by identifying the 2.5% and 97.5% quantiles, which were shifted as seen in the video. 3) I do not think so.
@@statquestThanks, Josh! Could you please elaborate on the CL for medians. _Why_ it is so shifted to the left, compared to CL for mean values. I'm so sorry to bother, but it seemed that I _get_ it, while in reality I cannot understand why the CL for median values is so, so different from CL for means. I've purchased your PCA guide. Pure awesomeness!
@@SwapperTheFirst Thank you for supporting StatQuest!!! As I wrote earlier, the CI was found by identifying the 2.5% and 97.5% quantiles (95% of the quantiles are between 2.5 and 97.5). If that doesn't make sense to you, consider watching the StatQuest on quantiles: ruclips.net/video/IFKQLDmRK0Y/видео.html
I was thinking about Central Limit Theorem.. The sample data comes from some unknow distribution, so if we generate a new dataset and calculate the mean over and over again.. the histogram of these means will be like a normal distribution? If I'm not wrong, that's what central limit theorem is about, right? Unless it doesn't work when you repeat bootstrap like 1,000 or 10,000 times.. i don't know, I'm confusing
@@phelipe2587 This is my thought exactly. Using bootstrapping (random process) we get a normalized distribution (for example, of means), even when initial distribution is not normalized. I want to make a small experiment, though. I will get data from Josh deck (8 datapoints) and will run the bootstrap, say 10K, using highly random data (say, from random.org). Then I will get 8 datapoints from some other distribution, which is not normal (say, wealth distribution in US) and again, compare with bootstrap distro after 10K. Also want to check the median CL in bootstrapped distro, since I (alas) still don't get it. But when you play with actual data, instead of endless theories - sometimes you may have an insight.
The purpose of the 95%CI is to tell us whether or not the observed mean, 0.5, is statistically different from 0, and, in this context, when a 95%CI contains 0, we fail to reject the hypothesis that there is a statistically significant difference between the observed mean and 0.
@@statquest hello josh thank you for replying just one more question so whenever the CI contains 0( or the mean we are trying to differentiate from) in it we will fail to reject the null hypothesis correct ?
Nice video! but can I understand that this method is similar to permutation but differs to which permutation create a new dataset by blending the data together? do bootstrapping involve any blending of sample?
To be honest, I'm not sure if "blending" is a technical term or not. Bootstrapping is related to permutation methods, but fundamentally different since it allows, and actually requires, sampling with replacement.
NOTE: A lot of people ask "What happens when the original collection of measurements is not representative of the underlying distribution?" It's important to remember that a confidence interval is not guaranteed to overlap the true, population mean. A 95% CI means that if we make a ton of CIs using the same method, 95% of them will overlap the true mean. This tells us that 5% of the time we'll be off. So yes, a sample that is totally bonkers is possible, but rare. Understanding this risk of making the wrong decision, and managing it, is what statistics is all about.
Also, at 5:55 I say there are up to 8^8 combinations of observed values and possible means, but this assumes that order matters, and it doesn't. So 8^8 over counts the total number of useful combinations and the true number is 15 choose 8, which is 6435 (for details on this math, see: en.wikipedia.org/wiki/Multiset#Counting_multisets )
Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/
We take for granted all that went behind that idea of 95% CI that you stated - it was Jerzy Neyman's who came up with that definition. Have you read "The Lady Tasting Tea"? A bit of a history of some incredible mathematicians, including Ronald Fisher and Jerzy Neyman. The 95% comes up on page 123. Thanks for all your valuable statistics videos!
@@natasgestel6873 Yes, I've read the book. Those dues were pretty smart.
Thank you for explaining that order doesn't matter. I was looking for the clarification on this everywhere.
So, if we take our sample of 8 observations, and we calculate a 95% confidence interval around the sample mean by bootstrapping, and then a genie appears and tells us that the true population mean lies outside of that confidence interval, that's the same as saying that our original 8-observation sample's mean actually wouldn't appear 95% of the time if we repeated the experiment infinitely many times, each experiment being an 8-observation sampling of the population?
@@alexandersmith6140 The definition is of a 95% CI is that if we repeated the process of creating the 95% CI a ton of times, 95% of the CIs created that way would overlap the true mean. Thus, if collected 8 measurements and used Bootstrapping to calculate a 95%, then that if we repeated that process of creating the 95% CI a ton of times (collected 8 measurements, then calculated the CI with bootstrapping), then 95% of those CIs will overlap the true mean.
In other words, it doesn't matter if we use bootstrapping, or some formula to calculate the CI, in both cases we have to collect 8 measurements a ton of times.
There is nobody on RUclips that explains statistics better or in a more entertaining way than you! Keep it up!
Wow, thanks!
What I love about you is that you explain the big picture first. You help me understand why we should care in the first place, or the motivation behind the concept. Then you dive into the details afterwards, you make the information more accessible without compromising the technical integrity of the information. A very rare skill indeed, I'm reading Introduction To Statistical Learning in R ( ISLR ) and some chapters aren't intuitive, whenever I read a chapter that doesn't make sense I just watch your videos. That's how I know you're not compromising the technical integrity of the information, because what you say doesn't contradict what I read in academic papers, it's just easier to understand than what I read in academic papers. You are one of a kind!
Thank you very much!
@@statquest No, thank YOU Josh!
I think you summed up the value of these videos really well. Starting with the big picture and then zooming into the details is so much more beneficial for learning and I think this is one of the things Josh nails!
@@CaptainFeatherSwordzIt's worlds apart from what the education system has conditioned us to right?
I have done a master's in stats and a course in data analysis, and the only reason I've passed these things is that after a long and confusing lecture I can just come and watch you explain it in simple terms. Bam!
Thank you so much!!
Thanks! I'm glad my videos are helpful! :)
I am presently in your shoes, taking a Data Science Course but thanks to @statquest. giving him Double Bam!!
That's cool! How are your studies/career going?
You're probably the best guy for this job. Even though I don't know where I'm gonna apply all these. I just keep going through all of your videos. After finishing up this playlist I'll watch the ML playlist. Keep amazing us. Thank you JOSH
Thanks!
I can never get over how your videos make me love statistics when all my professors and recommended texts made me run away from it. Super grateful!! Also, I think I asked when this video was coming about a year ago.
Glad it finally came out! :) Sorry it takes me so long to make videos.
Another great video. This video explains how to do bootstrap, which is the easy part. The more difficult part is to understand why bootstrap works. The conceptual challenge is that bootstrapping assumes that if we were to repeat an experiment, it would produce one of the outcomes we had observed. This could be a huge assumption, depending on the applications. Boot strapping does not add any new information to what has been observed.
Noted
"The reason why this works is because the histogram of the sample tends to look very similar to the histogram of the population. That's really the key idea behind the bootstrap, and we will see how this idea can be used in all kinds of complicated situations. "
Taking an online course on bootstrap regression and came here to try to understand why bootstrap works when it does not generate any new information.
@@sgpleasure When you sample from a population, it’s unsurprising that the distribution of the sample resembles the distribution of the population. So, you’re not really obtaining any new information. In essence, we’re only pretending it’s new information, when in fact, it’s just reconfirming existing information.
I wonder n feel so much regard for the institution and teachers, who taught you... no doubt, you are doing an incredible job...stay blessed always
Thank you! :)
All semester long I have been floundering through my statistics class, no thanks to my professors' boring and quite difficult-to-follow lectures on the materials. I've felt so dumb all semester, so when the next section called for "bootstrapping" I finally decided to throw her lecture videos aside and see if someone could explain the concepts better on RUclips. Boy am I glad I stumbled upon this. The visuals are straight to the point and the way you talk through everything very slowly and clearly is SOOO helpful. The enthusiasm and goofiness helps me keep my attention, which is a pain for me with ADHD. I could rewatch my prof's videos 5 times and retain nothing. Makes me wanna just burst into tears from frustration. But I felt like I could actually keep up with this video and _understand_ it!
TL;DR thank you for making this, it was a HUGE improvement over my professor's teaching style and I will DEFINITELY be consulting you for future topics. You're a peach
Hooray! Thank you very much. Just for reference, here's a list of all of my videos: statquest.org/video-index/
@@statquest thank you very much
What a comprehensive and fun discussion! I really had trouble understanding the concept of bootstrapping by myself but your lecture helped me a great deal :> Kudos!
Glad it was helpful!
You sir are an absolute legend. Really helping me getting through my course, because my professor explains the same concept in a method that is 100 times harder to understand
Happy to help!
I read a section on bootstrapping countless times and only understood it finally after watching your video! All I have to say to that is: BAM! (and thanks a bunch)
Hooray!!! :)
I am justt speechless at - how can you mae something so complicated so simple , hats off to you and thanks a ton
Thank you!
This kinda feels illegal xD Really nice explained!
Thank you! :)
BEST explanation EVER of bootstrap. Thanks for your dedication!
Glad it was helpful!
You made this concept so much easier to understand than what I was supposed to be learning it from. Thank you so much!!
Glad it was helpful!
Wow! this is the first time I learned this. awesome!
BAM! :)
I am learning machine learning and came to this term , this videos explain it very clear, thank you.
Thanks!
You are a legend my friend! A legend. I am doing my masters in Data Science this fall and this is amazing
You can do it!
This is the most amazing video I've seen on bootstrapping, thank you! Quadruple Bam!
Wow, thanks!
Mr. Josh - u are amazing. World needs more ppl like u. Its like education on another level. Thank you
Thanks! :)
Not the information I was looking for but i couldn't stop myself from watching it to the end. It was quite entertaining :)
*BAM
That's awesome! BAM! :)
One of the most useful video on this topic on youtube, thanks!
Wow, thanks!
Wow this is so good. The intro made me laugh so hard, it wasn't even that funny I just didn't expect it.
Thanks!
OMG, it was a truly easy-to-understand video! Both the animation, narration, and explanation!!!! I wanna give a billion likes!!!
Wow, thanks!
Bootstrapping? More like "Bro, it's awesome knowledge you're dropping!" 👍
Bam! :)
@@statquest Boot! 🥾
The triple BAM was amazing, thank you!
Thank you!
The work you do is awesome!! Love it.
Thank you!
Josh, you're sent to us from heaven, thanks
:)
blud just dropped one of the best explanatory videos out there and thought we wouldnt notice☠☠☠
bam! :)
this is better than college-level advanced course !!! thank you
Wow, thanks!
So smoothly explained.
Thank you sir.
Thank you!
Thank you so much for your wonderful videos. I have a small request to provide a lecture on FLDA, GMM, EM Algorithm, MLE estimation, MAP estimation. Also, there are some lectures which are not in the book, please also include those lectures too. Thank you so much again!!!. I want to learn more and more from your lectures.
Thanks! I'll keep those topics in mind.
i loved your bams and the illustrations for the steps and your explanation helped a lot
Thank you!
Excellent explanation as always by StatQuest!!! Thx a lot!!!
BAM! :)
I can only say one thing: BAM!!! you are the best teacher BAM!!!
Thank you!
i read 30 pages of a book , almost get sth,watch 10 min of statquest, fully understand the subject, you're the best bro
Thanks!
I love the terminology alert😂
quadruple bam !😂
bam!
Wow, that was super easy to understand. Thank you very much
double bam! :)
Great video!
Btw, you could probably do a really good Solid Snake voice. Would love to get an Easter egg in one of the next videos!!
That would be funny. :)
Statquest is the netflix for data science concepts.
bam!
Big BAM for so much statistic knowledge in such little time
Hooray!
Awesome work! Thanks Josh! It seems weird to me that this creates useful/meaningful results. You're just reusing the same data again and again, so even though we can generate new means by bootstrapping, are those means actually correct? In other words, has anyone compared bootstrapping (and calculating the mean) to repeating the experiment a bunch of times (and calculating the mean)?
Yes. Bootstrapping has been around for about 40 years and has tons of theoretical justification etc. It's the real deal.
You can simulate bootstrapping techniques easily.
The bootstrapped distribution has the same mean as the statistical mean of the sample, and the more points you have in your original sample, the closer its width will be to the spread of the means you would get if you redid the experiment a bunch of time.
Bootstrapping is not the same however as doing the experiment a bunch of times, as the latter would give a bigger dataset with a narrower confidence interval. This is a confidence interval estimation method that does not make assumptions about the underlying data distribution.
Thank you for the teaching 🎉
Any time!
In 8:23 the notation on the x-axis should be median values not mean values since we are using median as statistic measurement for bootstrapping in this case...pls look into it
Yep. That's a typo.
Dear Josh I bought a few study guides :) Thanks so much for your videos
Awesome! Thank you so much for your support!!
9 th wonder I learned bootstrapping and confidence intervals! hurray!
double bam! :)
thank you, this was very helpful
Glad it was helpful!
I loved the idea of shameless self promotion idea lol. Thanks for your time and effort.
Thank you! :)
Thank you so much, your videos are always so helpful to me
Glad you like them!
Nice way of explanation!! BAM!!!
Thanks!
Okay you are the best thank you for doing this video !
Thank you!
Create video! One small flaw I would like to point out the lingo should be fail to reject the null hypothesis instead of cannot reject the hypothesis.
Noted
Nice work , man
Thanks!
Great video! I learn so much with you!
Awesome! Thank you!
You are amazing! Thank you!
Thanks!
Loved it... Big BAM!
Thanks!
easy to understand....thanks josh!
Thank you!
University professor explained it in a confused and insufficient way (to put it politely), then I came to StatQuest.
bam!
You rock Josh. Thanks for making this video!
Thanks!
Love all of your videos!! Thanks a lot!
Glad you like them!
Josh is on Spotify! BAM
bam!
Ur videos are just so cool, tnx a lot
Glad you like them!
Please upload videos on monte carlo simulation and integration
I'll keep that in mind.
This guy is great!
Thanks!
Nice explanation...awesome
Thank you!
Thank you
TRIPLE BAM!!! Thank you so much for supporting StatQuest!!! :)
Thank you so much, very clear again! Not planning to make some videos about Fisher information, Jackknife, and Delta method, by any chance? 😬
No time soon, but I'll keep them in mind.
@@statquest Jackknife would be cool!
Thank you for this informative video; it really clarified my understanding of bootstrapping! However, I'm curious about the choice between sampling with replacement versus without replacement in the bootstrap method. How do we determine the most appropriate method for our specific dataset? Are there particular scenarios or types of data where one method is preferred over the other? Any additional insights would be greatly appreciated. Thanks a lot!
Bootstrap always uses sampling with replacement.
Thank you 🙏
You’re welcome 😊!
Awesome song!
Thanks!
🎵
Boot World
Workin', playin'
Boot World
It's number one in boots.
🎵
:)
You are the best!
Thanks!
Thanks Josh, you are the one!
Thank you and congratulations again. I'm so glad I was helpful. BAM! :)
I love how cute was 'small bam'
:)
@@statquest give me a small bam 👊🏼☺️
Thanks!
Thank you so much for supporting StatQuest!!! BAM! :)
¡Gracias!
Hooray!!! Muchas Grasias for supporting StatQuest!!! BAM! :)
Thank you JOSH!
bam!
Josh if you need someone who cleans your room or makes the dishes, just give me a call. I own you that
Wow! :)
At 6:40 when you start to discuss the 95% CI; I think there will be a lot of people who wont understand the subtlety of this distribution. You have created a distribtuion of 'statsitics'; in this case the mean. So, as you would appreciate you have derived the "sampling distribution' of the mean, from which the standard deviation = the standard error of the mean and the 95% CI calcaution is trivial. The uninitated might not appreciate how this is different from a distribution of a single data set; whereby the standard error = the standard devation / sqrt(n).
noted
wow, great content!
Thanks!
Thank you very much!! One question: When a need to calculate de standard error, I just need to calculate de standard deviation of the resamples? Or a need to calculate the standard deviation divided by the square root of n?
Remember what the standard error is - it the standard deviation of the means we would get from collecting a lot of different samples and calculating the mean for each one. So, if we use bootstrapping to create a bunch of means, all we need to do is calculate the standard deviation of those means..
@@statquest Ooh I got it! Thank you very much for your answer and for being generous enough to explain! :)
Nice - “Part 1”, gonna have another nice solid multi-part set.
At least 2 parts. Part 2 will be out soon.
great video
Thanks!
Instantly subscribed!!
Thanks!
Explained better than my professor! Amazing work Josh
Wow, thanks!
Thank you for the great explanation, but just a question, the bootstrap histogram will always be similar to the original data distribution if you repeat the bootstrap a lot of times. So i'm not sure if i get the added value, especially because the bootstrapped dataset is a based on the means (in this example) but the mean of 0.5 is a bit misleading since the standard deviation is very high, based on the bootstrap and confidence interval we cannot reject the null hypothesis, as it include 0, but isn't that a bit.. like, I mean of course statistically it's correct, but in reality the drug has a very sign. effect (either positive or negative), but since both occur, the mean implies that the drug has no effect, so do the bootstrapped histograms. Thanks again.
I'm not sure I fully understand your question. The idea is that the boostrap represents what would happen if we could repeat the experiment a lot more times.
Can you also use bootstrapping method to calculate standard error for medians?
Yes, see 8:13
Thanks! Couple of questions - could someone please clarify this for me, please:
1) At 8:40 we should see "median values" at the bottom distribution instead of "mean"? 2) also, at the same time mark, why confidence levels moved to the left this far? they cover mostly "feeling worse" data points.
More general question - is Bootstrapping theoretically or conceptually linked to the Central Limit Theorem?
1) Oops! That's a typo. It should say "median".
2) The CI was found by identifying the 2.5% and 97.5% quantiles, which were shifted as seen in the video.
3) I do not think so.
@@statquestThanks, Josh! Could you please elaborate on the CL for medians. _Why_ it is so shifted to the left, compared to CL for mean values. I'm so sorry to bother, but it seemed that I _get_ it, while in reality I cannot understand why the CL for median values is so, so different from CL for means.
I've purchased your PCA guide. Pure awesomeness!
@@SwapperTheFirst Thank you for supporting StatQuest!!! As I wrote earlier, the CI was found by identifying the 2.5% and 97.5% quantiles (95% of the quantiles are between 2.5 and 97.5). If that doesn't make sense to you, consider watching the StatQuest on quantiles: ruclips.net/video/IFKQLDmRK0Y/видео.html
I was thinking about Central Limit Theorem.. The sample data comes from some unknow distribution, so if we generate a new dataset and calculate the mean over and over again.. the histogram of these means will be like a normal distribution? If I'm not wrong, that's what central limit theorem is about, right? Unless it doesn't work when you repeat bootstrap like 1,000 or 10,000 times.. i don't know, I'm confusing
@@phelipe2587 This is my thought exactly. Using bootstrapping (random process) we get a normalized distribution (for example, of means), even when initial distribution is not normalized.
I want to make a small experiment, though. I will get data from Josh deck (8 datapoints) and will run the bootstrap, say 10K, using highly random data (say, from random.org). Then I will get 8 datapoints from some other distribution, which is not normal (say, wealth distribution in US) and again, compare with bootstrap distro after 10K.
Also want to check the median CL in bootstrapped distro, since I (alas) still don't get it.
But when you play with actual data, instead of endless theories - sometimes you may have an insight.
How do you calculate the confidence interval in the bootstrap?
This is explained at 6:40
Can you please do a video of the Monte carlo simulation?
I'll keep that in mind.
Hello i had a question when you said that the confidence interval contain 0 in it shouldn't it be 0.5 since that is the mean ?
The purpose of the 95%CI is to tell us whether or not the observed mean, 0.5, is statistically different from 0, and, in this context, when a 95%CI contains 0, we fail to reject the hypothesis that there is a statistically significant difference between the observed mean and 0.
@@statquest hello josh thank you for replying just one more question so whenever the CI contains 0( or the mean we are trying to differentiate from) in it we will fail to reject the null hypothesis correct ?
@@ishangrotra7265 That's the idea, however, I believe the null specifically refers to 0.
@@statquest thank you josh please keep up the good work you have of a really great help !
0:40 I bet this is the official thought process of pharmaceutical companies
;)
Bam!! Thank you!
Thanks!
Thank you so much for this video!! Are there any cases where bootstrapping must NOT be used?
Not that I know of off the top of my head. Maybe if you only have 3 or 4 measurements, maybe then bootstrapping will not be very useful.
Nice video! but can I understand that this method is similar to permutation but differs to which permutation create a new dataset by blending the data together? do bootstrapping involve any blending of sample?
To be honest, I'm not sure if "blending" is a technical term or not. Bootstrapping is related to permutation methods, but fundamentally different since it allows, and actually requires, sampling with replacement.
Before this video, I thought bootstrapping was a way of tying your shoe laces, lol
Ha! :)
@@statquest yeah lol :)
How is it different from drawing a normal distribution and getting the probability from that??
That question is answered at 8:13
This indifference voice is damn attractive
:)
this video is so great :3 bam!
Thank you!