Statistical Inception: The Bootstrap (

Поделиться
HTML-код
  • Опубликовано: 24 ноя 2024

Комментарии • 73

  • @swaree
    @swaree Год назад +25

    here from the SoME judging page, nice video that I hope gets recommended to more people. you earned a new subscriber. good luck with the contest!

    • @very-normal
      @very-normal  Год назад +2

      Thanks for your time! Glad you enjoyed it

  • @MultiWilliam15
    @MultiWilliam15 Год назад +140

    As a mathematics graduate student, it’s interesting to me how a lot of upper-level statistical tools are derived from real analysis. It’s like one of those courses that no mathematician wants to touch unless they’re working with PDEs or statistics.

    • @איילתדמור
      @איילתדמור Год назад +4

      Why do mathematicians want avoid this?

    • @micayahritchie7158
      @micayahritchie7158 Год назад

      ​@@איילתדמורit's hard

    • @pavlevilotijevic7266
      @pavlevilotijevic7266 Год назад

      @@איילתדמור It is a really technical course, lots of theorems, proofs and abstract thinking. This isn't unusual for mathematics but it is usually held in the first year so for a lot of people it takes getting used to. Also it is really big, basically every year a third of what you study is just different kinds of analysis (real, complex, functional, differential equations etc.). Still, it is objectively really useful and, if you can navigate through the mathematical rigor and detail, it is also really interesting and satisfying once you grasp the meaning.

    • @ivanjorromedina4010
      @ivanjorromedina4010 Год назад +5

      For me, real analysis is like the pretty popular girl, everyone wants to go with her. I know beacuse of the flatmates (I'm from Europe) that I had during my undergraduate, that there is also this type of subject in other degrees. For biology someone told me it is genetics, for medicine it is neurology and for physics it is astrophysics.
      Of course people don't want to mess around with real analysis and end up doing some descriptive set theory or other difficult stuff, but a lot of people like it because it is somehow intuitive at a very basic level (calculus, not foundational analysis).
      I want to mention that there is other places were you end up messing somehow with real analysis, but they are more higher level or advanced like harmonic analysis or measure theory.

  • @CsyroxXx
    @CsyroxXx 5 месяцев назад +2

    Finally someone who guides you in the right direction of WHY it works

  • @fiona8081
    @fiona8081 Год назад +2

    oh man, one of my thesis committee members just literally gave the comment "you should consider bootstrapping" with no further detail (for a project with several types of analyses), and all I knew was that when we learned how to run mediation in R using PROCESS in our first year stats class it used bootstrapping. So, this vid randomly getting suggested to me was pretty funny. Thank you for explaining the actual basic concept, bc like you said, mostly I had just seen high level math-heavy explanations, or instructions on how to do it in specific contexts.

  • @Camstraction
    @Camstraction Год назад +30

    I’m super glad this video was recommended to me, I really enjoyed it! Thanks for putting in the time and effort to make it. After subscribing I took a look at your channel to see if you had any other similar videos, but they’re mostly very short form. I hope you make more videos like this that are of a mid/long format. Good luck and I hope more people see this video and subscribe (:

    • @very-normal
      @very-normal  Год назад +3

      Thanks! More long form videos are definitely coming in the future!

  • @movax20h
    @movax20h Год назад +22

    There is a restricted variant of bootstrap, where instead of doing independent sample samples using Monte Carlo, you use correlated sample sample by doing it iteratively. In each iteration you replace few random elements of the sample, from the full sample, making a new sample sample, instead of replacing everything. This is called Jack Knife method. I learned this when I was studying physics, and it is very rarely used method, but it is very powerful, because with proper coding (all usual statistics like mean, variance, moments, can be implemented using incremental and reversible updates, that do not depend on a sample sample size, but rather just on number or replaced elements), can speed up things A LOT. Nuclear physicist and physconomics people you this quite a bit, because it also gives you all kinds of estimates and error estimates for free.

    • @Singularity24601
      @Singularity24601 Год назад

      Huh? Don't know if we're talking about the same thing, but your description doesn't sound like jackknife resampling to me. Jackknife is rather oldschool and pre-dates bootstrapping by several decades. Any chance you're thinking of Gibbs sampling?

  • @cosmic_Robot
    @cosmic_Robot 2 месяца назад

    What a well made presentation of an idea that feels very opaque to less-seasoned statistics professionals (talking about myself). You did a wonderful job with this explanation. Subscribed.

  • @patrickmoehrke4982
    @patrickmoehrke4982 Год назад +16

    Thanks for the great video! One thing that would have tied it up is bringing it back to the original example and briefly illustrating why bootstrap could work in that case

  • @twandekorte2077
    @twandekorte2077 7 месяцев назад

    Great video!

  • @asdf56790
    @asdf56790 Год назад +15

    What I don't get is that we wanted to avoid using the CLT, because of our small sample size. However, our reasoning for the bootstrap still seems to be asymptotics, so the assumption of a large n. Also: Where exactly do we use Monte-Carlo methods here? I understand them as an approximation method used for numerical estimates, but not for proofs?
    I guess it would've helped me a lot to apply the reasoning to our specific example.
    But this is a great topic and I'm super happy to see someone mathematically justifying it on RUclips

    • @andrewhubbard4043
      @andrewhubbard4043 Год назад +2

      I think the idea is that while you only have a sample n of 30, you can have a bootstrap n of much larger, say 200.

    • @icodestuff6241
      @icodestuff6241 Год назад +1

      the CLT only gives info about the mean, not the median, since we can see a visible skew

  • @Danylux
    @Danylux Год назад +1

    I've heard a lot about bootsraping from my phycisist friends and didn't know the details, I'm taking probability rn with a bit of measure theory and seeing how análisis and and measure theory make this idea useful is a really good insight!

  • @Mutual_Information
    @Mutual_Information Год назад +1

    Very well done! Keep going!

  • @BleachWizz
    @BleachWizz Год назад +3

    4:38 - ok before I get to the explanation here, when you said the concept of the idea it's just brilliant! I know it's unsatisfying to not know a proof but it feels like it just has to work if you're not with an exceptional case.

  • @the_math_behind
    @the_math_behind Год назад +1

    Great video! I had always wondered what was going on under the hood with bootstrapping. Can’t wait check out some of your other vids!

  • @davidmoore5846
    @davidmoore5846 Год назад

    Woohoo, I saw 999 subscribers and had to hit subscribe!
    Excellent explanation. I always introduce it by starting from an algorithm ("you just resample the data") and then moving forward to the more complicated terminology (bootstrap estimator, monte carlo estimate of bootstrap estimator). It's nice to see the historical and mathematical details being discussed!

    • @very-normal
      @very-normal  Год назад

      I will remember you forever
      (thank you for subscribing!)

  • @bin4ry_d3struct0r
    @bin4ry_d3struct0r 3 месяца назад

    The implementational concept sounds so simple at first, but the underlying math ... 😬💀😱

  • @SimGunther
    @SimGunther Год назад +3

    Was gonna respond to another thread on this, but it's more convenient to have it here:
    ​The reason that the Monte Carlo method is named the way it is is due to Stanislaw Ulam's uncle and how he would borrow money from his relatives in Monaco to support his gamblin'. In knowing this, Nicholas Metropolis made the suggestion based off the Monaco Caseeno just to give Ulam and John von Neumann a code name required for a neutron simulation program.
    Yep, you can thank Metropolis for giving the name to methods used in these simulations for the Manhattan Project. That's some dark lore for you for some innocent non-deterministic math.

  • @AVeryLargeSon
    @AVeryLargeSon Год назад +1

    Hey you probably used Latex to generate the text in this. Just FYI, a double backtick `` is used to compile a forwards facing quotation mark.

  • @sjoerdglaser2794
    @sjoerdglaser2794 Год назад +9

    Maybe I don't really understand it. But as you mention in the beginning (and I heard in my education), the bootstrap is mostly used when datasets are too small to provide meaningful estimates on their ow. However, the final part of why bootstrapping works still hinges mostly on asymptotics: in the limit of infinite datasets.
    So maybe I understand it wrong, but don't the asymptotics destroy the point of bootstrapping?

    • @very-normal
      @very-normal  Год назад +1

      I also learned early on that it could be used with smaller datasets, but it seems this isn’t the case. After some thinking, I think it was a mistake on my part to say that bootstrap can be used with smaller datasets. I don’t think there’s anything stopping us from using it with small datasets, but then we aren’t assured that the bootstrap estimate will converge to the underlying CDF. But from a practical perspective, it can still tell us a little bit about the distribution which can still be helpful.
      Thanks for watching the video and the feedback!

    • @trinalmage9446
      @trinalmage9446 Год назад +3

      This 'flaw' of the bootstrap is well-known: all justifications of the bootstrap are also asymptotic (there's some great articles by Le Cam on this). However, if you delve deep within these asymptotics, the bootstrap approximates something known as the Edgeworth expansion, and as such is expected to be a "faster", more reliable asymptotic than relying on the usual methods.

  • @nelson6814
    @nelson6814 Год назад

    This video is art, so good 😊🎉

  • @nujabraska
    @nujabraska Год назад

    It would’ve been nice to return to your original example to see how a bootstrap method would transform your data, but other than that, good video. I have taken a couple stats classes at uni and that helped me understand, so someone less statistically inclined may have trouble understanding this video.

  • @papetoast
    @papetoast Год назад +6

    I like the topic. A small criticism is that the video doesn't seem to be made with a clear target audience in mind, and as a result the math difficulty jumps around too much. For example, PMF and CDF probably don't need any explanation here. If someone doesn't know those, they likely won't understand everything else anyways.

    • @very-normal
      @very-normal  Год назад +5

      Thanks for watching and for the feedback! After watching 3b1b’s video on the SoME3 winners, I definitely see the importance of thinking of a target audience more. I’ll do my best to do better in future videos!

  • @FlamenLion
    @FlamenLion Год назад

    Delightfully interesting!

  • @ollie-d
    @ollie-d 10 месяцев назад

    Would’ve been helpful to walk through an example of applying the bootstrap to the data you framed the narrative around.

  • @kaiserruhsam
    @kaiserruhsam Год назад

    specifically the phrase was meant as an example of something impossible to do, not simply lacking outside help.

  • @kventinho
    @kventinho 3 месяца назад

    Okay idk why but this is so cute hahahaha

  • @alexoprea1000
    @alexoprea1000 Год назад

    Really nice explanation

  • @SebWilkes
    @SebWilkes Год назад

    Thanks for the video! I just wonder if there could have been a demonstration on the clinical trial? Perhaps your point is that is still a lot of work, but if it was doable to show then I would be curious to see how the ideas behind bootstrapping then apply to "the real world". This is to say, some people learn better from examples!

    • @Singularity24601
      @Singularity24601 Год назад

      We generally design clinical trials in such a way that bootstrapping is not relevant. Bootstrapping is not in my starting lineup of methodological tools, it's more like a fallback for when all else fails.

  • @yorailevi6747
    @yorailevi6747 Год назад

    Can you show an example of the bootstrap method?

  • @trinalmage9446
    @trinalmage9446 Год назад

    This video displays some wonderful intentions and craftsmanship, especially when it comes to animation and the early parts! However, I hope you will accept some feedback:
    - 5:17 This phrase can cause confusion: I know you know that the PDF is not P(X=x), but "the likelihood of seeing different values" makes it sound like that for someone who doesn't know better. More importantly, the discussion can be had entirely without densities, and a quick comment on how the CDF describes distributions most generally since it encompasses both the discrete and continuous cases would probably have been clearer.
    - There are a lot of unclear phrases that hint at depth but are also just a tad confusing. "They (PDF and CDF) both describe the randomness in a random variable" doesn't really help me understand anything that's going on.
    - Talking about "dependence on n" wrt the cdfs and bootstraps is fine, but isn't the bigger, aching question the fact that you are approximating two **random** functions? This discussion seems to have been ignored, but kind of underlines the whole thing. Then again, seems like a hard task for a 10-minute video. Maybe it would've been better to leave the Monte Carlo explanation for a second part, discussing implementation?
    - The infamous middle step of the proof is a little confusingly laid out to me, still. But, seems like a frankly herculean task to try to explain all the baggage to properly grasp at why that proof-idea makes complete sense.
    - Given that you cite the Wasserman notes in the video, probably would've been nice to include them in the description?
    - On a minor note, these videos could do with a little more focus. You seem to go off on a few tangents related to you, your experience and how you learned the subject a fair few times, which are of course valid, but could be a little sparser, given the objective of the video is teaching others.
    Best of luck in your present and future academic and RUclips career!

  • @PierreCasas
    @PierreCasas Год назад

    Not sure I understand what is the benefit in precision from bootstrap, as we could directly compute the CDF of the sampling distribution ?

    • @very-normal
      @very-normal  Год назад

      It can be used as a general tool since we may not always be able to compute the CDF directly. For example, if we’re dealing with functional data, we can use it to construct bootstrap intervals to understand the variability in the coefficient function.

  • @imotvoksim
    @imotvoksim Год назад

    Unless you've gone through a graduate statistics course, I don't think you have a chance at taking away much from this video. But since I went through a few such courses, I feel like you left me hanging in there with a teaser for an actual proof. Nice explanation overall, but I feel like the target audience could handle more depth. Also, it's funny that the intro finishes with "you can't always ask asymptotic to save you" and the video proceeds to proof the bootstrap through asymptotics.

  • @treelight1707
    @treelight1707 Год назад +2

    Although I kind of understand what bootstrap is about, I didn't get to the part where I can actually implement it. Nevertheless, I enjoyed the video very much, and I really like the channel name :) Looking forward for part 2.

    • @very-normal
      @very-normal  Год назад

      Thanks! I’ll always be kicking myself for not including that last analysis part lol, I thought people wouldn’t care about it but I was wrong

  • @AllemandInstable
    @AllemandInstable Год назад

    nice one

  • @tomasalvim1022
    @tomasalvim1022 Год назад

    Really liked the video, but the music is a little loud.

  • @htomerif
    @htomerif Год назад

    I didn't like the part where we didn't get to see some actual analysis of the original data set presented to us at the beginning of the video. What I got out of it was that this bootstrapping "works" but the degree to which it works depends on the size of the data set.
    I don't get why the number of unique bootstrap data sets is relevant compared to n^n. I'm guessing it's not relevant and computationally constructing only unique data sets would be very bad for accurately embiggening the original data set.
    I also don't get if this is supposed to lend some continuity to the data set that "works"? My guess is "no". For example, in the real world there's the TAS2R38 gene that lets people taste certain chemicals. There are (so far as I can tell) 8 genotypes for this gene. If I sampled 200 people for sensitivity to tasting a chemical relevant to TAS2R38, all 200 of them would fall into one of exactly 8 bins (I don't think it's partially dominant). If I didn't know this already, any statistical analysis that assumed the continuity of the represented total population and generated so much as one extra bin (much less a continuity of bins) would be really very broken.
    So that's what I'm worried about with this bootstrapping. There are some situations where I would want a nice curve and some where the discreteness is pretty critical information, but I would have to know which I was looking for in advance for the bootstrapping to work, which kind of unstraps the bootstrapping. Does that make sense?
    Anyway, its not that I didn't like the video, I did. It just felt like instead of being handed a tool, I was handed an advertisement for a tool. I'm obviously not a statistician but I'm also not an idiot, so I guess I'll have to do some more looking into this on my own.

  • @Possumman93
    @Possumman93 Год назад

    So, what was the treatment effect

  • @TheOneMaddin
    @TheOneMaddin Год назад +5

    "It is called Monte Carlo because we are picking multiple boot strap samples at random" ... okay, cool ... but why is it called Monte Carlo?

    • @very-normal
      @very-normal  Год назад +1

      Great question! That’s worth talking about in a video

    • @tomasalvim1022
      @tomasalvim1022 Год назад +1

      Casino from Monaco

  • @lorenzoplaserrano8734
    @lorenzoplaserrano8734 Год назад

    ❤❤❤

  • @gingeral253
    @gingeral253 Год назад

    Interesting

  • @sherifffruitfly
    @sherifffruitfly Год назад

    YOU CAN'T JUST USE ASYMPTOTICS TO SOLVE ALL YOUR PROBLEMS! (dies laughing)

    • @sherifffruitfly
      @sherifffruitfly Год назад

      and naturally he immediately moves on to use asymptotics with his bootstrap estimators :P
      love the video tho - thank you! :)

  • @wanfuse
    @wanfuse Год назад

    can I get permission to use this content in an open source project I am making?

  • @daniellebalouise9596
    @daniellebalouise9596 Год назад

    What are the prereqs for me to watch this video?

  • @jestingrabbit
    @jestingrabbit Год назад +2

    My brother in christ, the content is good, the repetitive flashing of a horizontal white bar about a third from the bottom and left of the screen is incredibly distracting. The 'this is a video' conceit is no worth this distraction.

  • @Singularity24601
    @Singularity24601 Год назад +1

    I would argue that medians are overrated and rarely a good idea. High school tells us that medians are like means but better, because they're robust to outliers, while neglecting to tell us that such robustness comes with costs. Besides, "robustness" to outliers is just another way of saying "ignorant" to outliers. Yes, many clincal trials do report medians, but these are generally for illustrative purposes only and have nothing to do with the actual hypothesis testing. I think you could have done the same presentation without going into medians.

    • @Hamiltonianmcmc
      @Hamiltonianmcmc Год назад

      Robustness is very much not overrated. The reason why it is not overrated has to do with the fact that we aren't really ever actually interested in the average treatment effect itself; we have to settle for it. A doctor is truly interested in knowing how the medicine he prescribes will impact the patient right in front of him, not how it will impact patients on average. Looking for average/median treatment effects is a compromise with reality; we are truly interested in how medicine will TEND to impact people so that we can talk about how we think it will impact someone in front of us. For this reason, robustness can become important; especially, for example, when we are comparing two medicines against each other.
      What if most people get a small effect from medicine A with huge effects for some others, while medicine B tends to give much better effects for most people but with no outliers so that it looks like there's not much of a difference between the two? Consider this!
      People who discount robustness or say "never throw away data unless you can absolutely show that it was a mistake" are showing that they are not focused on the problem on hand and the needs of the end users of their analyses, but on dogma. Every single choice must be informed by the problem domain you are working in and the relative cost/benefit of being right or wrong when it comes to the decisions or actions that the analysis informs.

    • @Singularity24601
      @Singularity24601 Год назад +1

      ​@@Hamiltonianmcmc Robustness is indeed overrated. I am a doctor (and a biostatistician). Are you? Allow me to address some of your points individually:
      1. "A doctor is truly interested in knowing how the medicine he prescribes will impact the patient right in front of him, not how it will impact patients on average..."
      Firstly, the hard reality is that no one can predict the future with perfect accuracy. Hence, we cannot know how the medicine will impact the patient, we can only estimate a distribution of possible outcomes (preferably based on the data of other patients with similar details to the patient in front of us). We can summarize this distribution with a mean or a median. Secondly, you claim to value the details of the individual patient in front of us... this strengthens my argument, not yours, for means incorporate all the details whereas medians throw away details.
      2. "What if most people get a small effect from medicine A... Consider this!"
      Consider what? I think it'll be more effective if you state your point rather just a rhetorical question. The scenario you paint is not that unusual for a day at my work, yet I still can't answer your question because it's not clear what question you're trying to ask.
      3. "People who discount robustness or say 'never throw away data'... [are focused] on dogma."
      That's a rather sweeping statement to claim... dogmatic even. I am a cancer doctor. When we are trying to cure someone with a small chance of cure, outliers can be everything. Two treatments can have the same median overall survival, but one gives a 49% chance of long-term survival and the other gives a 0% chance of long-term survival. The median dogmatist claims that these two treatments are equal. I claim they are not.

  • @mv3845
    @mv3845 7 месяцев назад

    The music is so distracting:(

    • @very-normal
      @very-normal  7 месяцев назад

      Sorry about that, there is a version without the music in the description available

  • @nickdinenis9883
    @nickdinenis9883 6 месяцев назад

    Great video!!