Statistical Gold Nuggets (Hierarchical Models)

Поделиться
HTML-код
  • Опубликовано: 24 янв 2025

Комментарии • 64

  • @bigclout0056
    @bigclout0056 15 дней назад +64

    “Babe quick, Very Normal just uploaded” - genuinely me to my statistician girlfriend

    • @bjorntorlarsson
      @bjorntorlarsson 15 дней назад +19

      "- Are you in love with a statistician nerd?"
      "- Probably."

  • @Imperial_Squid
    @Imperial_Squid 14 дней назад +9

    Just wanted to say thanks for all the work you do!
    I had a job interview a couple of months ago and knew they'd asked about stats tests as part of it. Having all your videos there in one place made it really easy to just skimm through and give myself a refresher on a bunch of topics.
    I ended up passing that interview too!

    • @very-normal
      @very-normal  14 дней назад +2

      That’s great! I’m glad I was helpful in that process! Hopefully it helps with my own interviews as well lol

    • @Imperial_Squid
      @Imperial_Squid 14 дней назад +2

      @very-normal I'll keep my fingers crossed for you! Given your excellent teaching style and attention to detail I'm sure you'd do great!

  • @gn7586
    @gn7586 10 дней назад +2

    My intuition for this problem would be, that, in the real world, we would not know how many nuggets there are, if any. So I would set up a mixture model with 2 mixtures and compare it with a mixture model with just 1 to see if there are any nuggets. Mixture models can of course be viewed as a hierarchical model

    • @very-normal
      @very-normal  10 дней назад +1

      Yeah that’s exactly what I had in mind! This is known as the EX-NEX model in the clinical trial space

  • @hamatsg9988
    @hamatsg9988 14 дней назад +1

    Absolute thanks for this amazing masterpiece of a video, i know its very time consuming to make these videos but this channel will grow a lot. one of the best.

  • @nolan_meyer
    @nolan_meyer 13 дней назад +4

    12:22 makes me think of a mixture model

  • @-NguyenDuyTanA-mh1db
    @-NguyenDuyTanA-mh1db 15 дней назад +7

    Youre my motivation to keep deep diving into statistics

  • @5eurosenelsuelo
    @5eurosenelsuelo 14 дней назад +5

    Statistics is so hard... This video definitely went over my head. Still, thank you for making the effort of trying to explain it.

    • @very-normal
      @very-normal  14 дней назад +1

      It’ll come with time! Just keep at it and you’ll be surprised how far you get

  • @statisticiann
    @statisticiann 14 дней назад +5

    Looks like you were going to try a Mixture Model at the end there.

    • @very-normal
      @very-normal  14 дней назад +3

      👌 you got the idea of it

    • @someonespotatohmm9513
      @someonespotatohmm9513 11 дней назад

      @@very-normal can't wait for the video about it. Hope you get better soon regardles!

  • @DudeWhoSaysDeez
    @DudeWhoSaysDeez 14 дней назад

    I really appreciate your videos and the way you present topics!

  • @thomasrebotier1741
    @thomasrebotier1741 14 дней назад +6

    My data analyst run-to would have been leave-one-out comparisons. So compare each group mean to the mean of all other groups. The variance of the left-out of course does not change but the variance of the pool drops by a factor 6, roughly (35 groups, ~6^2), thus the CIs no longer overlap as easily. They will always do so, and I'm doing 36 tests instead of one, inducing higher demands of separatedness for the same p-value. So I'm curious to know how much overlap reduction would be required and my theory doesn't scale up I'll just do a monte-carlo to estimate p. How would this approach compare to the hierarchical? Which one has best odds of securing the nugget?

    • @very-normal
      @very-normal  14 дней назад +6

      That’s definitely an interesting approach. One reason I chose to do the analysis in a Bayesian way was so that I could sidestep any talks about multiple testing problems, since many frequentist approaches would have to contend with it. 36 tests is a lot, so it’d be interesting to figure out the best way to do it for getting that nugget

  • @drdca8263
    @drdca8263 12 дней назад

    My guess was something like “assume that each cancer type is in one of two classes, ‘non-nugget’ and ‘nugget’, and each of these two classes has their own mean, and where the mean for the latter is greater than for the former. Assume each of the cancer types has some probability p of being in the nugget group. I don’t know what prior to use for p.
    Then, given an assignment of which type is in which of the two types, have binomial distribution.”
    though that sounds like it could be pretty hard to compute the posterior for?
    I guess you would need to have both a confidence interval for the probability of good outcomes in the nugget group, along with a probability of “there is at least one type in the nugget group”?

  • @colin4349
    @colin4349 14 дней назад +1

    The scale of the axes at 6:08 and 6:28 should be the same for the comparison.
    Without having a similar scale on the x-axis, it takes a lot of computational brain power to actually compare the two distributions. My brain goes, oh those are similar size, but wait the axis is different, ok 25-20=5 and 7-2=5, so they are the same, wait decimals, ok 0.05 and 0.5 ish.

  • @aleksjabonski6560
    @aleksjabonski6560 14 дней назад +4

    My question is: where do you learn this kind of applied stuff? Any book recommendations with case studies to carry out these type of analyses yourself in R?

  • @rafaelwendel1400
    @rafaelwendel1400 7 дней назад

    Could you also use a bayeasian approach to create subsamples of your total sample and update the variation of your estimates using the best candidate of each round to enhance how you distribute your sample sizes? Eg starting with two groups with half their total samples and grouping the most distinct averages in new groups as you add more data from your sample in, say, binary cuts of the data (like the binary search).

  • @Shkib0y
    @Shkib0y 15 дней назад +6

    Why Stan over a Bayesian package in R? (I’ve used Bolstad for Bayesian work)

    • @very-normal
      @very-normal  15 дней назад +6

      just for speed, I know Stan the best so it was the fastest to implement. It also helped me point out differences in models

  • @MrKopara123
    @MrKopara123 13 дней назад

    Would you ever consider to make video about factor analysis? It would be dope to have it explained by you!

    • @very-normal
      @very-normal  13 дней назад +1

      Yeah! I only know vaguely about it but it’s definitely be worth a video!

  • @DudeWhoSaysDeez
    @DudeWhoSaysDeez 14 дней назад

    Bro deserves more subscribers

  • @Pedritox0953
    @Pedritox0953 8 дней назад

    Great video! Peace out

  • @behrad9712
    @behrad9712 14 дней назад

    Thank you so much 🙏

  • @MrHjld
    @MrHjld 14 дней назад

    Go over Stein's example and the JS estimator next !

  • @nkartzzgermany
    @nkartzzgermany 12 дней назад

    Great video! If you‘re looking for new video ideas, I‘d love to see a visualization of the different types of convergence and their connections :) Couldn’t find any good videos on YT yet…

  • @BigRedHeadd
    @BigRedHeadd 12 дней назад +1

    Couldn't you use a leave-one-out approach where you do the pooling each time with one group not included? You then know the nugget is the one with the biggest distance and least variance overlap with the pooled groups. Does that make sense?

    • @very-normal
      @very-normal  12 дней назад +1

      Yeah it makes sense! What you’ve described is actually very similar to one of the more sophisticated approaches to this problem, sort of testing out different combinations of “sameness” and trying to find the most probable one

    • @duckboatsdotnet
      @duckboatsdotnet 10 дней назад

      That was my thought as well. I.e. perform a jackknife analysis.

  • @neils3ji
    @neils3ji 14 дней назад +1

    Bias-variance tradeoff is one of the most important concepts in data science, ML/AI, stats. Respect for bringing it up!

  • @1.4142
    @1.4142 14 дней назад

    I just read the tipping point book on the plane lol

  • @jjmyt13
    @jjmyt13 11 дней назад

    I video about hierarchical models and not a single Simpsons (paradox) gif is criminal! Jk, loving the series

  • @cornagojar
    @cornagojar 13 дней назад

    Please to see Stan mentioned

  • @justasaiyanfromearth5252
    @justasaiyanfromearth5252 14 дней назад

    What's really funny I yesterday had my final exam on bayesian statistics and it was on hierarchical models.

  • @nicholasviscomi7588
    @nicholasviscomi7588 14 дней назад +2

    Gaussian mixture model?

    • @landsgevaer
      @landsgevaer 12 дней назад

      Not sure what you apply that to, since that is a method to fit a histogram with multiple additive gaussians (if that is what you mean). But, responses are binary, resulting in probabilities, not gaussian, which works for general numeric data; and you have knowledge regarding which group each sample belongs to, that does not seem to be used for gaussian mixtures.

  • @ridwanwase7444
    @ridwanwase7444 3 дня назад

    A off topic problem but that intrigues me
    let we are obeserving no of cattles we are observing in one hour. In the countryside, there will be many cattles so let we have data:30,31,28,29,30,... and in urban area there will be less cattle but somehow a truck of cattle may appear. So there our data:2,3,30,4,25,2... If we want to model these with poisson, then first case variance will be higher as lamda is higher. but in real life, we can sense, second process will have higher variance. How to interprete it?

    • @very-normal
      @very-normal  3 дня назад +1

      One way to model this hierarchically is to place a distribution on the lambda parameter, similar to how a distribution was assumed for the binomial parameters here. City and country side would have different lambda, which will let them have different variances. This encodes a belief that the distribution of the number of cattle observed are different in city and countryside but also acknowledges that there are common factors that should make them similar.
      Unrelated to this video, but you can also try negative binomial regression to account for the overdispersion and add city as a regressor.
      In the end, the model you choose should hopefully be motivated by your knowledge about the data

    • @ridwanwase7444
      @ridwanwase7444 2 дня назад

      @very-normal Thanks!

  • @federicomagnani1954
    @federicomagnani1954 14 дней назад

    There is no way a real statistician lookes like in 6:42

  • @Inexorablehorror
    @Inexorablehorror 15 дней назад

    Maybe using a student-t distribution instead of a normal distribution? I a not sure, but I would assume that due to the normality parameter nu of the t-dist, the shrinkage would be smaller than with a normal one. Bbut the "outlier" would be more compatible and the mean doesn't need to be shifted so much. Likewise, the sigma parameter could be smaller, which could decrease the credible intervals. I would love to see the results from it.
    Thanks for the videos, "brms" would be great to show ;-) I know, Stan is much more flexible...

    • @jeffreychandler8418
      @jeffreychandler8418 14 дней назад

      yeah a T posterior would probably be a better choice. I'm not super well versed in BHM's but I've seen T posteriors far far far more often than normal posteriors

    • @Inexorablehorror
      @Inexorablehorror 14 дней назад

      @@jeffreychandler8418 You mean a t-dist. as a prior right? The posterior does not need to follow any specific distribution.
      I have not seen t-distributions as priors for the random effect (as far as I remember), but for typical fixed effect models, mimicking frequentist ANOVAS and t-tests, as proposed by Kruschke or McElreath.

    • @jeffreychandler8418
      @jeffreychandler8418 14 дней назад

      @@Inexorablehorror oh yeah I did. brain crosses wires when I'm tired haha.

  • @bjorntorlarsson
    @bjorntorlarsson 15 дней назад +1

    I'd like a comment on traditional statistical methods assuming a distribution function, estimating the central and spread measures, and adjustments for kurtosis, heteroscedaticity et c. compared to machine learning ("AI") that uses all data with brute force to "finds patterns". Overfitting is the traditional reaction, but ChatGPT works darn well, doesn't it?
    Over a decade ago I had a look at Support Vector Machines (SVM) which is a mathematically analytical approach, not black boxed pattern fitting by (guided) brute force. The math of this stuff is a bit beyond my current paygrade and I have no use for it professionally, so it is only out of personal curiosity that I ask. I hear that SVM only has specialized applications nowadays since neural networks and such outcompetes it, so if you haven't familiarized yourself with SVP, it might not be worth the effort to do so. I just wonder if anyone reading this has, and wants to compare it to the traditional statistical methods. It seemed pretty nifty as far as I could tell, although I never applied it.

    • @very-normal
      @very-normal  15 дней назад +3

      learning how to code an SVM: 😃
      learning the theory of an SVM: 💀

    • @colin4349
      @colin4349 14 дней назад +1

      From a certain perspective, the Bayesian approach in the video is a machine learning method. A model is fit to some data. The first model has one parameter, the "overall response rate". The second model, approach #2, has 36 parameters, one for each group. The final model, approach #3, has 3 parameters. Given the size of the data in the video, ie small, the 3 parameter approach makes a lot of sense. The 36 parameter approach just has too many knobs to turn, too many degrees of freedom.
      ChatGPT 4 probably has around 1.8 trillion parameters.

    • @bjorntorlarsson
      @bjorntorlarsson 14 дней назад +1

      @@colin4349 Good point there!
      I really have to get a grip on machine learning, somehow. But those node layers turn me off, it's so stupid simple in detail. Not exactly Euler math. And that emulates human conversation? It is humiliating, but it is what it is.

    • @colin4349
      @colin4349 14 дней назад +2

      @@bjorntorlarsson Don't be afraid of just ignoring some details. For example, I know how to do simple calculations by hand and using a handheld calculator. However, I do not understand how the electrical signals make a calculator work.
      ChatGPT works at a high level by learning and then sampling from a distribution. The distribution is learned from the data, (how? don't care, ignored), conditioned on a "context window". A context window is the recent text conversation. Then ChatGPT samples the next word from the distribution conditioned on the context window, updates the context window, samples the next word, and so on.

    • @bjorntorlarsson
      @bjorntorlarsson 14 дней назад

      ​@@colin4349 ChatGPT is bad at math! But still, it surprisingly often generates the correct answer. If one then asks it what process it used, it just generates a new answer relating to that new question. A new generation that is of course completely unrelated to whatever it did to generate the previous correct math answer. It doesn't "know" anything about that, and doesn't "understand" my question the way I meant it.
      It takes some getting used to it. It never uses any logic! I've asked it, and it says so. I suppose it could be used as a convenient user interface that in turn actually uses math software like Wolfram or Matlab. But it doesn't yet, the standard version that I pay $20 a month for. Perhaps I could "prompt" it to do so?
      For real:
      "- But, without using any logic, how come you can generate code that actually work?"
      "- Oh, coding is just heuristics!"

  • @zerotwo7319
    @zerotwo7319 13 дней назад

    This is why in ML they are called 'models'...

    • @very-normal
      @very-normal  13 дней назад +2

      i think that’s the name for them in statistics too

  • @Trizzer89
    @Trizzer89 15 дней назад

    Cant Chi square test do this?

    • @very-normal
      @very-normal  15 дней назад +3

      It’s one way to do this. It’ll tell you that one of the groups will have a different mean from the rest, but you’d also have to do secondary pair wise tests to identify the nugget itself.

  • @wapsyed
    @wapsyed 13 дней назад

    love it! but that background is awful for reading :(

    • @very-normal
      @very-normal  13 дней назад

      oops sorry I’ll keep that in mind for future videos

  • @bokehbeauty
    @bokehbeauty 14 дней назад

    I can learn best with implementing the examples myself and play with the data coming with the example. Hence for me it would be far more educative if you would use either Python or R Bayesian stat libraries, and explain/justify the data set used. Else could you add a video how to execute Stan under Python (or R).

  • @AkshayKumar-vd5wn
    @AkshayKumar-vd5wn 15 дней назад

    First.