Metropolis-Hastings - VISUALLY EXPLAINED!

Поделиться
HTML-код
  • Опубликовано: 8 авг 2021
  • In this tutorial, I explain the Metropolis and Metropolis-Hastings algorithm, the first MCMC method using an example.
    I also celebrate Arianna Rosenbluth who was the first person to implement this algorithm on a primitive computer called MANIAC.
    Some links:
    Arianna Rosenbluth - Flash of Genius
    www.radcliffe.harvard.edu/new...
    Kris Hauser's brilliant explanation of why the metropolis-hastings procedure satisfies the detailed balance criterion?
    Original Location [seems to be not accessible anymore]
    people.duke.edu/~kh269/teachi...
    My copy of his notes
    drive.google.com/file/d/1_pGh...
    Colin Carroll's Blog
    colindcarroll.com/
    Colin Carroll's Example Notebook
    colindcarroll.com/2018/11/24/...
    #mcmc
    #bayesianstatistics
    #markovchains
  • НаукаНаука

Комментарии • 113

  • @hiclh4128
    @hiclh4128 Год назад +15

    This is the best MCMC video I have seen. I also like the history part. Please continue making such high quality videos.

  • @chazhovnanian6897
    @chazhovnanian6897 4 месяца назад +4

    Absolute BEST MCMC video, this helped me develop a true intuition of the topic

  • @est9949
    @est9949 Год назад +5

    Wow, this is a rare hidden gem. Best video on this subject. I also enjoy the history of the field and the researchers who had worked on this almost a century before me. Quite chilling to think about it.

  • @dhinas9444
    @dhinas9444 2 года назад +3

    Thank you sir Sachdeva, i love your witty way of explaining and also the visual representation!

  • @SuperCaptain4
    @SuperCaptain4 2 года назад +2

    Wow, I wish I had found this video earlier. Your channel in general actually! So far it is a hidden gem but I am sure it will not remain that for long

  • @MarkSimithraaratchy
    @MarkSimithraaratchy 9 месяцев назад +1

    I watched this video to clean up some of my understanding around MH and get some visual grounding. Excellently explained and nice cuts of humor to keep it engaging!
    What I didn't expect was this touching tribute to Arianna Rosenbluth. Women certainly did not get the proper recognition regardless of the incredible contributions they had to the field. Thanks for sharing this important slice of history and putting emphasis on her contributions.

  • @proxyme3628
    @proxyme3628 Год назад +5

    Thanks so much for creating the video. Your video is always concrete, visual, intuitive. I wish other videos dumping math formulas from text books will learn from you on how to teach or explain in complex things in simple, visual, intuitive manners.

    • @KapilSachdeva
      @KapilSachdeva  Год назад +1

      🙏 thanks for the kind words. Much appreciated.

  • @swapnilamale9073
    @swapnilamale9073 2 года назад +7

    one of the best explanations for the one of the most challenging topics to understand from a book....your knowledge, research, and experience reflected in the video..personally find u underrated... best wishes for more subscribers

    • @KapilSachdeva
      @KapilSachdeva  2 года назад +1

      Thanks 🙏… please share it with your colleagues.

    • @swapnilamale9073
      @swapnilamale9073 2 года назад

      Please, could you make a video on subset simulation (SS)!

    • @KapilSachdeva
      @KapilSachdeva  2 года назад

      Have not used Subset Simulation in practice (generally do not like to make tutorials on subjects that I have not used in practice); may be will make a tutorial some time later after I have tired it myself. No promise. Sorry!

  • @yli6050
    @yli6050 Месяц назад

    I am grateful to your lectures ❤ what a wonderful service you’ve done to all the learners

  • @user-hv9ze2qd8f
    @user-hv9ze2qd8f 9 месяцев назад +2

    Simply Excellent, it shows how hard work you have done to make this video.

  • @chiaosun3505
    @chiaosun3505 2 года назад +1

    the only explanation that i didn't fall asleep listening. thank you very much.

  • @johnnykessler
    @johnnykessler Год назад +2

    This is exactly what I have been looking for, thank you so much!

  • @SimoneIovane
    @SimoneIovane 7 месяцев назад +1

    This video is a gem of this subject. Really congrats for your teaching skills

  • @ivanpastukhov8664
    @ivanpastukhov8664 8 месяцев назад +2

    Outstanding explanation! Thank you so much Sir!

  • @TheJuwailes
    @TheJuwailes 2 года назад +1

    Quite incredible honestly. I would pay for those videos, they're that good.

    • @KapilSachdeva
      @KapilSachdeva  2 года назад +2

      🙏 Just this appreciation is enough.
      Haven't got any cycles in the last few months to work on the videos but will make them for sure.

    • @TheJuwailes
      @TheJuwailes 2 года назад

      @@KapilSachdeva I hope you make a video on the convergence criteria for MCMC's and especially Metropolis-Hastings, and the Thinning phenomenon

  • @rafas4307
    @rafas4307 10 месяцев назад +2

    this is the best explanation I have ever seen

  • @tsilikitrikis
    @tsilikitrikis 2 года назад +2

    Excellent explanation, took me Some hours searching about this algorithm, u explain it perfecr.
    Actually, the hole point is the acceptance probability That has relation with the shape of our final curve-distro. For example in bayes, we dont need to compute Likelihood*prior analytically but Just to take a number out of it.
    The normal is used as a distribution to search around xt point, so we searching around each point of x axis and accepting it as "part" of our distribution according to the acceptance probability.
    Perfetto!Thank you

  • @MisterDives
    @MisterDives 7 месяцев назад +1

    Excellent video, very clear explanation -- thank you so much!

  • @krgonline
    @krgonline 2 года назад +2

    Thank you very much for the excellent content and presentation. Stay blessed

  • @mohsenvazirizade6334
    @mohsenvazirizade6334 8 месяцев назад +3

    Thank you for such a brillian tutorial. Any videos on Gibs sampling?

  • @hongkyulee9724
    @hongkyulee9724 Год назад +1

    This videos is so very intuitive ! Really thank you. You save me 😊😊😊😊😊😊😊😊!!!

  • @josemariapereziquierdo5939
    @josemariapereziquierdo5939 3 месяца назад +1

    Thanks a lot for this clarifying video.

  • @user-px9zz3fo9o
    @user-px9zz3fo9o 2 месяца назад +1

    best MCMC video I have seen

  • @greysky1786
    @greysky1786 5 месяцев назад +1

    Thank you very much for this video, it was really helpful.

  • @tsoofbo
    @tsoofbo Год назад +1

    Amazing video. Thank you so much!

  • @mathewjones8891
    @mathewjones8891 4 месяца назад +1

    Dr. Sachdeva, I am riveted by your tutorials. So clear. Thank you for historical perspective, especially Arianna Rosenbluth. So important to remember who these ideas/methods came from. In my worlds (Markov modeling of ion channels, and other stuff), I think we use the phrase "detailed balance" interchangeably with "microscopic reversibility" (a phrase due to Tolman, I think, from the 1930s?). The idea is that a closed physical Markov system (say, an ion channel protein transitioning between different conformational states), at thermodynamic equilibrium, should be completely symmetrical and reversible. If there is a loop in the Markov chain (e.g., S1S2S3S1, where Sn is a discrete state), then the *product* of transition rates should be the same clockwise vs counter-clockwise. Thermodynamically, this is equivalent to conservation of free energy, similar to the idea that traversing any landscape and returning to the same point will result in the same final energy. Is this the same meaning that you intend for "detailed balance"? If so, why is this mathematically necessary? In real life, ion channels are not in a closed system and are rarely at thermodynamic equilibrium. There are ionic gradients and voltage gradients. A biological cell (e.g., a neuron) is not a closed thermodynamic system, it exchanges energy with its environment (the "bath"). So the assumptions about "microscopic reversibility" may not (do not) strictly apply. Nevertheless, I have encountered a lot of resistance about "violating" microscopic reversibility in some of my published models of ion channel function. So, my questions are a) is "microscopic reversibility" the same as what you mean by "detailed balance"; b) is detailed balance a mathematical necessity, or can it depend on conditions; c) should I retract all of my previously published papers :-}? Again, thank you. These tutorials are wonderful.

    • @KapilSachdeva
      @KapilSachdeva  4 месяца назад

      Will have to spend time pondering over it :) … it’s been a while now. Thanks for your kind words.

  • @user-pq2er8fh8k
    @user-pq2er8fh8k 5 месяцев назад +1

    Nice and simple explanation ☺️

  • @johnphillips4887
    @johnphillips4887 2 года назад +1

    Very good, very helpful!

  • @parvillner5721
    @parvillner5721 4 месяца назад +1

    amazing explanation

  • @ZinzinsIA
    @ZinzinsIA Год назад +1

    Very nice video thank you very much

  • @jagadeesanmuthuvel1006
    @jagadeesanmuthuvel1006 10 месяцев назад +1

    Thanks a lot! Awesome explanations! please extend it for HMC and NUTS as well :)

  • @shantanudixit4197
    @shantanudixit4197 Год назад +1

    Excellent as always!
    Can you make a video on Langevin Monte Carlo as well?

  • @vivekren3463
    @vivekren3463 9 месяцев назад +2

    Thank you so much. People like yourself are really creating a greater impact in education industry ! I paid probably $xxxx .xx to study this concept which you tought in ~20min. Can you help with Gibbs Sampling as well.

  • @sheeta2726
    @sheeta2726 Год назад +1

    Thank you!

  • @vaibhavwanere3034
    @vaibhavwanere3034 Год назад +1

    Excellent!!!!!!!!!!!!!!!!!!

  • @surnamnarendra4729
    @surnamnarendra4729 Год назад +1

    thankyoun so much nsir

  • @gumimi385
    @gumimi385 9 месяцев назад +1

    thank you!

  • @suppareskrittikulsittichai6812
    @suppareskrittikulsittichai6812 2 года назад +1

    Supreb explanation

  • @masoudsakha
    @masoudsakha 7 месяцев назад

    I have a question. if we know f(x) since we need that to calculate alpha, what point of the Metropolis-Hastings algorithm ?? to guess the distribution of X which we already use in our algorithm?

    • @KapilSachdeva
      @KapilSachdeva  7 месяцев назад +1

      You need to consider few things:
      Even if you know the functional form of your distribution function you still can not directly sample from it. Computing log prob and sampling are two different things.
      Secondly, most of the time you do not know the functional form. All you have is the product of likelihood and prior. This is where metropolis Hastings is useful.

  • @abdullahalsulieman2096
    @abdullahalsulieman2096 Год назад

    When you say states, do you mean the random variable or the row vector probability. In previous videos, states (s1, s2, s3, and s4) were referred to as the probabilities within the random variable x.
    Plus, when you say parameter, do you mean the states defined above or the random variables?

    • @KapilSachdeva
      @KapilSachdeva  Год назад

      States do not mean probabilities. States mean the values a random variable can take. With each state is “associated” it’s probability.
      Parameters in Bayesian statistics are Random Variables. This means a parameter can have various states.

    • @abdullahalsulieman2096
      @abdullahalsulieman2096 Год назад

      @@KapilSachdeva Very clear. I carried on the notations from your previous video. By the way, you are very good at teaching.

  • @chronicillnessrevolution5079
    @chronicillnessrevolution5079 Год назад

    Thanks for the video! Kris Hauser's notes are not accessible anymore! Could you perhaps upload it and give us a new link, if you have the pdf saved? I would really appreciate that. Thanks!!

    • @KapilSachdeva
      @KapilSachdeva  Год назад +1

      Indeed the link is invalid now.
      But I do have a copy of it :) Let me know if above link is accessible to you
      drive.google.com/file/d/1_pGhAdG2q58H-T6NaFuFGF_ut6JJhH7i/view?usp=share_link

    • @chronicillnessrevolution5079
      @chronicillnessrevolution5079 Год назад +1

      @@KapilSachdeva Awesome! Thanks a lot

  • @jayflaggs
    @jayflaggs Год назад +1

    Maybe it's a silly question...but why not just sample over the entire known range [-6, 6]? What is the need for sampling using the uniform distribution?

    • @KapilSachdeva
      @KapilSachdeva  Год назад +1

      When we say we sample it means that those samples should come in the same proportion as that of the distribution function. In other words, for a given function samples between [-6,6] that have high probability of occurring should occur more.
      Now the problem is how do you do sampling for any arbitrary function? We can only obtain samples from uniform distribution that gives samples that are all equally likely. The sampling algorithms (rejection, importance, etc) are about transforming the samples from uniform distribution function to the target distribution function.

  • @andreaqui1653
    @andreaqui1653 Год назад

    Question, if a sample is rejected...then how does it get accounted for in the final histogram? Intuitively, it seems like once a sample is rejected, it will always be rejected, and the final histogram would be missing quite a bit. Also, it seems where we accept samples, the acceptance would grow indefinitely at the random variable and the histogram would over extend above the distribution we want to approximate at that specific area.

    • @KapilSachdeva
      @KapilSachdeva  Год назад

      "once a sample is rejected it will always be rejected"
      Above conclusion is not necessarily correct. Once you have the "stationary markov chain" then the sample will be accepted (in a longer run) in accordance with its plausibility. In other words, a less likely sample will be accepted less number of times. It is shown in the histogram as well. See the links in the description of the video to run the code yourself.
      I have feeling that in you are not including/considering the notion of stationary markov chains.
      There is another concept that I have not gone over - "burn in" (or discarding the initial samples). That "burn in" (or the samples we throw away in the initial quite many steps) would help prevent some of the corner cases that you are worried about.
      This all being said, the quality of the reproducibility of the target function using samples depends on how complex the target function is . Complexity would equate to higher dimensional functions and shape of it. This is where more advanced MCMC algorithms play a role.
      Hope this helps in reducing the concerns.

    • @andreaqui1653
      @andreaqui1653 Год назад

      @@KapilSachdeva Ah ha, thanks for nudging my assumptions in the correct direction. So in lament terms....stationary markov chain meaning the probability at f(xt) is equally or to the probability at f(xt + 1), or in other words the histogram at f(xt) is of the same height as the histogram at f(xt+1)....in the case we would start rejecting the sample at f(xt) and begin to accept the nearby samples that were once rejected as that once rejected sample would be more plausible. Visually speaking, it's almost as if the peak of our gaussian distribution rises with the histogram height to the point where once the gaussian rises above the underlying distribution then the once rejected samples begin to get accepted.

    • @KapilSachdeva
      @KapilSachdeva  Год назад

      It does not mean that.
      A stationary markov chain means that the transition probability between various states is "fixed". Transition prob to move from state 1 to state 2 will be different from state 1 to state 3 etc.
      It might be a good idea to watch the Markov Chain tutorial in this series which is required to understand MCMC algorithms.
      ruclips.net/video/CIe869Rce2k/видео.html

  • @abdullahalsulieman2096
    @abdullahalsulieman2096 Год назад

    I have an algorithm for MCMC and would like your interpretation for it. Is this possible?

    • @KapilSachdeva
      @KapilSachdeva  Год назад

      I am not sure if I understand your question. Are you asking to explain another MCMC algorithm?

    • @abdullahalsulieman2096
      @abdullahalsulieman2096 Год назад

      @@KapilSachdeva I have an MCMC problem that is solved but I dont understand which is which in it. What is f(x) or the proposed distribution, etc.

  • @user-lr6xs8dn8k
    @user-lr6xs8dn8k Месяц назад

    In real life we don't know Target distribution - f(x). How did you calculated alpha for various sample points ? f(Xt+1)/f(Xt)

  • @Maciek17PL
    @Maciek17PL Год назад

    at 20:37 you say that f = likelihood * prior but shouldn't it be f = likelihood * prior / N and when calculating alpha N's will cancel eachother out and that's how we avoid the integral in N?? also in my opinion you missed the most important part HOW WILL THE ESTIMATED distribiution look like?? how do I get it from these samples points??? What is q, is it pdf of N(x_t, 0.4) in this example?

    • @KapilSachdeva
      @KapilSachdeva  Год назад

      The (normalized) target function does include the normalizing constant.
      I do show the approximated (estimated) function using the histogram in the tutorial. It was constructed using the samples that were collected.
      Q is Normal distribution in this example. I also show that using the yellow function 😳

    • @Maciek17PL
      @Maciek17PL Год назад

      ​@@KapilSachdeva well at 20:42 there is literally f(x) = likelihood * prior on the screen and it got me confused. In my opinion it should be clarified that f(x) = likelihood * prior / N and then when calculating f(x+1)/f(x) N's will cancel out. Is already a hard topic and when I saw f(x) = likelihood * prior at first I thought we are trying to approximate just the integral in the denominator not the whole thing

    • @KapilSachdeva
      @KapilSachdeva  Год назад +1

      Apologies for the inconvenience it caused you.
      As you correctly figured out we are "not" trying to approximate the integral in the denominator.

  • @subasurf
    @subasurf 2 года назад

    what if you want to sample from a distribution without knowing what the target distribution is?

    • @KapilSachdeva
      @KapilSachdeva  2 года назад

      This the primary purpose of MCMC methods ie to sample from unknown target distribution function.
      This is why you create multiple markov chains and compare their distributions. If they are similar it is a clue that you have succeeded to sample from unknown target distribution function. I talk about it around 12:45 in the tutorial.

    • @nikolaobradovicpi
      @nikolaobradovicpi 2 года назад

      @@KapilSachdeva How do you compute alpha on 9:29 if you don't know f(x) since you say MCMC sample from unknown target distribution function?

    • @KapilSachdeva
      @KapilSachdeva  2 года назад +2

      @@nikolaobradovicpi let’s answer your question considering 2 scenarios:
      1) You know the target function but you cannot sample from it. Knowing the function here refers to the formula of the function (pass in the input and get the output). This is the case with our toy example. Sampling to recreate a function vs evaluating a function are two different things.
      2) The second situation to consider is that you do not even know the functional form and then your question is valid ie how do we even get to evaluate? …. This is where you rely on computing/evaluating it “indirectly”. As an example, In the (Bayesian statistics) posterior distribution, the numerator consists of 2 components - likelihood and prior. You have access to their functional form but not to the functional form of product of these components. However, for the evaluation you need to evaluate likelihood and prior separately and multiply the results. This is how you will get alpha!
      I talk about this later in the tutorial when explaining the “Hastings” part and show that your target function is the product of likelihood and prior.
      Got to 20:00 you will see the explanation in the tutorial.
      Hope this makes sense.

    • @nikolaobradovicpi
      @nikolaobradovicpi 2 года назад +1

      @@KapilSachdeva Thank you very much sir for the response. Your tutorials are great!

    • @KapilSachdeva
      @KapilSachdeva  2 года назад +1

      🙏

  • @othsaj1467
    @othsaj1467 Год назад +1

    When you don't know how to name something you simply call it a Kernel hahah good one !

  • @ssshukla26
    @ssshukla26 2 года назад +1

    Hah never knew anything about this topic. Thanks. But there was a lot of noise cut off in between. Idk.

    • @KapilSachdeva
      @KapilSachdeva  2 года назад

      Thanks Sunny. Noise cut off. ..:( .. sorry if it is the case but am not able to observe it. At what time it happens?

    • @ssshukla26
      @ssshukla26 2 года назад +1

      @@KapilSachdeva ah my bad. Bluetooth was running out of battery. Didn't realize it's the headphones 🤦‍♂️

    • @KapilSachdeva
      @KapilSachdeva  2 года назад +1

      :) … u made me nervous … but thanks for verifying. 🙏