Metropolis-Hastings - VISUALLY EXPLAINED!

Kapil Sachdeva

Просмотров 37 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 24 дек 2024

Комментарии • 125

@chazhovnanian6897 9 месяцев назад ⁺⁴
Absolute BEST MCMC video, this helped me develop a true intuition of the topic
@KapilSachdeva 9 месяцев назад
🙏
@hiclh4128 2 года назад ⁺²⁰
This is the best MCMC video I have seen. I also like the history part. Please continue making such high quality videos.
@KapilSachdeva 2 года назад ⁺³
🙏
@MarkSimithraaratchy Год назад ⁺²
I watched this video to clean up some of my understanding around MH and get some visual grounding. Excellently explained and nice cuts of humor to keep it engaging!
What I didn't expect was this touching tribute to Arianna Rosenbluth. Women certainly did not get the proper recognition regardless of the incredible contributions they had to the field. Thanks for sharing this important slice of history and putting emphasis on her contributions.
@KapilSachdeva Год назад
🙏
@swapnilamale9073 2 года назад ⁺⁷
one of the best explanations for the one of the most challenging topics to understand from a book....your knowledge, research, and experience reflected in the video..personally find u underrated... best wishes for more subscribers
@KapilSachdeva 2 года назад ⁺¹
Thanks 🙏… please share it with your colleagues.
@swapnilamale9073 2 года назад
Please, could you make a video on subset simulation (SS)!
@KapilSachdeva 2 года назад
Have not used Subset Simulation in practice (generally do not like to make tutorials on subjects that I have not used in practice); may be will make a tutorial some time later after I have tired it myself. No promise. Sorry!
@est9949 Год назад ⁺⁵
Wow, this is a rare hidden gem. Best video on this subject. I also enjoy the history of the field and the researchers who had worked on this almost a century before me. Quite chilling to think about it.
@KapilSachdeva Год назад
🙏
@dhinas9444 2 года назад ⁺⁴
Thank you sir Sachdeva, i love your witty way of explaining and also the visual representation!
@KapilSachdeva 2 года назад ⁺¹
🙏😃
@yli6050 7 месяцев назад ⁺¹
I am grateful to your lectures ❤ what a wonderful service you’ve done to all the learners
@mathewjones8891 10 месяцев назад ⁺¹
Dr. Sachdeva, I am riveted by your tutorials. So clear. Thank you for historical perspective, especially Arianna Rosenbluth. So important to remember who these ideas/methods came from. In my worlds (Markov modeling of ion channels, and other stuff), I think we use the phrase "detailed balance" interchangeably with "microscopic reversibility" (a phrase due to Tolman, I think, from the 1930s?). The idea is that a closed physical Markov system (say, an ion channel protein transitioning between different conformational states), at thermodynamic equilibrium, should be completely symmetrical and reversible. If there is a loop in the Markov chain (e.g., S1S2S3S1, where Sn is a discrete state), then the *product* of transition rates should be the same clockwise vs counter-clockwise. Thermodynamically, this is equivalent to conservation of free energy, similar to the idea that traversing any landscape and returning to the same point will result in the same final energy. Is this the same meaning that you intend for "detailed balance"? If so, why is this mathematically necessary? In real life, ion channels are not in a closed system and are rarely at thermodynamic equilibrium. There are ionic gradients and voltage gradients. A biological cell (e.g., a neuron) is not a closed thermodynamic system, it exchanges energy with its environment (the "bath"). So the assumptions about "microscopic reversibility" may not (do not) strictly apply. Nevertheless, I have encountered a lot of resistance about "violating" microscopic reversibility in some of my published models of ion channel function. So, my questions are a) is "microscopic reversibility" the same as what you mean by "detailed balance"; b) is detailed balance a mathematical necessity, or can it depend on conditions; c) should I retract all of my previously published papers :-}? Again, thank you. These tutorials are wonderful.
@KapilSachdeva 9 месяцев назад
Will have to spend time pondering over it :) … it’s been a while now. Thanks for your kind words.
@SuperCaptain4 2 года назад ⁺²
Wow, I wish I had found this video earlier. Your channel in general actually! So far it is a hidden gem but I am sure it will not remain that for long
@KapilSachdeva 2 года назад ⁺¹
🙏
@proxyme3628 2 года назад ⁺⁵
Thanks so much for creating the video. Your video is always concrete, visual, intuitive. I wish other videos dumping math formulas from text books will learn from you on how to teach or explain in complex things in simple, visual, intuitive manners.
@KapilSachdeva 2 года назад ⁺¹
🙏 thanks for the kind words. Much appreciated.
@YogeshYewale-i3m Год назад ⁺²
Simply Excellent, it shows how hard work you have done to make this video.
@KapilSachdeva Год назад
🙏
@chiaosun3505 2 года назад ⁺¹
the only explanation that i didn't fall asleep listening. thank you very much.
@KapilSachdeva 2 года назад
😃😃
@tsilikitrikis 2 года назад ⁺²
Excellent explanation, took me Some hours searching about this algorithm, u explain it perfecr.
Actually, the hole point is the acceptance probability That has relation with the shape of our final curve-distro. For example in bayes, we dont need to compute Likelihood*prior analytically but Just to take a number out of it.
The normal is used as a distribution to search around xt point, so we searching around each point of x axis and accepting it as "part" of our distribution according to the acceptance probability.
Perfetto!Thank you
@KapilSachdeva 2 года назад
🙏 glad it was helpful to you!
@abdullahalsulieman2096 Год назад
Can you further elaborate on your conclusionv
@rafas4307 Год назад ⁺²
this is the best explanation I have ever seen
@KapilSachdeva Год назад
🙏
@SimoneIovane Год назад ⁺¹
This video is a gem of this subject. Really congrats for your teaching skills
@KapilSachdeva Год назад
🙏
@johnnykessler Год назад ⁺²
This is exactly what I have been looking for, thank you so much!
@KapilSachdeva Год назад
🙏
@TheJuwailes 3 года назад ⁺¹
Quite incredible honestly. I would pay for those videos, they're that good.
@KapilSachdeva 3 года назад ⁺²
🙏 Just this appreciation is enough.
Haven't got any cycles in the last few months to work on the videos but will make them for sure.
@TheJuwailes 3 года назад
@@KapilSachdeva I hope you make a video on the convergence criteria for MCMC's and especially Metropolis-Hastings, and the Thinning phenomenon
@joaovictordacostafarias6243 17 дней назад
What an amazing video! Congratulations on your work!
@ivanpastukhov8664 Год назад ⁺²
Outstanding explanation! Thank you so much Sir!
@KapilSachdeva Год назад
🙏
@leoalanya4290 2 месяца назад ⁺¹
How can we just sample from the target distribution in order to calculate the acceptance ratio? Doesn't that defeat the purpose of the algorithm: Why wouldn't I just take samples from the target directly? Or is the problem that we are not able to draw independently from the target?
@SenLan-o2h 8 месяцев назад ⁺¹
best MCMC video I have seen
@KapilSachdeva 8 месяцев назад
🙏
@mohsenvazirizade6334 Год назад ⁺³
Thank you for such a brillian tutorial. Any videos on Gibs sampling?
@KapilSachdeva Год назад ⁺¹
🙏
@kevon217 5 месяцев назад
Excellent overview. Highly appreciated.
@MisterDives Год назад ⁺¹
Excellent video, very clear explanation -- thank you so much!
@KapilSachdeva Год назад
🙏
@ahsentahir4473 25 дней назад
Excellent! That was an amazing tutorial!
@krgonline 2 года назад ⁺²
Thank you very much for the excellent content and presentation. Stay blessed
@KapilSachdeva 2 года назад
🙏
@hongkyulee9724 2 года назад ⁺¹
This videos is so very intuitive ! Really thank you. You save me 😊😊😊😊😊😊😊😊!!!
@KapilSachdeva 2 года назад ⁺¹
🙏
@rakeshojha6395 2 месяца назад ⁺¹
Sir I am asst prof Bioinformatics I am using this for molecular phylogenetics...thank you so much sir for this video
@KapilSachdeva 2 месяца назад
🙏
@josemariapereziquierdo5939 9 месяцев назад ⁺¹
Thanks a lot for this clarifying video.
@KapilSachdeva 8 месяцев назад
🙏
@jayflaggs Год назад ⁺¹
Maybe it's a silly question...but why not just sample over the entire known range [-6, 6]? What is the need for sampling using the uniform distribution?
@KapilSachdeva Год назад ⁺¹
When we say we sample it means that those samples should come in the same proportion as that of the distribution function. In other words, for a given function samples between [-6,6] that have high probability of occurring should occur more.
Now the problem is how do you do sampling for any arbitrary function? We can only obtain samples from uniform distribution that gives samples that are all equally likely. The sampling algorithms (rejection, importance, etc) are about transforming the samples from uniform distribution function to the target distribution function.
@abbudeh 3 месяца назад
How do we evaluate the target function f, if we assume that it is not known and we want to discover it?
@AprajitaShrivastavaRS 11 месяцев назад ⁺¹
Nice and simple explanation ☺️
@KapilSachdeva 10 месяцев назад
Thank you 🙂
@nick45be Месяц назад
In 3:13 why do you refer to x=0.6 as the first value while immediately after you refer to x=0.6 as the first sample? shouldn't x=0.6 be a single value from a sample of data?
@KapilSachdeva Месяц назад
It’s the first sample.
@masoudsakha Год назад
I have a question. if we know f(x) since we need that to calculate alpha, what point of the Metropolis-Hastings algorithm ?? to guess the distribution of X which we already use in our algorithm?
@KapilSachdeva Год назад ⁺¹
You need to consider few things:
Even if you know the functional form of your distribution function you still can not directly sample from it. Computing log prob and sampling are two different things.
Secondly, most of the time you do not know the functional form. All you have is the product of likelihood and prior. This is where metropolis Hastings is useful.
@greysky1786 10 месяцев назад ⁺¹
Thank you very much for this video, it was really helpful.
@KapilSachdeva 10 месяцев назад
🙏
@jagadeesanmuthuvel1006 Год назад ⁺¹
Thanks a lot! Awesome explanations! please extend it for HMC and NUTS as well :)
@KapilSachdeva Год назад
🙏
@Maciek17PL Год назад
at 20:37 you say that f = likelihood * prior but shouldn't it be f = likelihood * prior / N and when calculating alpha N's will cancel eachother out and that's how we avoid the integral in N?? also in my opinion you missed the most important part HOW WILL THE ESTIMATED distribiution look like?? how do I get it from these samples points??? What is q, is it pdf of N(x_t, 0.4) in this example?
@KapilSachdeva Год назад
The (normalized) target function does include the normalizing constant.
I do show the approximated (estimated) function using the histogram in the tutorial. It was constructed using the samples that were collected.
Q is Normal distribution in this example. I also show that using the yellow function 😳
@Maciek17PL Год назад
@@KapilSachdeva well at 20:42 there is literally f(x) = likelihood * prior on the screen and it got me confused. In my opinion it should be clarified that f(x) = likelihood * prior / N and then when calculating f(x+1)/f(x) N's will cancel out. Is already a hard topic and when I saw f(x) = likelihood * prior at first I thought we are trying to approximate just the integral in the denominator not the whole thing
@KapilSachdeva Год назад ⁺¹
Apologies for the inconvenience it caused you.
As you correctly figured out we are "not" trying to approximate the integral in the denominator.
@abdullahalsulieman2096 Год назад
I have an algorithm for MCMC and would like your interpretation for it. Is this possible?
@KapilSachdeva Год назад
I am not sure if I understand your question. Are you asking to explain another MCMC algorithm?
@abdullahalsulieman2096 Год назад
@@KapilSachdeva I have an MCMC problem that is solved but I dont understand which is which in it. What is f(x) or the proposed distribution, etc.
@RajeshPrajapat-l7o 7 месяцев назад
In real life we don't know Target distribution - f(x). How did you calculated alpha for various sample points ? f(Xt+1)/f(Xt)
@parvillner5721 10 месяцев назад ⁺¹
amazing explanation
@KapilSachdeva 10 месяцев назад
🙏
@vivekren3463 Год назад ⁺²
Thank you so much. People like yourself are really creating a greater impact in education industry ! I paid probably $xxxx .xx to study this concept which you tought in ~20min. Can you help with Gibbs Sampling as well.
@KapilSachdeva Год назад
🙏
@tsoofbo Год назад ⁺¹
Amazing video. Thank you so much!
@KapilSachdeva Год назад
🙏
@shantanudixit4197 Год назад ⁺¹
Excellent as always!
Can you make a video on Langevin Monte Carlo as well?
@KapilSachdeva Год назад
🙏
@abdullahalsulieman2096 Год назад
When you say states, do you mean the random variable or the row vector probability. In previous videos, states (s1, s2, s3, and s4) were referred to as the probabilities within the random variable x.
Plus, when you say parameter, do you mean the states defined above or the random variables?
@KapilSachdeva Год назад
States do not mean probabilities. States mean the values a random variable can take. With each state is “associated” it’s probability.
Parameters in Bayesian statistics are Random Variables. This means a parameter can have various states.
@abdullahalsulieman2096 Год назад
@@KapilSachdeva Very clear. I carried on the notations from your previous video. By the way, you are very good at teaching.
@andreaqui1653 Год назад
Question, if a sample is rejected...then how does it get accounted for in the final histogram? Intuitively, it seems like once a sample is rejected, it will always be rejected, and the final histogram would be missing quite a bit. Also, it seems where we accept samples, the acceptance would grow indefinitely at the random variable and the histogram would over extend above the distribution we want to approximate at that specific area.
@KapilSachdeva Год назад
"once a sample is rejected it will always be rejected"
Above conclusion is not necessarily correct. Once you have the "stationary markov chain" then the sample will be accepted (in a longer run) in accordance with its plausibility. In other words, a less likely sample will be accepted less number of times. It is shown in the histogram as well. See the links in the description of the video to run the code yourself.
I have feeling that in you are not including/considering the notion of stationary markov chains.
There is another concept that I have not gone over - "burn in" (or discarding the initial samples). That "burn in" (or the samples we throw away in the initial quite many steps) would help prevent some of the corner cases that you are worried about.
This all being said, the quality of the reproducibility of the target function using samples depends on how complex the target function is . Complexity would equate to higher dimensional functions and shape of it. This is where more advanced MCMC algorithms play a role.
Hope this helps in reducing the concerns.
@andreaqui1653 Год назад
@@KapilSachdeva Ah ha, thanks for nudging my assumptions in the correct direction. So in lament terms....stationary markov chain meaning the probability at f(xt) is equally or to the probability at f(xt + 1), or in other words the histogram at f(xt) is of the same height as the histogram at f(xt+1)....in the case we would start rejecting the sample at f(xt) and begin to accept the nearby samples that were once rejected as that once rejected sample would be more plausible. Visually speaking, it's almost as if the peak of our gaussian distribution rises with the histogram height to the point where once the gaussian rises above the underlying distribution then the once rejected samples begin to get accepted.
@KapilSachdeva Год назад
It does not mean that.
A stationary markov chain means that the transition probability between various states is "fixed". Transition prob to move from state 1 to state 2 will be different from state 1 to state 3 etc.
It might be a good idea to watch the Markov Chain tutorial in this series which is required to understand MCMC algorithms.
ruclips.net/video/CIe869Rce2k/видео.html
@georgeboutselis5031 20 дней назад
I don't know if the sum of two independent normal variables has 2 peaks..
@chronicillnessrevolution5079 2 года назад
Thanks for the video! Kris Hauser's notes are not accessible anymore! Could you perhaps upload it and give us a new link, if you have the pdf saved? I would really appreciate that. Thanks!!
@KapilSachdeva 2 года назад ⁺¹
Indeed the link is invalid now.
But I do have a copy of it :) Let me know if above link is accessible to you
drive.google.com/file/d/1_pGhAdG2q58H-T6NaFuFGF_ut6JJhH7i/view?usp=share_link
@chronicillnessrevolution5079 2 года назад ⁺¹
@@KapilSachdeva Awesome! Thanks a lot
@YT-yt-yt-3 4 месяца назад
Still not very clear to me why that hasting generalization is required? It would have been better if you had explicitly pointed out the problem statement with metropolis that requires generalization solution.
@johnphillips4887 2 года назад ⁺¹
Very good, very helpful!
@KapilSachdeva 2 года назад
🙏
@ZinzinsIA Год назад ⁺¹
Very nice video thank you very much
@KapilSachdeva Год назад
🙏
@vaibhavwanere3034 Год назад ⁺¹
Excellent!!!!!!!!!!!!!!!!!!
@KapilSachdeva Год назад
🙏
@subasurf 3 года назад
what if you want to sample from a distribution without knowing what the target distribution is?
@KapilSachdeva 3 года назад
This the primary purpose of MCMC methods ie to sample from unknown target distribution function.
This is why you create multiple markov chains and compare their distributions. If they are similar it is a clue that you have succeeded to sample from unknown target distribution function. I talk about it around 12:45 in the tutorial.
@nikolaobradovicpi 2 года назад
@@KapilSachdeva How do you compute alpha on 9:29 if you don't know f(x) since you say MCMC sample from unknown target distribution function?
@KapilSachdeva 2 года назад ⁺²
@@nikolaobradovicpi let’s answer your question considering 2 scenarios:
1) You know the target function but you cannot sample from it. Knowing the function here refers to the formula of the function (pass in the input and get the output). This is the case with our toy example. Sampling to recreate a function vs evaluating a function are two different things.
2) The second situation to consider is that you do not even know the functional form and then your question is valid ie how do we even get to evaluate? …. This is where you rely on computing/evaluating it “indirectly”. As an example, In the (Bayesian statistics) posterior distribution, the numerator consists of 2 components - likelihood and prior. You have access to their functional form but not to the functional form of product of these components. However, for the evaluation you need to evaluate likelihood and prior separately and multiply the results. This is how you will get alpha!
I talk about this later in the tutorial when explaining the “Hastings” part and show that your target function is the product of likelihood and prior.
Got to 20:00 you will see the explanation in the tutorial.
Hope this makes sense.
@nikolaobradovicpi 2 года назад ⁺¹
@@KapilSachdeva Thank you very much sir for the response. Your tutorials are great!
@KapilSachdeva 2 года назад ⁺¹
🙏
@sheeta2726 2 года назад ⁺¹
Thank you!
@KapilSachdeva 2 года назад
🙏
@surnamnarendra4729 Год назад ⁺¹
thankyoun so much nsir
@KapilSachdeva Год назад
🙏
@gumimi385 Год назад ⁺¹
thank you!
@KapilSachdeva Год назад
🙏
@suppareskrittikulsittichai6812 2 года назад ⁺¹
Supreb explanation
@KapilSachdeva 2 года назад
🙏
@othsaj1467 Год назад ⁺¹
When you don't know how to name something you simply call it a Kernel hahah good one !
@KapilSachdeva Год назад
😊🙏
@ssshukla26 3 года назад ⁺¹
Hah never knew anything about this topic. Thanks. But there was a lot of noise cut off in between. Idk.
@KapilSachdeva 3 года назад
Thanks Sunny. Noise cut off. ..:( .. sorry if it is the case but am not able to observe it. At what time it happens?
@ssshukla26 3 года назад ⁺¹
@@KapilSachdeva ah my bad. Bluetooth was running out of battery. Didn't realize it's the headphones 🤦‍♂️
@KapilSachdeva 3 года назад ⁺¹
:) … u made me nervous … but thanks for verifying. 🙏

Следующие

Автовоспроизведение

Probabilistic Programming - FOUNDATIONS & COMPREHENSIVE REVIEW!