Basically np.random.random() step is used to make sure that samples are accepted with the probability "prob" calculated in the previous step. It's confusing for the beginners though.
np.random.random() generates a random number from the set [0,1). Here we have a probability "prob" for moving from a sample 'a to 'b''. Our sampler will accept the next sample 'b' with a probability "prob". Imagine if "prob" was large(near 1) then most of the times np.random.random() will result in an acceptance but if "prob" was small(near 0) then most times np.random.random() will result in non-acceptance. Basically this is how you incorporate selecting something with a probability in algorithms.
This is the meat and potatoes of metropolis hastings. If the candidate probability (Pr(c)) is higher than the previous Pr(prev), the candidate will always be accepted. If Pr(c) is lower than Pr(prev) it will be accepted Pr(c)/Pr(prev) % of the time and rejected (1- Pr(c))/Pr(prev) % of the time. What this does is prevent the chain from getting stuck in local probability density maximums (if there are any)
The code in cell# 49 will always give 99.9% accuracy. Is it not better to subtract 1000 from n_accept and report that instead? The size of the retained_sample will always be 1 million as you append a new value whether you accept or not.
For likelihood, we have pareto distribution, and for theta prior follows Gamma distribution. We have 20 samples given . Now my posterior is complex, involving sum of logxi. What proposal density should i use and for sum logxi appeared with theta in posterior should i sum all data given what to do and how to proceed could you help
I'd assume it's like cheating. You have a sample, rather than getting another random sample, it's like you copy the same sample. Not quite, but nearly.
The fact that the sample draws are correletated in the Metropolis (and especially in the more efficient Hamiltonian MonteCarlo) algorithm looks like a feature to me rather than an unfortunate necessity. It's what allows the algorithm to be efficient. As long as the final proportion of sample draws closely matches the posterior density, how the samples were obtained seems to be much less important. The samples are correlated because these algorithms are actually built to explore the sample space in a smart manner.
Generally you want independent samples or completely random sampling. This is usually handled with lagging the samples (ie for example throwing away every 100th sample) and then checking to make sure that you still have an approximation of the target distribution.
@@DanMyers-z4t HMC aims to reproduce the posterior density distribution. The way it does it is by basically treating the (unknown) posterior as a scalar potential energy field, randomly selecting a starting position and then integrating the phase space trajectory of a point mass that is randomly kicked and subjected to that potential. Once "the point mass stops" its "position" is recorded before being kicked again and the process is iterated, as many times as necessary. Since the position of the mass at the start of the next step is its position at the end of the previous step it's inevitable for the samples to be correlated, to an extent. But it's exactly for that reason that the exploration speed of the unknown posterior density is much higher than in "classical" Metropolis-Hastings (simplifying the entire idea: the potential makes sure the mass visits regions of low potential energy/high probability density more often than other regions). The algorithm goes through an initialiazation phase that's meant to tune the paramters in a way that maximizes the speed of convergence while minimizing the inaccuracy. If the "energy" was actually conserved like in a time-independent Hamiltonian, the transition probability would be one (that is, no samples would ever be rejected) but, as far as I remember, the convergence speed would not be the best. So, while completely random samples would obviously be ideal, it would take so many samples in order to thoroughly explore the posterior that obtaining an accurate result would be less practical. As long as we can be reasonably sure that the frequency of samples in each small parameter space subset is proportional to the actual unknown posterior density, it matters little how those samples were actually obtained and if they were random or correlated. The issue is devising a method that assuages our fear of obtaining a result that doesn't actually match the unknown posterior density.
how can MCMC be used in the realm of stocks and finance? I’ve been looking into making a stockbot as a personal project and landed on MCMC as a viable option
Small suggestion ... you could try cropping some part from your video frame (mostly from the LHS) ... which will make the code more visible ... having said that ... that 's for the video, was looking for something similar!
@@ritvikmath the font size seems perfect ... I meant your video ... the one you have at the top right ... which shows you seating and coding ... there are some extra space which you could remove
Hi. Thank you so much for all your effort and substance in the videos that you've released on RUclips. I'm a big fan, and you've really helped me in my Data Science/Analysis career to date 😊. I'm currently working on a research project that somewhat relates to MCMC - Variational Inference, which is an alternative approach to MCMC (to be very brief). If you're familiar with this machine learning algorithm, would you be able to help me understand a niche branch of research related to this algorithm? That is, "Variational Inference augmented with Copulas". If you are able to help, please let me know how I could return the favour ❤.
Man you blow these videos out of the park, it’s like surreal how good these videos are at tying the big picture together. Thank you for the content!!!
Thanks!
These MCMC videos (and of course others too) are just brilliant, can't thank you enough!
You're very welcome!
Love your videos! Great balance between simple explanations and giving a good overview of the topic.
How to get this norm constant ? At 1:32 integration was from 0 to inf and -inf to 0, while function definition was for x>=1 ?
I was wondering that too. Should be x>=0?
For the wolfram integral, I think it should be "for x from 1 to infinity".
Why ">> if np.random.random() < prop " (at 10:43), the sample is accepted? what is the role of "np.random.random()" here?
Thank you!
Basically np.random.random() step is used to make sure that samples are accepted with the probability "prob" calculated in the previous step. It's confusing for the beginners though.
np.random.random() generates a random number from the set [0,1). Here we have a probability "prob" for moving from a sample 'a to 'b''. Our sampler will accept the next sample 'b' with a probability "prob".
Imagine if "prob" was large(near 1) then most of the times np.random.random() will result in an acceptance but if "prob" was small(near 0) then most times np.random.random() will result in non-acceptance. Basically this is how you incorporate selecting something with a probability in algorithms.
10:43 in cell 48 , why use f(candidate)/f(samples) as recept ratio? Where are the transition terms according to mcmc?
This is the meat and potatoes of metropolis hastings. If the candidate probability (Pr(c)) is higher than the previous Pr(prev), the candidate will always be accepted. If Pr(c) is lower than Pr(prev) it will be accepted Pr(c)/Pr(prev) % of the time and rejected (1- Pr(c))/Pr(prev) % of the time. What this does is prevent the chain from getting stuck in local probability density maximums (if there are any)
In practice, you have observation data. Can you look at a video of how to use data with MCMC?
Really cool, it would be great if you could cover Sequential Importance Sampling (SIS) too.
The code in cell# 49 will always give 99.9% accuracy. Is it not better to subtract 1000 from n_accept and report that instead? The size of the retained_sample will always be 1 million as you append a new value whether you accept or not.
You do a great job with your Code with Me Videos. I'd like to refer students to your videos-- do you plan to make more of these?
For likelihood, we have pareto distribution, and for theta prior follows Gamma distribution. We have 20 samples given . Now my posterior is complex, involving sum of logxi. What proposal density should i use and for sum logxi appeared with theta in posterior should i sum all data given what to do and how to proceed could you help
Is f(x) always greater than p(x) given the normalizing constant?
Please can you do a video on hamiltonian monte carlo
These vids are excellent, thanks a lot
Why is that a drawback, that samples are correlated? Isn’t that the entire point behind MCMC?
I'd assume it's like cheating. You have a sample, rather than getting another random sample, it's like you copy the same sample. Not quite, but nearly.
The fact that the sample draws are correletated in the Metropolis (and especially in the more efficient Hamiltonian MonteCarlo) algorithm looks like a feature to me rather than an unfortunate necessity. It's what allows the algorithm to be efficient. As long as the final proportion of sample draws closely matches the posterior density, how the samples were obtained seems to be much less important. The samples are correlated because these algorithms are actually built to explore the sample space in a smart manner.
Generally you want independent samples or completely random sampling. This is usually handled with lagging the samples (ie for example throwing away every 100th sample) and then checking to make sure that you still have an approximation of the target distribution.
@@DanMyers-z4t HMC aims to reproduce the posterior density distribution.
The way it does it is by basically treating the (unknown) posterior as a scalar potential energy field, randomly selecting a starting position and then integrating the phase space trajectory of a point mass that is randomly kicked and subjected to that potential.
Once "the point mass stops" its "position" is recorded before being kicked again and the process is iterated, as many times as necessary.
Since the position of the mass at the start of the next step is its position at the end of the previous step it's inevitable for the samples to be correlated, to an extent. But it's exactly for that reason that the exploration speed of the unknown posterior density is much higher than in "classical" Metropolis-Hastings (simplifying the entire idea: the potential makes sure the mass visits regions of low potential energy/high probability density more often than other regions).
The algorithm goes through an initialiazation phase that's meant to tune the paramters in a way that maximizes the speed of convergence while minimizing the inaccuracy.
If the "energy" was actually conserved like in a time-independent Hamiltonian, the transition probability would be one (that is, no samples would ever be rejected) but, as far as I remember, the convergence speed would not be the best.
So, while completely random samples would obviously be ideal, it would take so many samples in order to thoroughly explore the posterior that obtaining an accurate result would be less practical.
As long as we can be reasonably sure that the frequency of samples in each small parameter space subset is proportional to the actual unknown posterior density, it matters little how those samples were actually obtained and if they were random or correlated.
The issue is devising a method that assuages our fear of obtaining a result that doesn't actually match the unknown posterior density.
how can MCMC be used in the realm of stocks and finance? I’ve been looking into making a stockbot as a personal project and landed on MCMC as a viable option
What is difference between tuning parameter and standard deviation and where to use which?
ABSOLUTE KING
What's so bad about the correlation in the Metropolis-Hastings method?
Can insert a lag to get the samples independent.
Is it possible for you to comment on my mcmc notebook as to why its so slow?? That will be greatly appreciated
Awesome video. No lag though?
Small suggestion ... you could try cropping some part from your video frame (mostly from the LHS) ... which will make the code more visible ... having said that ... that 's for the video, was looking for something similar!
Hey thanks for the suggestion. I'll try and remember to increase the font size as well for future videos!
@@ritvikmath the font size seems perfect ... I meant your video ... the one you have at the top right ... which shows you seating and coding ... there are some extra space which you could remove
@@teegnas ahh I see, good suggestion! thanks.
Please do transitional MCMC
Awesome
Hi.
Thank you so much for all your effort and substance in the videos that you've released on RUclips. I'm a big fan, and you've really helped me in my Data Science/Analysis career to date 😊.
I'm currently working on a research project that somewhat relates to MCMC - Variational Inference, which is an alternative approach to MCMC (to be very brief). If you're familiar with this machine learning algorithm, would you be able to help me understand a niche branch of research related to this algorithm? That is, "Variational Inference augmented with Copulas".
If you are able to help, please let me know how I could return the favour ❤.
What is MCMC ?
Markov Chain Monte Carlo