vote for TFP and agree with your comments... especially the py zen related: many ways to do the same things is not always a good idea ("There should be one and preferably only one obvious way to do it!") Tried to learn TFP but looks need more effort than my time budget so had to finish a small project using PyMC. But to be serious we really need TFP. Industry application heavily relies on Python & TF (server cannot 'conda' at all) but tutorials and standards for TFP are not well established at this moment. Would be great if you can walk us thru the TFP decently (and show us what is the preferred only one way XD)...
at 18:00 you say 2 chains on each parameter - are you sure it's on each parameter? How do you run MCMC on each parameter on it's own? Even if you focus on a, you have to know the values of b and sigma to calculate the joint, which is needed for MCMC...
Thanks for pointing this out. Indeed, saying "you are running or creating 2 chains per parameter can be confusing". Let me try to clarify. As you figured out yourself, you do need other parameters to compute the joint prob; which means the MCMC algorithm (of your choice) is "sampling" from a multidimensional space. Now you do not typically run the sampling procedure on the parameter space only once rather you run multiple processes (with different starting points). Let's say you ran the sampling procedure 4 times then "we say you have 4 chains per parameter". But as mentioned above and rightly pointed out by you, what you have is the trace of samples per parameter that you "separately" plot for visual analysis and do diagnostics. Hope this helps!
Your tutorials are amazing! Complexities clear and Clear and simple.! Can I ask a doubt? Which of the packages handle best when every thing is not conjugate? Say likelihood is non-gaussian and prior is non gaussian?
Thanks Arvind. All of these packages have implementations of many MCMC methods with auto tuning of their various aspects. Which one is best? The notion of best here could be - A) is it package (PPL) easy to use B) speed and accuracy of the inference algorithms implementations Unfortunately there is no much data. There is an effort from Facebook to create a benchmark ai.facebook.com/blog/ppl-bench-creating-a-standard-for-benchmarking-probabilistic-programming-languages/ That said, IMO, in terms of expressability of your model Pymc3 and Numpyro are quite good.
@13.44 - The next thing to worry about is... Kapil, Why should we have to worry about anything when you are at the other end simplifying all esoterics. This is cool.
Am not sure if I fully understood your concern but based on a gut feeling of what may be bothering you: When we say posterior distribution it is not only about "one" parameter. It could be many parameters. We would then have a multivariate posterior distribution. If above is not what concerns you, may be rephrase or elaborate and I will try to answer.
@@KapilSachdeva ok so the posterior distribution is multivariet I think I get it, but how do I deal with the integral in posterior predictive? When approximating posterior with Metropolis-Hastings integrals canceled eachother out but in posterior predictive there is only an integral so I'm unable to calculate alpha
Good question. For the posterior predictive you would "estimate" it using the law of large numbers. Expected value (expectation) can be estimated using the law of large numbers. You would sample from the posterior distribution instead of computing the integral. The above statement assumes that you have a good understanding of expected value and law of large numbers. PS: Perhaps I should create a concrete but simple (code) example (with out using any framework) that illustrates the workflow end to end.
Hi C, I saw a comment notification from you but it does not appear here in the feed. I can see a portion of your question and I believe you were asking if it is possible to have an integration of neural networks when using NumPyro. The answer is yes. NumPyro uses JAX as a backend and you can use few different neural network frameworks that are built on top of JAX. Not sure if you deleted the comment or youtube did. I have seen it happen a few times now. Possibly a bug in RUclips!
What a helpful overview. Thank you infinitely!
🙏
Great content. Please keep making videos
🙏
vote for TFP and agree with your comments... especially the py zen related: many ways to do the same things is not always a good idea ("There should be one and preferably only one obvious way to do it!")
Tried to learn TFP but looks need more effort than my time budget so had to finish a small project using PyMC. But to be serious we really need TFP.
Industry application heavily relies on Python & TF (server cannot 'conda' at all) but tutorials and standards for TFP are not well established at this moment.
Would be great if you can walk us thru the TFP decently (and show us what is the preferred only one way XD)...
at 18:00 you say 2 chains on each parameter - are you sure it's on each parameter? How do you run MCMC on each parameter on it's own? Even if you focus on a, you have to know the values of b and sigma to calculate the joint, which is needed for MCMC...
Thanks for pointing this out. Indeed, saying "you are running or creating 2 chains per parameter can be confusing". Let me try to clarify.
As you figured out yourself, you do need other parameters to compute the joint prob; which means the MCMC algorithm (of your choice) is "sampling" from a multidimensional space.
Now you do not typically run the sampling procedure on the parameter space only once rather you run multiple processes (with different starting points). Let's say you ran the sampling procedure 4 times then "we say you have 4 chains per parameter".
But as mentioned above and rightly pointed out by you, what you have is the trace of samples per parameter that you "separately" plot for visual analysis and do diagnostics.
Hope this helps!
A future video idea on Numpyro?
🙏 yup!
Great stuff ! Can you do a video on your journey as developer, books , experiences, self-learning to inspire those on similar paths ?
Thanks Amit.
Interesting question. Have never given much thought about it but maybe one day!. 🙏
Your tutorials are amazing! Complexities clear and Clear and simple.! Can I ask a doubt? Which of the packages handle best when every thing is not conjugate? Say likelihood is non-gaussian and prior is non gaussian?
Thanks Arvind.
All of these packages have implementations of many MCMC methods with auto tuning of their various aspects.
Which one is best? The notion of best here could be -
A) is it package (PPL) easy to use
B) speed and accuracy of the inference algorithms implementations
Unfortunately there is no much data. There is an effort from Facebook to create a benchmark ai.facebook.com/blog/ppl-bench-creating-a-standard-for-benchmarking-probabilistic-programming-languages/
That said, IMO, in terms of expressability of your model Pymc3 and Numpyro are quite good.
@13.44 - The next thing to worry about is... Kapil, Why should we have to worry about anything when you are at the other end simplifying all esoterics. This is cool.
😀😀
Once I have posterior distribiutions of parameters how do I use them for predictions?
You would use it to get the "Posterior Predictive distribution". See ruclips.net/video/Kz7YbxHkVI0/видео.html
@@KapilSachdeva yeah I saw that video but there was only one distribiution on W now there are 3
Am not sure if I fully understood your concern but based on a gut feeling of what may be bothering you:
When we say posterior distribution it is not only about "one" parameter. It could be many parameters. We would then have a multivariate posterior distribution.
If above is not what concerns you, may be rephrase or elaborate and I will try to answer.
@@KapilSachdeva ok so the posterior distribution is multivariet I think I get it, but how do I deal with the integral in posterior predictive? When approximating posterior with Metropolis-Hastings integrals canceled eachother out but in posterior predictive there is only an integral so I'm unable to calculate alpha
Good question.
For the posterior predictive you would "estimate" it using the law of large numbers.
Expected value (expectation) can be estimated using the law of large numbers. You would sample from the posterior distribution instead of computing the integral.
The above statement assumes that you have a good understanding of expected value and law of large numbers.
PS:
Perhaps I should create a concrete but simple (code) example (with out using any framework) that illustrates the workflow end to end.
Great topic
🙏
Hi C, I saw a comment notification from you but it does not appear here in the feed.
I can see a portion of your question and I believe you were asking if it is possible to have an integration of neural networks when using NumPyro.
The answer is yes. NumPyro uses JAX as a backend and you can use few different neural network frameworks that are built on top of JAX.
Not sure if you deleted the comment or youtube did. I have seen it happen a few times now. Possibly a bug in RUclips!
I suppose some of your videos will let me crack ML interviews. Tysm.
Ha ha …glad u find these tutorials helpful. Thanks for always giving a word of encouragement. Means a lot to me.