Probabilistic Programming - FOUNDATIONS & COMPREHENSIVE REVIEW!

Поделиться
HTML-код
  • Опубликовано: 9 сен 2024

Комментарии • 27

  • @MeshRoun
    @MeshRoun 2 года назад +3

    What a helpful overview. Thank you infinitely!

  • @eugeneL_N1E104
    @eugeneL_N1E104 Год назад +3

    vote for TFP and agree with your comments... especially the py zen related: many ways to do the same things is not always a good idea ("There should be one and preferably only one obvious way to do it!")
    Tried to learn TFP but looks need more effort than my time budget so had to finish a small project using PyMC. But to be serious we really need TFP.
    Industry application heavily relies on Python & TF (server cannot 'conda' at all) but tutorials and standards for TFP are not well established at this moment.
    Would be great if you can walk us thru the TFP decently (and show us what is the preferred only one way XD)...

  • @anuragkumar1015
    @anuragkumar1015 2 года назад +3

    Great content. Please keep making videos

  • @amitk3474
    @amitk3474 2 года назад +2

    Great stuff ! Can you do a video on your journey as developer, books , experiences, self-learning to inspire those on similar paths ?

    • @KapilSachdeva
      @KapilSachdeva  2 года назад +1

      Thanks Amit.
      Interesting question. Have never given much thought about it but maybe one day!. 🙏

  • @shankar2chari
    @shankar2chari 3 года назад +1

    @13.44 - The next thing to worry about is... Kapil, Why should we have to worry about anything when you are at the other end simplifying all esoterics. This is cool.

  • @user-wr4yl7tx3w
    @user-wr4yl7tx3w 2 года назад +2

    A future video idea on Numpyro?

  • @arvinddhupkar5158
    @arvinddhupkar5158 2 года назад +1

    Your tutorials are amazing! Complexities clear and Clear and simple.! Can I ask a doubt? Which of the packages handle best when every thing is not conjugate? Say likelihood is non-gaussian and prior is non gaussian?

    • @KapilSachdeva
      @KapilSachdeva  2 года назад +1

      Thanks Arvind.
      All of these packages have implementations of many MCMC methods with auto tuning of their various aspects.
      Which one is best? The notion of best here could be -
      A) is it package (PPL) easy to use
      B) speed and accuracy of the inference algorithms implementations
      Unfortunately there is no much data. There is an effort from Facebook to create a benchmark ai.facebook.com/blog/ppl-bench-creating-a-standard-for-benchmarking-probabilistic-programming-languages/
      That said, IMO, in terms of expressability of your model Pymc3 and Numpyro are quite good.

  • @user-or7ji5hv8y
    @user-or7ji5hv8y 3 года назад +1

    Great topic

    • @KapilSachdeva
      @KapilSachdeva  3 года назад

      🙏

    • @KapilSachdeva
      @KapilSachdeva  3 года назад

      Hi C, I saw a comment notification from you but it does not appear here in the feed.
      I can see a portion of your question and I believe you were asking if it is possible to have an integration of neural networks when using NumPyro.
      The answer is yes. NumPyro uses JAX as a backend and you can use few different neural network frameworks that are built on top of JAX.
      Not sure if you deleted the comment or youtube did. I have seen it happen a few times now. Possibly a bug in RUclips!

  • @RealMcDudu
    @RealMcDudu 2 года назад +1

    at 18:00 you say 2 chains on each parameter - are you sure it's on each parameter? How do you run MCMC on each parameter on it's own? Even if you focus on a, you have to know the values of b and sigma to calculate the joint, which is needed for MCMC...

    • @KapilSachdeva
      @KapilSachdeva  2 года назад +1

      Thanks for pointing this out. Indeed, saying "you are running or creating 2 chains per parameter can be confusing". Let me try to clarify.
      As you figured out yourself, you do need other parameters to compute the joint prob; which means the MCMC algorithm (of your choice) is "sampling" from a multidimensional space.
      Now you do not typically run the sampling procedure on the parameter space only once rather you run multiple processes (with different starting points). Let's say you ran the sampling procedure 4 times then "we say you have 4 chains per parameter".
      But as mentioned above and rightly pointed out by you, what you have is the trace of samples per parameter that you "separately" plot for visual analysis and do diagnostics.
      Hope this helps!

  • @ssshukla26
    @ssshukla26 3 года назад +1

    I suppose some of your videos will let me crack ML interviews. Tysm.

    • @KapilSachdeva
      @KapilSachdeva  3 года назад +1

      Ha ha …glad u find these tutorials helpful. Thanks for always giving a word of encouragement. Means a lot to me.

  • @Maciek17PL
    @Maciek17PL Год назад

    Once I have posterior distribiutions of parameters how do I use them for predictions?

    • @KapilSachdeva
      @KapilSachdeva  Год назад +1

      You would use it to get the "Posterior Predictive distribution". See ruclips.net/video/Kz7YbxHkVI0/видео.html

    • @Maciek17PL
      @Maciek17PL Год назад

      @@KapilSachdeva yeah I saw that video but there was only one distribiution on W now there are 3

    • @KapilSachdeva
      @KapilSachdeva  Год назад

      Am not sure if I fully understood your concern but based on a gut feeling of what may be bothering you:
      When we say posterior distribution it is not only about "one" parameter. It could be many parameters. We would then have a multivariate posterior distribution.
      If above is not what concerns you, may be rephrase or elaborate and I will try to answer.

    • @Maciek17PL
      @Maciek17PL Год назад

      @@KapilSachdeva ok so the posterior distribution is multivariet I think I get it, but how do I deal with the integral in posterior predictive? When approximating posterior with Metropolis-Hastings integrals canceled eachother out but in posterior predictive there is only an integral so I'm unable to calculate alpha

    • @KapilSachdeva
      @KapilSachdeva  Год назад +2

      Good question.
      For the posterior predictive you would "estimate" it using the law of large numbers.
      Expected value (expectation) can be estimated using the law of large numbers. You would sample from the posterior distribution instead of computing the integral.
      The above statement assumes that you have a good understanding of expected value and law of large numbers.
      PS:
      Perhaps I should create a concrete but simple (code) example (with out using any framework) that illustrates the workflow end to end.