Это видео недоступно.

Сожалеем об этом.

Importance Sampling - VISUALLY EXPLAINED with EXAMPLES!

Kapil Sachdeva

Просмотров 15 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 6 фев 2021
This tutorial explains the Importance Sampling technique and its variant for unnormalized distribution functions called Self Normalized Importance Sampling.
Notebook to go with this tutorial:
colab.research.google.com/dri...
#sampling
#statistics
#montecarlo

Комментарии • 53

@sudhagarraghavan9696 3 года назад ⁺¹³
Outstanding tutorial...probably the best in the RUclips on this topic. Your knowledge is very impressive and your presentation of the subject is excellent ! Thank you!
@malcolmyang5662 3 года назад ⁺⁵
One of the best explanation I have found so far in internet. GJ
@atomsmasher 2 года назад ⁺¹
Incredibly done tutorial, I looked at 4 videos before landing on this one and it is superb. Thank you for taking the time to enrich humanity by sharing knowledge clearly and intuitively
@KapilSachdeva 2 года назад
🙏 many thanks for these kind words. I am happy you found the tutorial helpful.
@proxyme3628 Год назад ⁺¹
Finally I can understand the importance sampling. Great way to start with uniform distribution p(x) and then introduce sampling from importance region with normal distribution q(x). Great way to get the idea of importance sampling. Thank you so much.
@KapilSachdeva Год назад
🙏 glad that it was helpful to you.
@schrodingerac 3 года назад ⁺²
This is a great tutorial..... I really appreciate the effort done to make it looks like this
@KapilSachdeva 3 года назад
🙏
@akapo98 Год назад ⁺¹
Very high quality video! Thank you very much!
@KapilSachdeva Год назад
🙏
@avishkarsaha8506 2 года назад ⁺¹
best video on importance sampling ive seen
@KapilSachdeva 2 года назад
🙏
@jonasschmitt3548 3 года назад ⁺²
Great video, thank's a lot :)
@KapilSachdeva 3 года назад
🙏
@prajwalchauhan6440 8 месяцев назад ⁺¹
This series is the best.
@KapilSachdeva 8 месяцев назад
🙏
@Matteo-uq7gc 3 года назад ⁺¹
Another awesome video! Sequential Importance Sampling would be amazing also!!!
@KapilSachdeva 3 года назад ⁺²
🙏… will do it; most likely in conjunction with particle filtering.
@Matteo-uq7gc 3 года назад
@@KapilSachdeva I will be on the lookout!!! Thank you
@sanjaykrish8719 4 месяца назад ⁺²
This is pure gold ❤
@KapilSachdeva 3 месяца назад
🙏
@mohsenvazirizade6334 9 месяцев назад ⁺²
Fantastic tutorial. Way to go! what are the other resources you reommend for other importance sampling methods?
@KapilSachdeva 9 месяцев назад ⁺¹
🙏
@eceserin 2 года назад
Very nice, thank you.
@KapilSachdeva 2 года назад
🙏
@vi5hnupradeep 2 года назад ⁺¹
Thank you so much sir !
@KapilSachdeva 2 года назад
🙏
@yogeshmathpati1472 2 года назад ⁺¹
Keep up the good work sir. Tysm!
@KapilSachdeva 2 года назад
🙏
@yogeshmathpati1472 2 года назад
Can I get the code of importance sampling of some problem to know how it's done ? Actually I'm new to coding and I'm not able to find it anywhere. It would be huge favour if you help. And something about Latin Hypercube Sampling too. Thanks
@KapilSachdeva 2 года назад
Here is a notebook in which I have added the example that I showed in the tutorial.
colab.research.google.com/drive/1Xm2e0MDVsoxgPIKIl-u60iu4fNARz3Kw?usp=sharing
Hope this helps.
@chandreshmaurya8102 2 года назад ⁺¹
Nice info on importance sampling. Can you tell me which software you used to make the videos?
@KapilSachdeva 2 года назад
🙏… PowerPoint
@bodwiser100 7 месяцев назад
@KapilSachdeva, very nice video! Super helpful as always. Can you please shed light on one sticky point that has always eluded me when studying sampling methods (including MCMC): what did you mean when you mentioned that one of the problems in computing the average is "we can't sample from p"? I thought it meant that we do not know the PDF p(x). But as I continued to watch the video further on, it seemed that it was not the case. We _can_ have access to p(x). But then I was wondering that if we know p(x), why can we not sample from it? In the same vein, I have always wondered that if we knew p(x), there would never be any need for the proposal function, and the transforms etc. and all that machinery. Isn't the unavailability of p(x) the whole reason why we are doing all this?
@KapilSachdeva 7 месяцев назад ⁺¹
Quite many people have this confusion. Here are two things that should help:
- just because you know the functional form of the PDF it does not mean you can sample from it. Most of the sampling algorithms are the transformations of uniform sampling.
- in the example I have used function whose analytical form I knew but when you are doing Bayesian statistics and have posterior prob then most of the time you do not even know the functional form.
@ajazhussain2919 4 месяца назад ⁺¹
thanks
@KapilSachdeva 3 месяца назад
🙏
@ArashSadr 2 года назад
Thanks for the great video, I was wondering this is a typo or not? At 13:17, on the bottom of right column texts, THe last formulation should be 10/N..... ? The 1/N is missing or I am wrong?
@KapilSachdeva 2 года назад ⁺¹
🙏No you are correct. Thanks for catching it … someone is really paying attention :)
@medomed1105 2 года назад
@@KapilSachdeva I think 10 should not be existed . It was for uniform
@100rabhkr_India 2 года назад ⁺¹
Dear Kapil Sir, thanks for this great video. At 4:10, you mention that P_theta is not normalized. I want to ask, why do we need our P_theta to be normalized? What problem would it create if its not normalized?
@KapilSachdeva 2 года назад
Thanks Saurabh.
A function can be only be considered a probability distribution function (PDF) if it is normalized. You use PDF to compute the probability of a state of your random variable. Probability is a number between 0 and 1.
In the expectation equation, the right most term is the pdf (p(x)). Therefore, your distribution better be normalized to use it.
In the importance sampling case you see that we need to compute the ratio p(x)/q(x). This ratio is that of PDFs and hence u need the normalizing constants for both of them.
Now, in most of the real world probabilistic modeling the computation of this normalizing constant is the most the difficult thing (computational cost).
@100rabhkr_India 2 года назад
@@KapilSachdeva Thanks for the quick response. Can you please provide an example of a PDF which is not normalized
@KapilSachdeva 2 года назад ⁺¹
The unnormalized PDF arise when you multiply/mix/convolve two high dimensional probability functions. Most of the time you see this happening when doing Bayesian Statistics. Read this page - en.wikipedia.org/wiki/Normalizing_constant .... The posterior distribution is result of mixing likelihood and prior divided by normalizing constant. You also saw that in the Kalman Filter tutorial. If you use non-linear functions for transforms you would end up resulting function for which computing the normalizing constant will be extremely costly.
Also look at the first 5-10 min of this video where I explain this problem of normalizing constant in another (but related) context (ruclips.net/video/68dJCDyzB0A/видео.html)
@100rabhkr_India 2 года назад ⁺¹
@@KapilSachdeva Understood. Thank you very much. Waiting for your videos soon on Particle filter, Bootstrap filter etc. Loved your videos. Thank you very much
@KapilSachdeva 2 года назад
🙏
@yiweikwok8735 Год назад ⁺¹
Thanks for the great video!
I wanted to ask about the "variance" you plotted (timestamp 10:10) in Monte Carlo sampling. At first time I didn't know what variance you were referring to, but by looking at the colab notebook, I realized you were referring to the variance of random variable $10*h(X)$, instead of the variance of the Monte Carlo estimate $10*\sum_{i=1}^N h(X_i) / N$.
So here I want to share some of my thoughts when I was learning it. If we are talking about variance of $10*h(X)$, then it's always 4, and the plot only shows an estimate of this ground truth variance, with $N$ getting larger and larger. For a small $N$, the estimate of variance could also have high variance, resulting a very inaccurate estimate of variance, i.e. could be much larger or smaller than 4. With a sufficiently large $N$, we can finally obtain an accurate estimate: 4. Running the program multiple times, one can observe the great fluctuations at the first of the plot.
It's another thing when it comes to the variance of the Monte Carlo estimate $10*\sum_{i=1}^N h(X_i) / N$. For this random variable, the variance is $4/N$, depending on the number $N$. Hence actually Monte Carlo reduces the variance of its estimate of the mean by increasing number of samples $N$.
However, the more important thing is, with the large variance of $10*h(X)$, one would require a very large $N$ to get the "law of large numbers" really working well. By Chebyshev's inequality, if we are expecting the Monte Carlo estimate to have an error less than \epsilon, the number of samples is inversely proportional to the "risk" (the probability that we fail to accomplish the goal). With a large variance of $10*h(X)$, the number of samples $N$ needs to be even larger. That's why we need importance sampling to reduce the sample variance, so that we don't need so many samples to perform Monte Carlo.
That's my own opinion, hope it helps 😄
@KapilSachdeva Год назад
🙏
@animatedmath4307 2 года назад ⁺¹
@14.41 you state you are interested in computing P(X\geq 2), but this has nothing to do with h. Do you mean P(h(X)\geq 2)? In such a case the rest makes sense
@KapilSachdeva 2 года назад
Based on your comment, in the hindsight I agree it is not proper to say we are interested in probability of X > 2. Indeed my sentence will create confusion as it has done for you. Apologies.
Here is how I should have explained:
In this scenario, we are interested in calculating the expectation of a function of a random variable X (i.e. h(X)), where X is normally distributed __but__ we only want to consider situations when X >=2
I hope it makes sense now.
@animatedmath4307 2 года назад ⁺¹
@@KapilSachdeva i see: then it is not even what i thought.
P(X\geq 2)= integral of Ind*p(x)dx, where Ind is the indicator function e p(x)=pdf(X).
P(h(X)\geq 2)=integral of Ind(h(x))*p(x) dx
E[restriction of h(X) to the domain [2, \infty)]=integral h(x)*Ind* p(x) dx
which is the second integral @15.50.
Now it is clear.
@KapilSachdeva 2 года назад
🙏
@lakex24 7 месяцев назад
Thank you for raising this question. It's indeed quite confusing for me as well. imo, the indicator function should be applied to h(x), where we should set h(x) = 1, for h(x) >= 2; and set h(x) = 0, for h(x) < 2. Since we are interested in the probability of h(x) >= 2, NOT the expectation of h(x) ...?

Следующие

Автовоспроизведение

Inverse Transform Sampling - VISUALLY EXPLAINED with EXAMPLES!