Marginalizing the value conveyed by *your* playlist, to *my* understanding of this subject, is intractable. And if what I said is even remotely sensible, per the rules of probability, the whole credit goes to you.
I've been following your channel for a while and you've really helped me understand complicated probability concepts, thank you! One question: I didn't understand how the z variable is a latent one. Why can't it just be a parameter?
First of all: Thanks for the feedback :). I am super glad, I could help! I think, here, it wouldn't make sense to have it as a parameter. A parameter, at least in my understanding, would be an adjustable value that defines the distribution of a random variable. Each data-point you observe (and that you want to cluster) consists (at least in the assumptions of a Gaussian Mixture Model) of a class and a position. Both are random variables, meaning that a data-point "does not follow belong to one class or one position". Instead, there is a probability associated with potential classes and potential positions in the observed space. The class variable is considered latent, because in the task of clustering, we do not know to which class a certain point belongs to. Certainly, if we did a scatter plot, we could use our "eye-norm" to figure this out. But we rather want to have a more probabilistic/mathematical treatment. This is because, it could also belong to a not-so-obvious class and just be an unlikely spread from the cluster's center. I hope that added some info into the right direction. Please ask a follow-up question if something remains unclear.
@@MachineLearningSimulation That makes sense, thank you! I got confused because of the nature of the latent variable because we infer its belongings from data, similar to how we model the distribution.
Is there a connection between using mixture coefficients to take a linear combinations of two gaussians and DGM approach, where the parameters of the gaussian parameters changes condition upon which category? They seem like 2 very different approaches to arrive at the probability. (Let me know if my question is not clear.). Thanks.
Thanks for the question :) I would view it from two perspectives: 1) Calculating the likelihood/probability density of one sample. Imagine you have the GMM with two classes from the video, and you have sample, let's say at X=3.0 . Now, in order to get the probability of X, we have to marginalize over the latent class, that is because we don't know the class it belongs to. Hence p(X=3.0) = pi_0 * N(X=3.0, mu_0, sigma_0) + pi_1 * N(X=3.0, mu_1, sigma_1) In general, we would of course have the summation symbol, but since we only have two contributions in the sum, I explicitly wrote it down. This is of course a mixture here, and you could also call it a linear combination of Normal distributions. 2) Sampling the GMM: Here you would first sample a latent class, then use the corresponding Normal to sample a point and then "throw away the latent class because it is not observed". Also take a look at this video after 14:10 ruclips.net/video/kMGjXVb8OzM/видео.html where I first do this process manually and then use TensorFlow Probabilities built-in Mixture Distribution A remark: If you have a special case of a Mixture Model in which you would observe the class (i.e. it is not latent) then you would evaluate the joint p(Z, X) instead of the marginal. "If you have more information than you should of course also use it." I think this could refer to the second case you mentioned. However, as we commonly use GMM for clustering in which we of course don't know the class, it is not really much seen in application. I hope that helped :) Let me know if something was unclear.
Hey, thanks again :) Are you referring to what is shown in TensorFlow Probability? If so, then yes. Since they are implemented as the Mixture Model in TFP, the probability of getting any value from the domain of possible values is 1, which is the condition for normalization. Is that what you were asking?
Unfortunately, I overestimated my video output :D So it took me a little longer, but here is the continuation for the Multivariate Case: ruclips.net/video/iqCfZEsNehQ/видео.html The videos on the EM derivation and its implementation in Python will follow.
Marginalizing the value conveyed by *your* playlist, to *my* understanding of this subject, is intractable. And if what I said is even remotely sensible, per the rules of probability, the whole credit goes to you.
Thanks so much :).
The analogy to probability theory is also highly appreciated 😂
I've been following your channel for a while and you've really helped me understand complicated probability concepts, thank you! One question: I didn't understand how the z variable is a latent one. Why can't it just be a parameter?
First of all: Thanks for the feedback :). I am super glad, I could help!
I think, here, it wouldn't make sense to have it as a parameter. A parameter, at least in my understanding, would be an adjustable value that defines the distribution of a random variable. Each data-point you observe (and that you want to cluster) consists (at least in the assumptions of a Gaussian Mixture Model) of a class and a position. Both are random variables, meaning that a data-point "does not follow belong to one class or one position". Instead, there is a probability associated with potential classes and potential positions in the observed space. The class variable is considered latent, because in the task of clustering, we do not know to which class a certain point belongs to. Certainly, if we did a scatter plot, we could use our "eye-norm" to figure this out. But we rather want to have a more probabilistic/mathematical treatment. This is because, it could also belong to a not-so-obvious class and just be an unlikely spread from the cluster's center.
I hope that added some info into the right direction. Please ask a follow-up question if something remains unclear.
@@MachineLearningSimulation That makes sense, thank you! I got confused because of the nature of the latent variable because we infer its belongings from data, similar to how we model the distribution.
Thanks thousands!!
You're welcome 🤗
Is there a connection between using mixture coefficients to take a linear combinations of two gaussians and DGM approach, where the parameters of the gaussian parameters changes condition upon which category? They seem like 2 very different approaches to arrive at the probability. (Let me know if my question is not clear.). Thanks.
Thanks for the question :)
I would view it from two perspectives:
1) Calculating the likelihood/probability density of one sample. Imagine you have the GMM with two classes from the video, and you have sample, let's say at X=3.0 . Now, in order to get the probability of X, we have to marginalize over the latent class, that is because we don't know the class it belongs to. Hence
p(X=3.0) = pi_0 * N(X=3.0, mu_0, sigma_0) + pi_1 * N(X=3.0, mu_1, sigma_1)
In general, we would of course have the summation symbol, but since we only have two contributions in the sum, I explicitly wrote it down. This is of course a mixture here, and you could also call it a linear combination of Normal distributions.
2) Sampling the GMM: Here you would first sample a latent class, then use the corresponding Normal to sample a point and then "throw away the latent class because it is not observed". Also take a look at this video after 14:10 ruclips.net/video/kMGjXVb8OzM/видео.html where I first do this process manually and then use TensorFlow Probabilities built-in Mixture Distribution
A remark: If you have a special case of a Mixture Model in which you would observe the class (i.e. it is not latent) then you would evaluate the joint p(Z, X) instead of the marginal. "If you have more information than you should of course also use it." I think this could refer to the second case you mentioned. However, as we commonly use GMM for clustering in which we of course don't know the class, it is not really much seen in application.
I hope that helped :) Let me know if something was unclear.
Hey, another great video! Is the GMM PDF you show at the end normalized? Thanks!
Hey, thanks again :)
Are you referring to what is shown in TensorFlow Probability? If so, then yes. Since they are implemented as the Mixture Model in TFP, the probability of getting any value from the domain of possible values is 1, which is the condition for normalization.
Is that what you were asking?
Thanks for this video. Can you make a video about multivariate case ?
Hey, thanks for the feedback ☺️
Yes that's already planned. I think it will go online in 3 to 4 weeks
Unfortunately, I overestimated my video output :D So it took me a little longer, but here is the continuation for the Multivariate Case: ruclips.net/video/iqCfZEsNehQ/видео.html
The videos on the EM derivation and its implementation in Python will follow.
sorry, why the categorical distribution's P(Z) = Cat(Pi) = product of all the pi[0], pi[1]?
Hi, thanks or the question. Do you have a time stamp in the video to which you are referring?