3:44 Intro, Gaussian Distribution, Probability Density Function (PDF) 7:38 GMM Intro 9:08 Covariance matrix 10:15 GMM Definition, K Gaussians 11:30 How to apply GMM for classification 12:30 Problem statement, Fitting a GMM model, Maximum Likelihood Estimate (MLE) 13:58 Similarity to Kmeans clustering algorithm 16:13 Expectation maximization (EM) algorithm and difference to Gradient Descent 18:15 When to apply GMM, anomaly detection, clustering, object tracking 19:30 Coding example with Python 25:10 EM algorithm workflow in practice, Log Likelihood 27:54 EM algorithm visual / walkthrough 36:30 Summary great video, many Thanks :)
In case you have bad results using Gaussian mixtures, keep in mind the EM optimization only has local convergence properties, just like gradient descent: it can get stuck. Restarting the the density estimation with other initial parameters might solve it ! :)
suggestion at time 6:45 minutes, the y values aren't the probabilities of the x values, intuitively the probability for a single point on the gaussian will be 0.
warning: when he finger styles his hair, get ready for hardcore info dump. PS: 3blue1brown series on linear algebra has THE BEST vid on eigen vectors/value pairs, no joking.
I watch 4-5 vídeos of you per day. I'm Learning generative models for drug Design Siraj. Watch your videos not only motivates me, also makes my life & study fun and cool.
Siraj, I think it would of been helpful if you showed the resulting clusters that you get from the gaussian mixture model approach in your data. You showed how to model your data using the gaussian mixture, but I am unclear on how we get the specific clusters (say 2 clusters) from that?
Thank you! Your videos helped me a lot... I was so lost and confused about this topic that I was on the verge of giving up. Checked out your tutorials that gave a lot of useful information and insights. Thanks a tonne! :) :D Keep up the good stuff
At 4:35, it appears that the score is nonnegative. Although a Gaussian distribution is a close approximation in this case, could a log-normal distribution also be used in a Gaussian Mixture Model? Are there advantages to selecting a Gaussian distribution instead?
Great video, I tried running your code on my terminal and it's giving the error that 'GaussianMixture' object has no attribute 'loglike', would you happen to know why an error like would occur, or anyone by that matter. Thank you so much
I have the problem with the gaussian mixture models, I don't know how generate outliers uniformly in the p-parallelotope defined by the coordinate-wise maxima and minima of the ‘regular’ observations in R?
3:45 Siraj, in my information theory class, I was told Gaussian distribution as the distribution which assumes the least about the data (maximized differential entropy for a given variance) so maybe you can include that in your explanation when someone asks why we assume Gaussian distribution apart from the central limit theorem.
hey siraj ! EM is a heuristic with no guarantees for global convergence. there have been recent algorithms based on method of moments, random projections etc. which provably recover the gmm under some assumptions
Hey Siraj! Just found your channel and it doesn't cease to amaze. I am learning a lot about AI and ML with your vibrant and enthusiastic expression. My 2 cents would be to talk a tiny bit slower but it is up to you. Congrats and Keep up the Good Work!
I have some questions: 1. In the end, what we achieved: probability distribution of people whether they keep playing the game? 2. May it cause overfitting if we set too many gaussian distributions? Regards.
Loved the explanation. If I have to model 6 features instead of 2, and use a sliding windows approach on my dataframe (I need to find the anomalous windows), how can I modify the weights and the rest of the code? Just looking for direction.
Hey Siraj, I have vectors with 10 components, thus 10 features. I labeled the vectors by 4 classes. I wanna use GMMs to calculate the probabilities for a new incoming vector belonging to each one of the classes. What do I use? Do I have to create a GMM for every class? If yes, how to model a GMM to a 10 feature vector? Or could or even should I use Multivariate Gaussian Distributions instead?
Hello Siraj, I am working on a project to extract the total bill from restaurant receipts. Is there any way that I could use CNN or any other deep learning techniques to achieve this. I am new to Ml and would greatly appreciate your suggestions.
I got pretty confused around 33:33 with the E step. You've computed wp1 and wp2, which is cool, and then normalised them so their sum is 1 [wp1/(wp1+wp2) + wp2/(wp1+wp2) = (wp1+wp2)/(wp1+wp2) = 1], which makes sense. You then add the log of this sum to self.loglike. But the log of 1 is 0... Which is where you lost me.
You guess a theta ( model params) , then that gives you a probability distribution of the hidden variables. With that known, you maximize the joint probability distribution of X and the hidden variables. That gives you a new theta. Repeat the 2 steps above: use the new theta model params instead of your guess.
He makes mistakes... If only that was the only one... Referring to Variance as Variation... Doesn't know how a Standard Deviation is calculated... omg.
Siraj I have a question/problem. I have two data inputs which is to be comparatively trained by a learning model. It's not a multiple set of data but only one. It's a set of pair of inputs. I have been reading pairwise svm. How do I do that? Is there a better model.
Hi Siraj, I appreciate your videos and I love your content. I' am working on a project on cross-matching using active learning, what advice would you have for me? I' am trying to build something scalable but not so computationally intense.
I am trying to use your notebook and getting this error -- any ideas?? I am getting an error for #checking the fitting process AttributeError: 'GaussianMixture' object has no attribute 'loglike'
Hey thanks for the video, However i noticed that your solution is rather hardcoded for a mixture of 2 distributions. What if we are dealing with a more complicated data set and we do not know how many distributions will be mixed? Is there any deterministic approach to find out this number?
The quality of the audience is reflected from the content:) Thank you for sharing and helping understand complex subjects in an approachable way. (and not dumbing it down:)
6:45 "the y values are the probabilities for those x values" aren't the y values the probability density of the x values, since in a continuous range of x values, the probability for a single value x is 0? Or did I miss something?
Technically speaking you are indeed correct, the probability of any point occurring on a continuous distribution such as the gaussian is 0. The y-axis for a normal distribution is density, not probability. I think Siraj just mentioned "probability" as a intuitive way to think about it.We can still use the area under the gaussian to compute the probability of getting a point in a small neighbourhood of x.
I was wondering some one would have commented about that... Technically speaking it's probably density as we don't say density at a single point we get probability only after integrating around an infinitesimal small interval around it.
You can use gradient descent. it's a standard maximization problem (likelihood).. the variable here is denoted by theta, where theta (for gmm) is the mean, variances (co variance matrix) and the probabilities for every gaussian. nothing stochastic when you have the given data points, a no more complex function then loss of a network.
I keep getting this error : AttributeError Traceback (most recent call last) in 10 try: 11 mix.iterate() ---> 12 if mix.loglike > best_loglike: 13 best_loglike = mix.loglike 14 best_mix = mix AttributeError: 'GaussianMixture' object has no attribute 'loglike' I am not sure what to do in this case. Any ideas? Thank you
Would be nice with timestamps, since it is quite impossible to find the bit of information about Gaussian mixture models that I was actually looking for...
Siraj this is Awesome!! Brother... Man you gave awesome reference links. Exploring them gave full knowledge on the concept. Rewatching the video after that made Complete sense.. Hope i find a Job at ML and DL and support you on Patreon
So the probability density function looks more intimidating than it really is. Thanks for explaining it. If you had to choose between a semester of linear algebra or statistics, which would you choose?
You're the real man! Why didn't you come to Indonesia? We also have ML/DL community here. :) Anyway, thanks for your elaboration of GMM, it is indeed helpful and easy to understand. Cheers!
It's always great and informative to watch and learn from your video. But my question is a non technical, but do provide a solution plz... Question : I saw your github profile, and I'm curious what filters you applied on your profile pic(dp) ?? :p ps: I already told you this question is going to be a non-technical one and Yes !!! you have been on my youtube's subscription list from the very beginning. Cheers !!!
Hi. Great again Siraj. You're the best on that online apparently. Should we have a video about non-parametric estimation or Higher Order statistics, perhaps ICA?
5:45 +siraj "whether it's a car or roller coaster that's increasing in velocity reaches a peak then decreases or a soundwave... very likely a Gaussian distribution would be a great model"...????? Isn't the bell curve representative of the frequency of the data, not the data itself??
3:44 Intro, Gaussian Distribution, Probability Density Function (PDF)
7:38 GMM Intro
9:08 Covariance matrix
10:15 GMM Definition, K Gaussians
11:30 How to apply GMM for classification
12:30 Problem statement, Fitting a GMM model, Maximum Likelihood Estimate (MLE)
13:58 Similarity to Kmeans clustering algorithm
16:13 Expectation maximization (EM) algorithm and difference to Gradient Descent
18:15 When to apply GMM, anomaly detection, clustering, object tracking
19:30 Coding example with Python
25:10 EM algorithm workflow in practice, Log Likelihood
27:54 EM algorithm visual / walkthrough
36:30 Summary
great video, many Thanks :)
From a muddy blur to crystal clear in 30 min, thank you very much for this video Siraj
In case you have bad results using Gaussian mixtures, keep in mind the EM optimization only has local convergence properties, just like gradient descent: it can get stuck. Restarting the the density estimation with other initial parameters might solve it ! :)
thanks Jason!
I love how passionate you are about this
suggestion at time 6:45 minutes, the y values aren't the probabilities of the x values, intuitively the probability for a single point on the gaussian will be 0.
Siraj. The depth and range of your knowledge still continues to amaze me.
thanks Antony!
The butt kissing ends at 3:40
Thanks. Haha
@Siraj
, why do you change the formula at 29:54? instead of sigma^2 you are using abs(sigma).
Very energetic presentation. Kept me attentive throughout the video. Hit the sub 2 minutes in it.
warning: when he finger styles his hair, get ready for hardcore info dump.
PS: 3blue1brown series on linear algebra has THE BEST vid on eigen vectors/value pairs, no joking.
I watch 4-5 vídeos of you per day. I'm Learning generative models for drug Design Siraj. Watch your videos not only motivates me, also makes my life & study fun and cool.
Siraj, I think it would of been helpful if you showed the resulting clusters that you get from the gaussian mixture model approach in your data. You showed how to model your data using the gaussian mixture, but I am unclear on how we get the specific clusters (say 2 clusters) from that?
Thank you! Your videos helped me a lot... I was so lost and confused about this topic that I was on the verge of giving up. Checked out your tutorials that gave a lot of useful information and insights. Thanks a tonne! :) :D Keep up the good stuff
At 4:35, it appears that the score is nonnegative. Although a Gaussian distribution is a close approximation in this case, could a log-normal distribution also be used in a Gaussian Mixture Model? Are there advantages to selecting a Gaussian distribution instead?
the iteration function is empty, which makes the current code completely random, it should be "mix.Mstep(mix.Estep())" inside that function
Like he understands that
Very well explained..... I was lost while our college professor was explaining GMM and EM...
Wow! Finally I got my head around this subject. Well done and amazing teaching skills 👏🏻
Andre
We love you Siraj
Great video, I tried running your code on my terminal and it's giving the error that 'GaussianMixture' object has no attribute 'loglike', would you happen to know why an error like would occur, or anyone by that matter. Thank you so much
I have the problem with the gaussian mixture models, I don't know how generate outliers uniformly in the p-parallelotope defined by the
coordinate-wise maxima and minima of the ‘regular’ observations in R?
3:45 Siraj, in my information theory class, I was told Gaussian distribution as the distribution which assumes the least about the data (maximized differential entropy for a given variance) so maybe you can include that in your explanation when someone asks why we assume Gaussian distribution apart from the central limit theorem.
Where do I get the dataset? It is not mentioned anywhere and is not in Github repository either
Dataset can be found at: raw.githubusercontent.com/brianspiering/gaussian_mixture_models/master/bimodal_example.csv
hey siraj ! EM is a heuristic with no guarantees for global convergence. there have been recent algorithms based on method of moments, random projections etc. which provably recover the gmm under some assumptions
25:22 EM model
Hey Siraj!
Just found your channel and it doesn't cease to amaze. I am learning a lot about AI and ML with your vibrant and enthusiastic expression. My 2 cents would be to talk a tiny bit slower but it is up to you. Congrats and Keep up the Good Work!
thanks Kashyap!
Siraj's desktop background has the Sierra mountains, but doesn't OS Sierra not work with Tensorflow and OpenAI and other machine learning stuff?
Great Video! Really helpful for Data scence students..
Awesome work Siraj
I have some questions:
1. In the end, what we achieved: probability distribution of people whether they keep playing the game?
2. May it cause overfitting if we set too many gaussian distributions?
Regards.
Hi Siraj, wonderful video! I am wandering what is the difference between Gaussian mixture model and least square method in the data fitting' view?
Hey @siraj where are you going to be in India would love to catch up
Here, x1, x2... are the vecors or are the data points of a vector x?
Love the motivation at the start, preach!
Loved the explanation. If I have to model 6 features instead of 2, and use a sliding windows approach on my dataframe (I need to find the anomalous windows), how can I modify the weights and the rest of the code? Just looking for direction.
Hey Siraj, I have vectors with 10 components, thus 10 features. I labeled the vectors by 4 classes. I wanna use GMMs to calculate the probabilities for a new incoming vector belonging to each one of the classes. What do I use? Do I have to create a GMM for every class? If yes, how to model a GMM to a 10 feature vector? Or could or even should I use Multivariate Gaussian Distributions instead?
This is very helpful for my machine learning exam! Stay awesome, Siraj!
@Siraj Raval Where can I see when and where the meet ups are?
Hello Siraj, I am working on a project to extract the total bill from restaurant receipts. Is there any way that I could use CNN or any other deep learning techniques to achieve this. I am new to Ml and would greatly appreciate your suggestions.
I got pretty confused around 33:33 with the E step. You've computed wp1 and wp2, which is cool, and then normalised them so their sum is 1 [wp1/(wp1+wp2) + wp2/(wp1+wp2) = (wp1+wp2)/(wp1+wp2) = 1], which makes sense. You then add the log of this sum to self.loglike. But the log of 1 is 0... Which is where you lost me.
You are right! Siraj should check and fix that with RUclips annotations.
Agree
you are getting better and better at explaining these things Siraj! keep up the great work you are helping a lot of people
How do I use this for spectra (wavelength, flux, flux_error) instead of a histogram?
33:30
wp1/(wp1+wp2) + wp2/(wp1+wp2) = 1
log(wp1 + wp2) = log(1) = 0
How is his model being trained?
You guess a theta ( model params) , then that gives you a probability distribution of the hidden variables. With that known, you maximize the joint probability distribution of X and the hidden variables. That gives you a new theta. Repeat the 2 steps above: use the new theta model params instead of your guess.
We actually try to get the value of log(wp1 + wp2) =1 not (wp1 + wp2) to be 1.
could you please show an example on 3d data (XYZ - points) ?
Where I can get the blog he is following?
whether wp1 + wp2 = 1 always...so self.loglike += log(wp1 + wp2) will be zero ????
Is it true ?? whether my assumption is wrong ??
Kindly explain...
He makes mistakes... If only that was the only one... Referring to Variance as Variation... Doesn't know how a Standard Deviation is calculated... omg.
Great presentation about GMM !! Thanks
Wondering if you would post the lecture notes/slides somewhere?
Siraj I have a question/problem. I have two data inputs which is to be comparatively trained by a learning model. It's not a multiple set of data but only one. It's a set of pair of inputs. I have been reading pairwise svm. How do I do that? Is there a better model.
Bruh you’re helping me pass my class. Thanks
Really thanks man, your video helped me a lot in my Hyperspectral Images classification project's
If
we add to the covariance matrix the gradient decent of the covariance matrix will the result stay positive definite?
Clearly explained the concept!!! Great presentation
Hi, how to change the variance and average Gaussian function in matlab? Can you show an example of what the code looks like?
Is there a guide in how to set up jupyter notebook?
If you aren't already using Python, use the anaconda distribution. www.continuum.io. It will also include the most useful libraries.
19:15 where are the link to those repositories?
You can find them in the notebook Siraj made for this video github.com/llSourcell/Gaussian_Mixture_Models/blob/master/intro_to_gmm_%26_em.ipynb
Hi Siraj, I appreciate your videos and I love your content. I' am working on a project on cross-matching using active learning, what advice would you have for me? I' am trying to build something scalable but not so computationally intense.
I am trying to use your notebook and getting this error -- any ideas??
I am getting an error for #checking the fitting process
AttributeError: 'GaussianMixture' object has no attribute 'loglike'
You is amazing! Siraj!
thank you siraj for such amazing videos....u really are the best
Hey thanks for the video,
However i noticed that your solution is rather hardcoded for a mixture of 2 distributions. What if we are dealing with a more complicated data set and we do not know how many distributions will be mixed? Is there any deterministic approach to find out this number?
Love this video. It presents so clear.
when I do your codes couldnot find data file error? Why? how can find it?
Thank you very much! Your explication is very good and educative! I'm recommending your channel to my friends too.
Love the lecture style! Wish the topic covers multivariate as well
you are the best source of ML... thanks for your attention(s) and love to AI!!!!!
Your accent reminds me of Mitchell from Modern Family(fav character) :')
Also great video thanks!!
The quality of the audience is reflected from the content:) Thank you for sharing and helping understand complex subjects in an approachable way. (and not dumbing it down:)
6:45 "the y values are the probabilities for those x values"
aren't the y values the probability density of the x values, since in a continuous range of x values, the probability for a single value x is 0? Or did I miss something?
Technically speaking you are indeed correct, the probability of any point occurring on a continuous distribution such as the gaussian is 0. The y-axis for a normal distribution is density, not probability. I think Siraj just mentioned "probability" as a intuitive way to think about it.We can still use the area under the gaussian to compute the probability of getting a point in a small neighbourhood of x.
I was wondering some one would have commented about that... Technically speaking it's probably density as we don't say density at a single point we get probability only after integrating around an infinitesimal small interval around it.
You can use gradient descent. it's a standard maximization problem (likelihood)..
the variable here is denoted by theta, where theta (for gmm) is the mean, variances (co variance matrix) and the probabilities
for every gaussian.
nothing stochastic when you have the given data points, a no more complex function then
loss of a network.
Great series!!!! even helps me in my AI learning curve at Udacity. Thanks for it. rgds tibor
Please where can I get the data you used in the video?
Are you coming to Brussels?
Those are really strong motivating words in the beginning :). Thanks.
Hi! How can we select the number of components?
I keep getting this error :
AttributeError Traceback (most recent call last)
in
10 try:
11 mix.iterate()
---> 12 if mix.loglike > best_loglike:
13 best_loglike = mix.loglike
14 best_mix = mix
AttributeError: 'GaussianMixture' object has no attribute 'loglike'
I am not sure what to do in this case. Any ideas?
Thank you
Thank you. Very helpful video. :)
Would be nice with timestamps, since it is quite impossible to find the bit of information about Gaussian mixture models that I was actually looking for...
Super tutorial! Thank you so much!
Siraj this is Awesome!! Brother... Man you gave awesome reference links. Exploring them gave full knowledge on the concept.
Rewatching the video after that made Complete sense..
Hope i find a Job at ML and DL and support you on Patreon
Hey I am trying to make a feature subset selection project using GMM clustering. Can you help me out with that?
So the probability density function looks more intimidating than it really is. Thanks for explaining it. If you had to choose between a semester of linear algebra or statistics, which would you choose?
Thank you very much for the great video!! Siraj is god of explanation
You're the real man! Why didn't you come to Indonesia? We also have ML/DL community here. :) Anyway, thanks for your elaboration of GMM, it is indeed helpful and easy to understand. Cheers!
i just loved the energy :D
You're the best! You've helped turn this 19 year old from a lazy kid into an inspired workaholic
so amazing! Keep it up
same! although I am 15 though
It's always great and informative to watch and learn from your video.
But my question is a non technical, but do provide a solution plz...
Question : I saw your github profile, and I'm curious what filters you applied on your profile pic(dp) ?? :p
ps: I already told you this question is going to be a non-technical one and Yes !!! you have been on my youtube's subscription list from the very beginning.
Cheers !!!
Thanks Siraj, good one!!
Hi. Great again Siraj. You're the best on that online apparently. Should we have a video about non-parametric estimation or Higher Order statistics, perhaps ICA?
Hey Siraj, Where will you be meeting folks in India?
Siraj, any plans on coming to Germany with in the future?
siraj, my guy.. this is so 🔥. will you be in Amsterdam sept 4-16 ?
Hello can anyone tell me where do I find the dataset i.e. the csv data file used in the code? Thanks.
Hi Siraj Raval, we love you from Tunisia
Such a good video that I clicked like button for 10 times :)
ended up with "no thumbs up" :P
Hi, your videos are great!. Please cover VGG, Alexnet, and others sometime.
thanks Aamir!
Can you please control your moving hands data points? too much distraction.
What is the best way to handle nonlinear data in recommender system .
Please any one can help me with this .
omg. I just discovered your channel..... sOOOOOOOOOOOO gOOOOOOOOOOOd
5:45 +siraj "whether it's a car or roller coaster that's increasing in velocity reaches a peak then decreases or a soundwave... very likely a Gaussian distribution would be a great model"...????? Isn't the bell curve representative of the frequency of the data, not the data itself??
Thanks for reading theory to me. Couldn't do that by myself
I know you're being sarcastic, but honestly, I'm looking for people to do just that for me, I HATE reading technical material.