This could be by far the best explanation I have seen for EM algorithm. The way you have connected the intuitive way to mathematical explanation, is so so commendable!!!! Thank you so much for your efforts
Brilliant explanation. I especially appreciate you first providing the intuition of the method in the verbal explanation of the E and M steps. I struggled with the seeing the math first in other lectures until seeing your video. Thanks for posting this.
Thank you for the high-quality contents that you have produced over the past few years. Most of the time, it really did help me get the intuition and understanding of what was going on with the theoretical concepts I was seeing in my courses. Once again, thank you !
Awesome explanation. I'd like to extend yours with my intuition regarding the E-Step: the first term p(x|m0) shows the probability of x happening for the chosen m0, and the second term LogLikelihood shows the probability of x happening for the computed m, and we want to maximize both. Because we want a choice with high probability from every aspect. That's why we multiply them together. Because the multiplication can weight between them. If one of them is small then the result will be small. It can be high only if both are high.
That's really great way to look at EM. I'm an engineering graduate but new to ML and the workup explanation before dropping into the maths is excellent. thanks
Yes please go on with the prove, that will be an interesting topic. Though I went on Andrew's ng video couple of times, but I couldn't understand it better than here!! You're a rock star in simplifying complex concepts!!
Thanks for the very clear explanation! A follow up video on how the EM algorithm can be used in gaussian mixture models or bayesian networks would be awesome!
Thanks a lot! That is a great explanation!!! I was struggling with EM for a long time!! :)) I'd grateful if you also talk about the proof of convergence!
I had a jolt of excitement when I saw you had decided to do a video on this topic. It's something I've had to revisit time and time again, always understanding the intuition, but always getting lost in the formulas. Your post did a great job at helping to explain the intuition. I did struggle a bit with your non-conventional likelihood notation, though. That did throw me off a little bit but I understand why you had to have it that way and quickly adjusted. The care you took in explaining why there is mu and mu0 just shows why you are a fantastic teacher.
Thanks for your explanation. I think my main mental knot was wondering why you alter N instead of looking for the best guess for x to locate the value of the unknown value. To realize that x doesn't change and the power of the algorithm lies in finding the optimal solution for the learner without caring for the actual value of x was what I needed for it all to make sense.
Thanks so much for this great and explanation! I would definitely be interested in the proof. It will be great if you could do a video on Gaussian mixture models as well and how it is solved using the EM algorithm.
Wonderful. Definitely helps my understanding. When I find time I want to see what you're doing w/ the stock predictions. If I remember lectures from business school, you should not be able "generate alpha" unless you possess information the market does not. In this case you could say you've found some new idea that has real predictive value, but either they will a) already have found this and put much more compute + their proximity to the actual place where the trades happen towards getting the answer first and beating you to the trades or b) didn't know it before but will immediately steal it and then see a) haha. But hey, I'll still watch to see what you've got going on.
Thanks So much Ritvik!Your videos are amazing...do you have list of playlist for machine learning to connect dots in ML concepts,I see playlist for data science but not for machine learning. Thanks
Nice to see the theorem guaranteeing convergence for sequences that are increasing and bounded being used to prove this. I do have a more pragmatic question which is how somebody would go about finding the argmax in the M step. Would gradient descent be used on the expectation of log-likelihood function (I would imagine in this case the expectation of log-likelihood would have to be convex for this to work) to find the argmax?
Yep, you can use any optimization method. For Gaussian mixture models there are explicit formulas for the M step which are obtained in the usual way by setting the gradient of the expected log-likelihood to zero.
Great explanation. However, the way you have written it, there is no difference between the likelihood function and the probability function. I think for clarity you should swap x,1,2 and \mu. Also you should use ; instead of | so that the likelihood function is not confused with conditional probability.
Thanks a for the easy-to-understand intuation of EM algorithm. Would you like to explain the Coin-flip example along with your formulation step ② and ③?
The expression for Expectation seems similar to Bayesian theory where we have prior belief (P(x|u)) and likelihood and we are multiplying both to get posterior. Is this the same concept?
I think it applies not only for estimation of mu, but any arbitrary parameter. Then it would not be as simple as taking average of all observed data. I could be wrong though :P
Love this question, thanks for asking. Indeed with this toy example the EM algorithm is overkill and it was mostly meant for instructive purposes. Of course, when we use things for instructive purposes we can miss the more interesting applications. Thinking about this, an interesting variation would be what if you have the data [1,2,x,y] drawn from a N(0,sigma) where now it’s the standard deviation sigma as well as two missing values you want to predict. This is interesting because it’s important to consider the values of the non missing data *and* the potential values of the missing data which are consistent with some estimate for sigma (since standard deviation is inherently a measure between data points)
This could be by far the best explanation I have seen for EM algorithm. The way you have connected the intuitive way to mathematical explanation, is so so commendable!!!! Thank you so much for your efforts
Glad it was helpful!
@@ritvikmath I comfirm, thank you for helping lost students
The world needs to see this. Thanks Ritvik, I honestly have utmost respect and love for the amount of hard work you put in your videos. Cheers :)
It would take a lot of time to develop these intuitions on your own.
Thank you Ritvik for simplifying EM algorithm like this. This is the best video I have seen so far.
Your channel and way of teaching is so amazing!! Very inviting, inclusive, and friendly. Thank you so much for such good vibes 💕
I have an exam tomorrow and this video was the thing I needed. I can't thank you enough dude.
thank you ritvik the best videos are in this channel.
Very intresting way of teaching thank you from TUNISIA
Most welcome!
Brilliant explanation. I especially appreciate you first providing the intuition of the method in the verbal explanation of the E and M steps. I struggled with the seeing the math first in other lectures until seeing your video. Thanks for posting this.
Incredible explanation! Was trying to understand the intuition behind EM for a long time! Thanks for the video! Keep Going!!
Glad it helped!
Thank you for the high-quality contents that you have produced over the past few years. Most of the time, it really did help me get the intuition and understanding of what was going on with the theoretical concepts I was seeing in my courses.
Once again, thank you !
It would take me two more lives to be able to explain it this well to someone, kudos! Great job buddy!
Wow, thanks!
YES! I have quiz on this NEXT WEEK!
Awesome explanation. I'd like to extend yours with my intuition regarding the E-Step: the first term p(x|m0) shows the probability of x happening for the chosen m0, and the second term LogLikelihood shows the probability of x happening for the computed m, and we want to maximize both. Because we want a choice with high probability from every aspect. That's why we multiply them together. Because the multiplication can weight between them. If one of them is small then the result will be small. It can be high only if both are high.
thanks for the additional inputs!
That's really great way to look at EM. I'm an engineering graduate but new to ML and the workup explanation before dropping into the maths is excellent. thanks
It's 4am and I saw this video and had to watch....... really great explanation bro.....your a natural teacher.....thanks for this......subscribed
your understanding and explanation of such a complicated concept is impeccable
Thanks!You explained such a complicated subject so clearly!!!!
thanks!
Thank you! By far the best channel for providing clear explanations to fairly complex problems.
Your videos are unreal, simple explanations of complex problems its insane.
Holy, i can't believe how good this video was :) thank you so much
By far the best explanation, amazing.
Wow, thank you for your work here. I finally feel confident in a subject in my masters classes and that means the world
The only explanation you need for understanding EM algorithm, proper chad explanation!
cant express how happy i m to see after yr videos . thanks a lot !
this is the best lecture for em algo
you are just amazing! What would be super useful would be an EM video based on your "Maximum likelihood" one.
Yes please go on with the prove, that will be an interesting topic. Though I went on Andrew's ng video couple of times, but I couldn't understand it better than here!! You're a rock star in simplifying complex concepts!!
Much better explanation than what I normally see. I would also be interested in seeing you go through the proof.
Thanks for the very clear explanation! A follow up video on how the EM algorithm can be used in gaussian mixture models or bayesian networks would be awesome!
Although there is more for fully understanding, I was able to gain the concept because of your video!
Ngl my favorite rapper-turned-algorithm
Absolutely fantastic. I agree w/ other comments... The DS world needs to see this. Thank you.
Glad you enjoyed it!
THANK YOU. You're literally saving my ML undergraduate course
awesome!
Thanks a lot! That is a great explanation!!! I was struggling with EM for a long time!! :))
I'd grateful if you also talk about the proof of convergence!
Your explanations are soooo clear! really appreciate the effort you put into your videos. Thank youu!!
What an amazing channel, honestly
Awesome! Best explanation of EC algorithm for the beginner!
Glad it was helpful!
I had a jolt of excitement when I saw you had decided to do a video on this topic. It's something I've had to revisit time and time again, always understanding the intuition, but always getting lost in the formulas. Your post did a great job at helping to explain the intuition. I did struggle a bit with your non-conventional likelihood notation, though. That did throw me off a little bit but I understand why you had to have it that way and quickly adjusted. The care you took in explaining why there is mu and mu0 just shows why you are a fantastic teacher.
Very well explained
i thank GOD i found your channel. A big thanks to youtube and to you!!
Broke down the most complicated algorithm in the simplest terms. Wow!
Ritvik, you are doing a great job, thanks
Thank you for all the work you put in your videos to make life's like mine easier. Cheers man!
Genius man, genius , Wonderful explanation !!
This explanation is amazing in order to get the concept
A real master can explain the most complex problem in an understandable way.
Thanks for your explanation.
I think my main mental knot was wondering why you alter N instead of looking for the best guess for x to locate the value of the unknown value. To realize that x doesn't change and the power of the algorithm lies in finding the optimal solution for the learner without caring for the actual value of x was what I needed for it all to make sense.
Loved it. Thanks for the efforts.
Great video !
Got this crystal clear. Thanks a lot!
Would love to see a proof video! Keep up the great work!
Thanks so much for this great and explanation! I would definitely be interested in the proof. It will be great if you could do a video on Gaussian mixture models as well and how it is solved using the EM algorithm.
Wonderful. Definitely helps my understanding. When I find time I want to see what you're doing w/ the stock predictions. If I remember lectures from business school, you should not be able "generate alpha" unless you possess information the market does not. In this case you could say you've found some new idea that has real predictive value, but either they will a) already have found this and put much more compute + their proximity to the actual place where the trades happen towards getting the answer first and beating you to the trades or b) didn't know it before but will immediately steal it and then see a) haha. But hey, I'll still watch to see what you've got going on.
Can't wait to see the proof
Lovely, that's very intuitive. Thank you so much.
Very cool! Thank you for teaching!
The best Thanks man
Excellent. Thank you so much! 👍
thanks!
Thanks for the great lecture. One question if I may: 2:20, why you put best guess 1 here instead of a random draw from your known distribution?
Amazing explanation!
Coild you please do the derivation or intuition for EM for clustering? I observe that it is described in many textbooks, but not in such a cool way. 😅
Amazing, thank you for that !
Glad you liked it!
Excellent explanation!!
Really nice explaination! Thank you!
Glad it was helpful!
Very compelling ... Brilliant
Excellent explanations!
You are a gem
Thankyou for explaining very clearly
Thank you so much for your explanation, helps me a lot
yeah this guy is the fucking goat
So clear -- wow!
Thank you so much for these videos!
One question: how do you estimate and maximize the integral in practice? That was the elephant for me...
this helped so much, thank you a lot!!
This is explained so well
great man, ultra great
Thanks for the video! What was not clear to me is whether we calculate all E(LL|M) for all Ms in which we can calculate the argman in step 3?
Hi Ritvik, thank you very much for awesome videos. Could you please make some videos on SQL?
thanks! and please check out my full SQL playlist here:
ruclips.net/p/PLvcbYUQ5t0UFAZGthysGOAtZqLl3otZ-k
@@ritvikmath Awesome! Thanks a lot.. Could you please add sql with window function to the playlist, if possible?
very good explanation!
Glad you think so!
Thanks for the great video! One question: if you have (1+2+x)/3 = x , then you can have close form solution, why you still need numerical approach?
Thanks So much Ritvik!Your videos are amazing...do you have list of playlist for machine learning to connect dots in ML concepts,I see playlist for data science but not for machine learning.
Thanks
Great video! Thank you so much
Glad it was helpful!
I'm interested in the proof!
amazing, thanks for such a clear explanation :)
Great teacher❤
Example with python coming anytime?
A worked example of the final process would be invaluable.
oh my god. this was so helpful
Awesome!
On step 2, what does the dx do at the end of that equation?
Nice to see the theorem guaranteeing convergence for sequences that are increasing and bounded being used to prove this. I do have a more pragmatic question which is how somebody would go about finding the argmax in the M step. Would gradient descent be used on the expectation of log-likelihood function (I would imagine in this case the expectation of log-likelihood would have to be convex for this to work) to find the argmax?
Yep, you can use any optimization method. For Gaussian mixture models there are explicit formulas for the M step which are obtained in the usual way by setting the gradient of the expected log-likelihood to zero.
Great explanation. However, the way you have written it, there is no difference between the likelihood function and the probability function. I think for clarity you should swap x,1,2 and \mu. Also you should use ; instead of | so that the likelihood function is not confused with conditional probability.
Thank you Ritvik for your explanation! does it work only for normal distribution or we can apply it for other kind of distributions
Is the EM algorithm the best algorithm to use in some specific problem (compared for example to the gradient descent algorithm)?
really great explanation! thank you :-)
Thanks a for the easy-to-understand intuation of EM algorithm. Would you like to explain the Coin-flip example along with your formulation step ② and ③?
Thank you so much BRO
The expression for Expectation seems similar to Bayesian theory where we have prior belief (P(x|u)) and likelihood and we are multiplying both to get posterior. Is this the same concept?
Can you do the proof too please
I’m very confused as of why not just maximize the log-likelihood of all the current observed data given some mu?
I think it applies not only for estimation of mu, but any arbitrary parameter. Then it would not be as simple as taking average of all observed data. I could be wrong though :P
Love this question, thanks for asking. Indeed with this toy example the EM algorithm is overkill and it was mostly meant for instructive purposes. Of course, when we use things for instructive purposes we can miss the more interesting applications. Thinking about this, an interesting variation would be what if you have the data [1,2,x,y] drawn from a N(0,sigma) where now it’s the standard deviation sigma as well as two missing values you want to predict. This is interesting because it’s important to consider the values of the non missing data *and* the potential values of the missing data which are consistent with some estimate for sigma (since standard deviation is inherently a measure between data points)
Incredible
Great videos. Got it in one go! Could you do Gaussian Mixture Models? Thanks.
Sir , I know, In E - step we estimate unknown x , but you are calculating Likelyhood . how are these connected ?