For those who may be confused about the underlying logic flow, the prerequisites are bayes theorem, conditional probability, conditional independence, D-separation algorithm, bayes networks
Your good teaching addicted me to go forward seeing more videos Good that you described everything slowly in details, Bad that you described everything slowly in details ;)
n is the number of z variables (or x variables) i.e. the number of time steps in the HMM m is the number of states each z variable (the hidden variable) can take
Thank you for your video. I have read the paper by Lawrence R. Rabiner explaining HMM and Forward/Backward algorithms, but it was not as clear as you explained! Very helpful!
For the complexity calculation. I understand where the first m comes from. Then it looks like there are only k z_{k}'s instead of m z_{k}'s. How do you get \Theta(m^2) for each k? After checking out the backward algorithm I understand that the complexity for the whole forward-backward algorithm is \Theta(nm^2), this is because in backward algorithm, we have (m-k) z_{k}'s. So the sum of both algorithm should give m z_{k}'s, the complexity for each k is \Theta(m^2) and the final complexity should be \Theta(nm^2). Could you explain a little bit more why the complexity for each individual algorithm is also \Theta(nm^2)?
Yes, it was not very clear, but see Jason Wu's note below: when we write \alpha(z_k), we really mean the joint probabilities p(z_k,x_k) for all possible outcomes of z_k -- of which there are m. So there are a total of m values \alpha(z_k), one for each outcome of z_k. Next, in the computation of EACH \alpha(z_k), there is the summation over all possible outcomes of z_{k-1} , of which there are also m. That's how we get a total of \Theta(m^2) for computing all (m) values of \alpha(z_k). And then since there is a total of n values that k takes, the (recursive!) computation of all alphas is m*m*n.
Can you explain why you are summing (at 2:15)? It seems to me that it should be a product there, not a sum. How did you come out with that formula for p(z_k, x_1:k)? Thanks.
At minute 10:55, shouldn't P(z[1],x[1]) be equal to P(z[1] | x[1]). P(x[1])? The emission matrix gives us the likelihood of an observation (Z) given a hidden state (X), not the other way around.
Hello. Could tell me what to do when emission probability equals to zero at some step? Then all further alphas equals to zero. Re-estimation formulas (Baum-Welch) don't make any sense then. I'm trying to implement HMM with guassian mixtures. So I can't use smoothing techniques since those are only for discrete distributions. How to deal with such a problem?
A lot in this and previous videos, you refer to "separation rule" and that you "condition on something". I don't understand what those mean? Can you give a link to any video where you have already explained them in details? i cannot followed this videos without proper understanding. Thanks.
Actually he was talking about D-separation, you can get better understanding of this concept in that link www.andrew.cmu.edu/user/scheines/tutor/d-sep.html :)
I'm failing my test today lol.
For those who may be confused about the underlying logic flow, the prerequisites are bayes theorem, conditional probability, conditional independence, D-separation algorithm, bayes networks
Your good teaching addicted me to go forward seeing more videos
Good that you described everything slowly in details, Bad that you described everything slowly in details ;)
Great video and explanation, unfortunately the variables m and n are not well defined and that's why people get confused
n is the number of z variables (or x variables) i.e. the number of time steps in the HMM
m is the number of states each z variable (the hidden variable) can take
It is the fifth time for me to hear the lecture,finally I understand the equation after 1 year!
Thank you for your video. I have read the paper by Lawrence R. Rabiner explaining HMM and Forward/Backward algorithms, but it was not as clear as you explained! Very helpful!
The paper by Rabiner explains it quite clear
Nothing new. Just wanted to say you deserve every bit of praise you are getting here and more. Cheers.
Can't imagine how some comments claim this crystal clear tutorial "confusing"..
watch the previous videos on HMM to understand m and n and all the variables explained. great explanation!
For the complexity calculation. I understand where the first m comes from. Then it looks like there are only k z_{k}'s instead of m z_{k}'s. How do you get \Theta(m^2) for each k? After checking out the backward algorithm I understand that the complexity for the whole forward-backward algorithm is \Theta(nm^2), this is because in backward algorithm, we have (m-k) z_{k}'s. So the sum of both algorithm should give m z_{k}'s, the complexity for each k is \Theta(m^2) and the final complexity should be \Theta(nm^2). Could you explain a little bit more why the complexity for each individual algorithm is also \Theta(nm^2)?
Yes, it was not very clear, but see Jason Wu's note below: when we write \alpha(z_k), we really mean the joint probabilities p(z_k,x_k) for all possible outcomes of z_k -- of which there are m. So there are a total of m values \alpha(z_k), one for each outcome of z_k.
Next, in the computation of EACH \alpha(z_k), there is the summation over all possible outcomes of z_{k-1} , of which there are also m.
That's how we get a total of \Theta(m^2) for computing all (m) values of \alpha(z_k).
And then since there is a total of n values that k takes, the (recursive!) computation of all alphas is m*m*n.
very clear and intuitive explanation. Thanks a lot!!
Can you explain why you are summing (at 2:15)? It seems to me that it should be a product there, not a sum. How did you come out with that formula for p(z_k, x_1:k)? Thanks.
barabum2 its the relation between joint probability and marginale probability
www.quora.com/What-is-marginalization-in-probability
At minute 10:55, shouldn't P(z[1],x[1]) be equal to P(z[1] | x[1]). P(x[1])? The emission matrix gives us the likelihood of an observation (Z) given a hidden state (X), not the other way around.
Looks nice, I am looking for a worked example, preferably from Natural Language Processing. Any idea?
Great explanation ! you make it so easy to understand!
Hello. Could tell me what to do when emission probability equals to zero at some step? Then all further alphas equals to zero. Re-estimation formulas (Baum-Welch) don't make any sense then. I'm trying to implement HMM with guassian mixtures. So I can't use smoothing techniques since those are only for discrete distributions. How to deal with such a problem?
Do you have any videos that work through example problems?
A lot in this and previous videos, you refer to "separation rule" and that you "condition on something". I don't understand what those mean? Can you give a link to any video where you have already explained them in details? i cannot followed this videos without proper understanding. Thanks.
Actually he was talking about D-separation, you can get better understanding of this concept in that link www.andrew.cmu.edu/user/scheines/tutor/d-sep.html :)
I am writing a code for this and I can't understand the 'm' variable.
a bit lost at this point after watching the prev 6 vids... suddenly the rate of new stuff ramped up!
I hope you were my professor.
how can I get p(z1) ? any help?
You're the best 🥳
You are a great man :D
very well explained
18 people do not know how to prove independence, and do not know probability chain rule :'(
11:26 wort written "known" ever
i wish a forward code in the matlab program.
I wish you code it
I believe \Theta(m): For all the situations of z_{k-1}. \Theta(m^2):For all the situations of (z_{k-1}, z_k) .
hard to follow . a lot of new things which are not clearly explain.
i dont understand the summation
very complicated to understand
m=n
ffs
no, m = # of training examples. n = number of timesteps
try to relate your explanations to real life applications
define things more proper please...
It is the fifth time for me to hear the lecture,finally I understand the equation after 1 year!