For anyone that has trouble wrapping their head around why variable elimination is more efficient, writing out the explicit for loops to compute P(Y) was really helpful for me: If we assume W,X,Y,Z each have K possible values then we need to compute K^4 values to fill out the complete table for P(Y). The naive triple sum has K^3 terms and we need to compute this triple sum for each of the K possible values of Y, giving us a total of K^4 values. If we do variable elimination then we first compute f_W(x): for each value of X, call this x: f_W(x) = 0 for each value of W, call this w: f_W(x) += P(w)*P(x|w) Note: - big capital letters denote the random variable, lower case letters denote a value of the corresponding random variable. - f_W(x) is a table containing K numbers, one for each value of X - the innermost operation is "constant time" because we are just looking up these values in a table. - in total it takes K^2 operations to compute the f_W(x) table and then we store it away. Next we compute f_X(y): for each value of Y, call this y: f_X(y) = 0 for each value of X, call this x: f_X(y) += P(y|x) * f_W(x) Note: - f_X(y) is a table containing K numbers, one for each value of Y - the innermost operation is "constant time" because we are just looking up these values in a table! - in total it takes K^2 operations to compute the f_X(y) table and then we store it away. Last we can now compute P(Y) for each value of y: for each value of Y, call this y: P(y) = 0 for each value of Z, call this z: P(y) += P(z|y) * f_X(y) Note: - P(Y) is a table containing K numbers, one for each value of Y - the innermost operation is "constant time" because we are just looking up these values in a table. - in total it takes K^2 operations to compute this last table. Computing P(Y) by variable elimination takes 3 * K^2 operations, which is much less than K^4 for large K! Basically, by computing each of these tables in the right order we avoid repeating work that we already did.
By far the best explanation of variable elimination; thanks for motivating via brute force/enumeration. For the longest time, it wasn't clear to me that VE was about computational spend not about being the only possible mathematical solution to a problem.
Completely agree, my current professor makes it way to hard to understand, and I never understood what is the use of making things so abstract that students don't understand. What is then the point of education?
Very impressive, you make the model crystal clear, and I know that compute bayesian network is nothing than that to calculate a probability (for discrete variables), or a probability distribution (for continuous variables) efficiently.
For the slip node, can we say that the slip node is conditionally independent from rain? Or is it independent? Or is it still related indirectly? Does the order of summations in variable elimination matter? Also what are observed and unobserved variables? Ie are ancestor variables observed variables? Or are they the marginalized variables? Or something else?
Around 9:43 you simply say that P(S|W,R) is reduced to P(S|W) but you never give a more formal explanation of why. I know it's because of conditional independence. You could have easily added clarity by stating that you started with the chain rule of probability and then applied conditional independence assumption. That would save anyone who has learned basic probability theory a few minutes of their time, instead of making them pause to think through what just happened there.
Thanks for that note. To make it even more explicit for people who still had to think about it (like me): If two variables A and B are independent, P(A|B) = P(A). Here, S and R are independent (which is counterintuitive, as mentioned in the video). Therefore, P(S|W,R) = P(S|W).
Wait. What do the commas actually denote? It seems confusing that they're being used to denote both "AND" and "OR" (Union and Intersection) like at 14:49. Can someone explain what's going on?
very unclear and comfusing using venn diagrams to represent some of the probabilities and giving detail example of the math using numbers to show how it runs would be of great help, for people discovering the subject. I am fairly sure this is a great video for people who already understand the subject or have some grapst on it. But for new comer it is very confusing. not to mention the rise in difficulty between the first part which is quite easy to understand (although venn diagrams would help) and the second part which looks like elvish.
Great video but for the slipping bit your intuition isnt always true like it could be but if the ground is wet doesnt nessasarily mean it was raining as you said so it could not be raining and you could slip on dew covered grass. Loving this video tho as I dont know probability or bayesian classifiers which are in my literature for nns, okay you crossed out the intuition lol paused the video MB
For anyone that has trouble wrapping their head around why variable elimination is more efficient, writing out the explicit for loops to compute P(Y) was really helpful for me:
If we assume W,X,Y,Z each have K possible values then we need to compute K^4 values to fill out the complete table for P(Y). The naive triple sum has K^3 terms and we need to compute this triple sum for each of the K possible values of Y, giving us a total of K^4 values.
If we do variable elimination then we first compute f_W(x):
for each value of X, call this x:
f_W(x) = 0
for each value of W, call this w:
f_W(x) += P(w)*P(x|w)
Note:
- big capital letters denote the random variable, lower case letters denote a value of the corresponding random variable.
- f_W(x) is a table containing K numbers, one for each value of X
- the innermost operation is "constant time" because we are just looking up these values in a table.
- in total it takes K^2 operations to compute the f_W(x) table and then we store it away.
Next we compute f_X(y):
for each value of Y, call this y:
f_X(y) = 0
for each value of X, call this x:
f_X(y) += P(y|x) * f_W(x)
Note:
- f_X(y) is a table containing K numbers, one for each value of Y
- the innermost operation is "constant time" because we are just looking up these values in a table!
- in total it takes K^2 operations to compute the f_X(y) table and then we store it away.
Last we can now compute P(Y) for each value of y:
for each value of Y, call this y:
P(y) = 0
for each value of Z, call this z:
P(y) += P(z|y) * f_X(y)
Note:
- P(Y) is a table containing K numbers, one for each value of Y
- the innermost operation is "constant time" because we are just looking up these values in a table.
- in total it takes K^2 operations to compute this last table.
Computing P(Y) by variable elimination takes 3 * K^2 operations, which is much less than K^4 for large K!
Basically, by computing each of these tables in the right order we avoid repeating work that we already did.
Thanks!
By far the best explanation of variable elimination; thanks for motivating via brute force/enumeration. For the longest time, it wasn't clear to me that VE was about computational spend not about being the only possible mathematical solution to a problem.
Best explanation of probability I've received in my whole academic career, thank you
Completely agree, my current professor makes it way to hard to understand, and I never understood what is the use of making things so abstract that students don't understand. What is then the point of education?
My professor for AI explained this so badly that I had no idea what was going on. Thanks for this in-depth and logical explanation of these topics
Excellent video. You brought up a lot of small things that I was confused about and explained them
Your explanation is brilliant, it gives a very good intuition for the theory. Thanks a ton
This was great- please do more!🙏🏼
I was struggling to understand this in my class. Glad I came here.
Very impressive, you make the model crystal clear, and I know that compute bayesian network is nothing than that to calculate a probability (for discrete variables), or a probability distribution (for continuous variables) efficiently.
For the slip node, can we say that the slip node is conditionally independent from rain? Or is it independent? Or is it still related indirectly?
Does the order of summations in variable elimination matter?
Also what are observed and unobserved variables? Ie are ancestor variables observed variables? Or are they the marginalized variables? Or something else?
Great video. Would love to see the code for that assigment.
12.23 doesn't c,r mean car wash AND ( not OR) RAIN as mentioned in lecture
This is a great video on Bayesian Network. Other people creating videos should take a note from this one.
i have an assignment on this that i need to deliver in two hours and this video is saving me right now!
Good lecture,that is a big help for me to understand baysian network and formula.
Does "variable-elimination" imply: "the overall network's functionality got changed"? thanks
Great video, extremely clear and helpful. :)
This was a literal saviour! Thanks a ton!
What's the difference between enumeration and variable elimination anyway, still think it's only a difference in notation.
How come condition is "Rain or Carwash" not "Rain and Carwash"?
Around 9:43 you simply say that P(S|W,R) is reduced to P(S|W) but you never give a more formal explanation of why. I know it's because of conditional independence.
You could have easily added clarity by stating that you started with the chain rule of probability and then applied conditional independence assumption. That would save anyone who has learned basic probability theory a few minutes of their time, instead of making them pause to think through what just happened there.
Thanks! I'll definitely try to clarify that better next time I teach this topic.
Thanks for that note. To make it even more explicit for people who still had to think about it (like me): If two variables A and B are independent, P(A|B) = P(A). Here, S and R are independent (which is counterintuitive, as mentioned in the video). Therefore, P(S|W,R) = P(S|W).
which book is he using for the reference?
Thanks for great video! Helped me a lot in understanding this stuff for my Uni course :)
thanks, for sharing this lecture video!
Thanks ! Very nice explanation !
Damn, what a voice. Thanks for this
when u elimiate c, you have f(w), but where is r go?
Really useful, thanks!
Top tier video without a doubt.
Great video. Thanks a lot!
Very simple explanation, thans !
Wait. What do the commas actually denote? It seems confusing that they're being used to denote both "AND" and "OR" (Union and Intersection) like at 14:49.
Can someone explain what's going on?
The commas represent values to be factored i.e. P(W | C, R) = P(W | C) P(W | R)
good explanation !
Can you tell me what we need to know about this method of data mining Other than this, please.
Thank you for the video!! :)
Great video !
Came here searching for coal , found Gold ✌🏻✌🏻✌🏻✌🏻✌🏻
Until now I understand bayesian network and the notation.
wtf is this how is it so simple. had it always been this simple. thanks
love your voice bro!
your voice doesn't sound like your photo
great
Very good
awesome
very unclear and comfusing using venn diagrams to represent some of the probabilities and giving detail example of the math using numbers to show how it runs would be of great help, for people discovering the subject. I am fairly sure this is a great video for people who already understand the subject or have some grapst on it. But for new comer it is very confusing. not to mention the rise in difficulty between the first part which is quite easy to understand (although venn diagrams would help) and the second part which looks like elvish.
from Bihar (INDIA)
DId what my teacher tried to do in 1 hour in 5 minutes, and better so
Great video but for the slipping bit your intuition isnt always true like it could be but if the ground is wet doesnt nessasarily mean it was raining as you said so it could not be raining and you could slip on dew covered grass. Loving this video tho as I dont know probability or bayesian classifiers which are in my literature for nns, okay you crossed out the intuition lol paused the video MB
absolutely useless.