The problem with most math lectures, especially stochastics and statistics, is that they are too abstract to grasp the concept easily. Just put in a few concrete examples and it will be so much easier to understand.
As practical example Teta: input to a communication channel D: the observed value or the output MLE : value of input that maximizes the particular output Map: value of the most probable input given the observation value
Excellent video in terms of definition. This can give you a good overview of what exactly you are trying to compute. You should have given a good example of how MAP works.
Also like it, but often I find it helpful to get a simple, but concrete example. It can be very overwhelming, working only with abstract variables and functions, and in order not to drown in this abstractness, it's nice to relate things to the real world from time to time. But of course, this is just my own observation. Probably, a lot of people prefer to stay abstract.
I'm trying to find a single concrete example on the web on how do determine MLE and MAP but didn't find anything really elucidating. Looks like most texts are made by and for math students.
Great lectures to review old concepts. The only thing I would say is that he constantly goes into nearly irrelevant caveats like 'a' vs 'the' MAP, which would be infinitesimal in likelihood of occurring in real problems that you would need to worry about in optimization
i guess if you can understand the ideas delivered here you can easily use Matlab or Python to implement all these terms. Without reviewing the basic terms in (probability, linear algebra) things may become abstract.
At 5:38 it says that MLE is to maximize P(D|theta), but I learned elsewhere that inversely the MLE is defined as P(theta|D) = P(D|theta), so it tries to maximize P(theta|D). Am I mistaken? Please point me in the right direction. Thanks.
MLE tries to maximize P(D|theta). MAP tries to maximize P(theta|D). The relationship of these two probabilities is given by P(theta|D) = P(D|theta)*P(theta)/P(D), where P(D) is a constant. So maximizing P(theta|D) is equivalent to maximizing P(D|theta)*P(theta), the numerator.
Xin Jing Thanks very much for your explanation. But what do you mean by "MAP"? And since P(theata) is not a constant, would it affect the relationship(equivalence) in the likelihood?
The problem with most math lectures, especially stochastics and statistics, is that they are too abstract to grasp the concept easily.
Just put in a few concrete examples and it will be so much easier to understand.
Exactly. That's the problem I find with most lectures. Most of the concepts are vague and abstract with nothing to solidify them.
This is literally what he does in the next video. The problem with most commentators is that they are too impatient/lazy to even check the next video.
As practical example
Teta: input to a communication channel
D: the observed value or the output
MLE : value of input that maximizes the particular output
Map: value of the most probable input given the observation value
Who else is watching this at the night before exam?
Doing a take home exam now. probably going to fail this portion...
30 minutes before viva XD
Excellent video in terms of definition. This can give you a good overview of what exactly you are trying to compute.
You should have given a good example of how MAP works.
Also like it, but often I find it helpful to get a simple, but concrete example. It can be very overwhelming, working only with abstract variables and functions, and in order not to drown in this abstractness, it's nice to relate things to the real world from time to time. But of course, this is just my own observation. Probably, a lot of people prefer to stay abstract.
I'm trying to find a single concrete example on the web on how do determine MLE and MAP but didn't find anything really elucidating. Looks like most texts are made by and for math students.
This concept used in the Naive Bays model and this model is used to classify emails as spam or not spam.
Great lectures to review old concepts. The only thing I would say is that he constantly goes into nearly irrelevant caveats like 'a' vs 'the' MAP, which would be infinitesimal in likelihood of occurring in real problems that you would need to worry about in optimization
I have a question. So what is the purpose of the prior once we observe the data?
Where would be the theta MLE in the case of the graph at 11:15 ?
(nice video :))
Is MAP estimator a recursive or batch estimator?
do you listen what you are saying?
who else fell asleep watching this?
i guess if you can understand the ideas delivered here you can easily use Matlab or Python to implement all these terms. Without reviewing the basic terms in (probability, linear algebra) things may become abstract.
I can´t watch tutorials like that. Keep the cross still, it makes me feel so uncomfortable that its moving like all the time
You wouldn't happen to know if this is available in English anywhere, would you?
booooooooh
Sorry the way you pronounce theta and data really confuses me 😶🌫
At 5:38 it says that MLE is to maximize P(D|theta), but I learned elsewhere that inversely the MLE is defined as P(theta|D) = P(D|theta), so it tries to maximize P(theta|D). Am I mistaken? Please point me in the right direction. Thanks.
I find this: www.probabilitycourse.com/chapter9/9_1_3_comparison_to_ML_estimation.php
exactly.
??????
MLE tries to maximize P(D|theta). MAP tries to maximize P(theta|D). The relationship of these two probabilities is given by P(theta|D) = P(D|theta)*P(theta)/P(D), where P(D) is a constant. So maximizing P(theta|D) is equivalent to maximizing P(D|theta)*P(theta), the numerator.
Xin Jing Thanks very much for your explanation. But what do you mean by "MAP"? And since P(theata) is not a constant, would it affect the relationship(equivalence) in the likelihood?
@@LeoAdamszhang for ML you maximize the Likelihood based on D: argmax L(Theta|D). not P.
Sorry, but at time 6.00 the correct formula is
p(D, theta) = p(theta, D)p(theta)
I'am right?
No , this is wrong. Consult Bayes Rule