What a nice explaining! Before watching this lecture, I just memorized expectation and variance of binomial distribution. Now I definitely understood them. Thank you professor!
In that Hat problem, why is P(X_i = 1) = 1/n? Shouldn't it depend on the value of i. What I mean is for the first person it is 1/n but for the second it is 1/n-1 if the first one did not get his hat and 0 if the first one picked his hat.
aisi taisi is right, but it will result in n*1/n: consider 3 people - the first has 1/3 chance to get his because there are 3 choices he can make - the 2nd can get his had out of 2 if the first didn't take his (conditional probabilty): probability the first didn't take his*probability of getting his: 2/3*1/2=1/3 - the 3rd can't choose anymore, but there is a 2/3 chance the others took his, so he has a 1/3 chance of getting his all of them have 1/3 chance with n: first has 1/n chance 2nd has (1-1/n)*1/(n-1)=(n-1)/n*1/(n-1)=1/n 3rd has (1-1/n-1/n)*1/(n-2)=(n-2)/n*1/(n-2)=1/n and so on ... you can observe as well if drawing a decision tree
Τhink of it as follows: There are n! ways of distributing the hats to the people as you rightly pointed out. So that is the cardinality of the sample space Ω. Now, let the event A where the k-th person gets his own hat. There are n-1 hats to distribute freely as one has to have his hat. So the cardinality of the event A turns out to be (n-1)! Thus the probability of the k-th person to get his own hat is: (n-1)!/n! = 1/n
Actually, the probability that the second person get his hat is also 1/n. Let A the event that the second person gets his particular hat and let B be the event that the first person gets the hat that belongs to the second person. By the law of total probability, P(A) = P(B*)P(A|B*) + P(B)P(A|B) = [(n-1)/n]*[1/(n-1)] + [1/n]*0 = 1/n
An explanation for 44:50 Consider, (a+b)^2=(a+b)*(a+b) =a^2+ab+ba+b^2 =a^2+b^2+ab+ba Similarly, (a+b+c)^2=a^2+b^2+c^+ab+ba+bc+cb+ac+ca i.e. the square of the sum of n terms is equal to sum of squares of all individual terms plus the sum of all of the possible products of 2 different terms. Imagine a, b, and c,... to be the individual X_i. Now, the sum we need to evaluate is X^2= (X_1 + X_2 + ..... + X_n)^2 From our previous conclusion, this sum should just be: {(X_1)^2+(X_2)^2+.......+(X_n)^2} + {(X_1 * X_2) + (X_1 * X_3) + (X_1 * X_4) + (X_1 * X_5) +....... + (X_2 * X_1) + (X_2 * X_3) + (X_2 * X_4) + ..........} Thus, X^2 = Sum over all i of (X_i)^2 + Sum over all i,j such that i=/=j of (X_i * X_j) Hope this helps!
Where does n^2 - n come from and how is it obtained? Is it just the initial nuber of hats mutliplied by th remaining number after 1 is correctly selected?
Think of this like a n*n matrix. Cut the main diagonal off, how many entries are still there? Well, to answer this let's get back to the start. We have n * n entries, or n^2. The main diagonal consists of n entries (you can see they "represent" one row or one column). So, if you take take the main diagonal of you get n^2 - n.
Think about it this way, the number of cross terms you get when you square a sum with n terms is the number of ways you can pair 2 terms out of the total of n individual terms (order matters, ie. pairing the 2nd term with the third term is counted separately from pairing the third term with the second term). Well, this is just the permutation n(n-1) which is equal to n^2-n.
Sir, it is meant to be equal to that, it doesn't mean that is the formulae for it. E[x] = sum(xpx), and here x has only two values 1 and 0. Here we don't care about P(X1X2 = 0) cause X1X2 = 0 so that's not required, then E[X1X2] = 1* P(X1X2= 1) + 0*P(X1X2=0) gives you the answer
Because the terms with symmetric X’s (for example X1*X2 and X2*X1) are treated as two terms. That’s why the total number of terms is n*(n-1). You can certainly use the 2*Sum formula; but then the total number of terms would become “n choose 2”, which is half of before.
Tossing the coin each time is basically like doing a binary decision, which can be described by an indicator function. The total number of heads is the sum of the results of each toss, which is the sum of of indicator functions. The summation is linear so you have E[\sum_i X_i] = \sum_i E[X_i].
Thanks Liang!!! This formula should work only in the case when we use a linear function such as the indicator function. It was confusing because John stressed that it was always true but I believe he meant, always true in the case of indicator functions! Thank you again! :D Isn't John just amazing?
I think there is sth wrong here, although the final result is exactly the same. X^2=xichma(Xi^2)+2xichmaXiXj and use arithmetic progression we have n(n-1)/2 XiXj,so the result will also be 1.
Pr. Tsitsiklis finds two envelopes with money notes, in every student's hat. Changes the notes to coins, and tosses each coin. What's the expectation that : 1- Pr. Tsitsiklis does something more exciting than tossing coins? 2- At most one hat, is available for Pr. Tsitsiklis by the Final Lecture (L=25)?
Nice lecture, but you should also add the lecture contents in the description so that people don't have to go through the entire lecture to find out whether it's relevant to their course or not.
If you repeat the experiment many times (that's the idea of expected value) it does makes sense. In about 1/2 of the experiments both will get theirs hats right and in the other 1/2 no one will get it right. On average 1 person gets the hat right although that result by itself is not possible. Similarly, the expected value of rolling a 6 face fair dice is 3.5, however there is no face with 3.5 dots. 😉
Very well explained. Really big fan of prof.John Tsitsiklis
This vedio has thousands of fundamentals in those 50 minutes!!!!! Amazing Proff.
Thanks MIT. Views of (video 7.)/ Views of (video 1.) is about 10%.
What a nice explaining! Before watching this lecture, I just memorized expectation and variance of binomial distribution. Now I definitely understood them. Thank you professor!
in greece, applied probability is a standard course in order to get accept to any economically oriented university.
fun fact, proffesor John Tsitsiklis is Greek :)
In that Hat problem, why is P(X_i = 1) = 1/n?
Shouldn't it depend on the value of i. What I mean is for the first person it is 1/n but for the second it is 1/n-1 if the first one did not get his hat and 0 if the first one picked his hat.
Consider all pick one hat at the same time.
aisi taisi is right, but it will result in n*1/n: consider 3 people
- the first has 1/3 chance to get his because there are 3 choices he can make
- the 2nd can get his had out of 2 if the first didn't take his (conditional probabilty): probability the first didn't take his*probability of getting his: 2/3*1/2=1/3
- the 3rd can't choose anymore, but there is a 2/3 chance the others took his, so he has a 1/3 chance of getting his
all of them have 1/3 chance
with n:
first has 1/n chance
2nd has (1-1/n)*1/(n-1)=(n-1)/n*1/(n-1)=1/n
3rd has (1-1/n-1/n)*1/(n-2)=(n-2)/n*1/(n-2)=1/n
and so on ...
you can observe as well if drawing a decision tree
Τhink of it as follows:
There are n! ways of distributing the hats to the people as you rightly pointed out.
So that is the cardinality of the sample space Ω.
Now, let the event A where the k-th person gets his own hat. There are n-1 hats to distribute freely as one has to have his hat. So the cardinality of the event A turns out to be (n-1)!
Thus the probability of the k-th person to get his own hat is:
(n-1)!/n! = 1/n
thank you!!
Actually, the probability that the second person get his hat is also 1/n. Let A the event that the second person gets his particular hat and let B be the event that the first person gets the hat that belongs to the second person. By the law of total probability,
P(A) = P(B*)P(A|B*) + P(B)P(A|B) = [(n-1)/n]*[1/(n-1)] + [1/n]*0 = 1/n
brilliant 👌this is getting really interesting
Awesome style of teaching...Thanks a lot !!!
Wow, the best explanation I have heard on discrete random vectors and the hat problem!!!!!
Thx,absolutely extraordinary lecture.
The hat problem deserved an all lecture on its own...
An explanation for 44:50
Consider,
(a+b)^2=(a+b)*(a+b)
=a^2+ab+ba+b^2
=a^2+b^2+ab+ba
Similarly, (a+b+c)^2=a^2+b^2+c^+ab+ba+bc+cb+ac+ca
i.e. the square of the sum of n terms is equal to sum of squares of all individual terms plus the sum of all of the possible products of 2 different terms.
Imagine a, b, and c,... to be the individual X_i.
Now, the sum we need to evaluate is X^2= (X_1 + X_2 + ..... + X_n)^2
From our previous conclusion, this sum should just be: {(X_1)^2+(X_2)^2+.......+(X_n)^2} + {(X_1 * X_2) + (X_1 * X_3) + (X_1 * X_4) + (X_1 * X_5) +....... + (X_2 * X_1) + (X_2 * X_3) + (X_2 * X_4) + ..........}
Thus, X^2 = Sum over all i of (X_i)^2 + Sum over all i,j such that i=/=j of (X_i * X_j)
Hope this helps!
Does anyone know the deeper reason for why mean=variance=1 in the hat problem at the end?
It's extremely helpful , thank you sir.
Where does n^2 - n come from and how is it obtained? Is it just the initial nuber of hats mutliplied by th remaining number after 1 is correctly selected?
Think of this like a n*n matrix. Cut the main diagonal off, how many entries are still there?
Well, to answer this let's get back to the start. We have n * n entries, or n^2. The main diagonal consists of n entries (you can see they "represent" one row or one column). So, if you take take the main diagonal of you get n^2 - n.
@@HackionSTx Clever and yet clear answer, thank you
@@loumierex you're welcome, I'm glad it helped someone. These videos and comments helped so much, it's the least I could do.
Think about it this way, the number of cross terms you get when you square a sum with n terms is the number of ways you can pair 2 terms out of the total of n individual terms (order matters, ie. pairing the 2nd term with the third term is counted separately from pairing the third term with the second term). Well, this is just the permutation n(n-1) which is equal to n^2-n.
excellent lecture.
how could he derive X^2=sum (Xi)^2+ sum(XiXj) ? please someone explains to me
it's just generalize of Newton binomial. Ex: (x+y+z)^2 = x^2 + y^2 + z^2 + 2xy + 2xz + 2yz
I don't understand why E[X_i, X_j] = P(X_1 X_2 = 1). Where does that come from? Also, don't we need to know also P(X_1 X_2 = 0)?
+haha I don't know wether I can answer your question, but could you maybe post a timestamp when Mr. Tsitsikilis talked about that?
Thanks, I figured it out already
Sir, it is meant to be equal to that, it doesn't mean that is the formulae for it. E[x] = sum(xpx), and here x has only two values 1 and 0. Here we don't care about P(X1X2 = 0) cause X1X2 = 0 so that's not required, then E[X1X2] = 1* P(X1X2= 1) + 0*P(X1X2=0) gives you the answer
Thanks for this wonderful lecture.
Very clearly explained!
is this how the conditional pmf at 16:39 is computed?
X=3
P{Y=3}=(2/20)+(4/20)=6/20
P{Y=4}=(1/20)+(2/20)=3/20
P{Y>=3}=P{Y=3}+P{Y=4}=6/20+3/20=9/20
P{X=1}=(1/20)+(2/20)=3/20
P{X=2}=(2/20)+(4/20)=6/20
P{X=3}
=P{X=1,Y>=3}/P{Y>=3}=P{X=1,Y=3}/P{Y>=3} + P{X=1,Y=4}/P{Y>=3}
=[(2/20)+(1/20)]/[(6/20)+(3/20)]
=(2/20)/(9/20) + (1/20)/(9/20)
=2/9 + 1/9
=3/9
P{X=2|Y>=3}
=P{X=2,Y>=3}/P{Y>=3}=(P{X=2,Y=3}+P{X=2,Y=4})/P{Y>=3}
=[(4/20)+(2/20)]/[(6/20)+(3/20)]
=4/9 + 2/9
=6/9
P{Y=3|X
why does the sum part have Sum(XiXj) instead of 2*Sum(XiXj) which is the correct formula of square of a sum?
Because the terms with symmetric X’s (for example X1*X2 and X2*X1) are treated as two terms. That’s why the total number of terms is n*(n-1). You can certainly use the 2*Sum formula; but then the total number of terms would become “n choose 2”, which is half of before.
i was good until the last 5 minutes of the video. Where did XiXj come from?
Those are cross terms when you expand the square. (a + b)**2 = a**2 + 2ab + c**2.
why is E[X^2] equal to the sum of all the X^2 plus the sum of all X_i and X_j (at minute 45)
at minute 49 I also do not understand how x_1 and X_2 both being one can mean that Xi and Xj are different?
@@angiolg2 algebra, you multiply two really long lists together
The hat problem is soooooooooooooo great
SOMEONE BE AN EPIC GAMER AND PUT TIMESTAMPS ON THESE VIDEOS FOR JOHN
who's still here in 2020 XDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD
me :)))))))))))))))
me study probability for my semesters.
Why is the expected value of sum = sum of expected values in case of the indicator function
Tossing the coin each time is basically like doing a binary decision, which can be described by an indicator function. The total number of heads is the sum of the results of each toss, which is the sum of of indicator functions. The summation is linear so you have E[\sum_i X_i] = \sum_i E[X_i].
Thanks Liang!!! This formula should work only in the case when we use a linear function such as the indicator function. It was confusing because John stressed that it was always true but I believe he meant, always true in the case of indicator functions! Thank you again! :D Isn't John just amazing?
can someone explain to me the hat problem? esp for the variance. thx so much!
I think there is sth wrong here, although the final result is exactly the same.
X^2=xichma(Xi^2)+2xichmaXiXj and use arithmetic progression we have n(n-1)/2 XiXj,so the result will also be 1.
Roku Vo agreed
Thank you for your detailed explaination :)
+Victor Laszlo Thanks for the explanation. I had the same doubt as Roku and your explanation helped.
i don't get what is he trying to say at 43:28
What do u mean, is pretty intuitive
@@leogaussbell1622 helpful response.
The response was as helpful, as the question was concrete:)
feels like too many things a squished into one lecture
Pr. Tsitsiklis finds two envelopes with money notes, in every student's hat.
Changes the notes to coins, and tosses each coin.
What's the expectation that :
1- Pr. Tsitsiklis does something more exciting than tossing coins?
2- At most one hat, is available for Pr. Tsitsiklis by the Final Lecture (L=25)?
17:30
Nice lecture, but you should also add the lecture contents in the description so that people don't have to go through the entire lecture to find out whether it's relevant to their course or not.
What if number of people are 2 ,then the expectation that 1 person gets his own hat won't make sense 😓
consider this : two people either bot can get their hats back or none of them do. that is 1 person getting their hat back on average ¯\_(ツ)_/¯
If you repeat the experiment many times (that's the idea of expected value) it does makes sense.
In about 1/2 of the experiments both will get theirs hats right and in the other 1/2 no one will get it right. On average 1 person gets the hat right although that result by itself is not possible.
Similarly, the expected value of rolling a 6 face fair dice is 3.5, however there is no face with 3.5 dots. 😉
These tutorials arent congruent with the course.