*My takeaways:* 1. Generating normally distributed data in code 0:45 2. Probability density function for distribution 8:10 3. Not everything is normal distributed 20:29 4. The central limit theorem 22:50 5. Pi is calculated using Monte Carlo Simulation 32:21 - Standard deviation gets better with more samples 42:00
"This is really amazing that this is true, and dramatically useful." This statement, at 25:19, it not only true CLT ... but also for this couse: Thank you!
Hmm, of the 43k people who watched "6. Monte Carlo Simulation", only 6k bothered to watch confidence intervals. Estimate the amount of the 43k who are gamblers trying to beat the system.
How can one use the Archimedes method for calculating pi in a Monte Carlo simulation and find a CI? Is seems like this method is a more straight forward way of finding pi.
Just thought you might be interested - the example that you gave from the book of Kings that seems to estimate pi as 3 has an interesting tradition associated with it. There is a concept of 'written' and 'read' in the reading of the scriptures and in this case 'line' is read and 'the line' is written. Hebrew has values associated with each letter and if we use the value of 'the line' you get 111 and 'line' is 106. If you use this as a factor to multiply the apparent 3 - you get a pretty good estimate of pi... הקו = 111 קו = 106 111/106*3= 3.1415
The weights in that Python code will be 1.0 always. From the description, it ought to be the count of each 'x' in a bin, divided by the number of values in the bin (which could be zero...). Discard the weights!
If you are talking about this line of code: weights = [1/numSamples]*len(dist) weights is actually equal to a list 1,000,000 items long of 1/1,000,000 ie: [1e-06, 1e-06, 1e-06, . . . ,1e-06] As in python [5]*5 == [5, 5, 5, 5, 5], not [25] you can test it by adding a print statement for part of the weights list after it is defined, like this: print(weights[:10]) #Prints first 10 items in weights list. (Don't try and print the whole list, its 1 million items long!) If you were talking about somthing else im sorry!
46:31 The slide says "both are factually correct". But i don't understand how the 2nd statement is true. Is it correct to say that the value of pi is between X and Y with probability 0.95, when in fact we know that the value of pi is between those X and Y with a probability of 1 ? The 2nd statement implies that the value of pi is not between X and Y with a probability of 0.05, which is false.
Once the confidence interval for an unknown parameter is constructed, the probability that the confidence interval contains the true value of the parameter is either 0 or 1. It cannot be 0.95.
Didn't you get a better estimate going from 1000 needles to 2000? Isn't 3.139 closer to Pi than 3.148 so it's an improvement isn't it? But it looks still be true from your samples that the simulations are not monotonically getting better.
Yes, 3.19 is closer to Pi than 3.148. He was probably just truncating in the last correct digit and that is why he though it was the other way around, because 3.19 has "two correct digits" and 3.148 has "three correct digits". Of course that is not the same as being closer to pi as this very example demonstrates
(Finally, my time to shine has arrived :) ) In a continuous variable, any real value inside an interval is possible. For example, between 0 and 1 we have infinite real numbers. The probability of sampling any of those particular values is 0 because there is an infinity of them. I hope it was clear.
I did not get the weight parameter in the formula shown at the beginning. It says [1/numSamples]*len(dist). However, numSamples is 1000000 and dist has always a length of 1000000 as well, so the weight will end up as 1. Am I missing something?
What is missing in the formula is to use float 1.0 instead of just 1 in the expression [1.0/numSamples]*len(dist). Otherwise you will get zeros for all weights list members.
mind the square brackets around [1/numSamples] - this is a list of length =1 Multiplying this by len(dist) gives you a list of length = len(dist). Example: [.2] * 5 = [.2, .2, .2, .2, .2]
Could somebody please explain why the precision is chosen to be .005 for the estimation of Pi? And what did he mean by saying "should probably use 1.96 instead of 2"? There are two "2" in the code, which one he meant? The whole lecture is titled "Confidence Intervals", but the actual topic is just skimmed in a couple of sentences 😳
You should probably watch the 6th lecture. All the qns u have are answered there. The Empirical Rule states that : 68% of the data is within 1 stdev of the mean 95% of the data is within 1.96 stdev of the mean (he used 2 instead of 1.96 for simplicity) 99.7% of the data is within 3 stdev of the mean
0.005 is number he chose as an acceptable range of error (Since exact value of pi = 3.141~ , we want estimates to lie between 3.136 ~ 3.146 with high confidence) Consider one simulation result where Estimate = 3.141556 Std.dev. = 0.0021 by the emperical rule there is 95% of chance that the actual value of pi will lie between 3.141556 - 2*0.0021 ~ 3.141556 + 2*0.0021 (there is 95% chance the estimate is correct within 0.0042(
I'm going to have my tutoring students watch these videos. You, sir, are an amazing teacher. And you are wrong about Mike Pence thinking pi is 3. He would never defile his mind with thinking of the value of pi. He knows this kind of unnatural fiddling with numbers is the devils work and would never participate in knowing of any part of it.
A lot more knowledge can be transmitted about the subject and much more better explained using only the chalk and the blackboard. We are upgrading computers and software, but we are downgrading our mind and intellect.
Ohh gosh, why all statistics teachers look and act the same boring way with a hint of attitude? The same in my university I never could follow the lecture cause of complete boredomness. I know it's my fault not the teacher's but does anyone agree? I watched lectures of analysis 1 2 complex for many hours no breaks and passed the exams no problem. This lecture I can never focus it's torture. However I'm very thankful because it's free and I appreciate that.
“....named after the astronomer Carl Guass...”. Carl Gauss was a major mathematician and physicist, as significant as Isaac Newton. This MIT professor clearly does not know who Carl Gauss is. Get your facts straight MIT.
Gauss was an astronomer. He's a great mathematician, but worked as a professor of astronomy and was the director of an astronomical observatory. Do you seriously think John Guttag, former head of MIT EECS, doesn't know who Gauss is?
Jonathan problem with most schools they give way too much Theory and not enough practical application. I went through an entire masters program and probability and statistics, and out of school couldn’t analyze a simple data set. i’m not deemphasizing the theory part, but wish schools would teach more like this and have separate academic tracks for those who want to focus solely on theory.
It's that kind of arrogance that leads to those situations where an entire class of "master" students can be asked "What is a confidence interval? How do we calculate it?" and not a single one of them raises a hand. A university should NEVER be afraid to review the basics. The time that is "wasted" on basics pays of exponentially when you finally get to the advanced stuff.
*My takeaways:*
1. Generating normally distributed data in code 0:45
2. Probability density function for distribution 8:10
3. Not everything is normal distributed 20:29
4. The central limit theorem 22:50
5. Pi is calculated using Monte Carlo Simulation 32:21
- Standard deviation gets better with more samples 42:00
Thank you!
"This is really amazing that this is true, and dramatically useful."
This statement, at 25:19, it not only true CLT ... but also for this couse: Thank you!
statistics are basics indeed but this course really helps me learn doing these stuff in python, thank you Mit
Hmm, of the 43k people who watched "6. Monte Carlo Simulation", only 6k bothered to watch confidence intervals.
Estimate the amount of the 43k who are gamblers trying to beat the system.
Hahahahahah
**commences needle dropping**
Which is a shame since CI is wrongly explained in that lecture
How can one use the Archimedes method for calculating pi in a Monte Carlo simulation and find a CI? Is seems like this method is a more straight forward way of finding pi.
Thank you Dr. Guttag.
Just thought you might be interested - the example that you gave from the book of Kings that seems to estimate pi as 3 has an interesting tradition associated with it. There is a concept of 'written' and 'read' in the reading of the scriptures and in this case 'line' is read and 'the line' is written. Hebrew has values associated with each letter and if we use the value of 'the line' you get 111 and 'line' is 106. If you use this as a factor to multiply the apparent 3 - you get a pretty good estimate of pi...
הקו = 111
קו = 106
111/106*3= 3.1415
The weights in that Python code will be 1.0 always. From the description, it ought to be the count of each 'x' in a bin, divided by the number of values in the bin (which could be zero...). Discard the weights!
Thanks man, I thought I was going crazy.
If you are talking about this line of code:
weights = [1/numSamples]*len(dist)
weights is actually equal to a list 1,000,000 items long of 1/1,000,000
ie: [1e-06, 1e-06, 1e-06, . . . ,1e-06]
As in python [5]*5 == [5, 5, 5, 5, 5], not [25]
you can test it by adding a print statement for part of the weights list after it is defined, like this:
print(weights[:10]) #Prints first 10 items in weights list.
(Don't try and print the whole list, its 1 million items long!)
If you were talking about somthing else im sorry!
Great lecture! Like professor's humor, particularly this 34:35 :)
I didn't get the joke. Could you tell me the context?
Mike Pence is a fundamentalist
@@andrei-un3yr Well, Mike Pence Voted Against Recognizing Pi back in 2009 that is why
Thank you! This course teaches soooo much better than the lecture provided in my university!
I would like to know when doing a monte carlo simulation why do we use Normal Inverse function in Excel?
37:07 I am confused with the equation needle in circle/needle in square = area of circle/area of square
46:31 The slide says "both are factually correct". But i don't understand how the 2nd statement is true. Is it correct to say that the value of pi is between X and Y with probability 0.95, when in fact we know that the value of pi is between those X and Y with a probability of 1 ? The 2nd statement implies that the value of pi is not between X and Y with a probability of 0.05, which is false.
Once the confidence interval for an unknown parameter is constructed, the probability that the confidence interval contains the true value of the parameter is either 0 or 1. It cannot be 0.95.
8:55
PDF formula in red rectangle is missing /
in code, factor2 is correct
Didn't you get a better estimate going from 1000 needles to 2000? Isn't 3.139 closer to Pi than 3.148 so it's an improvement isn't it? But it looks still be true from your samples that the simulations are not monotonically getting better.
Yes, 3.19 is closer to Pi than 3.148. He was probably just truncating in the last correct digit and that is why he though it was the other way around, because 3.19 has "two correct digits" and 3.148 has "three correct digits". Of course that is not the same as being closer to pi as this very example demonstrates
what is a bin? 3:15
Can someone tell me the code v[0][30:70] means? 6:14
total area of 40 bins is what i have concluded but why is that the "fraction within ~200 of mean"??
Great lecture. Pretty sure
11:44 could you expand on "the probability of any particular point is 0"?
(Finally, my time to shine has arrived :) ) In a continuous variable, any real value inside an interval is possible. For example, between 0 and 1 we have infinite real numbers. The probability of sampling any of those particular values is 0 because there is an infinity of them.
I hope it was clear.
Because there are an infinite number of possibilities
Would be great if sir you could also show the plots for more number of trials, so that we could observe the trends becoming gaussian :)
I did not get the weight parameter in the formula shown at the beginning. It says [1/numSamples]*len(dist). However, numSamples is 1000000 and dist has always a length of 1000000 as well, so the weight will end up as 1. Am I missing something?
What is missing in the formula is to use float 1.0 instead of just 1 in the expression [1.0/numSamples]*len(dist). Otherwise you will get zeros for all weights list members.
mind the square brackets around [1/numSamples] - this is a list of length =1
Multiplying this by len(dist) gives you a list of length = len(dist). Example:
[.2] * 5 = [.2, .2, .2, .2, .2]
absolutelyharmlesss Ah, got you! Thank you!
Could somebody please explain why the precision is chosen to be .005 for the estimation of Pi? And what did he mean by saying "should probably use 1.96 instead of 2"? There are two "2" in the code, which one he meant? The whole lecture is titled "Confidence Intervals", but the actual topic is just skimmed in a couple of sentences 😳
You should probably watch the 6th lecture. All the qns u have are answered there.
The Empirical Rule states that :
68% of the data is within 1 stdev of the mean
95% of the data is within 1.96 stdev of the mean (he used 2 instead of 1.96 for simplicity)
99.7% of the data is within 3 stdev of the mean
0.005 is number he chose as an acceptable range of error
(Since exact value of pi = 3.141~ , we want estimates to lie between 3.136 ~ 3.146 with high confidence)
Consider one simulation result where
Estimate = 3.141556
Std.dev. = 0.0021
by the emperical rule
there is 95% of chance that the actual value of pi will lie between
3.141556 - 2*0.0021 ~ 3.141556 + 2*0.0021
(there is 95% chance the estimate is correct within 0.0042(
24:23 in conclusion the subset is in the set
Thank you.
Did anyone call Pence and verify?
That comment he made about Pence was savage 😂.
34:40
„3, and I‘m sure that‘s what Mike Pence thinks it is...“
...statistics can be fun too!
(Religion as a Question of Precision, nice...)
This lecture is not about confidence intervals
I'm going to have my tutoring students watch these videos. You, sir, are an amazing teacher. And you are wrong about Mike Pence thinking pi is 3. He would never defile his mind with thinking of the value of pi. He knows this kind of unnatural fiddling with numbers is the devils work and would never participate in knowing of any part of it.
Lies again? Cock it
20:23
A lot more knowledge can be transmitted about the subject and much more better explained using only the chalk and the blackboard. We are upgrading computers and software, but we are downgrading our mind and intellect.
#POW
Ohh gosh, why all statistics teachers look and act the same boring way with a hint of attitude? The same in my university I never could follow the lecture cause of complete boredomness. I know it's my fault not the teacher's but does anyone agree? I watched lectures of analysis 1 2 complex for many hours no breaks and passed the exams no problem. This lecture I can never focus it's torture. However I'm very thankful because it's free and I appreciate that.
++
“....named after the astronomer Carl Guass...”.
Carl Gauss was a major mathematician and physicist, as significant as Isaac Newton. This MIT professor clearly does not know who Carl Gauss is. Get your facts straight MIT.
minor mistake though
Gauss was an astronomer. He's a great mathematician, but worked as a professor of astronomy and was the director of an astronomical observatory. Do you seriously think John Guttag, former head of MIT EECS, doesn't know who Gauss is?
These guys were Polymaths...
wow this is so basic the MIT should be ashamed to post this!
Jonathan problem with most schools they give way too much Theory and not enough practical application. I went through an entire masters program and probability and statistics, and out of school couldn’t analyze a simple data set. i’m not deemphasizing the theory part, but wish schools would teach more like this and have separate academic tracks for those who want to focus solely on theory.
It's that kind of arrogance that leads to those situations where an entire class of "master" students can be asked "What is a confidence interval? How do we calculate it?" and not a single one of them raises a hand. A university should NEVER be afraid to review the basics. The time that is "wasted" on basics pays of exponentially when you finally get to the advanced stuff.
come on.. Jonathan