After reviewing probability courses and this one itself many times, I found this lecture to be a great one. Some vague knowledge in real analysis will help. But I still need to come back later after digging deep into other sound footings for foundation. Thank MIT and professor.
29:51 that’s why electron clouds are drawn as balloons with really solid surface areas. We can draw the surface of the solid that the electron is in 95% of the time (or whatever P is if it is not 95%) because of “convergence in probability”.
🎯 Key Takeaways for quick navigation: 00:00 📊 *The speaker discusses the central limit theorem and its application to estimating proportions using averages.* 02:19 🔄 *The central limit theorem implies that the distribution of square root of n times the average converges to a standard normal random variable.* 03:16 🔄 *The probability that a Gaussian random variable exceeds q alpha over 2 is equal to alpha.* 05:08 📉 *Confidence intervals are built by transforming the estimate in a way that results in a pivotal distribution, not dependent on unknown parameters.* 07:28 🧠 *The concept of pivotal distributions is introduced to create estimates that are asymptotically independent of unknown parameters.* 13:33 📉 *Hoeffding's inequality provides a useful tool for bounding the probability that the sample average deviates from its expectation for any n.* 16:47 📉 *Solving Hoeffding's inequality for a specific case shows a method for constructing confidence intervals without the need for large sample sizes.* 19:43 📈 *The Hoeffding inequality's worst-case scenario is discussed.* 20:12 🔄 *Combining intervals to determine the probability of a variable being outside a specified range.* 21:34 🤔 *Comparing the margin square root of log 2 over alpha divided by 2n to q alpha over 2/3n.* 22:55 🎲 *The role of assumptions in statistics, and the importance of balancing assumptions for confident statements.* 23:25 🔄 *Different types of convergence in statistics, emphasizing convergence in distribution as crucial.* 24:51 🔄 *Convergence in distribution explained through the concept of probability computations on random variables.* 26:17 🧐 *Stronger conditions needed for convergence in distribution to allow combining random variables effectively.* 27:42 🔄 *Almost sure convergence and convergence in probability explained, highlighting their differences.* 29:06 📉 *Convergence in Lp and its relation to the weakening of convergence conditions.* 31:27 🔄 *The importance of the characteristic function in proving convergence in distribution, especially in the central limit theorem.* 33:19 🔄 *Equivalence between convergence in almost sure, convergence in probability, and convergence in distribution.* 38:35 🔄 *The continuous mapping theorem: if Tn goes to T, then f(Tn) goes to f(T) for continuous functions f.* 39:32 📊 *Convergence in probability involves decreasing index values implying convergence, eventually leading to convergence in distribution.* 40:29 🔄 *Operations and limits differ between convergence almost surely and convergence in probability, allowing various manipulations for the former.* 41:55 📈 *Convergence in distribution criteria includes the convergence of characteristic functions or bounded continuous functions of the random variable.* 42:24 🔄 *Slutsky's theorem states that if one random variable converges in probability and another in distribution, certain operations are still valid.* 43:23 🚇 *Inter-arrival times in queuing theory, modeled by exponential distribution, are useful in systems with memoryless properties.* 45:16 📉 *Exponential distribution is often employed in modeling positive random variables, such as inter-arrival times.* 48:39 🔄 *The average of inter-arrival times in queuing theory, denoted by Tn bar, serves as an estimator for 1 over lambda, the rate parameter.* 50:32 🔄 *Strong and weak laws of large numbers support the convergence of Tn bar to 1 over lambda.* 52:49 📚 *The variance of the exponential distribution with parameter lambda is 1 over lambda squared, impacting the central limit theorem.* 56:53 📊 *Construction of a confidence interval for 1 over lambda involves manipulation of inequalities and dependency on Tn bar.* [01:00:11 URL](ruclips.net/video/C_W1adH-NVE/видео.html) *📈 First-order Taylor expansion involves finding a theta bar between two values at which the expansion is performed, making them equal.* [01:00:38 URL](ruclips.net/video/C_W1adH-NVE/видео.html) *📊 Multiplying by root n in Taylor expansion leads to root n Zn minus theta times g prime of theta bar.* [01:01:33 URL](ruclips.net/video/C_W1adH-NVE/видео.html) *📉 Law of large numbers implies that theta bar converges to theta as Zn approaches theta.* [01:02:29 URL](ruclips.net/video/C_W1adH-NVE/видео.html) *🍔 The "sandwich theorem" visualizes the convergence of theta bar to theta, considering the movement of Zn and theta.* [01:03:55 URL](ruclips.net/video/C_W1adH-NVE/видео.html) *📜 Slutsky's theorem allows combining the convergence in distribution of Xn and convergence in probability of Yn if the limit of Yn is a constant.* [01:05:48 URL](ruclips.net/video/C_W1adH-NVE/видео.html) *🔄 Slutsky's theorem facilitates combining sequences of random variables converging in distribution and probability under specific conditions.* [01:07:16 URL](ruclips.net/video/C_W1adH-NVE/видео.html) *🔍 Delta method extends the central limit theorem to functions of averages, considering the derivative of the function.* [01:08:11 URL](ruclips.net/video/C_W1adH-NVE/видео.html) *📊 Application of the Delta method to the function g(x) = 1/x results in a confidence interval for lambda.* [01:10:59 URL](ruclips.net/video/C_W1adH-NVE/видео.html) *🔧 Replacing lambda by lambda hat in the confidence interval is justified by Slutsky's theorem, allowing for easier computation.* [01:12:22 URL](ruclips.net/video/C_W1adH-NVE/видео.html) *🧮 Slutsky's theorem is crucial for replacing parameters with their estimates, simplifying computations in statistical applications.* [01:15:14 URL](ruclips.net/video/C_W1adH-NVE/видео.html) *🎓 Understanding Slutsky's theorem enables the legitimate replacement of parameters with their estimates, maintaining convergence properties.* Made with HARPA AI
Am I the only one who felt hard to follow the course? 6:31 means basically, using properties of variance, we have X bar as X_bar, the n observations of x are i.i.d. Bernoulli, Bernoulli's variance = E[x^2] - E[x]^2 = 1^2*p - 0^2*(1-p) - (1^2*p - 0^2*(1-p))^2 = p - p^2 = p(1-p). Having this variance, we can derive the variance of X_bar, which is var(X_bar) = var([X1+...+Xn]/n), know properties of variance, we take n out, as var(X_bar) = 1/n^2 * var(X1+...+Xn), since i.i.d., these xi all have the same variance, therefore we have n*var(xi) divided by n^2, so the variance(X_bar) = var(Xi)/n, so to standardize it (in CTL), we divide it by the square root of var(Xi)/n, which is what he wrote on the board. So, it all roots back to Prof. Tsitsiklis' Probability course.
This lecture is basically broken by the fact that there is a continuation mismatch to 1. - references to concepts and theories are made that were never discussed in the previous lecture.
Nothing is broken - go to the URL link get the lecture slides , get the HW assignments and everything you need to understand what is going on is there. Phillipe Rigollet is excellent and discusses with ease fairly complicated material. Excellent job. My two complaints are that the sound volume is a way too low - whoever recorded the first lectures did lousy job. Also the slides are not properly exported to pdf - there are some missing math symbols in some of the equations
Ahh dont you love when the teacher says "I am going to pick up this guy.." and the camera just zooms in to the point you CANT SEE WHAT THE "GUY" IS!!! ahhhhhhh so good bravo bravo.
You can get the slides if you go to the link in the description (and read the missing part in this video). Excelent lecture,it took me 3 days to understand all the concepts exposed. At 55:23 is not equal to alpha, is equal to (1- alpha) since that is the way he defined the q (you can see that in the slides).
If one has not learned all the prerequisites, using 3 days to understand all concepts is very good. Learning the five or more prerequisites of linear algebra, calculus I and II, real analysis, and 18.600 Probability and random variable takes months or one year.@@phillustrator
Of Anyone, Massachusetts "Institute for Technology" should be able to level the volume so we can hear it.. Breaking edge technology, but makes movies with 1990's PBS vibes.. WOW Thanks for the videos though!
The first video was recorded in Fall 2017. The rest of the lectures were recorded in Fall 2016, but video of Lecture 1 was not available. Only me notice this?
this video and the rest of the series is from Fall 2016, but the first video was from Fall 2017 (mentioned in OCW @ ocw.mit.edu/courses/mathematics/18-650-statistics-for-applications-fall-2016/lecture-videos/lecture-1-introduction-to-statistics/) - hence the inconsistency in the content. As for CLT, you can refer to MIT 6.041 video lectures 19 and 20.
I wish they upload their covid stuff online.. But I think it is restricted by some US laws that state that educational online content released for the mass public should be accessible by all people and hence they have to take care of subtitling and also blurring faces of class students because of privacy laws. I wish it was simpler because all the best universities are teaching online and I hope the content becomes accessible to common folks like us.
The first lecture was not recorded. We are hoping to record it when the course is taught again in the Fall. In the meantime, please check out the course site here ocw.mit.edu/18-650F16 where you will find the slides that were used in all of the lectures.
Hi! thank u very much for the recordings ! First part of this lecture (lecture 2) is missing and thats why its really difficult to catch up with the rest of the lecture ! It would be really helpful if u guys could upload the newest version of this lecture
Even though there is now a lecture 1 recording from 2017, it doesn't cover the same material as the missing lecture 1 in 2016 had covered (the Central Limit Theorem, this quantity q=alpha/2), so anyone trying to learn from this will have to do a lot of their own legwork.
I find these videos quite interesting, however the disorder at the time of writing that many teachers deal with bothers me. Unclear and messy handwriting. It would be nice if they could improve on those aspects.
@@placeholder6811 eh, more like you know basic probability. Like Bertsekas Intro to probability, Probability and statistical inference by hogg tanis etc, or blitzstein intro to probability.
When he talks about Hoeffding's inequality, shouldn't the tails be larger compared to a standard normal? I think smaller tails would result in a tighter upper bound (i.e. more confidence that the sample mean is close to mu), which wouldn't make sense when n is small.
@7:00 ish.. If I am understanding correctly, limiting distribution is not always Gaussian. There might be cases when CLT isn't valid such as when n is small. Are there any other cases? And how were the bounds defined for kiss example? If someone can explain, I would appreciate it.
Another case is when variance of distribution is unbounded such as the Cauchy distribution. When n -> infinity, Cauchy is still Cauchy (not Gaussian). 18.600 discusses this.
Too bad this course is ruined by discontinuity... There seems to be a whole lecture missing, they should have added also the second lecture from 2017 at least. It's better to have some overlap than missing data.
Turning volume to maximum on desktop allows one to listen to this lecture without any problem. Using mobile may render this lecture inaudible even with maximum volume.
in my years of college, i never seen the teacher describe complicated equation so deeply, most of them dont even know the details just put in the power point and explained and the student sleep queitly.
I spent my years as a student. To teach something to someone, you firstly should motivate them by showing the point they will reach if they grasp taught. This is why firstly you should solve a lot of different real life example. In this course, students cant see the outlet of tunnel.
If everyone thought that way then we wouldn't have the mathematics we do today. Often the real world examples don't exist until dozens or hundreds of years later.
@@ryanjackson0x why did you write that? That is not true. Maths has developed by physical examples. In the past, scientists were interested in more than one science + maths. They tried to solve their own problems. Their motivation was the problem itself.
@@profetadosecxxi true. English isn't my first language but I easily get my point across and that makes sense. I never even thought about the grammar of my native language. Everything just fits right in and works normally and makes sense ever since I was 4 yr old. But with English it was a long journey till high school
English have strict grammar rules but it’s not essential when you grasp the language. He is French and that should motivates you. I was taught calculus by a Sri Lankan lecturer and his spoken English was bad but it didn’t deter me from learning the subject.
America gives one the opportunity to play her or his talent to the full in this great country no matter what accent she or he holds. You'll get used in a few weeks to the accent of any professor who MIT allows to teach.
Isn't CLT about the distribution of your estimated mean being normal (rather than calculating the mean from a very large sample)? i.e. calculate your mean 1000 times and those 1000 means will follow a normal distribution
Well if you look at the statement of the theorem it says that (X1+...+Xn)/n when properly standardized converges to the standard normal, so it is about the sample mean being normally distributed but only in the limit, that's why it's called the central *limit* theorem and that's why we need a large sample size. If you took a small sample a large number of times(like 1000) and calculated the mean each time, there's no reason to think that thus obtained means would be normally distributed(unless the distribution you're sampling from is normal). To take an extreme example assume that your sample size is just 1 and you take the mean 10000 times - the mean will equal the random variable itself each time, so the means will have the same distribution as the r.v you're sampling from. I might have misunderstood your question though.
Yes. The mean of n samples (A) follow normal distribution, same for the sum of those n samples (B). They are on different scales. Multiply mean of A by n, we get mean of B. Multiple SD of A by sqrt(n), we get SD of B. This follows from linearity of sum of mean/variance of iid distributions (with bounded mean and variance).
There wasn't any major changes between the two courses. Lecture 1 was missing from the 2016 recording of the course so only lecture 1 was recorded to patch that omission. See the course on MIT OpenCourseWare for the materials at: ocw.mit.edu/18-650F16. Best wishes on your studies!
@@mitocw thanks! understand, the problem is that there was a significant change in the transition from lecture 1 to 2 so it's not possible to understand this lecture without more context. The slides help, but yeah not great in terms of just watching the videos... thus the suggestion to perhaps publish the 2017 course instead (or additionally)? anyway, just what's seems possible as an outsider, assume that there's a reason to having patched it this way instead.
@11:11:38 What does he meant lambda comes from here? Why does it come from the variance of the limit distribution? After this, where does the next expression come from, the one he divides lambda by, why does he do this?
that's just standardizing the normal to be N(0,1). He does this to show that slutsky can be applied (together with lln) so that lambda can be replaced by lambda hat
Ah college hasn’t changed! He started with an interesting example, then went into technical speak. He is now relying on equations he knows but the students don’t. He should be giving real common sense examples then using the equations. I am teaching AP stats and he just seems to jump into complicated equations. If this is a first year stats course I feel sorry for the students. It seems teaching at the university level hasn’t improved in almost 40 years.
5 лет назад+1
True. I felt the same back in 2007 from my first statistic class to the second... I remember I was in doubt if I had skipped a class! And today I felt the same from the video 1 to video 2...
lol He said in the first video that you should know calculus, linear algebra and some probability. Dont be a puss, if you teach AP stats this should be easy.
This is not a first year course. As far as I heard from MIT staff, this is positioned as a graduate level course for non-math students ("non-math" in MIT's understanding still implies quite solid knowlede of math). It assumes the students are hands on calculus, probability theory, and had some exposure to linear algebra (all the way up to eigendecomposition) and multivariable calculus. This course requires quite some effort from the learners. It certainly has an "academic', rather than practical, flavor. Personally, i found it very interesting and useful (I completed the 18.6501x version on EDX).
If you are having trouble viewing this on RUclips, you can also get these videos from the Internet Archive and iTunes U: archive.org/details/MIT18.650F16 or itunes.apple.com/us/itunes-u/id1262009852.
Prerequisites Probability theory at the level of 18.440 Probability and Random Variables ( ocw.mit.edu/courses/18-440-probability-and-random-variables-spring-2014 ). Some linear algebra (matrices, vectors, eigenvalues). Best wishes on your studies!
After reviewing probability courses and this one itself many times, I found this lecture to be a great one. Some vague knowledge in real analysis will help. But I still need to come back later after digging deep into other sound footings for foundation. Thank MIT and professor.
This is very cryptic, especially without the first part.
so then watch the first part you spoiled bozo.
29:51 that’s why electron clouds are drawn as balloons with really solid surface areas. We can draw the surface of the solid that the electron is in 95% of the time (or whatever P is if it is not 95%) because of “convergence in probability”.
🎯 Key Takeaways for quick navigation:
00:00 📊 *The speaker discusses the central limit theorem and its application to estimating proportions using averages.*
02:19 🔄 *The central limit theorem implies that the distribution of square root of n times the average converges to a standard normal random variable.*
03:16 🔄 *The probability that a Gaussian random variable exceeds q alpha over 2 is equal to alpha.*
05:08 📉 *Confidence intervals are built by transforming the estimate in a way that results in a pivotal distribution, not dependent on unknown parameters.*
07:28 🧠 *The concept of pivotal distributions is introduced to create estimates that are asymptotically independent of unknown parameters.*
13:33 📉 *Hoeffding's inequality provides a useful tool for bounding the probability that the sample average deviates from its expectation for any n.*
16:47 📉 *Solving Hoeffding's inequality for a specific case shows a method for constructing confidence intervals without the need for large sample sizes.*
19:43 📈 *The Hoeffding inequality's worst-case scenario is discussed.*
20:12 🔄 *Combining intervals to determine the probability of a variable being outside a specified range.*
21:34 🤔 *Comparing the margin square root of log 2 over alpha divided by 2n to q alpha over 2/3n.*
22:55 🎲 *The role of assumptions in statistics, and the importance of balancing assumptions for confident statements.*
23:25 🔄 *Different types of convergence in statistics, emphasizing convergence in distribution as crucial.*
24:51 🔄 *Convergence in distribution explained through the concept of probability computations on random variables.*
26:17 🧐 *Stronger conditions needed for convergence in distribution to allow combining random variables effectively.*
27:42 🔄 *Almost sure convergence and convergence in probability explained, highlighting their differences.*
29:06 📉 *Convergence in Lp and its relation to the weakening of convergence conditions.*
31:27 🔄 *The importance of the characteristic function in proving convergence in distribution, especially in the central limit theorem.*
33:19 🔄 *Equivalence between convergence in almost sure, convergence in probability, and convergence in distribution.*
38:35 🔄 *The continuous mapping theorem: if Tn goes to T, then f(Tn) goes to f(T) for continuous functions f.*
39:32 📊 *Convergence in probability involves decreasing index values implying convergence, eventually leading to convergence in distribution.*
40:29 🔄 *Operations and limits differ between convergence almost surely and convergence in probability, allowing various manipulations for the former.*
41:55 📈 *Convergence in distribution criteria includes the convergence of characteristic functions or bounded continuous functions of the random variable.*
42:24 🔄 *Slutsky's theorem states that if one random variable converges in probability and another in distribution, certain operations are still valid.*
43:23 🚇 *Inter-arrival times in queuing theory, modeled by exponential distribution, are useful in systems with memoryless properties.*
45:16 📉 *Exponential distribution is often employed in modeling positive random variables, such as inter-arrival times.*
48:39 🔄 *The average of inter-arrival times in queuing theory, denoted by Tn bar, serves as an estimator for 1 over lambda, the rate parameter.*
50:32 🔄 *Strong and weak laws of large numbers support the convergence of Tn bar to 1 over lambda.*
52:49 📚 *The variance of the exponential distribution with parameter lambda is 1 over lambda squared, impacting the central limit theorem.*
56:53 📊 *Construction of a confidence interval for 1 over lambda involves manipulation of inequalities and dependency on Tn bar.*
[01:00:11 URL](ruclips.net/video/C_W1adH-NVE/видео.html) *📈 First-order Taylor expansion involves finding a theta bar between two values at which the expansion is performed, making them equal.*
[01:00:38 URL](ruclips.net/video/C_W1adH-NVE/видео.html) *📊 Multiplying by root n in Taylor expansion leads to root n Zn minus theta times g prime of theta bar.*
[01:01:33 URL](ruclips.net/video/C_W1adH-NVE/видео.html) *📉 Law of large numbers implies that theta bar converges to theta as Zn approaches theta.*
[01:02:29 URL](ruclips.net/video/C_W1adH-NVE/видео.html) *🍔 The "sandwich theorem" visualizes the convergence of theta bar to theta, considering the movement of Zn and theta.*
[01:03:55 URL](ruclips.net/video/C_W1adH-NVE/видео.html) *📜 Slutsky's theorem allows combining the convergence in distribution of Xn and convergence in probability of Yn if the limit of Yn is a constant.*
[01:05:48 URL](ruclips.net/video/C_W1adH-NVE/видео.html) *🔄 Slutsky's theorem facilitates combining sequences of random variables converging in distribution and probability under specific conditions.*
[01:07:16 URL](ruclips.net/video/C_W1adH-NVE/видео.html) *🔍 Delta method extends the central limit theorem to functions of averages, considering the derivative of the function.*
[01:08:11 URL](ruclips.net/video/C_W1adH-NVE/видео.html) *📊 Application of the Delta method to the function g(x) = 1/x results in a confidence interval for lambda.*
[01:10:59 URL](ruclips.net/video/C_W1adH-NVE/видео.html) *🔧 Replacing lambda by lambda hat in the confidence interval is justified by Slutsky's theorem, allowing for easier computation.*
[01:12:22 URL](ruclips.net/video/C_W1adH-NVE/видео.html) *🧮 Slutsky's theorem is crucial for replacing parameters with their estimates, simplifying computations in statistical applications.*
[01:15:14 URL](ruclips.net/video/C_W1adH-NVE/видео.html) *🎓 Understanding Slutsky's theorem enables the legitimate replacement of parameters with their estimates, maintaining convergence properties.*
Made with HARPA AI
Am I the only one who felt hard to follow the course? 6:31 means basically, using properties of variance, we have X bar as X_bar, the n observations of x are i.i.d. Bernoulli, Bernoulli's variance = E[x^2] - E[x]^2 = 1^2*p - 0^2*(1-p) - (1^2*p - 0^2*(1-p))^2 = p - p^2 = p(1-p). Having this variance, we can derive the variance of X_bar, which is var(X_bar) = var([X1+...+Xn]/n), know properties of variance, we take n out, as var(X_bar) = 1/n^2 * var(X1+...+Xn), since i.i.d., these xi all have the same variance, therefore we have n*var(xi) divided by n^2, so the variance(X_bar) = var(Xi)/n, so to standardize it (in CTL), we divide it by the square root of var(Xi)/n, which is what he wrote on the board. So, it all roots back to Prof. Tsitsiklis' Probability course.
i'm a fan, professor! the breakdowns. the pacing. the reading of the room. noice.
This lecture is basically broken by the fact that there is a continuation mismatch to 1. - references to concepts and theories are made that were never discussed in the previous lecture.
Nothing is broken - go to the URL link get the lecture slides , get the HW assignments and everything you need to understand what is going on is there. Phillipe Rigollet is excellent and discusses with ease fairly complicated material. Excellent job. My two complaints are that the sound volume is a way too low - whoever recorded the first lectures did lousy job. Also the slides are not properly exported to pdf - there are some missing math symbols in some of the equations
Ahh dont you love when the teacher says "I am going to pick up this guy.." and the camera just zooms in to the point you CANT SEE WHAT THE "GUY" IS!!! ahhhhhhh so good bravo bravo.
was not expecting that scooter after watching part 1
Why is sound volume so low?
for anyone watching the course now, you can install the volume boost extension for chrome to increase the sound volume
My dear friend he was lying in first lecture. Number can incorporate in my many different combinations. Which is definitely human consciousness.
You can get the slides if you go to the link in the description (and read the missing part in this video).
Excelent lecture,it took me 3 days to understand all the concepts exposed.
At 55:23 is not equal to alpha, is equal to (1- alpha) since that is the way he defined the q (you can see that in the slides).
Notice the inequality sign being reversed
Is it really excellent if it took you 3 days to understand?
If one has not learned all the prerequisites, using 3 days to understand all concepts is very good. Learning the five or more prerequisites of linear algebra, calculus I and II, real analysis, and 18.600 Probability and random variable takes months or one year.@@phillustrator
Of Anyone, Massachusetts "Institute for Technology" should be able to level the volume so we can hear it.. Breaking edge technology, but makes movies with 1990's PBS vibes.. WOW
Thanks for the videos though!
The first video was recorded in Fall 2017. The rest of the lectures were recorded in Fall 2016, but video of Lecture 1 was not available. Only me notice this?
I guess you are right, I watched video 1 but I feel difficult to follow this (cont.) video
how did it go from intro 1 to this? When did he tal about central klimit theorem etc? is he solving exercises? wtf
When taking this class you are supossed to have taken a probability course first
@Irving Ceron He meant the probability course provided by MIT (6.041). ruclips.net/video/j9WZyLZCBzs/видео.html
this video and the rest of the series is from Fall 2016, but the first video was from Fall 2017 (mentioned in OCW @ ocw.mit.edu/courses/mathematics/18-650-statistics-for-applications-fall-2016/lecture-videos/lecture-1-introduction-to-statistics/) - hence the inconsistency in the content. As for CLT, you can refer to MIT 6.041 video lectures 19 and 20.
The discontinuity from the first video adds quite some confusion.
better to watch the Khan Academy statistics playlist before MIT18.650
good advice
A new intro stats series should be recorded either online (COVID baby!) or in person
I wish they upload their covid stuff online.. But I think it is restricted by some US laws that state that educational online content released for the mass public should be accessible by all people and hence they have to take care of subtitling and also blurring faces of class students because of privacy laws.
I wish it was simpler because all the best universities are teaching online and I hope the content becomes accessible to common folks like us.
Oh no! Part 1 is missing! :(
The sound level is too low, maybe re-upload with increased volume?
The first lecture was not recorded. We are hoping to record it when the course is taught again in the Fall. In the meantime, please check out the course site here ocw.mit.edu/18-650F16 where you will find the slides that were used in all of the lectures.
Hi! thank u very much for the recordings !
First part of this lecture (lecture 2) is missing and thats why its really difficult to catch up with the rest of the lecture ! It would be really helpful if u guys could upload the newest version of this lecture
Even though there is now a lecture 1 recording from 2017, it doesn't cover the same material as the missing lecture 1 in 2016 had covered (the Central Limit Theorem, this quantity q=alpha/2), so anyone trying to learn from this will have to do a lot of their own legwork.
I find these videos quite interesting, however the disorder at the time of writing that many teachers deal with bothers me. Unclear and messy handwriting. It would be nice if they could improve on those aspects.
I really liked the lecture. Thanks MIT!
Is it supposed to be undergraduate level stat? I absolutely understood nothing.
This is undergrad yes :) It assumes you know some real analysis which many undergrads know as well!
@@placeholder6811 eh, more like you know basic probability. Like Bertsekas Intro to probability, Probability and statistical inference by hogg tanis etc, or blitzstein intro to probability.
why did he ride a scooter during lecture?
When he talks about Hoeffding's inequality, shouldn't the tails be larger compared to a standard normal? I think smaller tails would result in a tighter upper bound (i.e. more confidence that the sample mean is close to mu), which wouldn't make sense when n is small.
I think you are right
I can barely hear the audio. Please remake this video next time.
I don't understand
You'd think MIT would figure out the equation for chalkboard erasers. I guess that's more of a Harvard specialization.
@7:00 ish.. If I am understanding correctly, limiting distribution is not always Gaussian. There might be cases when CLT isn't valid such as when n is small. Are there any other cases? And how were the bounds defined for kiss example? If someone can explain, I would appreciate it.
Another case is when variance of distribution is unbounded such as the Cauchy distribution. When n -> infinity, Cauchy is still Cauchy (not Gaussian). 18.600 discusses this.
Too bad this course is ruined by discontinuity... There seems to be a whole lecture missing, they should have added also the second lecture from 2017 at least. It's better to have some overlap than missing data.
Pahtetic job of the sound recording ! Can't hear anything even with an external speaker
Is this basic statics??
I am worried that i don't understand....
Can i have the stellar ink wheee the notes and assignment are loaded
See the course materials on MIT OpenCourseWare at: ocw.mit.edu/18-650F16. Best wishes on your studies!
Thanks for the video professor
maybe use a microphone next time
😂😂
Me grabbing a large speaker so I can get a distorted voice that is actually within decibel levels humans can perceive
Turning volume to maximum on desktop allows one to listen to this lecture without any problem. Using mobile may render this lecture inaudible even with maximum volume.
At 0:22; you need a better duster
in my years of college, i never seen the teacher describe complicated equation so deeply, most of them dont even know the details just put in the power point and explained and the student sleep queitly.
No part uno?
Will the first part of this lecture be added as a separate video or will this video be replaced?
amazing lecture. Thank you very much.
I spent my years as a student. To teach something to someone, you firstly should motivate them by showing the point they will reach if they grasp taught. This is why firstly you should solve a lot of different real life example. In this course, students cant see the outlet of tunnel.
If everyone thought that way then we wouldn't have the mathematics we do today. Often the real world examples don't exist until dozens or hundreds of years later.
@@ryanjackson0x why did you write that? That is not true. Maths has developed by physical examples. In the past, scientists were interested in more than one science + maths. They tried to solve their own problems. Their motivation was the problem itself.
It's very kind of you
Here I am trying to learn English, with a french accent teacher who teach statistics. What is the possibility I am be successful?
Not a lot, considering your grammar. But you can try :)
@@georgeivanchyk9376 in my native language I learned to speak before to learn grammar
@@profetadosecxxi true. English isn't my first language but I easily get my point across and that makes sense. I never even thought about the grammar of my native language. Everything just fits right in and works normally and makes sense ever since I was 4 yr old. But with English it was a long journey till high school
English have strict grammar rules but it’s not essential when you grasp the language. He is French and that should motivates you. I was taught calculus by a Sri Lankan lecturer and his spoken English was bad but it didn’t deter me from learning the subject.
America gives one the opportunity to play her or his talent to the full in this great country no matter what accent she or he holds. You'll get used in a few weeks to the accent of any professor who MIT allows to teach.
What's with the sound, guys??!?
Slide on 07:43 seems not correct. If we are targeting confidence interval of 1-alpha, shouldn't it be q_(alpha/2) and not q_alpha?
Absolutely well done and definitely keep it up!!! 👍👍👍👍👍
the quality is so low
Great lecture! I have enjoyed it; thanks.
Isn't CLT about the distribution of your estimated mean being normal (rather than calculating the mean from a very large sample)? i.e. calculate your mean 1000 times and those 1000 means will follow a normal distribution
Well if you look at the statement of the theorem it says that (X1+...+Xn)/n when properly standardized converges to the standard normal, so it is about the sample mean being normally distributed but only in the limit, that's why it's called the central *limit* theorem and that's why we need a large sample size. If you took a small sample a large number of times(like 1000) and calculated the mean each time, there's no reason to think that thus obtained means would be normally distributed(unless the distribution you're sampling from is normal). To take an extreme example assume that your sample size is just 1 and you take the mean 10000 times - the mean will equal the random variable itself each time, so the means will have the same distribution as the r.v you're sampling from. I might have misunderstood your question though.
Yes. The mean of n samples (A) follow normal distribution, same for the sum of those n samples (B). They are on different scales. Multiply mean of A by n, we get mean of B. Multiple SD of A by sqrt(n), we get SD of B. This follows from linearity of sum of mean/variance of iid distributions (with bounded mean and variance).
Why is he using a bicycle?
Confusing
Someone get that gent a chair and a overhead projector.
because they lied....
They lied to me and let me run up then energy exhaustion
Why can't we get the complete recordings of 2017 so it's continuous? :(
There wasn't any major changes between the two courses. Lecture 1 was missing from the 2016 recording of the course so only lecture 1 was recorded to patch that omission. See the course on MIT OpenCourseWare for the materials at: ocw.mit.edu/18-650F16. Best wishes on your studies!
@@mitocw thanks! understand, the problem is that there was a significant change in the transition from lecture 1 to 2 so it's not possible to understand this lecture without more context. The slides help, but yeah not great in terms of just watching the videos... thus the suggestion to perhaps publish the 2017 course instead (or additionally)? anyway, just what's seems possible as an outsider, assume that there's a reason to having patched it this way instead.
why is the professor on a scooter?
Injury
@11:11:38 What does he meant lambda comes from here? Why does it come from the variance of the limit distribution? After this, where does the next expression come from, the one he divides lambda by, why does he do this?
that's just standardizing the normal to be N(0,1). He does this to show that slutsky can be applied (together with lln) so that lambda can be replaced by lambda hat
nice trike bro
Where did q and alpha come from?
Can anyone explain what exactlt is alpha
He is probably the worst professor I ever saw to be completely honest.
applied mathematics
I just remembered 18650 is a battery model 😂
Poor Recording.. can't hear anything
Ah college hasn’t changed! He started with an interesting example, then went into technical speak. He is now relying on equations he knows but the students don’t. He should be giving real common sense examples then using the equations. I am teaching AP stats and he just seems to jump into complicated equations. If this is a first year stats course I feel sorry for the students. It seems teaching at the university level hasn’t improved in almost 40 years.
True. I felt the same back in 2007 from my first statistic class to the second... I remember I was in doubt if I had skipped a class! And today I felt the same from the video 1 to video 2...
lol He said in the first video that you should know calculus, linear algebra and some probability. Dont be a puss, if you teach AP stats this should be easy.
This is not a first year course. As far as I heard from MIT staff, this is positioned as a graduate level course for non-math students ("non-math" in MIT's understanding still implies quite solid knowlede of math). It assumes the students are hands on calculus, probability theory, and had some exposure to linear algebra (all the way up to eigendecomposition) and multivariable calculus.
This course requires quite some effort from the learners. It certainly has an "academic', rather than practical, flavor. Personally, i found it very interesting and useful (I completed the 18.6501x version on EDX).
I agree
Aw come on, the sound is bad!!
anyone wrote the topics he had cover?
Topics, lecture slides, and assignments are available on MIT OpenCourseWare at: ocw.mit.edu/18-650F16. Best wishes on your studies!
why i can't watch it?
If you are having trouble viewing this on RUclips, you can also get these videos from the Internet Archive and iTunes U: archive.org/details/MIT18.650F16 or itunes.apple.com/us/itunes-u/id1262009852.
Thanks Mit
Text book...........Please...!!!!!
There was no required text for this course.
They had a book called all of statistics by Wasserman that was recommended in the first video
Teaching is just a paycheck to this guy.
1:01:08
these statistics do not suit for psychology
Introduction to brain damages.
Test
How is this an intro?
also, why is the teacher looking at students as if they offended his dead grandfather's ashes?
WTF is this? i'm sorry, but this is not an intro class at all
Pretentious. MIT you can do better!
ungrateful fellow
Totally useless......this shows that not everything that comes from MIT is useful
Come from 6.041 and this course seems too difficult for me..🥲
Prerequisites
Probability theory at the level of 18.440 Probability and Random Variables ( ocw.mit.edu/courses/18-440-probability-and-random-variables-spring-2014 ). Some linear algebra (matrices, vectors, eigenvalues). Best wishes on your studies!