Lecture 02 - Is Learning Feasible?

caltech

Просмотров 483 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 10 янв 2025

Комментарии • 230

@iAmTheSquidThing 5 лет назад ⁺²⁰⁷
Learning is entirely feasible when this guy is your teacher.
@y-revar 11 месяцев назад
Absolutely (and there are > 1 color pair at our disposal to disambiguate two different concepts)
@michaelnguyen8120 6 лет назад ⁺⁹³
Does anyone else find this guy absolutely hilarious for some reason? Something about the look on his face makes you feel like he's constantly thinking "Yeah, I'm killing this lecture right now". When he whips out that smug half smile I can't help but laugh out loud. You can tell he loves teaching. Great set of lectures.
@-long- 5 лет назад ⁺⁴
I agree, I saw wisdom in his face. #respect
@supriyamanna715 3 года назад ⁺⁶
actually the updating process is there in his mind, and always expressed in his face
@kora5 8 лет назад ⁺¹⁰²
Such a marvelous lecture!
The logical sequence, the explanation, the joke, the intuitutive powerpoint/animation, ...
I wish all my lectures are like this
@tomvonheill 8 лет назад ⁺⁴
Agreed, this guys is SO good, great job sprinkling in jokes to keep everyone's attention
@plekkchand 5 лет назад
where was the joke?
@d13tr 6 лет назад ⁺³⁵
If this video is confusing to you, consider the following:
The example at 9:34 is only to show that we know something about the entire set, based on a sample. Basically, it says the bigger the sample, the closer v relates to u (Hoeffding).
At 28:00, forget the above example. We are not trying to make a hypothesis for that example. The new values have nothing to do with above example. From this point on, we have a random data set (X) with an unknown function f. We want to know if we could make a hypothesis h to predict results. In other words: is learning feasible?
So in the new bin, the probability of how many points are green is a measurement for how correct a hypothesis is. We do not know how many are green, so we take a sample. In this sample, we get the relation between correct and incorrect results of the hypothesis and this says something about the entire bin (Hoeffding). So if the sample is sufficiently big and has a lot of positive predictions than yes, learning is feasible.
Or not? -> 33:30
okay? okay.
@AvielLivay 3 года назад
Yea, it's so misleading. The same marbles can plays two different roles. First it's used for measuring the probability of getting green and on the second time it's used for checking if a hypothesis h is correct (h(x)=fx(x)) or wrong (h(x)!=f(x)). He's a good professor but he's confusing.
@radicalengineer2331 3 года назад
can you please tell me how exactly mue is defined, picking means, I purchase both balls and defining the % how much of which 1 we got there in the bin after mixing or else, how we're defining picking red ball because ball are being picked in sample, like we pick 9 balls at a time and then define the probability of the same as "nue", but how that's being incorporated with the "mue"
@jorgetimes2 9 месяцев назад
Well, this was for a long time the most baffling point of the entire lecture for me. However, when complemented with the book, it suddenly hit me: although we cannot explicitly compute f(x) when comparing it to g(x), since f is unknown, hence the colors themeselves are completely unknown to us, what we CAN do is to view randomly picked x's as samples from probability distribution P, where the x is red with probability mu and green with probability 1 - mu. That's it.
@sagarrathi1 5 лет назад ⁺⁵
How much does this person knows, if he calls such a big concept just as simple tool. Respect.
@faruksn 11 лет назад ⁺²¹⁵
19:00
Now, between you and me, I prefer the original formula better. Without the 2.
However, the formula with the 2s has the distinct advantage of being … true. So we have to settle for that.
Best quote ever on the Hoeffding's Inequality. :)
@sharmilavelamur8342 9 лет назад ⁺²³
Professor, your lectures are so enjoyable that I look forward to "learning" :). Thank you!
@NaveenKumar-nd5ts 9 лет назад ⁺⁷⁹
Brilliant lecture. A different approach than that of Andrew ng's. Loved it !!
@beastfromeast-w2d 4 года назад ⁺⁹
It's way better than Andrew ng. I am a math major.
@zenicv Год назад ⁺¹
Abu-Mostafa is a real genius in explaining complex things in simple terms...The real game changer is Hoefflding's inequality because it allows us to model the learning problem in a way that bounds the uncertainty independent of unknown parameter(mu). The only thing that remains is a tradeoff between error tolerance (epsilon) and sample size(N) and is captured by the relation
P(|nu - mu| > epsilon)
@thekolbaska 11 лет назад ⁺⁶³
The Coursera ML course assumes you're an idiot to start with, teaches you little and then proclaims you an "expert". This course assumes substantial background, teaches you things in-depth, and at the end is still humble about how much knowledge is gave you. Andrew Ng's RUclips lectures recorded at Stanford are quite good though.
@s9chroma210 4 года назад ⁺⁴
Very well put, I was really frustrated with the coursera videos when I found this series and the experience has been much better.
@supriyamanna715 3 года назад
@@s9chroma210 how're you mowadays??
@s9chroma210 3 года назад
@@supriyamanna715 Im doing quite good. Did this and a few other courses which really helped!
@FsimulatorX 2 года назад
@@s9chroma210 which coursera course are you talking about that’s ‘drumming things down’?
@UserUser-pv2wo 9 лет назад ⁺⁵
Thanks to pofessor and Caltech, sending me back to my youth! I recall myself, excited by outstanding lecturers, I met then.... His speach and passion perfectly colourizes the topic and, I beleive, is of great help to fogeign students to understand.
@supriyamanna715 3 года назад ⁺²
56:41 ooo, that's the hypothesis!! Nothing I need more than that. Thanks Professor!!!
@pavel4616 Год назад ⁺¹
I am not completely undrestood analogy between learning and 1000 coins. At 43:08 we try different hypothesis (because bins are different) and try to select the best according the sample. At 48:52 we have the same hypothesis (because bins are the same) and try different samples.
@ProdTheRs 4 года назад ⁺³
This lecture is perfect! I recommend complementing it with Andrew Ng lecture 9 on learning theory from his youtube machine learning course. This prof. Is VERY good at conveying the intuition behind, while Ng will go more deeply into the Maths. They both complement each other in a perfect way.
@FsimulatorX 2 года назад ⁺¹
This is such a nice complementary course to Andrew Ng’s videos. I would seriously consider paying for quality lectures like these. Eternally grateful to Caltech for providing this one free of charge!
@pinocolizziintl 11 лет назад ⁺⁷
It would be great to see the Professor in some courses on Coursera. He is one of the best one I've ever heard. Thanks!
@ProdTheRs 4 года назад ⁺¹
The coin analogy for multiple bins was SO SO SO SO GOOD.
@MatthewRalston89 5 лет назад
Dr. Mostafa, thank you for the internesting lectures. I watched these while running on the treadmill for many days until it makes sense. I am happy that you let the world watch your great work and amazing lecture style. More students would love math if they had you for a teacher. Thank you!
@FsimulatorX 2 года назад
Your comment about running on a treadmill for many days while trying to understand this lecture made me laugh 😂
@丛亮 6 лет назад ⁺⁶
Q&A session is great, which explains a lot questions in my mind. Especially for me without looking at other materials
@Rafaelkenjinagao 2 года назад
Props for the method of explaining the Hoeffding's Inequality. I find that going on about each element of the equation separately facilitated a lot to understanding it. Congratulations!
@sreeragm8366 5 лет назад ⁺²
In a world of bestest MOCCs this playlist stands apart.
@philipralph 8 лет назад ⁺⁷
@45:24: I got 5 heads!!! Actually a total of 7 consecutive heads before my first tails. You always think it happens to someone else...
@rain531 2 года назад
Day 2 done. Amazing lecture. Thank You Professor Yaser and Caltech for making them open to public.
@Nbecom 11 лет назад ⁺¹⁰
The professor here is making a subtle but very important point. He is saying that given a set of sample some or the other hypothesis **must** agree with the data. And the more hypotheses there are the more likely it gets that one of them will agree with the data (Capital M in his lecture). This is a guaranteed fact (that some hypothesis will agree with the sample), we want to make sure that the probability of accidental agreement is is small. Recall professor's coin toss example.
@Satelliteua 4 года назад
What do you mean by "set of samples"? I thought when we're talking about multiple bins, we're talking about the same sample but different hypothesis applied to it.
@Omar-kw5ui 4 года назад ⁺¹
@@Satelliteua We are not talking about the same sample from each bin.. The way I understand the problem is this way: We have 1 bin containing all the possible data points (marbles). For each h (a possible hypothesis), we extract a random sample from the bin. Now, each h actually changes mu in the bin (since each h changes the colors of the marbles based on its conformity to f) - this is why the prof represents the problems as multiple bins. Now what happens when we pull out random samples from each bin? It might be the case that all the samples pulled out are correctly classified by h. Does that then mean that these data points were correctly classified because h tends to f? Well not necessarily. There are two things at play here. Let's first assume we are dealing with a single h (a single bin). From this bin we pull out a sample of data and check its classification accuracy. If we keep pulling out samples, then we might get a sample where h gets it all correct. So in this case, it was just luck that h got it all correct. Now though, we are dealing with multiple h (bins), and from each bin we are pulling out different samples. So now, like the coin example, we are actually likely to find the case (the h) where we pull out all heads, even though h is not close to f. This is why as M grows, we are likely to find a wrong model. This is what I understood. Sorry if my ideas are all over the place, its a difficult concept to put into words.
@roelofvuurboom5431 3 года назад
@@Omar-kw5ui Actually this is quite correct!
@anilgodbole1526 Месяц назад
@@Omar-kw5ui i started watching lectures recently. Your argument is correct. But if in real life i am doing some machine learning with some hypothesis & i find that all samples are 'green' (agree with function's Y points) then should i still abandon this particular hypothesis saying its just pure luck & keep on searching for other hypothesis where most likely i will probably never come close? I guess he is saying that by pointing out the coin-toss person who got all 5 heads. But then which hypothesis should i settle on as my 'g' ?
@jyotsnamasand2414 2 года назад
Superb teacher! His lectures are so clear and intuitive that they make 'learning' delightful.
@minhtamnguyen4842 5 лет назад ⁺¹
Love both his voice and jokes. A brilliant professor
@LessTrustMoreTruth 12 лет назад
Thumbs up to Professor Abu-Mostafa. Fantastic professor, fantastic sense of humor.
@neilbryanclosa462 7 лет назад ⁺⁵
This is an amazing lecture. Looking forward for watching the next lecture videos.
@Shahada2012 9 лет назад ⁺¹
The best way to have a grasp of all these lectures is to do all the homeworks and projects. If you are a "learning machine", then start to read conference papers on learning machine....so you will be ready for research.
@toori2l5l 8 лет назад ⁺¹
Where can I find conference paper?
@PotadoTomado 9 лет назад ⁺⁴
Prof. Abu-Mostafa is the man! Super cool guy.
@msharee9 11 лет назад ⁺¹
I cannot thank you much for this lecture. You make machine learning math a piece of cake.
@manjuhhh 11 лет назад ⁺⁴
Thank you Caltech and Prof
@rajkumarsaini7553 2 года назад
What does it mean to have stringent tolerance (time in the video: around 57:57). Basically, what if the inequality gives 2.
@emrefisne9743 Год назад ⁺²
he is literally manifestation of "how to teach"
@ningli335 7 лет назад
The professor teaches so good. Glad they made the video.
@NhatTanDuong 2 года назад
Professor Yaser Abu-Mostafa is amazing!
@dissonantiacognitiva7438 9 лет назад ⁺⁴⁴
He sounds just like King Julian, the king of the lemurs, I keep on hoping he starts singing "I like to move it move it"
@bluekeybo 6 лет назад ⁺²
I've been laughing for 10 minutes straight
@thangbom4742 6 лет назад
excellent lecture. It looks like that the inequation in the final verdict is so loose that P[|E_in(g) - E_out(g)| > eps]
@RogerDavidi4u 4 года назад
I was waiting for the Indian girl to ask questions here as well. will expect her again in the future videos as well
@pavel4616 Год назад
32:20 Am I understanding correctly that we can ignore the probability of the data because it is somehow hidden in mu and nu?
@ahmedelsayed121 6 лет назад ⁺²
I did not see that mentioned anywhere, Dr. Yasser has a book describing the course content in more details called "Learning From Data".
@roelofvuurboom5431 3 года назад
The book provides genuine additional insight to the lectures and vice verse.
@xhulioisufi2979 2 года назад
Thank you sir for the recordings. Now I hope I can pass my course in "Statistical Foundation of Machine Learning". :)
@shakesbeer00 6 лет назад
Thanks for the excellent lecture. Here are a couple of questions:
At 42:22 Just want to be more rigorous, would the P notation in this Hoeffding inequality depend on both X and y?
At 50:02 How exactly is g defined here in order to have this inequality hold? The inequality seems to require that g minimize the | Ein - Eout|? But that is not intuitive. Instead, it is more intuitive to have g that minimizes Ein (or eventually Eout) based on the definition of Ein and Eout earlier.
@shakesbeer00 6 лет назад
For my second question, I see it now because it is a less than or equal sign there, in stead of the equal sign. That inequality always holds since g is one of h in H. Thus whatever criterion for defining g is fine as long as g is one of the hs.
@roelofvuurboom5431 3 года назад
P is a selection probability that is assigned to X i.e. it defines the probability of selecting certain x's. It has nothing to do with y (or Y).
@loganphillips1674 6 лет назад ⁺²
28:47
How do we compare the hypothesis to the target function if we never know what the target function is?
@hyp5094 5 лет назад
That's what I'll like to know. Please reply if you have figured it. Thanks.
@prathameshmandke5966 5 лет назад ⁺⁵
True, we do not know the target function itself, but what we do know is it's value at certain points which are part of the dataset that we use to train our hypothesis. I believe the prof. refers to comparing the two functions at those points.
@roelofvuurboom5431 3 года назад
Don't equate "we don't know what the target function is" to "we don't know anything about the target function" as someone below already stated we do know something about the target function namely what its values are at those points that we sampled. So the more we sample the more we get to know about the target function. The probability comes in to allow us to state a certain probability that the pattern in the bin will be "sufficiently similar" (this is the epsilon definition) to the sample chosen as long as the sample is large enough.
@FsimulatorX 2 года назад
From the data we already have.
@WhyMe432532 11 лет назад
Thanks for the excellent lecture. Really enjoying them. Very well explained.
@sarnathk1946 7 лет назад
Awesome lecture.. The 10 times coin flip experiment was surprising and it is pretty interesting. Few observations:
1) By choosing a sufficiently big M, then I can get inequality like P[Bad Event] < 2 : which actually is a no-brainer
2) An absolute value of "bound" is meaningless unless I know the "mu" , the unknown quantity. But that said, practically, we can overcome this by some educated guesses and possibly central limit theorem too.
3) I am surprised there is no talk of Central limit theorem... I was expecting that something will be proved based on it...Possibly Hoeffding has relation to it....
@jonsnow9246 7 лет назад ⁺¹
What was the conclusion of the lecture? I mean how did we prove that learning is feasible?
@samchien474 11 лет назад
cont. the formula with M on the RHS will be used, which is very conservative. Unfortunately, most hypothesis space is not finite, i.e., you have infinite number of hypothesis in it, so you can't use M to measure the model complexity. So in that case, we will resort to something called VC dimension and derive generalization bounds based on that.
@southshofosho 9 лет назад ⁺²
He fields questions like a total G, I love this course
@dank8981 11 лет назад ⁺¹⁷
I like how he says okaaaaaay !
@YousefHamza 10 лет назад ⁺¹⁸
There's a 90 million Egyptian pronouncing it exactly like that XD
@dtung2008 7 лет назад
Didn't state Hoeffding's inequality correctly. The value of nu must be bounded with a range 1 with the formula (at 20:30).
@samchien474 11 лет назад
I think the original Hoeffding's equality applies when the hypothesis is specified BEFORE you see the data (e.g. a crazy hypothesis like if it is raining then approves the credit card otherwise not). However, in reality, we will learn a specific hypothesis using the data (e.g. using least squares to learn the regression coefficients), in that case, the learned hypothesis is g and you can considered it as chosen from a hypothesis space (H). If the hypothesis space is finite of size M, then
@pavel4616 Год назад
The professor's book also says that there is a problem because we use the same data points for all hypotheses instead of generating new data for each hypothesis. So it breaks the assumption of independence of data generation. Why wasn't it mentioned in the lecture?
@webbertiger 12 лет назад
I like the explanation that a complex g is too into the historical data and possible to get bigger Eout. I'm wondering whether most financial models in 2009 are like that and not many people could understand it, so few economists realized the crash was coming until it's too late.
@nayanvats3424 4 года назад ⁺¹
How did we sum up the RHS in the Hoeffding's inequality. I mean each of the hypothesis will have a different bound(epsilon) and hence a different exponential term. So, how do they sum up to be substituted by a "M" times the exponential . Also if we keep the bound same for each inequality wont the "N" no of samples change. How is the exponential consistent across all the hypothesis. Am I missing something?
@roelofvuurboom5431 3 года назад
Hmm...a lot of questions here. You start off by by defining what you find to be the maximum "acceptable" deviation of your selected hypothesis can be. This acceptable value is epsilon. The deviation is between the in-sample error and out-sample error. You cannot guarantee this but you can ensure that the chance that it will exceed this deviation is smaller than a certain probability. This is why the whole probability thing is brought in to the discussion. Now g is just one of the h's in the hypothesis set. So if the in- and out-samples of g deviates by more than epsilon this implies that (at least) one of the h's deviates by more than epsilon so we can say that it must be the case that h1 deviates by more than epsilon or h2 deviates by more than epsilon and so on. The probabilityh of deviation between in and outsampling is independent of the number of red and green balls i.e. it is independent of any h that is why each h has the same bound.
@olugbemieric1757 10 лет назад
thanks a lot for sharing. This will surely be of help to me in my m.sc
@aditijuneja1848 2 года назад
at 36:26 min shouldn't it be epsilon instead of nu?
@avirtser 10 лет назад
Thank you very much - a brilliant work
@lolilops54 11 лет назад ⁺¹
I'm getting confused towards the end. Once you have g, what use is it to compare it to every h in H? Surely, g is the best h in H, that's how it became g. Also, why does he add M to the inequality at the very end? Doesn't that just increase the value, the bigger H is? So with a large H and a subsequent large M, won't the comparison be totally redundant? I think he made that point at the end, but I don't see why he added it in, in the first place.
@bernardvantonder1086 7 лет назад ⁺²
13:00 Isn't this 'problem' just Hume's original problem of induction?
@roelofvuurboom5431 3 года назад
It is indeed!
@pavansughosh 12 лет назад
You know output for In-sample cases(training set).. So if output matches, hypothesis for that sample is green..(Still target function remains unknown)
@pinocolizziintl 11 лет назад
Thanks for pointing it out! I don't know how could I miss looking for a ML course on Coursera. The problem will be the overlap with the Criptography one from Dan Boneh
@YourStoryTeam 3 месяца назад
The more Hulft runs, the clearer we see, That more data brings us closer to certainty!
@jonsnow9246 7 лет назад ⁺¹⁰
What was the conclusion of the lecture? I mean how did we prove that learning is feasible?
@desitravellers2023 6 лет назад ⁺⁴²
The main objective of learning, as laid out in the lecture, is finding a hypothesis that behaves similarly for the training data(in sample) and test data(out sample). No matter the performance of the hypothesis on the sample, if we can prove that the hypothesis is performing approximately same for in sample and out sample than we have essentially proved that learning is feasible i.e generalizing beyond in samples is possible. The final modification to Hoeffding's formula states that with reasonable choice of M,epsilon and N, the probability of in sample performance deviating from out sample performance can indeed be bound to an acceptable limit thus proving learning is feasible. The fact that M is infinite in all the models we generally come across and still able to learn is proved in theory of generalization lecture. Thanks.
@roelofvuurboom5431 3 года назад
Is learning feasible means here: can we - based on observations of our insample data - make statements on our outsample data or in other words, can we generalize observations on our selected sample to the entire population (in the bin)?
@veronicanwabufo5905 3 года назад
It's been an excellent lecture so far. Though I am not very clear at what script H and each h mean.
@don2186 11 лет назад
I am doing the course on ML in Coursera, It is a very good start for someone who jwants to get started and know what machine learning is all about and do some exercises to get a feel for it. That said, simple derivations in calculus which you should know from high school are skipped and just the final formula is given which is a little disappointing. I don't see how anyone can do machine learning without knowing basic calculus. Too emphasis is placed on being nice.
@theodoregalanos9272 8 лет назад ⁺¹
Hello and thank you for the wonderful lectures!
I'm new in this field and I am trying to combine it with Computational Geometry. As such, my problems are unique in the sense that the training sets can (usually) be constructed at the will of the modeller. The data are always (potentially) there, it is a matter of choosing which to produce. I was wondering if there is a theoretical or practical approach to an optimized selection of training samples from the whole? Does that relate to assigning a specifc (hopefully optimum in some way)l P(x), e.g. uniform distribution which takes samples uniformly from the whole? Or is it that random selection is still good enough in this case?
Thank you in advance.
Theodore.
@RD-lf3pt 8 лет назад ⁺⁴
Umm... I'm probably missing sth ;) What formula is he using to get to 63% probabilities? If each coin gets 10 straight heads once each 1,024 times (say we run it infinite times... Then the proportion should be 1 over 2 to the N, right? So 1 over 2 to the ten, so once each 1024 times roughly.) So because the probability of each coin is independent, doesn't it mean that the probability should be almost 100%? (1000/1024)
Ah, Ok... So even if you had 100 trillion flips each with 99.9% chances of being heads, you still have 0.01x0.01... (100 trillion times) of chances to get all tails.
For this example, you have 1/1024 chances of it being heads 10 consecutive times, so you have (1-(1/1024)) chances of it having at least one of the 10 flips being tails... That is, if you have 1 in 1024 chances of it being heads 10 straight times, you have 1023 in 1024 chances of it not being that. And if it is not that, it means that, at least, there is one tails somewhere (at least one) that would break the chain. So over 1000 repetitions, you have (1-(1/1024)) to the 1000, or (1023/1024) to the 1000, or 37% chances to get at least one tail on each set of 10. So 63% chances approx. to get 10 consecutive heads.
That being said, I still believe if the chances are 1 in 1024 to get 10/10 heads, for each 1024 attempts when the number of attempts goes towards infinity, we should get at least one of those to be 10 straight heads? So maybe it has to do with distribution? Like sometimes you can get 2 or more sets of 10 straight heads in your lot of 1000, while other times you may get none. So the chances of you finding (in a lot of 1000 tries) at least 1 set of 10 straight heads is 63%? (because they can form clusters, and sometimes you will get a group with none)
Or maybe it doesn't have to do with that? I mean, what are probabilities, really? Say you have 99.9% chances to get heads and 0.01% to get tails. You do it twice and the chances to get at least one time heads are really high, of course. But there is 1/10,000 chances of actually being tails and tails... So if you go towards infinity, you might think the distribution would be 99.9% of the time, no matter where or the order, you get heads, and 0.01% of the time you would get tails. But for N tries, there is 0.01 to the N chances of actually being all tails... So you can do it 100 trillion times, or go towards infinity, and there is still a very, very, very small, but real chance to, well, get all tails. So the chance is there, and now let's suppose it happens... Now, if that slim chance was the way events unfolded, then that option would happen forever, infinite times, and the 99.9% chances would mean nothing. You might say, well, but if we run the experiment again, now we will probably get 99.9% of the time heads. So the 99.9% vs 0.01% probability isn't wrong... But actually, this new set of samples can be concatenated to the last, as they go towards infinity and the premise is that this will happen (eventually) infinite times, and ALL times, as one single time not getting tails would break the chain...
So now we might say it is unlikely, but now think of a person seeing it, witnessing the event... Wouldn't they say chances are 100% tails?
So one important thing is that possibilities don't guarantee you will get heads and tails in a proportion of 1/1024 and 1023/1024. It really doesn't. A probability of 90% doesn't mean sth will happen 90% of the time, but that we believe it has '9 chances out of ten to be that'. But once the drawing is made, it can happen 70% of the time only, on 2% of the time, and stay like this forever... At least that's my understanding of it after giving it some thought!
@crypto_quant_pro 8 лет назад ⁺¹³
You are correct that the probability of getting 10 heads is 1/(2^10). Let's call this a. The probability of NOT getting 10 heads in 1000 flips is (1-a)^1000 and getting at least one such result is 1 - (1-a)^1000 = 62.36%
@RD-lf3pt 8 лет назад ⁺³
Yeah, I know ;) I have to admit it puzzled me for a while until I figured it out (see paragraph three), though!
Thanks for the reply and clear explanation!
@taritgoswami9793 7 лет назад
Really Brilliant lecture ..
@htf7 6 лет назад
Great explanation sir. I like ur accent. Your accent is really like a Ratahan accent.
@VIVEKPANDEYIITB 3 года назад
Since mu depends on probability distribution; should it not be constant for all bins; i.e all h? And it should be nu that should change with h and bins. Why is mu different for different bins?
@granand 7 лет назад
Thank you caltech and Professor. But please can you help me, it's decades I touched maths to catch up. Tell which links I must read and understand so I can get back here to follow like you smart guys.
@yairelblinger2891 10 лет назад
This means that we assume that the data we have was sampled in the same way real data occurs. Does not seems a trivial assumption to me but makes allot of sense to need such an assumption.
Anyway, great lecture!
@bertrandduguesclin826 3 года назад
Why not writing the RHS of the Hoeffding's inequality as min(1, 2exp(-2Neps^2)) since a probability cannot exceeds 1 anyway?
@nissar-ali 6 лет назад
beautiful so far !!
@wafamribah4162 6 лет назад ⁺¹
One thing I couldn't figured it out though is how the target function and the hypothesis would agree? how the comparison occurs?
@desitravellers2023 6 лет назад
Whatever happens in the bin, is hypothetical. Just assume you have chosen a hypothesis h. This will agree with the target function in some cases and differ in other over the entire set of inputs which is possibly infinite. The main takeaway is you can compare it on the sample which is the training data for which the value of target function is available. Thus the essence is, if you see the hypothesis chosen by you is agreeing with the values of target function on the sample, this will probably behave the same for out of sample data points with in a threshold (according to Hoeffding's formula). Feel free to ask if you have further queries.
@adarshsingh6313 6 лет назад
sir 1.can u please explain hypothesis and target function in bin marble problem through some mathematical expression (as a example)....
@pablogarcia-zo1um 7 лет назад
fantastic lecture !!!! thanks a lot
@delightfulsunny 11 лет назад ⁺⁷
Remind myself that this is just foundation, and that it is dry and Zzzz.... but must ....keep...going..
an hour later...really the summary is that the more your model cater towards a specific sample, your model is more prone to failure when it comes to unknown. it is like fourier series, fitting too well to the data can lead to not actually learning at all >...
@alfonshomac 10 лет назад ⁺¹
maybe you'd like Stanford's course better by Andrew Ng, Google it and check it out. I like it.
@MrCmon113 5 лет назад
No that was the last lecture. This one wasn't really about that.
@丛亮 6 лет назад
How is the probability distribution over X considered into the learning process? The marbles (sample) from the bin (space) are subjected to the probability distribution. How does the probability affect learning? I only know that the multi-bins problem necessitates the modification of the plain-vanilla hoeffding's inequality. The multi-bins are brought about by the number of hypothesis in the hypothesis set, not by the probability distribution over the X space.
@roelofvuurboom5431 3 года назад
The essence of a probability distribution is that enables you to state that the pattern you observe in your sample will - with some particular probability - reflect the pattern in the bin. Making a statement about the situation in the bin based on what you observe in your sample IS the learning statement. You cannot generalize (or learn) more than this. If I select all green marbles in my sample. Can I say that all the marbles are green in the bin if the sample size is 10, 100 or 1000000? The answer is no. In fact I cannot state anything certain about the content of the bin no matter how large my sample is or how it is made up. Saying I cannot say anything certain about the bin is the same as stating I cannot learn anything certain about the bin.
@DrNeelDas 8 лет назад ⁺¹
I did not understand the union bound concept. My doubt is that shouldn't the upper bound (the probability of a hypothesis that is selected is bad) be (1/M) times (2M exp{-2e^2N}, assuming each hypthesis is equally likely to be selected. An analogy is consider this question
"There are two bags containing white and black balls. In the first bag there are 8 white and 6 black balls and in the second bag there are 4 white and 7 black balls. One ball is drawn at random from any of these two bags. Find the probability of this ball being black."
In the above question assume that selecting a bad ball signifies a bad event. Thus, P(bad event)=1/2*6/14+1/2*7/11=(1/2)*(6/14+7/11). In this example, M=2.
@ronithm5340 7 лет назад
I have the same doubt right now. Did you find your answer?
@roelofvuurboom5431 3 года назад
There is no probablity associated with selecting hypotheses. Probability is only associated with selecting the sample data (x). The learning algorithm will (likely) examine many or all hypotheses. It choses a particular hypothesis based on various criteria. Probability has nothing to do with this selection. Your thought error is probably that you saw that a hypothesis was "selected" and assumed this was a probabiliy concept. This is (rightly) confusing. It probably was better to say a hypothesis was "chosen" in order to stay away from probability terminology.
@brighty916 7 лет назад
this is awesome man, tks.
@oluwatoba11 12 лет назад
@ 34:20, I understand that the marble is green if the target function corresponds to the hypothesis used. However, didn't the professor say the target function itself is unknown?
@shakesbeer00 6 лет назад
I think he meant that the marble is green if yhat equals to y, namely, the target value instead of the underlying target function.
@KieranMace 7 лет назад
Is an new hypothesis h_avg, defined as an average over a subset set of hypotheses in H, necessarily also in H? or does it depend on the functional form of H?
@roelofvuurboom5431 3 года назад
No, H is a set of any group of hypotheses. The set does not have to have any form of arithmetic closure.
@ArunPaji 11 лет назад ⁺²
Something like feynman's lectures is being attempted.
@demon0192 5 лет назад ⁺¹
47:00 how did he obtain 63%?
@moose6459 5 лет назад ⁺¹
For 1 to get all heads = (1/2)^10, therefor 1-(1/2)^10= getting at least 1 tail. Likely good of everyone getting at least 1 tail = (1-(1/2)^10)^1000, so we take subtract that result from 1 to find how many did not get any tails
@bhanumanagadeep 7 лет назад
In slide 23, why is the probablity equation of g dependent on all hypothesis while we pick only one out of the multiple hypothesis? Shouldnt it be equal to the probabilty of the hypothesis chosen?
@roelofvuurboom5431 3 года назад
g is one of hypotheses h so what the dependency statement says is that if something applies to g it must therefor apply to (at least) one of the h's.
@namanvats9547 6 лет назад
How would you know what is the value of E(out) ??
@thanhquocbaonguyen8379 3 года назад
thank you for the lecture. it was really insightful though it's hard for me to capture it all. I'd like the questions that the students ask. why do we have multiple bins? they were cute though haha
@jonsnow9246 7 лет назад ⁺³
55:59 Overfitting!!!
@-long- 5 лет назад
awesome! thanks
@judgeomega 11 лет назад ⁺³
Ng's lectures are hardly even within the realm of a true 'lecture'. His command of the english language is very limited and he rarely explains anything beyond the iteration of formulas and proofs. You would be better served by reading a book on the subject.
Abu-mostafa is a TRUE teacher which walks you through the process. Of the dozens of lectures on this complex subject, he has the best compromise between content and approachability.
/opinion
@Blazzerek 8 лет назад ⁺¹
Hello. What should I do, if I misunderstood a lot of this talk? It can matter of language, because I'm not from English spoken countries, but other, just programming courses I understand. Specially things about machine learning, I don't. What to do? Study math or what...
@AlexEx70 8 лет назад
Try to become smarter
@Blazzerek 8 лет назад ⁺³
shitty comment sry, this can't help me ;-)
@AlexEx70 8 лет назад
Just a bit of humor) Turn on subtitles, so you eliminate poor audience (if so)
@luxeuto 8 лет назад ⁺³
This is something you can do to understand this lecture better:
1. Turn the subtitles on (by clicking the subtitles button)
2. If you don't understand something, watch and think over and over again until you totally get it.
3. This (work.caltech.edu/lectures.html#lectures)contains the lecture's slides, I think it helps.
Happy learning!
@EzraSchroeder 8 лет назад ⁺³
Try other machine learning courses, for instance on Udacity and Coursera, then come back to this one. Also, if you haven't had calculus, linear algebra, and probability, this isn't going to make a lot of sense to you. So if you lack that math background, then go study those topics one at a time, then come back.
@Shahada2012 9 лет назад
Brillant ya yaser.
@marcusjunior100 4 года назад
Is learning feasible? 8:07
@linkmaster959 5 лет назад
Im confused what the green and red marbles mean, if you pick random marbles from the bin they are random? what is learning in this context?
@linkmaster959 5 лет назад
Ahh I had to watch it twice. Abstract representations
@roelofvuurboom5431 3 года назад
You pick a hypothesis function. The hyothesis function returns a particular value x when h(x)=f(x) which is the unknown target function color the marble green otherwise red. The h is a proxy for the unknown function f. This algorithm colors all the marbles. It is important to realize that WE can't see the marbles in the bin but we do know they are colored. What we are trying to do is to find an h which has the lowest number of marbles colored red because a red marble x means h(x) is not equal to f(x). At this point probability does not come into the discussion we are simply coloring the marbles. The marble colors are not random. The marbles you pick are. In other words, you did not pick a randomly red marble you randomly happended to pick a marble that was red.
@AndyLee-xq8wq Год назад
Nice analogy!!
11 лет назад ⁺²
This is the right answer (1/1024) for the first question in the "coin analogy"
@edvaned8207 4 года назад ⁺²
Muito grato a UFRJ pela tradução. Excelente iniciativa para todos nós aprendentes autônomos.
@Sonia1978NYC 11 лет назад ⁺¹
Probably Approximately Correct :-o I have the book by Leslie Valiant
@rehantahirch 12 лет назад ⁺¹
The Prof. is amazing. He also looks like the Prince Charles
@helenlundeberg 9 лет назад ⁺⁵
I love this guy !
@danielgray8053 3 года назад ⁺²
Lol i love the lectures but why the heck did you use "mew" and "new" it is so confusing. Just think of two completely different sounding things. just use the sample mean x bar and the population mean mew lol duh

Следующие

Автовоспроизведение

Episode 32: The Electric Battery - The Mechanical Universe