Lecture 11 - Overfitting

Поделиться
HTML-код
  • Опубликовано: 12 сен 2024

Комментарии • 37

  • @elanmart
    @elanmart 10 лет назад +29

    I've done only 1-st semester math at my uni, and I can understand this lecturer almost perfectly. He's definatelly not overfitting :). Amazing teacher, trully.
    Thanks, Caltech!

  • @Augustinephillips
    @Augustinephillips 9 лет назад +43

    Man! What a fantastic professor.

  • @raulferreira2633
    @raulferreira2633 8 лет назад +11

    This guy has amazing teaching skills. Thanks for making such high quality content avaiable

  • @eecejk
    @eecejk 4 года назад +3

    Very clear, informative lecture! Especially, I like the part from 54:30 to 58:00 the most in this lecture since I have never found such clear description of the bias variance trade off. Thank you very much!

  • @arjunatm
    @arjunatm 2 года назад +2

    Why did I miss this series until 2021? 😮😮 Professor is explaining things really well. 💙💙

  • @yuwuxiong1165
    @yuwuxiong1165 6 лет назад +6

    The analogy of deterministic noise is enlightening... so 对牛弹琴 (play the lute to the cow) can be interpreated as that the complexity of music is deterministic noise to cows...lol...

  • @BellaNguyenDietrich
    @BellaNguyenDietrich 10 лет назад +2

    Great lecturer. Very easy and user-friendly for beginners to understand fundamental concepts to machine learning. Thank you.

  • @entaditorium
    @entaditorium 11 лет назад +4

    Interesting logo for the course!

  • @pt77780
    @pt77780 9 лет назад +10

    "Welcome back" :D

    • @andysilv
      @andysilv 8 лет назад +4

      Sort of brand

  • @chetan5848
    @chetan5848 6 лет назад

    Amazing lecture! WHAT an explanation of deterministic noise!

  • @jiunjiunma
    @jiunjiunma 12 лет назад +6

    I couldn't stop laughing when I heard that machine is hallucinating.

  • @HajjAyman
    @HajjAyman 12 лет назад +1

    I learned something new today: that machines may hallucinate! LOL.
    Great lecture and poignant examples.

  • @AndyLee-xq8wq
    @AndyLee-xq8wq Год назад

    much more clear about overfitting now

  • @vikasfavr
    @vikasfavr 11 лет назад +2

    nice work sir and thanks for this video

  • @gamer966
    @gamer966 8 лет назад +3

    If stochastic noise is the "normal" noise from a noisy target function and deterministic noise is the noise produced by the target function being more complex than the hypothesis set you have, then you still don't have an explanation of why the 10th order polynomial had overfitting trying to fit a noiseless 10th order polynomial (since it doesn't have stochastic noise by definition and it can't have deterministic noise because the hypothesis set is equal in complexity as the target function). Did I miss something?

    • @Bing.W
      @Bing.W 7 лет назад +6

      He explained in 26:48. 10th order polynomial is supposed to fit well a 10th order target, but only when there are enough samples. When you don't have enough samples, you can get good enough Ein, but you get really bad generalization error because of the theorem of VC dimension inequality. Now compared to 2th order model, the 10th order one is overfitting. (Note the definition of overfitting is "comparative".) In other words, deterministic noise is produced by the mismatch of model/target complexity, and also by the finite number of samples.

    • @GuitarViolinCovers
      @GuitarViolinCovers 5 лет назад

      To also clarify, the 'noiseless' target function in his example was a 50th order polynomial, therefore the overfitting there was due to deterministic noise 16:41. All the 10th order polynomials he used in his explanation were 'noisy', hence why the learning curves had a expected error > 0 even at large N (26:41)

  • @aminabbasi2936
    @aminabbasi2936 5 лет назад

    Fantastic prof.

  • @The-Daily-AI
    @The-Daily-AI 2 года назад

    Amazing lesson

  • @Nestorghh
    @Nestorghh 12 лет назад +1

    Que buena clase por favor!

  • @markh1462
    @markh1462 5 лет назад

    Brilliant lecture!

  • @googlandroid176
    @googlandroid176 2 года назад

    Early stopping seems to make the unstated assumption that E_out will not come back down.

  • @yvonker7408
    @yvonker7408 9 лет назад +1

    Hi,
    At 42:16 : "Stochastic noise increases => overfitting increases, deterministic noise increases => overfitting increases".
    At 1:13:16 : "[Values of terministic noise and stochastic noise have] nothing to do with the overfitting aspect".
    Can someone explain to me ? Thank you.

    • @yvonker7408
      @yvonker7408 9 лет назад

      Yvon liuhliuhk got an answer to my question here : book.caltech.edu/bookforum/showthread.php?t=503
      My second quotation is wrong.

    • @Bing.W
      @Bing.W 7 лет назад

      The point is that, the exact values themselves do not reflect the overfitting severity. You cannot say that a smaller value of bias leads to less overfitting. A model may get a smaller bias by overfitting it to the samples. At the same time, the bias value does not reflect the deterministic noise level. The latter is reflected by the complexity of the target (when it is above the model complexity): the higher complexity, the more noise, hence more overfitting.

  • @beattoedtli1040
    @beattoedtli1040 5 лет назад

    When I draw learning curves of decision trees or support vector machines, I have never so far seen overfitting in this sense. If I take my decision tree of depth 500, it will still be better on the test set than a depth-200 model, say. I get no added value on the validation set, of course. Can anyone show me a decision tree that overfits?!

  • @sreeharshapedada
    @sreeharshapedada 4 года назад

    The goal of this lecture is at 1:06:30 .. 💥

  • @vissarionchristodoulou6822
    @vissarionchristodoulou6822 5 лет назад

    Does anyone happen to know what graph-creating tool he is using or a tool with similar capabilities?

    • @subhasish-m
      @subhasish-m 4 года назад +1

      Programs such as MATLAB, Mathematica, Maple, and perhaps Octave could get you similar results

  • @Mateusz-Maciejewski
    @Mateusz-Maciejewski 5 лет назад

    This is funny that he has the same facial expression after answering each question in each video. Some sort of shaking head to express that he's just finished the answer. Great lecture, by the way. I still don't quite understand that notion of the deterministic noise, since if we choose richer model we make this noise smaller, but we end up in a worse result if we have small N. He explained it during the lecture and during the Q&A, but still I don't get it why he calls a bias a noise.

    • @zhuojunqiao6229
      @zhuojunqiao6229 5 лет назад +2

      I also struggled with the same question and here is my understanding. First of all, what is bias? Bias can be interpreted into the distance between the best h of a given H (= the best a H can do) and the target function, which is also the definition of deterministic noise. Then, why call it a noise? according to the example given by the professor, if what you want to learn (the target function) is far beyond your learning ability (the model complexity of H), it will mislead you like the child trying to learning complex number by using the real numbers, which happens to be the definition of noise.

  • @MOHSINALI-bk2qo
    @MOHSINALI-bk2qo 6 лет назад +1

    sir, you are god of machine learning . thank your sir for nice explanations.

  • @warmaxxx
    @warmaxxx 11 лет назад +4

    i had no idea machines can hallucinate lol

    • @zenicv
      @zenicv 11 месяцев назад +1

      10 years later, the above comment and lecture is all too relevent

    • @warmaxxx
      @warmaxxx 11 месяцев назад

      damn i don't even remember watching this @@zenicv