The Bias Variance Trade-Off

Поделиться
HTML-код
  • Опубликовано: 5 янв 2025

Комментарии • 82

  • @superman39756
    @superman39756 2 года назад +5

    This channel will explode soon - quality of content is too good, thank you !

  • @taotaotan5671
    @taotaotan5671 2 года назад +2

    Wow this video truly opened my mind.
    I have been heard this term from ML people many many times, but it remains vague until I watch this video!

  • @kellsierliosan4404
    @kellsierliosan4404 3 года назад +7

    I am studying a MSc in Stats at a decent uni and I have to say that your channel is damn amazing. Good job there, the intuition that you manage to put in your videos is mindblowing. You gained a subscriber :)

    • @Mutual_Information
      @Mutual_Information  3 года назад

      Thank you! Very happy to have you. More good stuff coming soon :)

    • @Mutual_Information
      @Mutual_Information  3 года назад

      And if you’d think it be helpful to your classmates, please share it with them 😁

  • @arongil
    @arongil Год назад +1

    I love the humor at the end ("if you make the heroic move of checking my sources in the description"). I'm learning so much from you, thank you!

  • @charudattamanwatkar8340
    @charudattamanwatkar8340 2 года назад +12

    Your videos have the perfect balance between rigor and simplicity. Kudos to you! Keep making such great content. You're destined to be really successful. 🎉

  • @karanshah1698
    @karanshah1698 3 года назад +23

    The moment you flashed the decomposed equation, it clicked to me this looks a lot like Epistemic and Aleatoric Uncertainty components. P.S: We need much more quality content like this on high-end academic literature, please keep going full throttle. You earned my subscribe!

    • @Mutual_Information
      @Mutual_Information  3 года назад +2

      Thank you very much! I’m not familiar with those components, but I’m glad to hear you are seeing relationships I don’t :) and will do, I have 4-5 videos in the pipeline. New one every 3 weeks!

  • @ConnectinDG
    @ConnectinDG 3 года назад +4

    I have been reading a lot on bias-variance trade-off and have been using it for some time now. But the way you explained it with amazing visuals, it was mind-blowing and very intuitive to understand. Totally like your content and will be keep waiting for more content like this in future.

  • @Boringpenguin
    @Boringpenguin 2 года назад +3

    This is probably the best take on Bias Variance Trade-Off I have ever seen on RUclips, the one from ritvikmath is a close second.
    Please don't ever stop making video like this, great stuff :)

  • @hspadim
    @hspadim 3 года назад +8

    Incredible work man! I’m truly looking forward for more content!

  • @winoo1967
    @winoo1967 3 года назад +9

    Great video!! The beginning as a creator in yt is pretty hard, so don't give up

    • @Mutual_Information
      @Mutual_Information  3 года назад +2

      Thank you! I won’t, especially with the encouragement

  • @AbhishekJain-bv6vv
    @AbhishekJain-bv6vv 3 года назад +2

    Until 7:22, I thought this was very theoretical, but as soon as you started the animations, everything made more sense and became clear . Truly incredible, amazing work. Lots of love from India, and please keep up the good work. You are the 3blue1brown of data science.

    • @Mutual_Information
      @Mutual_Information  3 года назад

      Thank you, encouragement like this means a lot. I’ll make sure to keep the good stuff coming :)

    • @AbhishekJain-bv6vv
      @AbhishekJain-bv6vv 3 года назад

      @@Mutual_Information I am a student in IIT Kanpur (one of the premier institutes of India), and I am currently doing a course
      Statistical Methods for Business Analytics.
      Here is the link to the playlist and the (lecture slides in the description).
      ruclips.net/p/PLEDn2e5B93CZL-T8Srj_wz_5FIjLMMoW-
      Just play any video in this, and tell me would you be willing to learn from these videos . The way of teaching is lagging far behind in our country.

  • @kashvinivini2264
    @kashvinivini2264 3 года назад +1

    Highly underrated video! Great work

  • @vladvladislav4335
    @vladvladislav4335 3 года назад +1

    Hey DJ, the quality of your videos is mindblowing, I subscribed even before watching the video till the end. I'm 100% sure your channel will blow up in the nearest future!

    • @Mutual_Information
      @Mutual_Information  3 года назад

      Thank you brother! I’m very happy to hear you like them and excited to have you as a sub. More to come!

  • @FarizDarari
    @FarizDarari 2 года назад +1

    Simply awesome explanation!

  • @juanvelez3889
    @juanvelez3889 3 года назад +3

    This is sooooo good. Thanks a lot for sharing your knowledge in such an amazing explanation!

  • @Lucifer-wd7gh
    @Lucifer-wd7gh 3 года назад +2

    I can see bright future of this channel. God job man . Keep uploading ❤️
    .
    From United States Of India 🇮🇳😆

  • @loukafortin6225
    @loukafortin6225 3 года назад +1

    this needs more views

  • @cerioscha
    @cerioscha Год назад +1

    Great video thanks!. I've never seen this explained in a regression context, only for classification in terms of VC dimension.

    • @Mutual_Information
      @Mutual_Information  Год назад

      Glad you appreciate it. This is an old video but I learned to lighten up on the on screen text, but I'm glad it still works for some

  • @tirimula
    @tirimula Год назад +1

    Awesome graphical visualization.

  • @akhilezai
    @akhilezai 3 года назад +7

    This is GOLD

  • @akhaita
    @akhaita 3 года назад +1

    Beautifully done!

  • @orvvro
    @orvvro 3 года назад +3

    Thank you, very clear video

  • @revooshnoj4078
    @revooshnoj4078 2 года назад +1

    clearly explained thanks!

  • @sathyakumarn7619
    @sathyakumarn7619 2 года назад

    This video clearly deserves a lot more views than this. Keep up the good work.

    • @Mutual_Information
      @Mutual_Information  2 года назад

      Thanks! Slowly things are improving. I think eventually more people will come to appreciate this one.

  • @JoseManuel-pn3dh
    @JoseManuel-pn3dh Год назад +1

    thanks you're carrying my MsC

  • @NoNTr1v1aL
    @NoNTr1v1aL 3 года назад +1

    Amazing video!

  • @belgion11
    @belgion11 2 месяца назад +1

    Great video, thank you!

  • @antoinestevan5310
    @antoinestevan5310 3 года назад +1

    really nice content and intuitions, liked it a lot !

  • @MP-if2kf
    @MP-if2kf 2 года назад +1

    Great video!

  • @murilopalomosebilla2999
    @murilopalomosebilla2999 3 года назад +1

    Well explained! Thanks!!

  • @wexwexexort
    @wexwexexort 11 месяцев назад

    fantastic visualizations

  • @navanarun
    @navanarun Год назад

    Thank you! this is amazing content.

  • @Kopakabana001
    @Kopakabana001 3 года назад +1

    Awesome info!

  • @eulefranz944
    @eulefranz944 3 года назад +1

    Excellent.

  • @peterkonig9537
    @peterkonig9537 Год назад +1

    Super cool stuff.

  • @partyhorse420
    @partyhorse420 Год назад +1

    Have you seen recent results in deep learning that show larger neural networks have both lower bias and lower variance than smaller models? Past a point, more parameters give less variance, which is amazing! See “Understanding Double Descent Requires a Fine-Grained Bias-Variance Decomposition” Adlam et Al

    • @Mutual_Information
      @Mutual_Information  Год назад

      I hadn't seen this before but now that I've read some of it, it's quite an interesting idea. Maybe it explains some of the weird behavior observed in the Grokking paper? I still am mystified by how these deep NNs sometimes defy the typical U shape of test error.. wild! Thanks for sharing

  • @manueltiburtini6528
    @manueltiburtini6528 2 года назад +1

    A masterpiece of yt

    • @Mutual_Information
      @Mutual_Information  2 года назад +1

      I'm glad you think so.. I was actually thinking about re-doing this one

  • @LvlLouie
    @LvlLouie 3 года назад +1

    Subscribed want to learn this stuff but not sure where to start!

    • @Mutual_Information
      @Mutual_Information  3 года назад

      Well I may be biased, but I think this channel is a fine place to start :)

  • @Throwingness
    @Throwingness 2 года назад

    I love the channel. I have a few topic requests... KL Divergence. Diffusion Networks. Policy Gradient RL models.

    • @Mutual_Information
      @Mutual_Information  2 года назад

      Policy Gradient RL methods will be out this summer! Diffusion.. that's a whole beast I don't have plans for right now. I'd need to learn quite a bit to get up to speed. KL Divergence, for sure I'll do that. Possibly later this year.

    • @Throwingness
      @Throwingness 2 года назад

      @@Mutual_Information
      Diffusion.
      Did you see Dalle-2? It's a milestone. I can't wait for the music and videos a system like this well create.

  • @theleastcreative
    @theleastcreative 11 месяцев назад +1

    It feels like you're reading out of that textbook on the table behind you

    • @Mutual_Information
      @Mutual_Information  11 месяцев назад

      The whole channel started b/c I actually wanted to write a book on ML.. but then I figured few people would read it, so might as well communicate the same those on a YT channel, where it had a better chance. Literally, I'd say "It's a textbook in video format". But then I realized, it can make the videos very dense and a little dry. So I've evolved a bit since.

  • @melodyparker3485
    @melodyparker3485 3 года назад +3

    Do you use the Manim Python Library for your animation?

    • @Mutual_Information
      @Mutual_Information  3 года назад +3

      No, though I should explore that one day. I use a personal library that leans heavily on Altair, which is a Python static plotting library based on d3.

    • @melodyparker3485
      @melodyparker3485 3 года назад +2

      @@Mutual_Information Cool!

  • @MegaSesamStrasse
    @MegaSesamStrasse Год назад

    So can I understand bias and variance in terms of a sampling distribution from which my specific model is taken? If the variance is high, the mean of this sampling distribution will be quite close to the true value. But since the variance of this distribution is so large, it is unlikely that my specific model represents the true value (but not impossible?). And if the model is very low in complexity, the variance of the sampling distiribution will be quite small. But since the expected value from the sampling distribution is far from the true value, it is very unlikely that my specific model represents the true value?

    • @Mutual_Information
      @Mutual_Information  Год назад +1

      That sounds about right. Think of it this way. There is some true data generating mechanism that is unknow to your model. A complex model is more likely to be able to capture it. In doing so, if you re-sample from the true data generating process.. fit the model.. and look at the average of those fits.. then those will equal the average of the true distribution. This is what I mean when I say "The complex model can 'capture' the true data generating mechanism". Aka, the model is low bias. However, the cost of such flexibility is that the model produces very different ("high variance") fits over different re-samplings of the data.
      Does that make sense?

  • @jadtawil6143
    @jadtawil6143 3 года назад +1

    subscribed. would u mind sharing how to quickly make the visuals with the math equations? Id love to use a similar resource for my students.

    • @Mutual_Information
      @Mutual_Information  3 года назад

      Hey Jad. I have plans to open source my code for this, but it’s not ready yet. I’ll make an announcement when it’s ready,

  • @bajdoub
    @bajdoub 3 года назад +2

    Excellent video! One question I have is in practice, what is the relationship between EPE and the mean square error (MSE) loss we usually optimize for in practice for regression problem? Is EPE an expected value of MSE? Or is MSE only related to the bias term in EPE? or are they completely unrelated?

    • @Mutual_Information
      @Mutual_Information  3 года назад +1

      Glad you enjoyed it! They are certainly related :) To make MSE and EPE comparable, the first thing we'd have to do is integrate EPE(x_0) over the domain of x, which we can call EPE, as you do. In that case, MSE is a biased estimate of EPE (to answer your question, it's an estimate of the whole of EPE - not any one of the terms). The MSE is going to be more optimistic/lower than EPE. This is because when fitting, you chose parameters to make MSE low.. if you had many parameters, you could make MSE really low (overfitting!). But EPE measures how good your model is relative to the p(x, y) - more parameter doesn't necessarily mean a better model! To get a better estimate, you could look at MSE out of sample. And that's what we do to determine those hypers.

    • @bajdoub
      @bajdoub 3 года назад +2

      @@Mutual_Information thanks so much for taking the time to reply! I will need sometime and probably another pass of the video and putting things on paper before I digest it all :-D but you have given me all elements of explanation. Keep up the good work your videos are some of the best out there, you put the bar very high! :-)

    • @Mutual_Information
      @Mutual_Information  3 года назад +1

      @@bajdoub thanks! It means a lot. I’ll try to keep the standard high :)

  • @virgenalosveinte5915
    @virgenalosveinte5915 3 месяца назад +1

    awesome

  • @AlisonStuff
    @AlisonStuff 3 года назад +2

    Woo!

    • @Mutual_Information
      @Mutual_Information  3 года назад +1

      Haha thank you sister

    • @AlisonStuff
      @AlisonStuff 3 года назад +2

      @@Mutual_Information your welcome brother. How are you? How was your day?

  • @ClosiusBeg
    @ClosiusBeg 3 года назад +1

    Man, please more pictures..

  • @ilyboc
    @ilyboc 3 года назад +1

    😮😮😯❤️

  • @kashvinivini2264
    @kashvinivini2264 3 года назад +1

    Please provide subtitles for foreign language speakers!

    • @Mutual_Information
      @Mutual_Information  3 года назад +1

      I have a list of outstanding changes I need to make and this is one of them. I’ll make it priority! Thanks for the feedback

  • @arminkashani5695
    @arminkashani5695 2 года назад +1

    Great explanation! Thanks so much.