8.3 Bias-Variance Decomposition of the Squared Error (L08: Model Evaluation Part 1)

Поделиться
HTML-код
  • Опубликовано: 22 янв 2025

Комментарии • 37

  • @kairiannah
    @kairiannah Год назад

    this is how you teach machine leanring, respectfully the prof. at my university needs to take notes!

  • @elnuisance
    @elnuisance 3 года назад +4

    This was life saving. Thank you so much Sebastian. Especially for explaining why 2ab = 0 while deriving the decomposition

  • @bluepenguin5606
    @bluepenguin5606 2 года назад +5

    Hi Professor, thank you so much for the excellent explanation!! I learned bias variance decomposition long time ago but never fully understand it until I watch this video! Detailed explanation of each definition helps a lot. Also, with the code implementation, it helps me not only understand the concepts but also be able to implement into the real application which is the part I always struggle with! I'll definitely find time to watch other videos to make my ML foundation more solid.

  • @whenmathsmeetcoding1836
    @whenmathsmeetcoding1836 2 года назад +1

    This was wonderful Sebastian after looking no such video available on you tube with such explanation

  • @PriyanshuSingh-hm4tn
    @PriyanshuSingh-hm4tn 2 года назад +1

    The best explanation of bias & variance I've countered so far.
    it would be helpful if you could include the "noise" too.

    • @SebastianRaschka
      @SebastianRaschka  2 года назад +1

      Thanks! Haha, I would defer the noise term to my statistics class but yeah, maybe I should do a bonus video on that. A director's cut. :)

  • @khuongtranhoang9197
    @khuongtranhoang9197 3 года назад +2

    Do you know that you are doing truly good work! clear to every single details

  • @gurudevilangovan
    @gurudevilangovan 3 года назад +2

    Thank you so much for the bias variance videos. Though I intuitively understood it, these equations never made sense to me before I watched the videos. Truly appreciated!!

    • @SebastianRaschka
      @SebastianRaschka  3 года назад

      Awesome, I am really glad to hear that I was able to explain it well :)

  • @ashutoshdave1
    @ashutoshdave1 2 года назад +1

    Thanks for this! Provides one of the best explanations👏

    • @SebastianRaschka
      @SebastianRaschka  2 года назад

      Thanks! Glad to hear!

    • @ashutoshdave1
      @ashutoshdave1 2 года назад

      @@SebastianRaschka Hi Sebastian, visited your awesome website resource for ML/DL. Thanks again. Can't wait for the Bayesian part to be completed.

  • @XavierL-m6g
    @XavierL-m6g Год назад

    Thank you so much for the intuitive explanation! The notations are clear to understand and it just instantly clicked.

  • @krislee9296
    @krislee9296 3 года назад +1

    Thank you so much. This helps me to understand perfectly about Bias-Variance mathmetically.

  • @imvijay1166
    @imvijay1166 2 года назад +2

    Thank you for this great lecture series!

  • @siddhesh119369
    @siddhesh119369 Год назад

    Hi, thanks for teaching, really helpful 😊

  • @justinmcgrath753
    @justinmcgrath753 11 месяцев назад

    At 10:20, the bias comes out backward because the error should be y_hat - y, not y - y_hat. The "true value" in an error is substracted from the estimate. Not the other way around. This is easily remembered from thinking of a a simple random variable with mean mu and error e: y = mu + e. Thus, e = y - mu.

  • @andypandy1ify
    @andypandy1ify 4 года назад +3

    This is an absolutely brilliant video Sebastian - thank you.
    I have no problem deriving the Bias-Variance Decomposition mathematically, but no one seems to explain what the variance or expectation is with respect to - is it just on one value? over multiple training sets? different values within one training set? You explained it excellently.

  • @kevinshao9148
    @kevinshao9148 3 года назад

    Thanks for the great video! One question: 8:42 why y is constant? y=f(x) here also has distribution, is a R.V. is that correct? and when you say "apply expectation on both sides, this expectation over y or over x?

    • @SebastianRaschka
      @SebastianRaschka  3 года назад +1

      Good point. For simplicity, I assumed that y is not a random variable but a fixed target value instead

    • @kevinshao9148
      @kevinshao9148 3 года назад

      @@SebastianRaschka Thank you so much for the reply! yeah that's where my confuse sticks. So what do you expectation over of? If you expectation over all the x value, then you cannot do this assumption right?

  • @tykilee9683
    @tykilee9683 3 года назад +1

    So helpful😭😭😭

  • @Rictoo
    @Rictoo 7 месяцев назад

    I have a couple of questions: Regarding the variance, is this calculated across different parameter estimates given the same functional form of the model? Also, these parameter estimates depend on the optimization algorithm used, right, ie., implying the model predictions are 'empirically-derived models' vs. some sort of theoretically optimal parameter combinations, given a particular functional form? If so, would this mean that _technically speaking_, there is an additional source of error in the loss calculation, which could be something like 'implementation variance' due to our model likely not having the most optimal parameters, compared to some theoretical optimum? Hope this makes sense, I'm not a mathematician. Thanks!

  • @bashamsk1288
    @bashamsk1288 Год назад

    When you say bias^2+variance that is for a single model
    In the beginning you said bias and variance for different models trained on different datasets which one is it?
    If we consider single model then bias is nothing but mean error and variance is mean squared error?

  • @1sefirot9
    @1sefirot9 3 года назад

    any good sources or hints on dataset stratification for regression problems ?

    • @SebastianRaschka
      @SebastianRaschka  3 года назад +1

      Not sure if this is the best way, but personally I approached that by manually specifying bins for the target variable and then proceeding with stratification like for classification. There may be more sophisticated techniques out there, though, e.g., based on KL divergence or so.

    • @1sefirot9
      @1sefirot9 3 года назад

      @@SebastianRaschka hm, given a sufficiently large number of bins this should be a sensible approach, and easy to implement. I will play around with that. I am trying some of the things taught in this course on the Walmart Store Sales dataset (available from Kaggle), a naive training of LGM already returns marginally better results than what the instructor on udemy had (he used xgboost with hyperparameters returned by the AWS Sagemaker auto tuner).

  • @DeepS6995
    @DeepS6995 3 года назад

    Professor, does your bias_variance_decomp work in google colab? It did not with me. It worked just fine in Jupyter. But the problem with Jupyter is that bagging is way slower (that's my computer) than what I could get in colab.

    • @SebastianRaschka
      @SebastianRaschka  3 года назад

      I think Google Colab has a very old version of MLxtend as the default. I recommend the following:
      !pip install mlxtend --upgrade

    • @DeepS6995
      @DeepS6995 3 года назад

      @@SebastianRaschka It works now. Thanks for the prompt response

  • @jayp123
    @jayp123 7 месяцев назад

    I don’t understand why you can’t multiply ‘E’ the expectation by ‘y’ the constant