Least Squares vs Maximum Likelihood

Поделиться
HTML-код
  • Опубликовано: 4 фев 2025

Комментарии • 33

  • @datamlistic
    @datamlistic  6 месяцев назад +2

    The equation explanation of the Normal Distribution can be found here: ruclips.net/video/WCP98USBZ0w/видео.html

    • @blitzkringe
      @blitzkringe 6 месяцев назад +1

      I click on this link and it leads me to a video with a comment with this link, and I click on this link etc..., when do I stop?

  • @MiroslawHorbal
    @MiroslawHorbal 6 месяцев назад +22

    The maximum liklihood approach also lets you derive regularised regression. All you need to do is add a prior assumption on your parameters. For instance, if you assume your parameters come from a gaussian distribution with 0 mean and some fixed value for sigma, the MLE derives least squares with an L2 regularisation term.
    Its pretty cool

    • @datamlistic
      @datamlistic  6 месяцев назад +1

      Thanks for the insight! It sounds like a really interesting possible follow up video. :)

  • @elia0162
    @elia0162 6 месяцев назад +6

    I still remember when i thought i discovered this thing alone, and after i got a reality check that iit was already discovered

  • @kevon217
    @kevon217 6 месяцев назад +4

    Great explanation of the intuition. Thanks!

  • @jafetriosduran
    @jafetriosduran 6 месяцев назад +1

    Una explicación breve y excelente de una duda que siempre tuve, muchas gracias

  • @the_nuwarrior
    @the_nuwarrior 6 месяцев назад

    Este video sirve para refrescar la memoria, excelente

  • @placidesulfurik
    @placidesulfurik 6 месяцев назад +19

    Your math implies that the gaussian distributions should be vertical, not perpendicular to the linear regression line.

    • @gocomputing8529
      @gocomputing8529 6 месяцев назад +4

      I agree. This would implies that the noise is on the Y variable, while the X has no noise

    • @IoannisNousias
      @IoannisNousias 6 месяцев назад +3

      The visuals should have been concentric circles. The distributions are the likelihood of the hypothesis (θ) given the data, data here being y,x. It’s a 2D heatmap.

    • @placidesulfurik
      @placidesulfurik 6 месяцев назад

      @@IoannisNousias ah, fair enough

    • @IoannisNousias
      @IoannisNousias 6 месяцев назад

      @@placidesulfurik in fact, this is still a valid visualization, since it’s a reprojection to the linear model. He is depicting the expected trajectory, as explained by each datapoint.

    • @datamlistic
      @datamlistic  5 месяцев назад

      @@IoannisNousias +1

  • @PplsChampion
    @PplsChampion 6 месяцев назад +1

    awesome explanation

  • @yaseral-saffar7695
    @yaseral-saffar7695 6 месяцев назад

    @3:14 is it really correct that st.dev does not depend on theta? I’m not sure as it depends on the square of the errors (y-y_hat) which depends on y_estimate which itself depends on theta.

  • @et2124
    @et2124 6 месяцев назад +3

    According to the formula on 2:11, I don't see how the gaussian distributionas are perpendicular to the line, instead of just the x axis
    Therefore, I believe you made a mistake in the image on 2:09

    • @jorgecelis8459
      @jorgecelis8459 6 месяцев назад +2

      indeed

    • @datamlistic
      @datamlistic  5 месяцев назад

      @placidesulfurik asked the same question. Answer is in the thread.

  • @MikeWiest
    @MikeWiest 6 месяцев назад

    Cool, thank you!

    • @datamlistic
      @datamlistic  6 месяцев назад +1

      Thanks! Happy you liked the video!

  • @markburton5318
    @markburton5318 6 месяцев назад

    Given that the best estimate of a normal distribution is not normal, what would be the function to minimise? And what if the distribution is unknown? What would a non-parametric function to minimise?

  • @theresalwaysanotherway3996
    @theresalwaysanotherway3996 6 месяцев назад +1

    love the video, seems like a natural primer to move into GLMs

    • @datamlistic
      @datamlistic  6 месяцев назад +3

      Happy to hear you liked the explanation! I could create a new series on GLMs if enough people are interested in this subject.

  • @6PrettiestPics
    @6PrettiestPics 6 месяцев назад

    Subbed! You love to see it.

  • @boredofeducation-sb6kr
    @boredofeducation-sb6kr 6 месяцев назад +3

    great video! but what's the intuition on why gaussian distribution as the natural distribution here?

    • @blitzkringe
      @blitzkringe 6 месяцев назад +3

      Central limit theorem. Natural random events are composed from many smaller events, and even if the distribution of individual events isn't Gaussian, their sum is.

    • @MiroslawHorbal
      @MiroslawHorbal 6 месяцев назад +1

      You can think of the model as:
      Y = mX + b + E
      Where E is an error term. A common assumption is that E is normally distributed around 0 with some unknown variance. Due to linearity, Y is distributed by a normal centered at mX + b
      You can derive other formula for regression by making different assumptions about the error distribution, but using a gaussian is most common.
      For example, you can derive least absolute deviation (where you mininize the absolute difference rather than the square difference) by assuming your error distribution is a Laplace distribution. This results in a regression that is more robust to outliers in the data
      In fact, you can derive many different forms of regression based on the assumptions on the distribution of the error terms.

    • @Eta_Carinae__
      @Eta_Carinae__ 6 месяцев назад

      @@MiroslawHorbalYes... like Laplace distributed residuals have their place in sparsity and all, but as to OPs question, the Gaussian makes certain theoretical results far easier. The proof of CLT is out there... it requires the use of highly unintuitive objects like moment generating functions, but at a very high level, the answer is that the diffusion kernel is a Gaussian, and is an eigenfunction of the Fourier transform... and there's a deep connection between the relationship between RVs and their probabilities, and functions and their Fourier transforms.

  • @digguscience
    @digguscience 6 месяцев назад +1

    I have seen the concept of least squares in Artificial Neural Networks, The material is very important for learning ANN