Logistic Regression with Maximum Likelihood

Поделиться
HTML-код
  • Опубликовано: 22 авг 2024
  • Logistic regression is a statistical model that predicts the probability that a random variable belongs to a certain category or class. In this video we use the Sigmoid function to form our hypothesis (statistical model). After that we form our likelihood function as a Bernoulli distribution given a data set, and using the maximum likelihood estimation method the model parameters are estimated using the gradient ascent algorithm.
    ** SUBSCRIBE:
    www.youtube.co...
    ** Follow us on Instagram for more endless engineering:
    / endlesseng
    ** Like us on Facebook:
    / endlesseng
    ** Check us out on twitter:
    / endlesseng
    ** Cat photo is courtesy of Dan Perry on Flicker and is licensed under creative commons as Attribution 2.0 Generic (CC BY 2.0).
    Source: www.flickr.com...
    License: creativecommon...
    ** Dog photo: Available in the public domain at Pxhere.
    source: pxhere.com/en/...

Комментарии • 87

  • @karimafifi5501
    @karimafifi5501 4 года назад +15

    Best explanation on the internet. Thank you.

    • @EndlessEngineering
      @EndlessEngineering  4 года назад

      Thank you! I am glad you enjoyed it, feel free to check out my other videos and subscribe to the channel

  • @gisellcelis5375
    @gisellcelis5375 2 месяца назад

    I have been looking for a good explanation in books and other videos but I couldn't understand this topic until I found your video. Thank you! :)

  • @RanCui
    @RanCui 2 года назад

    This is so far the best video I've watched on youtube that explains this topic

  • @hetav6714
    @hetav6714 3 года назад +3

    Damn this guy is awesome in explaining stuff. Really good work mate!

    • @EndlessEngineering
      @EndlessEngineering  3 года назад

      Thank you for watching! I am glad you found the video useful

  • @amargupta7123
    @amargupta7123 Год назад

    i have searched so many explanation but finally understood by your videos. Thank

  • @lieutanant8058
    @lieutanant8058 5 лет назад +17

    I'm wondering how the hell this is filmed. Some mirror magic or does he actually write backwards?

    • @EndlessEngineering
      @EndlessEngineering  5 лет назад +5

      It is filmed using Engineering Magic!

    • @alvinphantomhive3794
      @alvinphantomhive3794 4 года назад +2

      he did wrote backwards.. thats engineering concept , if adding doesnt solve the problem, then subtract it :)

    • @pratikd5882
      @pratikd5882 4 года назад +3

      Video camera placed behind the board, video footage flipped horizontally while editing.

    • @alvinphantomhive3794
      @alvinphantomhive3794 4 года назад

      @@pratikd5882 nah that's make sense..

    • @Pmarmagne
      @Pmarmagne 3 года назад +2

      @@pratikd5882 Nah, I think this guy is obvioulsy bored so he did the entire video writing backward

  • @TheElementFive
    @TheElementFive 2 года назад

    Just enough math to capture the intuition behind the algorithm. You got yourself a sub sir.

    • @EndlessEngineering
      @EndlessEngineering  2 года назад

      Thanks for watching and subscribing! Glad you liked the video! Let me know if you have any topics you would like to see videos on in the future

  • @drachenschlachter6946
    @drachenschlachter6946 Год назад

    Beste video on RUclips to this topic!

  • @nazrulhassan6310
    @nazrulhassan6310 2 года назад

    just fabulous explanation there are tons of videos on logistic regression but most of them gloss over the mathematical details.

    • @EndlessEngineering
      @EndlessEngineering  2 года назад

      Thank you for watching! Glad you found it clear and useful. Feel free to share and subscribe to the channel

  • @seb2118
    @seb2118 3 года назад +2

    I need more videos man!!
    Awesome work. Please post more videos or helpfull links that explain as well as you do!

    • @EndlessEngineering
      @EndlessEngineering  3 года назад +1

      Thank you for the kind words! I have been a little busy and not posting much lately, I am sorry about that! Let me know what topics you would like videos on! Thank you for watching

  • @abdulgadirhussein2244
    @abdulgadirhussein2244 4 года назад +1

    Great video dude and very neatly explained. Please keep up the great!

  • @tuannguyenxuan8919
    @tuannguyenxuan8919 2 года назад

    Thank you ! This video help me a lot to understand what MLE in logistic does

  • @swarnimartist3355
    @swarnimartist3355 4 года назад +1

    Amazing explanation....just wow

    • @EndlessEngineering
      @EndlessEngineering  4 года назад +1

      Thank you Shrish, glad you enjoyed the video. Please feel free to like the video and subscribe to channel, thank you for the support.

  • @patrickbyamasu1353
    @patrickbyamasu1353 3 года назад

    This is a very good explanation of the math behind Logistic Regression I ever seen!

  • @anilsarode6164
    @anilsarode6164 4 года назад +1

    Well done.

  • @heesukson3581
    @heesukson3581 3 года назад

    This is the best explanation ever. Thx!

    • @EndlessEngineering
      @EndlessEngineering  3 года назад

      Thank you! I am glad you found this video clear and useful. Please let me know if there are other topics you would like to see videos on

  • @batatambor
    @batatambor 4 года назад +1

    When we are talking about linear regressions we normally have to satisfy a few assumptions like (linearity, normality of error, homoskedasticity and etc). Do these conditions also have to hold in order to perform a logistic regression? You have after all a kind a of a linear regression inside the sigmund function, right?

    • @EndlessEngineering
      @EndlessEngineering  4 года назад +2

      The logistic model assumes a linear relationship between the predictor variables and the log-odds of the event. Logistic regression can be seen as a special case of the generalized linear model and thus analogous to linear regression, but logistic regression makes 2 key different assumptions First, the conditional distribution y ∣ x is a Bernoulli distribution rather than a Gaussian distribution. Second, the predicted values are probabilities and are therefore restricted to (0,1).

  • @pratikd5882
    @pratikd5882 4 года назад +2

    Thanks a ton!!, you explained it so well !! Please keep making videos on ML math topics.

  • @Felicidade101
    @Felicidade101 5 лет назад +2

    subscribed because this was so good :D

    • @EndlessEngineering
      @EndlessEngineering  5 лет назад

      Thank you! I am glad you like the video and subscribed, please feel free to provide any other feedback on topics you would like to see covered. Endless Engineering is committed to provide our followers with content they want!

  • @abdulkareemridwan8762
    @abdulkareemridwan8762 4 года назад

    Endless engineering.. Fantastic work.. You Explaining everybit... really appreciate

    • @EndlessEngineering
      @EndlessEngineering  4 года назад

      Thank you Abdulkareem. I am glad you found the video useful. Thanks for watching!

  • @igormishurov1876
    @igormishurov1876 4 года назад +1

    Please come out with new lessons! Very clear and cool!

  • @majedalhajjaj8965
    @majedalhajjaj8965 4 года назад

    يعطيك العافيه شرح رائع تذكرتك لما شرحت هياكل الطائرات

  • @kamrangurbanov4364
    @kamrangurbanov4364 4 года назад +1

    That was really helpful. Thank you very much

  • @avestaabdulrahman6549
    @avestaabdulrahman6549 3 года назад

    Thank you indeed!

  • @yoyoIITian
    @yoyoIITian 5 лет назад +1

    Wow! Nice explanation. Thank you so much.

    • @EndlessEngineering
      @EndlessEngineering  5 лет назад

      Glad you liked it! Let me know if there are any other topics you are interested in learning!

  • @salmanashraf2153
    @salmanashraf2153 4 года назад

    it was easily explained and convenient to comprehend. thanks

    • @EndlessEngineering
      @EndlessEngineering  4 года назад

      Hi Salman, I am glad you enjoyed the video. Thank you for watching

  • @taiworidwan194
    @taiworidwan194 2 года назад +1

    Thanks for the video.
    Please, how can one optimize the coefficients of a Logistic Regression Model using a Genetic Algorithm?

  • @user-xt9js1jt6m
    @user-xt9js1jt6m 4 года назад +1

    Nice explanation sir
    Wt if there is intercept?
    Do we have to estimate multiple parametrs simultaneously??

    • @EndlessEngineering
      @EndlessEngineering  4 года назад +1

      Hi Kailas, thank you for watching!
      That is a great question, and the answer is YES! In fact the formulation I show here does estimate an intercept, that is included in the notation x bar (a bar on top of the vector x). x_bar is the vector x appended with one element of value 1 like this --> x_bar = [1, x0, x1, ..., xn]. So the parameter that multiplies the element of value 1 is the intercept.

    • @user-xt9js1jt6m
      @user-xt9js1jt6m 4 года назад

      @@EndlessEngineering understood!!
      Thank you sir for replying.
      When i was trying to understanding entire mechanism i came accross various ways like
      Gradient descent and Newton Rapson.
      In gradient descent
      X new= x initial - Rate * slope
      Newton rapson
      X new = x initial - Log likely hood ( x initial)/ derivative of it
      M i correct??
      Which is better??
      Thanks you once again for your reply!!🙏🙏🙏

    • @EndlessEngineering
      @EndlessEngineering  4 года назад

      @@user-xt9js1jt6m Newton's method requires computing the second derivative of your cost function (Hessian) and inverting it. This uses more information and would converge faster than gradient descent (which only requires the first derivative.
      However, requiring the hessian to exist imposes a more strict condition of smoothness, and inverting it might be computationally challenging in some cases.
      So I would say neither is absolutely better, it just depends on the problem/data you are dealing with.

    • @user-xt9js1jt6m
      @user-xt9js1jt6m 4 года назад

      @@EndlessEngineering okay!!!
      Thank you!!!
      I appreciate your sincerity!!
      Thank you once again sir!!!

  • @praneethd.v1292
    @praneethd.v1292 2 года назад +1

    Why is the probability multiplied in example and why is it power in actual equation? Time : 4.52 of video

    • @EndlessEngineering
      @EndlessEngineering  2 года назад

      Praneeth, that is the equation for a Bernoulli distribution which takes the value 1 with probability p and the value 0 with probability 1 - p. This type of distribution is nice for a binary problem like we are solving in the video.

  • @willianarboleya5651
    @willianarboleya5651 4 года назад +1

    Thanks, bald Jack Black. It was very helpful.

  • @keving.7871
    @keving.7871 4 года назад

    Very good explanation!

    • @EndlessEngineering
      @EndlessEngineering  4 года назад

      Thank you Keven. I am glad you found it helpful and enjoyed the video!

  • @fadydawra
    @fadydawra 4 года назад +1

    Thank you alot!
    Could you explain kernels methods?

    • @EndlessEngineering
      @EndlessEngineering  4 года назад

      Thank you for watching Fadi. I have more videos coming, I will try to do one on Kernel methods

  • @nizardwicahyo76
    @nizardwicahyo76 4 года назад

    Thank you, dir! Great explain!

  • @k.8597
    @k.8597 Год назад +1

    U forgot to write (1-y_{i})LOG(1-sig(theta..) just then after taking the log :)
    Edit: nvm u fixed it right after

  • @user-cy9zf4oz2i
    @user-cy9zf4oz2i 5 месяцев назад

    but why is it x bar ?, its not xi?

  • @tayebmustafa8606
    @tayebmustafa8606 5 лет назад

    Thank you very much it really helps.

    • @EndlessEngineering
      @EndlessEngineering  5 лет назад

      Hello Tayyeb, I am glad you find this helpful. Let me know if there are any other topics you are interested in learning!

  • @anshulsaini5401
    @anshulsaini5401 3 года назад

    The explanation is very good. I still have some doubts regarding logistic regression
    Gradient descent/ascent is used to minimize the error right? In linear regression what we did was, we chose a random value for "b1 & b0" and then checked our error (MSE) using that slope and intercept. We integrated it again and again until we got a slope and intercept (b1 and b0) where the error was minimum which means global minima.
    1) Why we are using gradient ascent here?
    2) In linear regression we chose random values for b0 and b1, so what will be that random value here in logistic regression?

    • @EndlessEngineering
      @EndlessEngineering  2 года назад

      1) We are using the gradient ascent here to maximize the log likelihood function that we derived. You could use gradient descent if you multiply the log likelihood by a -1, then you are minimizing the -1 * Likelihood which is the same as maximizing the likelihood.
      2) The random variables here are the model parameters in the hypothesis. That is the vector theta mention. In the example I am showing it is the same model as linear regression, the number of parameters depends on the length of the input vector x

  • @gjyohe
    @gjyohe 4 года назад

    Awesome!

  • @pariasmukeba7977
    @pariasmukeba7977 Год назад

    how can we help ? do you have a udemy course we can buy? I feel like a mathematician after this video

  • @karl-henridorleans5081
    @karl-henridorleans5081 5 лет назад +2

    first things first, thank you very much for this explanation!! Liked and subscribed immediately!
    One thing that is still unclear to me is how in time stamp 12:06 you get the value of X-bar from that derivative. At the beginning X-bar was set to be a vector of the form: X-bar = [1, X_0,..., X_n]. I did the math (ie. the derivative of the sigmoid function w.r.t. theta), and I do not understand that connection. Any help would be welcome!

    • @EndlessEngineering
      @EndlessEngineering  5 лет назад +2

      Hi Karl. Thanks for watching and subscribing!
      That is a very good question, let me see if I can clarify.
      The derivative we want to compute is d sigma(theta__transpose * x_bar) / d theta . Using the chain rule and let
      a = theta__transpose * x_bar Eq(1)
      then we can write that as
      d sigma(a) / d theta = [d sigma(a) / d a] * [d a / d theta] Eq(2)
      based on Eq(1) we can write
      [d a / d theta] = d theta__transpose * x_bar / d theta --> which is equal to x_bar Eq(3)
      and
      d sigma(a) / d a = sigma(a) * (1 - sigma(a)) = sigma(theta__transpose * x_bar) * (1 - sigma(theta__transpose * x_bar)) Eq(4)
      Substitute Eq(3) and Eq(4) into Eq(2) and substitute the value of a from Eq(1) all into the original derivative
      d sigma(theta__transpose * x_bar) / d theta = sigma(theta__transpose * x_bar) * (1 - sigma(theta__transpose * x_bar)) * x_bar
      I hope that clears it up

    • @nickhall7793
      @nickhall7793 5 лет назад

      @@EndlessEngineering Very helpful explanation thank you!

  • @boussagmanmorad9473
    @boussagmanmorad9473 Год назад

    what do he mean by x barre

  • @madagascar9407
    @madagascar9407 4 года назад

    why are you writing the sum symbol and then still do +...+ . I dont think its supposed like that

    • @EndlessEngineering
      @EndlessEngineering  4 года назад

      Thanks for your question. The term affected by the sum symbol was too long to fit on one page, so I had to break it into multiple lines. It is valid to use the +...+ notation in that case since I am using the same index (i) to show that the other term belongs in the sum as well.

  • @corybeyer1
    @corybeyer1 3 года назад

    art

  • @AdarshSingh-qk2rj
    @AdarshSingh-qk2rj 4 года назад

    Why theta to be found, is needed to be increased using Max-LF ?

    • @AdarshSingh-qk2rj
      @AdarshSingh-qk2rj 4 года назад +1

      Normally we want to maximize the likelihood (consequently the log likelihood). This is the reason why we call this method maximum likelihood estimation. We want to determine the parameters in such a way that the likelihood (or log likelihood) is maximized.
      When we think about the loss function we want to have something that is bounded by 0 from below and is unbounded for positive values. Our goal is to minimize the cost function. Hence, we take the negative of the log likelihood and use it as our cost function.
      It is important to note that this is just a convention. You could also take the log likelihood and maximize it, but in this case, we would not be able to interpret it as a cost function.

  • @Felicidade101
    @Felicidade101 5 лет назад

    hero!

  • @sanchitgoyal6720
    @sanchitgoyal6720 4 года назад

    What is x bar at 13:18

    • @EndlessEngineering
      @EndlessEngineering  4 года назад

      It is a vector of the inputs with a one appended to it, see 1:47