Why do we use "e" in the Sigmoid?

Поделиться
HTML-код
  • Опубликовано: 16 сен 2024
  • Why do we use the mathematical constant "e" in the sigmoid function?
    My Patreon : www.patreon.co...
    Sigmoid Video : • The Sigmoid : Data Sci...
    Visuals Created with Excalidraw : excalidraw.com/

Комментарии • 39

  • @halibrahim
    @halibrahim Год назад +14

    Thank you for making us love math even more.

  • @pawanbhatt314
    @pawanbhatt314 Год назад +3

    thankyou for bringing this intuitive video, I just had this thought yesterday.
    Please keep uploading videos like this it makes my intuitive more strong and closer to statistics.

  • @ramirolopezvazquez4636
    @ramirolopezvazquez4636 Год назад +2

    Thanks! I really appreciate this bits of useful, subtle and insightful ideas about common objects in data science

  • @SuperMtheory
    @SuperMtheory Год назад +4

    Makes sense. dy/dx = y (1- y ) if k=e. Great video!

  • @sashayakubov6924
    @sashayakubov6924 3 месяца назад

    Love this nonchalant explanation :)

  • @JoeBurnett
    @JoeBurnett 17 дней назад

    Thank you for this explanation!

  • @apoorvatiwari8287
    @apoorvatiwari8287 Год назад

    Nice explanation. Clarifies everything

  • @ChocolateMilkCultLeader
    @ChocolateMilkCultLeader Год назад +1

    The best in the game for this kind of conteht

  • @reekdas9219
    @reekdas9219 6 месяцев назад

    thanks bro, interesting video!

    • @ritvikmath
      @ritvikmath  6 месяцев назад

      Glad you liked it!

  • @JMBalaguer
    @JMBalaguer Год назад

    Thanks for the explanation!

  • @Justin-zw1hx
    @Justin-zw1hx Год назад

    always high quality content

  • @jasdeepsinghgrover2470
    @jasdeepsinghgrover2470 Год назад +1

    Actually there is a better reasoning but I am still not sure about it... Sigmoid is derived through the linear regression on log odds of the two classes... So mx+c = ln(p/(1-p)) which gives p = 1/(1+e^-(mx+c))

  • @marcfruchtman9473
    @marcfruchtman9473 Год назад

    Great info. Thank you.

  • @user-nm4sz3vr6x
    @user-nm4sz3vr6x Год назад +1

    Huh, so I guess this is like a tradeoff of annoyances where using e upfront is just less annoying than discovering ln(k) much later.

  • @giorda77
    @giorda77 Год назад

    Really good explanation. Keep it up :)

  • @r0cketRacoon
    @r0cketRacoon Месяц назад

    amazing

  • @masster_yoda
    @masster_yoda Год назад

    This is a nice explanation, however one question is left open for me: We interpret the result of the sigmoid as probability. So sigmoid(x) results in some probability of something to be classified as some category. Let's assume the standard sigmoid(x) results in a value of 0.7. When I change sigmoid to use some other number k instead of e, this probability would change. Let's say it would now be 0.9 instead of 0.7. This appears to me as semantically completely different from 0.7. So I would conclude that with respect to the interpretation as probability, it is not arbitrary to choose e oder some other number k.

    • @MalTimeTV
      @MalTimeTV Год назад +1

      When we use the sigmoid function we are doing so because we can map from the real number space to the [0,1] space. So, in practice, this means that regression values can be mapped to probabilities. So, like you say, you might have some regression (x) value that maps to 0.7. But what you are generally interested in is not the 0.7 itself but rather the value of the sigmoid for the given data point relative to other data points. A concrete example might help to clarify:
      Say, we have a bunch of predictors (from a linear regression, say) - e.g. weather data, say, for temperature and pressure at some given location. And we want to combine these somehow via a linear relationship y = b0 + b1 x temp + b2 x pressure. We now want to use y (a real value) and map it to a probability for rain. So we use the sigmoid. And so we might get 0.7, like you say, for a given observation of temperature and pressure. Does that mean that we have a model which predicts 70% chance for rain. Not necessarily - and probably not even close. In practice, you will use the 70% relative to the value of the other observations. You might use a threshold value of 50% and say that all values above 50% should be classified as "expecting rain" and all values below as "not expecting rain". But then you might find that the 50% threshold for classification does not really hold up when you apply your model to historical data with known outcomes. However, if you tune the threshold (and explore other possible values, e.g. from 20%, 21% .... 69%, 70%), you might find that a threshold of 30% yields very high accuracy (even against data which you set aside and with which you didn't train your model).
      So, in other words, in practice, you rarely take the sigmoid function as a literal mapping from the real line to the probability line. You just allow it to perform a mapping to the probability line because here it helps to define, in some sense, a classification rule. And when you have this classification rule, you can fine tune the threshold to optimise your model. A long answer, I know, but I figured I would share this since I had wondered the same thing for quite a long time - until I saw how things worked in practice.

    • @thankforyourvideo
      @thankforyourvideo 10 месяцев назад

      @@MalTimeTV thanks a lot for your answer

  • @jakobpirs1392
    @jakobpirs1392 Год назад +1

    Very simmilar to the logit function.

  • @uusserrrreesssuuu
    @uusserrrreesssuuu Год назад

    are operations with 'e' are more expensive then with 2 or 3?

  • @16876
    @16876 Год назад

    very nice 😎

  • @pypypy4228
    @pypypy4228 Год назад

  • @gamuchiraindawana2827
    @gamuchiraindawana2827 10 месяцев назад

    It looks like the logistic function

  • @luispericchi9347
    @luispericchi9347 Год назад

    i like my curves like that

  • @nononnomonohjghdgdshrsrhsjgd
    @nononnomonohjghdgdshrsrhsjgd 4 месяца назад

    sorry, you didn't explain anything.

  • @Aiden-ml4cw
    @Aiden-ml4cw Год назад

    🤔 *PromoSM*