Thank you so much! Soon I have an exam in text mining and coming from humanitarian studies I struggled a bit... You explained it so clearly... Thank you for your job, really appreciated!
probably the best explanation of this function I've seen. I have a question though; when you use chain rule, AFAIK if you set say y=1+e^-s, then dy/dx = -e^-s. So this would be the numberator and the denom would be (1+e^-s)^2. So, I don't see where that extra - comes from.
Conceptually the mistake you're making is you have a composite of two functions where the first has a major part in the denominator... differentiating the denominator's power goes up right!!! The chain rule is d/dx(f(g(x)) = f'(g(x))*g'(x). Here, f(u)= 1/u where u = 1+exp(-x), and g(x)= 1+exp(-x). Using the chain rule, f'(u)= -1/u^2 and g'(x)=-exp(-x). Putting together, we get -1/(1+exp(-x))^2*(-exp(-x)) = exp(-x)/(1+exp(-x))^2 I prefer to use the product rule rather than the chain rule myself for simple calculations like this.
'a student got 9score, that's why I have very very high evidence that he is gonna drop out' -- so, the Alternative Hypothesis is "a student drops out" or this is Null Hypothesis ???
Thank you so much for that video !! Question are there any other fuctions like the sigmoid function that are used in data science, ML.. And how they are used ? Thanks
sigmoid is an activation function. ml uses about a dozen en.wikipedia.org/wiki/Activation_function because neurons are like scales, a squashing function balances out the extreme lows and highs so outliers don't mess up your predictions
I dont understand the rate of change part... If the score changes from 0 to 1 why the probability should increase more than changing from 9 to 10? 0 to 1 just makes it a tiny bit more probable that a student will not drop out but its still very unsure.
Hey, thanks for the intuitive explanation. I had a doubt, the cumulative distribution function of the normal distrbution is sigmoidal, would that be a reasonable explanation for the choice of sigmoid making sense when it comes to explaining the probability of naturally occurring phenomenon ?
Great explanation, thanks! I understand the Sigmoid function limits unbounded numbers to be between 0 and 1 and, at the same time, states the diminishing value of marginal increases. But, why use the Euler’s number e? Would any other number do?
Any other positive number would give the same basic shape, but it would lack the symmetry you get with e. You can see that by noting how the derivative form is less convenient for other numbers.
It's not entirely clear to me why a jump in s from 0 to 1 is considered more significant than a jump from 9 to 10. Doesn't that depend on many assumptions? For example, a student being absent on a single day might not actually be significant (could be a valid reason), but would might tip s from 0 to 1. In general, why is the region around s=0 considered/needed to be more "sensitive?"
He does emphasise upon 0 to 1 being a big relative change as opposed to 9 to 10. What is the relative change when going from 0 to 1 ? It is (1 - 0) / 0 = +ve infinity. While the relative change going from 9 to 10 is (10 - 9) / 9 = 1/9
finally, someone decided to record the best videos on Loss functions ... thanks!
Glad it was helpful!
I really liked the fact that you explained this idea using the student example thorughout the video. This really helped to "keep it real". Great work!
I really appreciate how you throw in 'sanity checks' in your videos. An important skill to learn and use!
Beautifully illustrated. I like how you rephrase or represent an observation to uncover the how and why of things.
Cheers,
b
Thank you so much! Soon I have an exam in text mining and coming from humanitarian studies I struggled a bit... You explained it so clearly... Thank you for your job, really appreciated!
Thanks a lot . It was the best explanation that I have heard ever
You're most welcome
you are a gem !! never had anyone cleared my fundamentals so deeply before.
how he expalin every topic in simple way with great intuittion and keep the things as simple as possible so learners not get bored .
I am learning AI and this video is the best explanation of Sigmoid function and what we need it for. Thanks
This video is wonderful. it really helps me understand why people design this function in the first place! Thanks!
Thanks a lot for explaining why it's needed at the first place, it helps with the intuition a lot. Loved this!
Thank you for such a clear explanation of the sigmoid function in terms of information!
HOLY SHIT I'M MINDBLOWN BY THIS VIDEO
Really neat and basic way of understanding the purpose
Cannot thank you more for your splendid lucid video on sigmoid
probably the best explanation of this function I've seen. I have a question though; when you use chain rule, AFAIK if you set say y=1+e^-s, then dy/dx = -e^-s. So this would be the numberator and the denom would be (1+e^-s)^2. So, I don't see where that extra - comes from.
Conceptually the mistake you're making is you have a composite of two functions where the first has a major part in the denominator... differentiating the denominator's power goes up right!!!
The chain rule is d/dx(f(g(x)) = f'(g(x))*g'(x). Here, f(u)= 1/u where u = 1+exp(-x), and g(x)= 1+exp(-x).
Using the chain rule, f'(u)= -1/u^2 and g'(x)=-exp(-x).
Putting together, we get -1/(1+exp(-x))^2*(-exp(-x)) = exp(-x)/(1+exp(-x))^2
I prefer to use the product rule rather than the chain rule myself for simple calculations like this.
Thank you so much for very clear explanation.!
You are welcome!
Excelent explanation, thank you ritvik! greetings from Argentina
Glad you enjoyed it!
very smart way of explaining!
Woww .I think you are one of the best teachers I've ever seen. 👏
Thank you! 😃
Fantastic explanation
Simple & intuitive explanation. Cheers!
Bravo!!.... This gives very interesting intuition.
This is one of the best video for understanding sigmoid function. Thanks a lot Ritvik
Beautifully put together, well done! And thank you so much for your efforts!
great work, your explanations are better than the explanations of my lecturers at UCL
Excellent work, can you make a video about the RELU and what makes it so effective?
Great in-depth tutorial on Sigmoid function!
Great explanation! Simplicity is the best policy.
this man is really awesome , hope he would be my college professor
Fantastically explained brother ! really amazed by your teaching method !
Great. Thank you for this explanation. Super.
Glad it was helpful!
Superb explanation
thats fantastic explanation, thanks a lot
will start watching your videos
Lovely concept 👌
Very good video. Thanks
just amazing explained ! thanks !
You are welcome!
thank you for the excellent content, I am glad I subscribed. do you cover other subjects? I feel like I need some refresher lessons!
amazing video
Another great video, thanks for your efforts
Many thanks!
The absolute best. Huge, huge fan of the channel
I love this guy😭
Awesome, you explained it in so simple way. Thank you so much.
thanks a lot! keep making this kind of video, you make it so easy to understand!
VEEERY NICE keep them coming 🥰
Thank you, really i undertand it easily, you are awesome, please continue
Fantastic explonation, thank you so much for this video!
Thanks for making it so simple!
Jus Wow, Great One, Thanks man
Amazing explaination! May your tribe increase
honest work, good job!
Thank you 😊
'a student got 9score, that's why I have very very high evidence that he is gonna drop out' -- so, the Alternative Hypothesis is "a student drops out" or this is Null Hypothesis ???
Love this channel! Keep it up man
Great explanation.. you've got a subscriber
Welcome aboard!
This is perfect. Thank you.
You're the best!
agree!
He's the best
Thanks :)
You look indeed very interesting in this video.
Well explained! Great!
thanks for this content
You are amazing ❤
thanx, now i understand!
Thank you!!
I am glad I have subscribed to your channel. Very informative.
Welcome aboard!
Thanks!
This is better than university
thank you, a great video again!
thank you !!
I am glad I am here.
Thank you so much for that video !! Question are there any other fuctions like the sigmoid function that are used in data science, ML.. And how they are used ? Thanks
sigmoid is an activation function. ml uses about a dozen en.wikipedia.org/wiki/Activation_function
because neurons are like scales, a squashing function balances out the extreme lows and highs so outliers don't mess up your predictions
Thank you for giving your insight!
Thanks man
You earned a subscriber
I dont understand the rate of change part... If the score changes from 0 to 1 why the probability should increase more than changing from 9 to 10? 0 to 1 just makes it a tiny bit more probable that a student will not drop out but its still very unsure.
4:19 why is it a huge change in relative terms?
Hey, thanks for the intuitive explanation. I had a doubt, the cumulative distribution function of the normal distrbution is sigmoidal, would that be a reasonable explanation for the choice of sigmoid making sense when it comes to explaining the probability of naturally occurring phenomenon ?
@Ritvik Is there any video you uploaded answering questions in the comments?
@@anishbabus576 It sucks but he has a very demanding day job (Academia), so gotta be happy with what we get.
Great explanation, thanks! I understand the Sigmoid function limits unbounded numbers to be between 0 and 1 and, at the same time, states the diminishing value of marginal increases. But, why use the Euler’s number e? Would any other number do?
Any other positive number would give the same basic shape, but it would lack the symmetry you get with e. You can see that by noting how the derivative form is less convenient for other numbers.
any book name can you suggest where this sigmoid and softmax is explained?
why is it a huge change from 0 to 1 or from 0 to -1 in relative terms?
I do not understand why this is the case, could you please explain that?
Brilliant
thanks!
The shape of S in Sigmoid. I see what you did there
Nice
Great
It's not entirely clear to me why a jump in s from 0 to 1 is considered more significant than a jump from 9 to 10. Doesn't that depend on many assumptions? For example, a student being absent on a single day might not actually be significant (could be a valid reason), but would might tip s from 0 to 1. In general, why is the region around s=0 considered/needed to be more "sensitive?"
He does emphasise upon 0 to 1 being a big relative change as opposed to 9 to 10. What is the relative change when going from 0 to 1 ? It is (1 - 0) / 0 = +ve infinity. While the relative change going from 9 to 10 is (10 - 9) / 9 = 1/9
Why specifically use sigmoid function in logistic regression when there are other probabilistic function available?
Thanks Ritvik
bariya
based
God 🙏🙏
sigmoid make waifu?