Activation Functions - EXPLAINED!

Поделиться
HTML-код
  • Опубликовано: 19 окт 2024
  • We start with the whats/whys/hows. Then delve into details (math) with examples.
    Follow me on M E D I U M: towardsdatasci...
    REFERENCES
    [1] Amazing discussion on the "dying relu problem": www.quora.com/...
    [2] Saturating functions that "squeeze" inputs: stats.stackexc...
    [3] Plot math functions beautifully with desmos: www.desmos.com/
    [4] The paper on Exponential Linear units (ELU): arxiv.org/abs/...
    [5] Relatively new activation function (swish): arxiv.org/pdf/...
    [6] Used an Image of activation functions from this Pawan Jain's Blog: towardsdatasci...
    [7] Why bias in Neural Networks? stackoverflow....

Комментарии • 159

  • @desalefentaw8658
    @desalefentaw8658 4 года назад +40

    wow, one of the best highlights of activation functions on the internet. Thank you for doing this video

  • @GauravSharma-ui4yd
    @GauravSharma-ui4yd 4 года назад +29

    Awesome as always. Some points to ponder correct me if I am wrong
    1. Relu is just not a activation but can also be thought as a self regularizer, as it offs all those neurones whose values are negative, so it's just a kind of automatic dropout.
    2. A neutral net with just input and output layer, with softmax at the output layer is logistic regression, but when we add hidden layers in this network with no hidden activations then it's more Powerful than just vanilla logistic regression as it is now taking linear combination of linear combinations with different weight settings. But it still results in linear boundaries.
    Lastly your contributions to the community is very valuable, clears a lot nitty-gritty details in short time. Keep going like this :)

    • @generichuman_
      @generichuman_ 2 года назад +5

      No, dropout is different. Random sets of neurons are turned off in order to cause the neurons to form redundancies which can make the model more robust. In the case of dying Relu, the same neurons are always dead, making them useless. Dropout is desirable and deliberate, dying Relu is not.

  • @SkittlesWrap
    @SkittlesWrap 6 месяцев назад +1

    Straight to the point. Nice and super clean explanation for non-linear activation functions. Thanks!

  • @UdemmyUdemmy
    @UdemmyUdemmy Год назад +76

    the screetching noise is irrtitaing..else nice tutoial

  • @PrymeOrigin
    @PrymeOrigin 10 месяцев назад

    One of the best explanations ive come across

  • @jhondavidson2049
    @jhondavidson2049 3 года назад +5

    I'm learning deep learning rn and using the deep learning book published by MIT press for the same. That's kinda complicated for me to understand especially these parts cause m still an undergrad and have 0 previous experience with this. Thank you for explaining this so well.

  • @otabeknajimov9697
    @otabeknajimov9697 Год назад

    best explanation of activation functions I ever seen

  • @fahadmehfooz6970
    @fahadmehfooz6970 3 года назад +1

    Amazing! Finally I am able to visualise vanishing gradient descent and dying relu.

  • @adrianharo6586
    @adrianharo6586 3 года назад +6

    Great video!
    The dissapointed gestures were a bit too much x'D
    A question I did have as a beginner was.
    What does it mean for a sigmoid gradient to "squeeze" values, as in they become smaller and smaller as they back propagate?

    • @AnkityadavGrowConscious
      @AnkityadavGrowConscious 3 года назад

      It means that sigmoid function will always output a value between 0 and 1 regardless of any real number input. Notice the mathematical formula and graph of a sigmoid function for better clarity. Any real number will be converted to a number between 0 and 1. Hence sigmoid is said to "squeeze" values.

  • @deepakkota6672
    @deepakkota6672 4 года назад +8

    Wooo, Did I just noticed the complex explained simple. Thanks! Looking forward to more videos.

  • @PritishMishra
    @PritishMishra 3 года назад +7

    The most thing I love about your videos is the fun you add... Learning becomes a bit easier

  • @linuxbrad
    @linuxbrad Год назад

    7:48 "once it hits zero the neuron becomes useless and there is no learning" this explains so much, thank you!

  • @oheldad
    @oheldad 4 года назад +3

    Great video ! And what is more great - are the useful references you add at the description. ( For me (1)+(7) answer the questions I asked my self at the end of your video - so its was on point ) ! Thank you !

    • @CodeEmporium
      @CodeEmporium  4 года назад

      Haha. Glad the references are useful! :)

  • @RJYL
    @RJYL Год назад

    Great explanation for activation function I like it so much

  • @shivendunsahi
    @shivendunsahi 4 года назад +5

    I discovered your page just yesterday and might I say, YOU'RE AWESOME! Thanks for such good content bro.

    • @CodeEmporium
      @CodeEmporium  4 года назад +3

      Thanks homie! Will dish out more soon!

  • @rishabhmishra279
    @rishabhmishra279 2 года назад +2

    Great explanation ! and the animations with maths formula and visualizing it is awesome !! Many thanks !

  • @deepaksingh9318
    @deepaksingh9318 4 года назад

    Wow... Perfect and easiest way to explain it..
    Everyone talks about what activations do but nobody shows in how actually it looks like behind the algos..
    And you explain things in the most easiest way which are so easy to understand and remember..
    So a big like for. All your videos..
    Could uh make more and more and DL.. 😄

    • @CodeEmporium
      @CodeEmporium  3 года назад

      Thank you. I'm always thinking of more content :)

  • @kanehooper00
    @kanehooper00 5 месяцев назад

    Excellent job. There is way too much "mysticism" around neural networks. This shows clearly that for a classification problem all the nerual net is doing is creating a boundary function. Of course it gets complicated in multiple dimensions. But your explanations and use of graphs is excellent

  • @the-tankeur1982
    @the-tankeur1982 6 месяцев назад +4

    I hate you for making that noises, i want to learn, comedia is something i would pass on

  • @nguyenngocly1484
    @nguyenngocly1484 4 года назад

    With ReLU f(x)=x is connect, f(x)=0 is disconnect. A ReLU net is a switched system of dot products, if that means anything to you.

  • @mikewang8368
    @mikewang8368 4 года назад

    better than most professors, thanks for great video

  • @alonsomartinez9588
    @alonsomartinez9588 2 года назад

    Awesome vid! Small sug: I might check the volume levels, during the screaming in :56 it was a bit painful to my ear and possibly sounded like audio clipping.

  • @rasikannanl3476
    @rasikannanl3476 4 месяца назад

    great .. so many thanks ... need more explanation

  • @shrikanthnc3664
    @shrikanthnc3664 3 года назад

    Great explanation! Had to switch to earphones though :P

  • @alifia276
    @alifia276 3 года назад

    Thank you for sharing! This video cleared my doubts and gave me a good introduction to learn
    further

  • @dazzykin
    @dazzykin 4 года назад +4

    Can you cover tanh activation? (Thanks for making this one so good!)

    • @CodeEmporium
      @CodeEmporium  4 года назад +5

      I wonder if there is enough support that warrants a video on just tanh. Will look into it though! And thanks for the compliments :)

  • @sgrimm7346
    @sgrimm7346 Год назад

    Excellent explanation. Thank you.

  • @phucphan4195
    @phucphan4195 2 года назад

    thank you very much, this is really helpful

  • @fazlayrabby3709
    @fazlayrabby3709 20 часов назад

    Can you please explain this "No gradient means no learning"?

  • @wagsman9999
    @wagsman9999 Год назад

    Beautiful explanation!

  • @malekaburaddaha5910
    @malekaburaddaha5910 3 года назад +1

    Thank you very much for the great, and smooth explanation. This was really perfect.

    • @CodeEmporium
      @CodeEmporium  3 года назад

      Much appreciated Malek! Thanks for watching!

  • @epiccabbage6530
    @epiccabbage6530 Год назад +1

    What are the axises on these graphs? Is it inputs, input*weights + bias for linear?

    • @NITHIN-tu7qo
      @NITHIN-tu7qo 8 месяцев назад

      did you get answer for it?

  • @AmirhosseinKhademi-in6gs
    @AmirhosseinKhademi-in6gs Год назад

    but we cannot use ReLU for the regression of functions with high degrees of derivatives!
    In that case, we should still go with infinitely differentiable activation functions like "Tanh", right?

  • @cheseremtitus1501
    @cheseremtitus1501 4 года назад

    Amazing presentation ,easy and captivating to grasp

  • @SeloniSinha
    @SeloniSinha 4 месяца назад

    wonderful explanation!!!

  • @ShivamPanchbhai
    @ShivamPanchbhai 3 года назад

    this guy is genius

  • @AymaneArfaoui
    @AymaneArfaoui 4 месяца назад

    what does x and y represent in the graph you use to show the cats and dog points ?

  • @simranjoharle4220
    @simranjoharle4220 Год назад

    This was really helpful! Thanks!

  • @Mohammed-rx6ok
    @Mohammed-rx6ok 2 года назад

    Amazing explanation and also funny 😅👏👏👏

  • @tarkatirtha
    @tarkatirtha 2 года назад

    Lovely intro! I am learning at the age of 58!

  • @DrparadoxDrparadox
    @DrparadoxDrparadox 2 года назад +1

    Great Video. Could you explain what U and V are equal to in this equation : o = Ux + V ? And How did you come up with the decision boundary equation and how did you determine the values of w1 and w2 ?
    Thanks in advance

  • @najinajari3531
    @najinajari3531 4 года назад

    Great Video and great page :) Which softwares you use to make these videos ?

    • @CodeEmporium
      @CodeEmporium  4 года назад +1

      Thanks! I use Camtasia Studio for the editing; Photoshop and draw.io for the images.

  • @myrondcunha5670
    @myrondcunha5670 3 года назад

    THIS HELPED SO MUCH! THANK YOU!

  • @eeera-op8vw
    @eeera-op8vw 4 месяца назад

    good explanation for a beginner

  • @superghettoindian01
    @superghettoindian01 Год назад

    Another great video
    🎉🎉🎉!

  • @yachen6562
    @yachen6562 4 года назад +1

    Really awesome video!

  • @kellaerictech
    @kellaerictech Год назад

    That's great explanation

  • @meghnasingh9941
    @meghnasingh9941 4 года назад +4

    wow, that was really helpful, thanks a ton!!!!

    • @CodeEmporium
      @CodeEmporium  4 года назад +1

      Glad to hear that. Thanks for watching!

  • @younus6133
    @younus6133 4 года назад +1

    oh man, amazing explanation.Thanks

  • @ankitganeshpurkar
    @ankitganeshpurkar 3 года назад

    Nicely explained

  • @youssofhammoud6335
    @youssofhammoud6335 4 года назад

    What I was looking for. Thanks!

  • @linuxbrad
    @linuxbrad Год назад

    9:03 what do you mean "most neurons are off during the forward step"?

  • @MrGarg10may
    @MrGarg10may Год назад +1

    then why isn't leaky RELU ELU used everywhere in LSTM, GRU, Transformers ..? why is RELU used everywhere

  • @mohammadsaqibshah9252
    @mohammadsaqibshah9252 Год назад

    This was an amazing video!!! Keep up the good work!

  • @prashantk3088
    @prashantk3088 4 года назад +1

    really helpful..thanks

  • @wucga9335
    @wucga9335 11 месяцев назад

    so how do we know when to use relu or leacky relu? do we just use leacky relu all together in all cases?

  • @aaryamansharma6805
    @aaryamansharma6805 4 года назад +1

    awesome video

  • @vasudhatapriya6315
    @vasudhatapriya6315 Год назад

    How is softmax a linear function here? Shouldn't it be non linear?

  • @pouyan74
    @pouyan74 2 года назад +1

    I've read at least three books on ANN's so far, but it's only now, after watching this video, that I have the intuition of what exactly is going on and how do activation functions break linearity!

  • @fredrikt6980
    @fredrikt6980 3 года назад

    Great explanation. Just add more contrast to you color selection.

    • @CodeEmporium
      @CodeEmporium  3 года назад

      My palette is rather bland i admit

  • @kphk3428
    @kphk3428 3 года назад +1

    1:16 I couldn't see that there were different colors so I was confused.
    Also I found the voicing of the training neural net annoying. But some people may like what other people dislike, so it's up to you to keep on voicing them.

    • @gabe8168
      @gabe8168 3 года назад +1

      the dude is making these videos alone, if you don't like his voice that's on you, but he can't just change his voice

  • @Nathouuuutheone
    @Nathouuuutheone 2 года назад

    What decides the shape of the boundary?

  • @yukuchan
    @yukuchan 7 дней назад +1

    nice video ♥

  • @mangaenfrancais934
    @mangaenfrancais934 4 года назад +1

    Great video, keep going !

  • @programmer4047
    @programmer4047 4 года назад +1

    So, we should always use leaky reLU

  • @TheAscent_
    @TheAscent_ 4 года назад

    @6:24 How does passing what is a straight line into the softmax function also give us a straight line? Isn't the output, and consequently the decision boundary, a sigmoid?
    Or is it the output before passing it into the activation function what counts as the decision boundary?

    • @CodeEmporium
      @CodeEmporium  3 года назад

      6:45 - The line corresponds to those points in the feature space (the 2 feature values) where The sigmoid's height is 0.5.

  • @prakharrai1090
    @prakharrai1090 2 года назад

    can we use linear activation with hinge loss for Linear svm for binary classification.

  • @jigarshah1883
    @jigarshah1883 4 года назад

    Awesome video man !

  • @jamesdunbar2386
    @jamesdunbar2386 3 года назад

    Quality video!

  • @tahirali959
    @tahirali959 4 года назад +1

    good work bro keep it up
    -

  • @ronin6158
    @ronin6158 4 года назад

    it should be possible to let (part of) the net optimize its own activation function no?

  • @bartekdurczak4085
    @bartekdurczak4085 4 месяца назад

    good explanation but the noises are little bit annoying but thank you bro

  • @LifeKiT-i
    @LifeKiT-i Год назад

    With graphical calculator, your explanation is sanely clear!! thank you!!

    • @CodeEmporium
      @CodeEmporium  Год назад

      Thanks so much for the kind comment! Glad the strategy of explaining is useful :)

  • @jaheerkalanthar816
    @jaheerkalanthar816 2 года назад

    Thanks mate

  • @patite3103
    @patite3103 3 года назад

    Amazing!

  • @uzairkhan7430
    @uzairkhan7430 3 года назад +1

    awesome

  • @masthanjinostra2981
    @masthanjinostra2981 3 года назад

    Benefited a lot

  • @x_ma_ryu_x
    @x_ma_ryu_x 2 года назад +1

    Thanks for the tutorial. I found the noises very cringe.

  • @undisclosedmusic4969
    @undisclosedmusic4969 4 года назад +3

    Swish: activation function. Swift: programming language. More homework, less sound effects 😀

  • @ehsankhorasani_
    @ehsankhorasani_ 3 года назад

    good job thank you

  • @igorpostoev2077
    @igorpostoev2077 3 года назад

    Thanks man)

  • @ShinsekaiAcademy
    @ShinsekaiAcademy 3 года назад

    thanks my man.

  • @VinVin21969
    @VinVin21969 3 года назад

    plot twist: its not that the boundary no longer changes, the vanishing gradient cause the gradient to be very small , that we can assume it is negligible

  • @harishp6611
    @harishp6611 4 года назад

    yes! I liked it. Keep it up.

  • @farhanfadhilah5247
    @farhanfadhilah5247 4 года назад

    this is helpful, thanks :)

  • @francycharuto
    @francycharuto 3 года назад

    gold, gold, gold.

  • @jhondavidson2049
    @jhondavidson2049 3 года назад

    Amazing!!!!!!!!!!!!!!!!!

  • @Acampandoconfrikis
    @Acampandoconfrikis 3 года назад

    thanks brah

  • @abdussametturker
    @abdussametturker 3 года назад

    thx. subscribed

  • @zaidalattar2483
    @zaidalattar2483 3 года назад

    Perfect explanation!... Thanks

  • @keanuhero303
    @keanuhero303 4 года назад

    What's the +1 node on each layer?

  • @splytrz
    @splytrz 4 года назад

    I've been trying to make a convolutional autoencoder for mnist, and at first I used sigmoid activation on the convolutional part and it couldn't make anything better than just a black screen on the output but when I removed all activation functions it worked well. Does anyone have any idea why that happened?

    • @fatgnome
      @fatgnome 4 года назад

      Are the outputs properly scaled back to pixel values after being squeezed by sigmoid?

    • @splytrz
      @splytrz 4 года назад

      @@fatgnome Yes. Otherwise the output wouldn't match with images. Also I checked model.summary() every time I made changes to the model.

  • @Edu888777
    @Edu888777 3 года назад

    I still dont understand what a activation function is

  • @abd0ulz942
    @abd0ulz942 Год назад

    learn Activation Functions with Dora
    but I honestly it is good

  • @ExMuslimProphetMuhammad
    @ExMuslimProphetMuhammad 3 года назад +1

    Bhai video shayad accha hoga but thumbnail pe Teri pic dekhke hi kafi log click na kare, I'm here just to let you know this:avoid putting your face on thumbnail or in video as no one is interested in seeing the educator while watching technical videos.

    • @CodeEmporium
      @CodeEmporium  3 года назад +1

      You clicked. That's all i care about ;)

  • @harshmankodiya9397
    @harshmankodiya9397 3 года назад

    gr8 exp

  • @frankerz8339
    @frankerz8339 3 года назад

    nice

  • @t.lnnnnx
    @t.lnnnnx 3 года назад

    followeeeed

  • @히안-p3j
    @히안-p3j 2 месяца назад

    I don't understand..

  • @wadyn95
    @wadyn95 4 года назад

    Wtf what's the sound of pictures...