Which Activation Function Should I Use?

Поделиться
HTML-код
  • Опубликовано: 6 авг 2024
  • All neural networks use activation functions, but the reasons behind using them are never clear! Let's discuss what activation functions are, when they should be used, and what the difference between them is.
    Sample code from this video:
    github.com/llSourcell/Which-A...
    Please subscribe! And like. And comment. That's what keeps me going.
    More Learning resources:
    www.kdnuggets.com/2016/08/role...
    cs231n.github.io/neural-networ...
    www.quora.com/What-is-the-rol...
    stats.stackexchange.com/quest...
    en.wikibooks.org/wiki/Artific...
    stackoverflow.com/questions/9...
    papers.nips.cc/paper/874-how-...
    neuralnetworksanddeeplearning....
    / activation-functions-i...
    / mathematical-foundatio...
    Join us in the Wizards Slack channel:
    wizards.herokuapp.com/
    And please support me on Patreon:
    www.patreon.com/user?u=3191693
    Follow me:
    Twitter: / sirajraval
    Facebook: / sirajology Instagram: / sirajraval Instagram: / sirajraval
    Signup for my newsletter for exciting updates in the field of AI:
    goo.gl/FZzJ5w
    Hit the Join button above to sign up to become a member of my channel for access to exclusive content! Join my AI community: chatgptschool.io/ Sign up for my AI Sports betting Bot, WagerGPT! (500 spots available):
    www.wagergpt.co

Комментарии • 462

  • @Skythedragon
    @Skythedragon 7 лет назад +261

    Thanks, my biological neural network now has learned how to choose activation functions!

    • @SirajRaval
      @SirajRaval  7 лет назад +23

      awesome

    • @GilangD21
      @GilangD21 6 лет назад +1

      Hahahah

    • @rs-tarxvfz
      @rs-tarxvfz 4 года назад

      Remember whole is not in its parts. Whole behaviour is different from its elements

  • @StephenRoseDuo
    @StephenRoseDuo 7 лет назад +35

    From experience I'd recommend in order, ELU (exponential linear units) >> leaky ReLU > ReLU > tanh, sigmoid. I agree that you basically never have an excuse to use tanh or sigmoid.

    • @gorkemvids4839
      @gorkemvids4839 6 лет назад +1

      I'm using tanh but i always read saturated neurons as 0.95 or -0.95 while backpropagating so gradient doesnt disapear.

    • @JorgetePanete
      @JorgetePanete 5 лет назад

      @@gorkemvids4839 doesn't*

  • @TheCodingTrain
    @TheCodingTrain 7 лет назад +203

    Great video, super helpful!

    • @SirajRaval
      @SirajRaval  7 лет назад +29

      thx Dan love u

    • @eointolster
      @eointolster 7 лет назад +13

      You are both awesome

    • @eointolster
      @eointolster 7 лет назад +4

      You are both awesome

    • @terigopula
      @terigopula 6 лет назад +2

      I absolutely love the energy you both have in your videos :)

    • @silverreyes7912
      @silverreyes7912 6 лет назад +2

      Be soo cool if both did a collab video!

  • @cali4nicated
    @cali4nicated 4 года назад +1

    Wow, man, this is a seriously amazing video. Very entertaining and informative at the same time. Keep up great work! I'm now watching all your other videos :)

  • @BOSS-bk2jx
    @BOSS-bk2jx 6 лет назад +3

    I love you man, 4 f***** months passed and my stupid prof. could not explain it as you did, not even partially. keep up the good work.
    Thanks a lot

  • @quant-trader-010
    @quant-trader-010 2 года назад +1

    I really like your videos as they strike the very sweet spot between being concise and precise!

  • @drhf1214
    @drhf1214 5 лет назад +1

    Amazing video! THank you! I've never heard of neural networks until I started my internship. This is really fascinating.

  • @pouyan74
    @pouyan74 4 года назад +1

    Dude! DUUUDE! You are AMAZING! I've read multiple papers already, but now the stuff are really making sense to me!

  • @gydo1942
    @gydo1942 6 лет назад +1

    this guy needs more subs. Finally a good explanation. Thanks man!

  • @kalreensdancevelventures5512
    @kalreensdancevelventures5512 3 года назад

    By far the best videos of Machine Learning Ive watched. Amazing work! Love the energy and Vibe!

  • @CristianMargiotta
    @CristianMargiotta 7 лет назад +1

    Valuable introduction to generative methods for establishement of sense in artificial intelligence. A great way of bringing things together and express in one single indescret language.
    Thanks Siraj Raval, great!

  • @prateekraghuwanshi5645
    @prateekraghuwanshi5645 7 лет назад

    Super clear & concise. Amazing simplicity. You Rock !!!

  • @grainfrizz
    @grainfrizz 6 лет назад

    I gained a lot of understanding and got that "click" moment after you explained linear vs non linearity. Thanks man. Keep up w/ the dank memes. My dream is that some day, I'd see a collab video between you, Dan Shiffman, and 3Blue1Brown. Love lots from Philippines!

  • @slowcoding
    @slowcoding 5 лет назад

    Cool. Your lecture cleared the cloud in my brain. I now have better understanding about the whole picture of the activation function.

  • @rafiakhan8721
    @rafiakhan8721 Год назад +1

    Really enjoyed the video as you add subtle humor in between.

  • @jb.1412
    @jb.1412 7 лет назад

    Hard stuff made easy. Congrats to a great video! Keep it up, mate!

  • @supremehype3227
    @supremehype3227 5 лет назад +1

    Excellent and entertaining at a high level of entropy reduction. A fan.

  • @WillTesler
    @WillTesler 6 лет назад

    Love this video so much. Helped me so much with my LSTM RNN network

  • @MrJnsc
    @MrJnsc 6 лет назад

    Learning more from your videos than all my college classes together!

  • @gigeg7708
    @gigeg7708 5 лет назад +1

    Super Siraj Raval!!!!! Great compilation Bro.

  • @gowriparameswaribellala4423
    @gowriparameswaribellala4423 5 лет назад

    Great explanation of activation functions. Now I need to tweak my model.

  • @hussain5755
    @hussain5755 7 лет назад +1

    just watched your speech @TNW Conference 2017, I am really happy that you are growing every day, You are my motivation and my idol. proud of you love you

  • @plouismarie
    @plouismarie 7 лет назад

    @Siraj
    NN can potentially grow in so many directions, you will always have something to explain to us.
    As you used to say 'this is only the beginning'.
    And ohh maaan ! you're so clear when you explain NN ;)
    Please keep doing what you're doing again and again and again...and again !
    You are for NN, what Neil de Grass is for astrophysics.
    thx for sharing the github source that detail each activation source

  • @TuyoIsaza
    @TuyoIsaza 5 лет назад +1

    Dude.... exactly what i needed.. Thanks again!

  • @nicodaunt
    @nicodaunt 5 лет назад

    digging your vids and enthusiasm from Portland Oregon!

  • @CrazySkillz15
    @CrazySkillz15 5 лет назад

    Thank you so much for such informative content explained with such clarity after taking so much efforts. Appreciate it! :) :D

  • @joshiyogendra
    @joshiyogendra 6 лет назад

    this guys makes learning so much fun!

  • @waleedtahir2072
    @waleedtahir2072 7 лет назад +1

    Dank memes and dank learning, both in the same video. Who would have thought. Thanks Raj!

  • @akhilguptavibrantjava
    @akhilguptavibrantjava 6 лет назад

    Crystal clear explanation, just loved it

  • @venkateshkolpakwar5757
    @venkateshkolpakwar5757 5 лет назад

    presentation is good,learned how to choose the activation function and thanks for the video,it helped a lot

  • @JeromeFortias
    @JeromeFortias 6 лет назад

    just a great training I LOVE how you did it :-)

  • @MohammedAli-pg2fw
    @MohammedAli-pg2fw 5 лет назад

    Thanks @Siraj. What amazing and easy to digest explanation.

  • @captainwalter
    @captainwalter 4 года назад

    hey Siraj- just wanted to say thanks again. Apparently you got carried away and got busted being sneaky w crediting. I still respect your hustle and hunger. I think your means justify your ends- if you didn't make the moves that you did to prop up the image etc, I probably wouldn't have found you and your resources. At the end of the day, you are in fact legit bc you really bridge the gap of 1) knowing what ur talking about (i hope) 2) empathizing w someone learning this stuff (needed to break it down) 3) raising awareness about low hanging fruit that ppl outside the realm might not be aware of. Thank you again!!!!

  • @jennycotan7080
    @jennycotan7080 8 месяцев назад

    Sir, likes for your memetics and fun explanation! All the spice you add to this video might bring some tech kids like me to the realm of Machine Learning!
    (And today, a mysterious graph sheet with the plot of max(0,x), a.k.a. ReLU function, appeared in my High School Maths notebook, between the pages about piecewise functions, after I get up and arrived at school.)

  • @anjali7778
    @anjali7778 4 года назад

    Woah ! thanks man, you made things so clear !!!

  • @hectoralvarorojas1918
    @hectoralvarorojas1918 7 лет назад

    Hi Siraj:
    Your videos are great!
    CONGRATULATIONS!

  • @dipeshbhandari4746
    @dipeshbhandari4746 4 года назад

    Your channel is GOLD!

  • @rahulsbhatt
    @rahulsbhatt 5 лет назад

    Entire video is a GEM 💎
    Totally makes sense to use ML

  • @LongDanzi
    @LongDanzi 4 года назад

    Thanks, that was super helpful!

  • @nrewik
    @nrewik 7 лет назад

    Thanks Siraj. Awesome explanation.
    I'm am new to deep learning. It would be great if you can make videos about regularization, and cost functions.

  • @madhumithak3338
    @madhumithak3338 3 года назад

    your teaching way is so cool and crazy :)

  • @dyjiang1350
    @dyjiang1350 6 лет назад

    This video is very easy to understand!

  • @robertodisco
    @robertodisco 3 года назад

    Awesome video! Thanks!

  • @yatinarora9650
    @yatinarora9650 4 года назад

    Noi i understood wtf we are using this activation function, til now i was just using them now I know why am using them, thanks siraj

  • @jindagi_ka_safar
    @jindagi_ka_safar 5 лет назад

    Great insight on Activation Functions , thanks

  • @jflow5601
    @jflow5601 4 года назад

    Thanks, very nice explanation.

  • @guilhermeabreu3131
    @guilhermeabreu3131 3 года назад

    Excellent explanation!!! You're really funny and I loved the way you explain things. Thank you!!!

  • @killthesource4740
    @killthesource4740 4 года назад +2

    I have a question to the vanishing gradient problem when using sigmoid. Could sigmoid be a more useful activation function when using shortcut connections in the NN?
    For those who don't know what these are: In a normal neural network each layer is connected with the other but it isn't directly connected to further layers. (like a Neuron n11 from layer1 is connected to a Neuron n21 in layer2. But n11 isn't connected to the Neuron n31 in the third layer). A shortcut connection would be when a connection exist between normally not connected Layers (like a connection between Neuron n11 and n31) and thus bypassing the layer in between (n21). It is still also connected to n21.

  • @gwadada6969
    @gwadada6969 2 года назад

    Awesome explanation.

  • @anithapriya5601
    @anithapriya5601 4 года назад

    Superb video! !

  • @sedthh
    @sedthh 7 лет назад +19

    but isn't RELU a linear function? you mentioned at the beginning that linear functions should be avoided as both calculating backpropagation on non-linear functions as classifying data points that do not fit a single hyperplane is easier
    or did I get the whole thing wrong?

    • @jeffwells641
      @jeffwells641 6 лет назад +14

      It's not linear because any -X sits at zero on the Y axis. "Linear" basically means "straight line". The ReLU line is bent, hard, at 0. So it's linear if you're only looking at > 0 or < 0, but if you look at the whole line it's kinked in the middle, which makes it non-linear.

    • @anshu957
      @anshu957 5 лет назад +4

      it is a piece-wise linear function which is essentially a nonlinear function. For more info, google "piece-wise linear functions".

    • @10parth10
      @10parth10 4 года назад +1

      The sparisty of the activations add to the non linearity of the neural net.

    • @UnrecycleRubdish
      @UnrecycleRubdish 2 года назад

      @@10parth10 that explanation helped. Thanks

  • @mswai5020
    @mswai5020 5 лет назад

    Loving the KEK :) Awesome Siraj :) can you do a piece on CFR+ and it's geopolitical implications?

  • @amitmauryathecoolcoder267
    @amitmauryathecoolcoder267 4 года назад

    Bro, I loved your content.

  • @akompsupport
    @akompsupport 7 лет назад

    Hey Siraj, here is a great trick: show us a neural net that can perform inductive reasoning! Great videos as always, keep them coming! Learning so much!

  • @bem7069
    @bem7069 7 лет назад

    Thank you for posting this.

  • @prasadphatak1503
    @prasadphatak1503 3 года назад

    Good Work!

  • @drip888
    @drip888 Год назад

    omg, this is the first time i am seeing his video and its quite entertaining

  • @datasciencetutorials8537
    @datasciencetutorials8537 3 года назад

    Thanks for this superb video

  • @Zerksis79
    @Zerksis79 7 лет назад

    Thanks for another great video!

  • @bazluhrman
    @bazluhrman 6 лет назад

    This is really good great teacher!

  • @harveynorman8787
    @harveynorman8787 5 лет назад

    This channel is gold! Thanks

  • @ManajitPal95
    @ManajitPal95 5 лет назад +5

    Update: There is another activation function called "elu" which is faster than "relu" when it comes to speed of training. Try it out guys! :D

  • @vijayabhaskar-j
    @vijayabhaskar-j 7 лет назад +7

    Great video! Also make a video on How to choose the number of hidden layers and number of nodes in each layer?

    • @SirajRaval
      @SirajRaval  7 лет назад +4

      will do thx

    • @TheQuickUplifts
      @TheQuickUplifts 5 лет назад

      If I understand the subject right, you'll always only need one hidden layer, because of Cover's Theorem

  • @paulbloemen7256
    @paulbloemen7256 5 лет назад +1

    1. The (activation) value of a neuron should be between 0 and 1, or? ReLu has a leaking minimum around 0, shouldn't ReLu have also a (leaking) maximum around 1?
    2. Is there one best activation function, delivering the best neural network with the least amount of effort, like the amount of tests needed, and computer power?
    3. Should weights and biases be between 0 and 1 or between -1 and 1? Or any different values?
    4. Against vanishing and exploding gradients: can this be prevented with a (leaking) correction minimum and maximum for the weights and biases? There would be some symmetry then with the activation function suggested in the first paragraph.

  • @midhunrajr372
    @midhunrajr372 5 лет назад

    Awesome explanation. +1 for creating such a big shadow over the Earth.

  • @nihaltahariya8858
    @nihaltahariya8858 7 лет назад

    Amazing video sir .......

  • @Djneckbeard
    @Djneckbeard 6 лет назад

    Great video!!

  • @satyamskillz
    @satyamskillz 4 года назад +1

    I can't control the gradient, the Best part of the video.

  • @user-xl9zr5is2b
    @user-xl9zr5is2b 7 лет назад

    this video is so helpful ! thx!

  • @mohamednoordeen6331
    @mohamednoordeen6331 7 лет назад +2

    very helpful video, thanks a lot, actually to introduce non-linearities we are introducing activation function. But how does ReLU which is linear is doing the justification over other non-linear functions ? Can you please give the correct intuition behind this ? Thanks in advance :)

  • @harshmankodiya9397
    @harshmankodiya9397 3 года назад

    Thanks for simplifying

  • @AashishKumar1
    @AashishKumar1 7 лет назад +3

    Great video Siraj. Keep up the good work

  • @jalengg
    @jalengg 4 года назад

    Fantastic video

  • @maloukemallouke9735
    @maloukemallouke9735 4 года назад

    Hi thank you so much I am wondering how you can select the right number of hidden layer to get the best results (if you can explain me please)
    Thanks

  • @akshaysreekumar1997
    @akshaysreekumar1997 6 лет назад

    Awesome video. Can you explain a bit more on why we aren't using an activation funciton in the outer layer?

  • @samuelajayi3748
    @samuelajayi3748 5 лет назад

    This was awesome

  • @cenyingyang1611
    @cenyingyang1611 2 года назад

    Curious why does ReLU avoid vanishing gradient problem? When z is below 0, since y is always 0, the gradient seems to be 0, which means the gradient vanishes? Or do I misunderstand about the vanishing gradient?

  • @alialkerdi1732
    @alialkerdi1732 5 лет назад

    Hello Sir Siraj, would you please answer the following questions?
    I am PhD student, I would use the artificial neural networks approach to measure the productive efficiency of 31 farms, and their production and cost functions as well. My questions are:
    Are 31 farms enough for applying this approach? what is the minimum number of observations that must be obtained in the research?
    is it a sufficient number? or I need more farms?
    knowing that I have many affecting independent variables.
    What type of activation functions I should use for the efficiency measurement and the production and cost functions estimation.
    Many thanks in advance.

  • @toadfrommariokart64
    @toadfrommariokart64 3 года назад

    despised the stale memes. loved the explanation

  • @WenjunLv
    @WenjunLv 5 лет назад

    great video!

  • @mohamednoordeen6331
    @mohamednoordeen6331 7 лет назад +1

    Initially we are applying activation functions to squash the output of each neuron to the range (0,1) or (1,-1). But in ReLU, the range is (0,x) and the x can take any large number of values. Can you please give the correct intuition behind this ? Thanks in advance :)

  • @evanchakrabarti3276
    @evanchakrabarti3276 6 лет назад

    Thanks, super helpful video. I've been confused about softmax... I've been implementing a basic backprop network in python and I've gotten stuck on it. I know it's a function that returns probability and makes the sum of the network's output vector 1, but I don't know how to implement it or it's derivative.

  • @nikksengaming933
    @nikksengaming933 6 лет назад

    I love watching these videos, even if I don't understand 90% of what he is saying.

  • @RastaZak
    @RastaZak 6 лет назад

    I have a question... For Sigmoid activation functions with an output close to 1, would the vanishing gradient problem still cause no signal to flow through it? Or instead would it cause the output to be fully saturated permanently? Either way it would be an issue but i'm just trying to wrap my head around this.

  • @sahand5277
    @sahand5277 5 лет назад +4

    8:44 i liked this motto on the wall.

  • @Omar-kw5ui
    @Omar-kw5ui 5 лет назад

    u covered half of what my ai principles course covered on learning in 3 and half hrs in 8 mins. nice

  • @antonylawler3423
    @antonylawler3423 7 лет назад

    Excellent, as usual.
    I think that the reason RELU hasn't been popular prior to now is that it is mathematically inelegant, in that it can't be used in commutable functions,, and a sigmoid function can.
    It does beg the question though - if RELU is being used, do we need to use the back propagation algorithm at all ? Perhaps some simpler recursive algorithm can be used.

  • @aiMonk
    @aiMonk 7 лет назад

    Which software do you use to create neural network and activation function animation like @1:15 to @2:03 and @5:27 to @5:54

  • @ilyassalhi
    @ilyassalhi 7 лет назад +3

    Siraj, ur videos inspired me to study machine learning. I've been learning python for the past month, and am looking to start playing around with more advanced stuff. Do you have any good book recommendations for machine or deep learning, or online resources that beginners should start with?

    • @SirajRaval
      @SirajRaval  7 лет назад

      aweomse. watch my playlist learn python for data science

    • @lubnaaashaikh8901
      @lubnaaashaikh8901 7 лет назад

      Siraj Raval Do you have videos on matlab using nn?

  • @DelandaBaudLacanian
    @DelandaBaudLacanian 2 года назад

    So if ReLU is best for hidden layer and softmax/linear is best for output, what is best for input layer? sorry I'm new but your video makes a lot of sense

  • @fersilvil
    @fersilvil 7 лет назад +3

    If we use GA we do not need differentiable activation functions , inclusive we can build our own function.The issue is the back propagation method , this limits the activation functions

  • @UsmanAhmed-sq9bl
    @UsmanAhmed-sq9bl 7 лет назад +2

    Siraj great video. Your views about Parametric Rectified Linear Unit (PReLU)?

  • @captainwalter
    @captainwalter 4 года назад

    i have a question Siraj- i wonder if you'll be able to hit this. Can I do a deep dive into learning / relearning math and physics topics via a programming language like python as the primary way of manipulating equations vs pencil and paper? I feel like this would be a lot faster and pay off.. or is there a good tutorial or simulation one could build where they interact with many fields in the context of a language like python?

  • @clark87
    @clark87 5 лет назад

    siraj you are a good ai teacher

  • @abhayranade5815
    @abhayranade5815 6 лет назад +1

    Excellent

  • @jchhjchh
    @jchhjchh 6 лет назад

    Hi, I am confused. ReLU will kill the neuron only during the forward pass? Or also during the backward pass?

  • @ColacX
    @ColacX 7 лет назад

    Just gotta say Siraj. You are amazing because i only understand half of what you say.

  • @caner19959595
    @caner19959595 7 лет назад

    Thank you Siraj for the video. It was actually a problem that I would like to point out in my thesis, so do you have any academic stuff describing this problem?

  • @jchhjchh
    @jchhjchh 6 лет назад

    Hi Siraj, what if I would like to use my own custom activation function which is not in Tensorflow, how can I implement this? Hope you consider to do a tutorial concerning this. Thanks! :)

  • @Rfranceli1981
    @Rfranceli1981 Год назад

    Excelent!!