The Fundamental Problem with Neural Networks - Vanishing Gradients

Поделиться
HTML-код
  • Опубликовано: 29 ноя 2024

Комментарии • 30

  • @HoHoHaHaTV
    @HoHoHaHaTV Год назад +5

    My respect for Ritvik goes exponentially high each time I see his explanations. He can beat any prof when it comes to explaining these things. I just feel so lucky to have come across this channel

  • @erickmacias5153
    @erickmacias5153 2 года назад +11

    I just found this channel like 3 days ago and has been very useful and interesting! Thank you very much

  • @geoffreyanderson4719
    @geoffreyanderson4719 2 года назад +8

    Yes, great topic. Absolutely, some top ways to fight off vanishing gradients are relu (and other advanced activation functions), and residual nets (skip nets).
    It's also quite possible to add your own custom resnets to any deep network; it's not neessary to use only the resnet blocks that the framework tool provides. Tensorflow's functional api makes it pretty straightforward to add skip layers plus the necessary aggregation layer to combine the main path plus the skip path, to layer types other than convolutional and computer vision specific. So while resnet was originally designed with compujter vision purpose, it's not married to that at all.
    Additional solid help to fight those vanishing gradients off, are
    - batch normalization, which basically conditions the signal to the next hidden layer; and
    - smarter initialization of weights in all your layers, like He Initialization when using Relu (and other initializations that are suited to other activation functions).

  • @arontapai5586
    @arontapai5586 2 года назад +5

    Very informative!!! Are you planning to make videos on RNN (LSTM...) and other type of network models?

    • @ritvikmath
      @ritvikmath  2 года назад +9

      Yup they'll likely be coming out within the next month!

  • @shubhampandilwar8448
    @shubhampandilwar8448 2 года назад +3

    Very well explained. I alwayd had trouble understanding this topic, but this video helped me to comprehend the topic intuitively.

  • @pushkarparanjpe
    @pushkarparanjpe Год назад

    Some questions inspired by your video.
    - Earliest layers see the severest form of vanishing gradient. Do later layers undergo vanishing gradient sequentially?
    - So what if the earliest layer weights get stuck; learning can still happen due to weight updates at later layers, right ?
    - Can we use vanishing gradient for neural architecture depth search ? Start with many layers; train; identify the early layers that got stuck; discard them and keep a shallower network. This sounds like there is something wrong with it; will this work ?

  • @CptJoeCR
    @CptJoeCR 2 года назад +2

    Love it as always!
    May I suggest a future video topic: Bayesian Change Point Detection. BCP has so many components that you already have covered (sampling techniques, MCMC, Bayesian statistics) that I think it would make for a great video! (and I'm still slightly confused how it all comes together in the end! lol)

  • @siddhantrai7529
    @siddhantrai7529 2 года назад +4

    Very well addressed, thank you for the video.

  • @ChocolateMilkCultLeader
    @ChocolateMilkCultLeader 2 года назад +1

    You can also add some bias to your networks in one of the intermediate layers

  • @shreypatel9379
    @shreypatel9379 2 года назад

    One of the best channels i've found on youtube (along the lines of 3Blue 1Brown and rest such channels). Keep up the good work

  • @marvinbcn2
    @marvinbcn2 2 года назад +1

    Excellent video. You perfectly convey the intuition. Only one doubt left: I cannot see why ReLu is a good solution, given that gradient vanishes to 0 in case of negative values. How do we compute backpropagation then?

  • @tomdierickx5014
    @tomdierickx5014 2 года назад +1

    Another gem! Great insights! 🔬

  • @listakurniawati8946
    @listakurniawati8946 2 года назад

    Omg thank you so much!!! You saved my thesis ❤❤

  • @posthocprior
    @posthocprior 2 года назад +1

    Excellent explanation.

  • @MachineLearningStreetTalk
    @MachineLearningStreetTalk 2 года назад

    Awesome video! Nice channel

  • @christophersolomon633
    @christophersolomon633 2 года назад

    Outstanding Video. Really well explained.

  • @ChocolateMilkCultLeader
    @ChocolateMilkCultLeader 2 года назад

    Never heard of someone call it the most important problem. Interesting view point.

    • @geoffreyanderson4719
      @geoffreyanderson4719 2 года назад

      Great depth is where you get the most exponentiation effect, thus the worst vanishing or explosion. But great depth is where the bulk of the power of deep neural nets comes from.

  • @seeking9145
    @seeking9145 2 года назад

    Super nice explanation!!!

  • @pushkarparanjpe
    @pushkarparanjpe Год назад

    Thanks once again !

  • @codematrix
    @codematrix Год назад

    Another way to help solve the vanishing gradient is to adjust your learning rate.

  • @aliasgher5272
    @aliasgher5272 2 года назад

    Maan gaya Sir G

  • @n.m.c.5851
    @n.m.c.5851 Год назад

    thanks

  • @xiaoweidu4667
    @xiaoweidu4667 Год назад

    as always