Это видео недоступно.
Сожалеем об этом.

Tutorial 7- Vanishing Gradient Problem

Поделиться
HTML-код
  • Опубликовано: 21 июл 2019
  • Vanishing Gradient Problem occurs when we try to train a Neural Network model using Gradient based optimization techniques. Vanishing Gradient Problem was actually a major problem 10 years back to train a Deep neural Network Model due to the long training process and the degraded accuracy of the Model.
    Below are the various playlist created on ML,Data Science and Deep Learning. Please subscribe and support the channel. Happy Learning!
    Deep Learning Playlist: • Tutorial 1- Introducti...
    Data Science Projects playlist: • Generative Adversarial...
    NLP playlist: • Natural Language Proce...
    Statistics Playlist: • Population vs Sample i...
    Feature Engineering playlist: • Feature Engineering in...
    Computer Vision playlist: • OpenCV Installation | ...
    Data Science Interview Question playlist: • Complete Life Cycle of...
    You can buy my book on Finance with Machine Learning and Deep Learning from the below url
    amazon url: www.amazon.in/...
    🙏🙏🙏🙏🙏🙏🙏🙏
    YOU JUST NEED TO DO
    3 THINGS to support my channel
    LIKE
    SHARE
    &
    SUBSCRIBE
    TO MY RUclips CHANNEL

Комментарии • 196

  • @kumarpiyush2169
    @kumarpiyush2169 4 года назад +123

    HI Krish.. dL/dW'11= should be [dL/dO21. dO21/dO11. dO11/dW'11] +
    [dL/dO21. dO21/dO12. dO12/dW'11] as per the last chain rule illustration. Please confirm

    • @rahuldey6369
      @rahuldey6369 3 года назад +12

      ...but O12 is independent of W11,in that case won't the 2nd term be zero?

    • @RETHICKPAVANSE
      @RETHICKPAVANSE 3 года назад +1

      wrong bruh

    • @ayushprakash3890
      @ayushprakash3890 3 года назад +2

      we don't
      have the second term

    • @Ajamitjain
      @Ajamitjain 3 года назад +1

      Can anyone clarify this? I too have this question.

    • @grahamfernando8775
      @grahamfernando8775 3 года назад +29

      @@Ajamitjain dL/dW'11= should be [dL/dO21. dO21/dO11. dO11/dW'11]

  • @mahabir05
    @mahabir05 4 года назад +34

    I like how you explain and end your class "never give up " It very encouraging

  • @Xnaarkhoo
    @Xnaarkhoo 3 года назад +16

    many years ago in the college I was enjoy watching videos from IIT - before the mooc area, India had and still have many good teachers ! It brings me joy to see that again. Seems Indians have a gene of pedagogy

  • @Vinay1272
    @Vinay1272 2 года назад +6

    I have been taking a well-known world-class course on AI and ML since the past 2 years and none of the lecturers have made me so interested in any topic as much as you have in this video. This is probably the first time I have sat through a 15-minute lecture without distracting myself. What I realise now is that I didn't lack motivation or interest, nor that I was lazy - I just did not have lecturers whose teaching inspired me enough to take interest in the topics, yours did.
    You have explained the vanishing gradient problem so very well and clear. It shows how strong your concepts are and how knowledgeable you are.
    Thank you for putting out your content here and sharing your knowledge with us. I am so glad I found your channel. Subscribed forever.

  • @tosint
    @tosint 4 года назад +11

    I hardly comment on videos, but this is a gem. One of the best videos explaining vanishing gradients problems.

  • @PeyiOyelo
    @PeyiOyelo 4 года назад +43

    Sir or As my Indian Friends say, "Sar", you are a very good teacher and thank you for explaining this topic. It makes a lot of sense. I can also see that you're very passionate however, the passion kind of makes you speed up the explanation a bit making it a bit hard to understand sometimes. I am also very guilty of this when I try to explain things that I love. Regardless, thank you very much for this and the playlist. I'm subscribed ✅

    • @amc8437
      @amc8437 3 года назад +3

      Consider reducing playback speed.

  • @ltoco4415
    @ltoco4415 4 года назад +7

    Thank you sir for making this misleading concept crystal clear. Your knowledge is GOD level 🙌

  • @lekjov6170
    @lekjov6170 4 года назад +36

    I just want to add this mathematically, the derivative of the sigmoid function can be defined as:
    *derSigmoid = x * (1-x)*
    As Krish Naik well said, we have our maximum when *x=0.5*, giving us back:
    *derSigmoid = 0.5 * (1-0.5) --------> derSigmoid = 0.25*
    That's the reason the derivative of the sigmoid function can't be higher than 0.25

    • @ektamarwaha5941
      @ektamarwaha5941 4 года назад

      COOL

    • @thepsych3
      @thepsych3 4 года назад

      cool

    • @tvfamily6210
      @tvfamily6210 4 года назад +13

      should be: derSigmoid(x) = Sigmoid(x)[1-Sigmoid(x)], and we know it reaches maximum at x=0. Plugging in: Sigmoid(0)=1/(1+e^(-0))=1/2=0.5, thus derSigmoid(0)=0.5*[1-0.5]=0.25

    • @benvelloor
      @benvelloor 4 года назад

      @@tvfamily6210 Thank you!

    • @est9949
      @est9949 4 года назад

      I'm still confused. The weight w should be in here somewhere. This seems to be missing w.

  • @bhavikdudhrejiya4478
    @bhavikdudhrejiya4478 4 года назад

    Very nice way to explain.
    Learned from this video-
    1. Getting the error (Actual Output - Model Output)^2
    2. Now We have to reduce an error i.e Backpropagation, We have to find a new weight or a new variable
    3. Finding New Weight = Old weight x Changes in the weight
    4. Change in the Weight = Learning rate x d(error / old weight)
    5. After getting a new weight is as equals to old weight due to derivate of Sigmoid ranging between 0 to 0.25 so there is no update in a new weight
    6. This is a vanishing gradient

  • @sapnilpatel1645
    @sapnilpatel1645 Год назад +1

    so far best explanation about vanishing gradient.

  • @gultengorhan2306
    @gultengorhan2306 2 года назад +1

    You are teaching better than many other people in this field.

  • @marijatosic217
    @marijatosic217 3 года назад +3

    I am amazed by the level of energy you have! Thank you :)

  • @rushikeshmore8890
    @rushikeshmore8890 4 года назад

    Kudos sir ,am working as data analyst read lots of blogs , watched videos but today i cleared the concept . Thanks for The all stuff

  • @satyadeepbehera2841
    @satyadeepbehera2841 4 года назад +3

    Appreciate your way of teaching which answers fundamental questions.. This "derivative of sigmoid ranging from 0 to 0.25" concept was nowhere mentioned.. thanks for clearing the basics...

    • @mittalparikh6252
      @mittalparikh6252 3 года назад

      Look for Mathematics for Deep Learning. It will help

  • @al3bda
    @al3bda 3 года назад +1

    oh my god you are a good teacher i really fall in love how you explain and simplify things

  • @deepthic6336
    @deepthic6336 4 года назад

    I must say this, normally I am kinda person who prefers to study on own and crack it. Never used to listen to any of the lectures till date because I just don't understand and I dislike the way they explain without passion(not all though). But, you are a gem and I can see the passion in your lectures. You are the best Krish Naik. I appreciate it and thank you.

  • @aidenaslam5639
    @aidenaslam5639 4 года назад +3

    Great stuff! Finally understand this. Also loved it when you dropped the board eraser

  • @vikrantchouhan9908
    @vikrantchouhan9908 2 года назад +2

    Kudos to your genuine efforts. One needs sincere efforts to ensure that the viewers are able to understand things clearly and those efforts are visible in your videos. Kudos!!! :)

  • @piyalikarmakar5979
    @piyalikarmakar5979 3 года назад

    One of the best vedio on clarifying Vanishing Gradient problem..Thank you sir..

  • @vishaljhaveri6176
    @vishaljhaveri6176 2 года назад

    Thank you, Krish SIr. Nice explanation.

  • @koraymelihyatagan8111
    @koraymelihyatagan8111 2 года назад

    Thank you very much, I was wandering around the internet to find such an explanatory video.

  • @classictremonti7997
    @classictremonti7997 3 года назад

    So happy I found this channel! I would have cried if I found it and it was given in Hindi (or any other language than English)!!!!!

  • @venkatshan4050
    @venkatshan4050 2 года назад +1

    Marana mass explanation🔥🔥. Simple and very clearly said.

  • @meanuj1
    @meanuj1 5 лет назад +4

    Nice presentation..so much helpful...

  • @mittalparikh6252
    @mittalparikh6252 3 года назад +1

    Overall got the idea, that you are trying to convey. Great work

  • @maheshsonawane8737
    @maheshsonawane8737 Год назад

    Very nice now i understand why weights doesn't update in RNN. The main point is derivative of sigmoid is between 0 and 0.25. Vanishing gradient is associated with only sigmoid function. 👋👋👋👋👋👋👋👋👋👋👋👋

  • @shmoqe
    @shmoqe 2 года назад

    Great explanation, Thank you!

  • @manujakothiyal3745
    @manujakothiyal3745 4 года назад +1

    Thank you so much. The amount of effort you put is commendable.

  • @benoitmialet9842
    @benoitmialet9842 2 года назад +1

    Thank you so much, great quality content.

  • @benvelloor
    @benvelloor 4 года назад +1

    Very well explained. I can't thank you enough for clearing all my doubts!

  • @sumeetseth22
    @sumeetseth22 4 года назад

    Love your videos, I have watched and taken many courses but no one is as good as you

  • @skiran5129
    @skiran5129 2 года назад

    I'm lucky to see this wonderful class.. Tq..

  • @himanshubhusanrath2492
    @himanshubhusanrath2492 2 года назад

    One of the best explanations of vanishing gradient problem. Thank you so much @KrishNaik

  • @YashSharma-es3lr
    @YashSharma-es3lr 3 года назад

    very simple and nice explanation . I understand it in first time only

  • @MauiRivera
    @MauiRivera 3 года назад

    I like the way you explain things, making them easy to understand.

  • @prerakchoksi2379
    @prerakchoksi2379 4 года назад +1

    I am doing deep learning specialization, feeling that this is much better than that

  • @naresh8198
    @naresh8198 Год назад

    crystal clear explanation !

  • @elielberra2867
    @elielberra2867 Год назад

    Thank you for all the effort you put into your explanations, they are very clear!

  • @faribataghinezhad
    @faribataghinezhad 2 года назад

    Thank you sir for your amazing video. that was great for me.

  • @MrSmarthunky
    @MrSmarthunky 4 года назад

    Krish.. You are earning a lot of Good Karmas by posting such excellent videos. Good work!

  • @MsRAJDIP
    @MsRAJDIP 5 лет назад +2

    Tommorow I have interview, clearing all my doubts from all your videos 😊

  • @classictremonti7997
    @classictremonti7997 3 года назад

    Krish...you rock brother!! Keep up the amazing work!

  • @khiderbillal9961
    @khiderbillal9961 3 года назад +1

    thanks sir you really hepled me

  • @nazgulzholmagambetova1198
    @nazgulzholmagambetova1198 2 года назад

    great video! thank you so much!

  • @yousufborno3875
    @yousufborno3875 4 года назад

    You should get Oscar for your teaching skills.

  • @tonnysaha7676
    @tonnysaha7676 3 года назад

    Thank you thank you thank you sir infinite times🙏.

  • @aaryankangte6734
    @aaryankangte6734 2 года назад

    Sir thank u for teaching us all the concepts from basics but just one request is that if there is a mistake in ur videos then pls rectify it as it confuses a lot of people who watch these videos as not everyone sees the comment section and they just blindly belive what u say. Therefore pls look into this.

  • @sekharpink
    @sekharpink 5 лет назад +33

    Derivative of loss with respect to w11 dash you specified incorrectly, u missed derivative of loss with respect to o21 in the equation. Please correct me if iam wrong.

    • @sekharpink
      @sekharpink 5 лет назад

      Please reply

    • @ramleo1461
      @ramleo1461 5 лет назад

      Evn I hv this doubt

    • @krishnaik06
      @krishnaik06  5 лет назад +28

      Apologies for the delay...I just checked the video and yes I have missed that part.

    • @ramleo1461
      @ramleo1461 5 лет назад +12

      @@krishnaik06Hey!,
      U dnt hv to apologise, on the contrary u r dng us a favour by uploading these useful videos, I was a bit confused and wanted to clear my doubt that all, thank you for the videos... Keep up the good work!!

    • @rajatchakraborty2058
      @rajatchakraborty2058 4 года назад

      @@krishnaik06 I think you have also missed the w12 part in the derivative. Please correct me if I am wrong

  • @swapwill
    @swapwill 4 года назад

    The way you explain is just awesome

  • @nabeelhasan6593
    @nabeelhasan6593 2 года назад

    Very nice video sir , you explained very well the inner intricacies of this problem

  • @yoyomemory6825
    @yoyomemory6825 4 года назад +1

    Very clear explanation, thanks for the upload.. :)

  • @b0nnibell_
    @b0nnibell_ 4 года назад

    you sir made neural network so much fun!

  • @adityashewale7983
    @adityashewale7983 Год назад

    hats off to you sir,Your explanation is top level, THnak you so much for guiding us...

    • @DEVRAJ-np2og
      @DEVRAJ-np2og Месяц назад

      do u completed his full playlist?

  • @hiteshyerekar2204
    @hiteshyerekar2204 5 лет назад +4

    Nice video Krish.Please make practicle based video on gradient decent,CNN,RNN.

  • @daniele5540
    @daniele5540 4 года назад +1

    Great tutorial man! Thank you!

  • @nikunjlahoti9704
    @nikunjlahoti9704 2 года назад

    Great Lecture

  • @skviknesh
    @skviknesh 3 года назад +1

    I understood it. Thanks for the great tutorial!
    My query is:
    weight vanishes when respect to more layers. When new weight ~= old weight result becomes useless.
    what would the O/P of that model look like (or) will we even achieve global minima??

  • @GunjanGrunge
    @GunjanGrunge 2 года назад

    that was very well explained

  • @ambreenfatimah194
    @ambreenfatimah194 3 года назад

    Helped a lot....thanks

  • @krishj8011
    @krishj8011 3 года назад

    Very nice series... 👍

  • @Haraharavlogs
    @Haraharavlogs 6 месяцев назад

    you are legend nayak sir

  • @abhinavkaushik6817
    @abhinavkaushik6817 3 года назад

    Thank you so much for this

  • @abdulqadar9580
    @abdulqadar9580 2 года назад

    Great efforts Sir

  • @salimtheone
    @salimtheone Год назад

    very well explained 100/100

  • @muhammadarslankahloon7519
    @muhammadarslankahloon7519 3 года назад +2

    Hello sir, why the chain rule explained in this video is different from the very last chain rule video. kindly clearly me and thanks for such an amazing series on deep learning.

  • @susmitvengurlekar
    @susmitvengurlekar 3 года назад

    Understood completely! If weights hardly change, no point in training and training. But I have got a question, where can I use this knowledge and understanding I just acquired ?

  • @manikosuru5712
    @manikosuru5712 5 лет назад +4

    As usual extremely good outstanding...
    And a small request can expect this DP in coding(python) in future??

  • @sunnysavita9071
    @sunnysavita9071 5 лет назад

    your videos are very helpful ,good job and good work keep it up...

  • @Thriver21
    @Thriver21 Год назад

    nice explanation.

  • @spicytuna08
    @spicytuna08 2 года назад

    you teach better than ivy league professors. what a waste of money spending $$$ on college.

  • @AnirbanDasgupta
    @AnirbanDasgupta 3 года назад

    excellent video

  • @nola8028
    @nola8028 2 года назад

    You just earned a +1 subscriber ^_^
    Thank you very much for the clear and educative video

  • @sowmyakavali2670
    @sowmyakavali2670 3 года назад +1

    Hi krish
    everyone says that Wnew = Wold - n * dL/dWold
    theoritically we know that dL/dWold means slope
    where as in practical scenario
    L is a single scalar value
    Wold is also a single scalar value
    Then how dL/dWold is calculating ???
    And also coming to the activation function , you are explaining it theoritically ,
    can you explain it by taking practical values ? , and don't tell it by taking predefined function or module ,
    bcz we know how to find a module and import it and how to use it ,
    but we don't know practical

  • @Kabir_Narayan_Jha
    @Kabir_Narayan_Jha 5 лет назад +1

    This video is amazing and you are amazing teacher thanks for sharing such amazing information
    Btw where are you from banglore?

  • @naimamushfika1167
    @naimamushfika1167 Год назад

    nice explanation

  • @anusuiyatiwari1800
    @anusuiyatiwari1800 3 года назад

    Very interesting

  • @neelanshuchoudhary536
    @neelanshuchoudhary536 4 года назад +1

    very nice explanation,,great :)

  • @AA-yk8zi
    @AA-yk8zi 3 года назад

    Thank you so much

  • @lalithavanik5022
    @lalithavanik5022 3 года назад

    Nice expalnation sir

  • @hokapokas
    @hokapokas 5 лет назад +4

    Good job bro as usual... Keep up the good work.. I had a request of making a video on implementing back propagation. Please make a video for it.

    • @krishnaik06
      @krishnaik06  5 лет назад +1

      Already the video has been made.please have a look on my deep learning playlist

    • @hokapokas
      @hokapokas 5 лет назад

      @@krishnaik06 I have seen that video but it's not implemented in python.. If you have a notebook you can refer me to pls

    • @krishnaik06
      @krishnaik06  5 лет назад +3

      With respect to implementation with python please wait till I upload some more videos

  • @amitdebnath2207
    @amitdebnath2207 3 месяца назад

    Hats Off Brother

  • @BalaguruGupta
    @BalaguruGupta 3 года назад

    Thanks a lot sir for the wonderful explanation :)

  • @magicalflute
    @magicalflute 4 года назад

    Very well explained. Vanishing gradient problem as per my understanding is that, it is not able to perform the optimizer job (to reduce the loss) as old weight and new weights will be almost equal. Please correct me, if i am wrong. Thanks!!

  • @melikad2768
    @melikad2768 3 года назад +1

    Thank youuuu, its really great:)

  • @arunmeghani1667
    @arunmeghani1667 3 года назад

    great video and great explanation

  • @gouthamkarakavalasa4267
    @gouthamkarakavalasa4267 Год назад

    Gradient Descent will be applied on Cost function right ?-1/m Σ (Y*log(y_pred) + (1-y)* log(1-y_pred))... in this case if they had applied on the activation function, how the algo will come to global minima.

  • @feeham
    @feeham 2 года назад

    Thank you !!

  • @suryagunda4038
    @suryagunda4038 3 года назад

    May god bless you ..

  • @raj4624
    @raj4624 2 года назад

    superb

  • @louerleseigneur4532
    @louerleseigneur4532 3 года назад

    Thanks krish

  • @naughtyrana4591
    @naughtyrana4591 4 года назад

    Guruvar ko pranam🙏

  • @winviki123
    @winviki123 5 лет назад +2

    Could you please explain why bias is needed in neural networks along with weights?

    • @Rising._.Thunder
      @Rising._.Thunder 4 года назад

      it is because when you want to control or fix the output of a given neuron within a certain range, for example, if the neuron is always giving inputs between 9 and 10, you can put a bias =-9 so as to make the neuron output between 0 and 1

  • @khiderbillal9961
    @khiderbillal9961 3 года назад +1

    so relu function is the best solution for deleting vanishing gradient descent

  • @nirmalroy1738
    @nirmalroy1738 4 года назад

    super video...extremely well explained.

  • @narsingh2801
    @narsingh2801 4 года назад

    You are just amazing. Thnx

  • @gaurawbhalekar2006
    @gaurawbhalekar2006 4 года назад

    excellent explanation sir

  • @karth12399
    @karth12399 4 года назад +3

    Sir you are saying derivative of sigmoid is 0 to 0.25. I understand it.
    But how that will imply derivative of O21 /derivative of 011 should be less than 0.25.
    Could you please help me understand that assumption

    • @rish_hyun
      @rish_hyun 4 года назад

      He agreed that he did it wrong subconsciously
      I found his comment somewhere in this chat

    • @jsridhar72
      @jsridhar72 3 года назад

      The output of every neuron in a layer is the Sigmoid of weighted sum of input. Since sigmoid is applied as the activation function in every neuron(here O21 is output after applying sigmoid function), the derivative should be between 0 and 0.25.

  • @dhananjayrawat317
    @dhananjayrawat317 4 года назад

    best explanation. Thanks man

  • @shahidabbas9448
    @shahidabbas9448 4 года назад +1

    Sir i'm really confusing about the actual y value please can you tell about that. i thought it would be our input value but here input value is so many with one predicted
    output

  • @gowthamprabhu122
    @gowthamprabhu122 4 года назад +1

    Can someone please explain why the derivative of each parent layer reduces ? i.e why does layer two have lower derivative of O/P with respect to its I/P?