Deep Learning(CS7015): Lec 3.4 Learning Parameters: Gradient Descent

Поделиться
HTML-код
  • Опубликовано: 2 окт 2024

Комментарии • 65

  • @ethanhunt7279
    @ethanhunt7279 5 лет назад +76

    The way this guy maintains the positive atmosphere while teaching is very rare for teachers. Hope we get more courses from him

  • @rohit2761
    @rohit2761 2 года назад +13

    This guy is literally the most knowledgeable and knows the art and magic of how to teach. He is just like MIT Guy Gilbert Strang. Even better, because he knows where students trap points.
    The positive atmosphere , crystal clear explanation, and a great voice.
    The best course NPTEL

  • @jayantpriyadarshi9266
    @jayantpriyadarshi9266 4 года назад +90

    Top 20% might get pissed off but I have a philosophy that I teach for the bottom 20% students of the class.
    Love you sir..

  • @ambujmittal6824
    @ambujmittal6824 5 лет назад +31

    This is by far the best explanation of gradient descent I have ever come across! Thank you so much for this :D

  • @keshavsingh489
    @keshavsingh489 5 лет назад +18

    This teacher is best of all, I spent 6 months after all these minute explainations but couldnt find it in any single place.
    Please make his machine learning lectures online too

    • @thearindampal
      @thearindampal 5 лет назад

      ruclips.net/p/PLH-xYrxjfO2VsvyQXfBvsQsufAzvlqdg9

    • @keshavsingh489
      @keshavsingh489 5 лет назад

      @@thearindampal This is Deep learning series.

    • @thearindampal
      @thearindampal 5 лет назад +1

      @@keshavsingh489 I am not sure if Prof. Khapra did a course on ML. This video is a part of DL course and the whole playlist is the one which I posted.

  • @sayakchakrabarty
    @sayakchakrabarty 4 года назад +2

    we need teachers like this in all best institutes of India, which is quite rare

  • @subarnasamanta4945
    @subarnasamanta4945 4 года назад +18

    Mitesh sir is one of the best teacher in nptel i have seen especially the way emphasis on the hidden maths is amazing

  • @srikanthtammina1900
    @srikanthtammina1900 5 лет назад +15

    Perfect explication of gradient descent. First with mathematical proof and then the associated code to prove it with graphical illustrations. Thank you

  • @santhoshkumark4463
    @santhoshkumark4463 9 месяцев назад

    jus loved to watch ur class ..............its amazing .....................the way u teach,,,,,,,

  • @lakshakumara
    @lakshakumara Год назад

    Best teacher touch very fundermetal to comprehensive best of luck

  • @commonsense1019
    @commonsense1019 Год назад +1

    baap of all times
    seriously man he is soo good in it

  • @sourajitmukherjee1786
    @sourajitmukherjee1786 3 года назад +1

    Excellent Lecture, Sir...the hessian and gradient explanation part was very helpful and made things more clear.

  • @charugatra2566
    @charugatra2566 5 лет назад +6

    Thank you. All your lectures are great. Also teaching everything from basics helps to understand quickly.

  • @murugavelish
    @murugavelish 4 года назад +1

    Nice series of lectures about deep learning. Thank you so much!

  • @lakshmir7327
    @lakshmir7327 4 месяца назад

    Thank you so much sir. Best teaching .

  • @pranjalnama2420
    @pranjalnama2420 Год назад +3

    Best teacher, i have never seen such a calm and positive teacher. 😍 we want more teachers like you sir.

  • @krishnachauhan2822
    @krishnachauhan2822 2 года назад +2

    I dont find any role of function error in last code implementation. As we already grad into x and y right?

    • @himanshu5891
      @himanshu5891 Год назад

      In code, error function is used to stop the algorithm. In the algorithm we have to input some threshold value, and if the difference between the successive error values becomes lesser than that threshold, the algorithm will be instructed to stop as the error gets converged.

  • @NabidAlam360
    @NabidAlam360 8 месяцев назад

    What an outstanding teaching skill with an amazing personality! I wish every university to be blessed with a teacher like him!

  • @nitingoyal5515
    @nitingoyal5515 2 года назад +2

    Sir please we want your ML series also

  • @amitattafe
    @amitattafe 2 года назад +1

    I have a question:
    As we know that gradient descent is a nice algorithm for fastest convergence. But many a time this algorithm oscillates and also it needs a quite a number of iterations to reach at minima.
    Also many of time it stuck towards local minima then at global minima.
    So why don't we try hybrid algorithm for finding global minima and also at less iteration cost.
    I can suggest Genetic algorithm (for faster convergence even towards global minima) at first and then for gradient descent.
    What are your concern about it??

  • @jsridhar72
    @jsridhar72 7 месяцев назад

    He teaches great. However, even after watching the video the math is incomprehensible for a poor math guy like me.

  • @bharatbajoria
    @bharatbajoria 4 года назад

    a very good explanation, Thank you Sir.

  • @akashdevbanshi9109
    @akashdevbanshi9109 3 месяца назад

    awesome explanation to one of the hardest part in ML

  • @vikrampande5243
    @vikrampande5243 5 месяцев назад

    Thank you Dr. Khapra. Basics are as important as advanced concepts, I really enjoyed this video.

  • @nptel1punith929
    @nptel1punith929 7 месяцев назад

    give me the feedback that we dont need basic stuff, i am gonna ignore that anyways!!!! , WOW moment for me as i was expecting a teacher just like this one.this one's a gem

  • @vinodjain6866
    @vinodjain6866 Год назад

    It's very important for teachers to be sorted and calm and sir is real example of that!

  • @monu_pathak-iitb
    @monu_pathak-iitb 8 месяцев назад

    real Gems of iits are those prof who know how to teach students so that he never have any confusion afterwards. rest are only researchers, they also do have knowledge but they aren't able to express in a proper manner.

  • @sb239
    @sb239 2 года назад

    at 27.48, moving in direction of gradient or opposite to direction of gradient ? Can anyone explain

  • @rohit2761
    @rohit2761 10 месяцев назад

    The man the myth the legend Sir Mitesh Khapra. God of deep learning

  • @Nikhil-hi1qs
    @Nikhil-hi1qs 11 месяцев назад

    Can you please tell more about how to chose the learning rate( eta ) here or you are treating that as a parameter itself?

  • @chalapathinagavarmabhupath8432
    @chalapathinagavarmabhupath8432 5 лет назад +1

    super you are awesome

  • @puneetsinha1732
    @puneetsinha1732 Год назад

    What textbook he mentioned at 9.04 ? hessian?

  • @NaveenKumar-ud5mq
    @NaveenKumar-ud5mq 8 месяцев назад

    one heck of a lecture it is. Amazed ...

  • @umairalvi7382
    @umairalvi7382 4 года назад +1

    Best

  • @swagatmishra9350
    @swagatmishra9350 3 года назад

    One of the best explanations..Thank you very much sir..

  • @sankarayachitula4328
    @sankarayachitula4328 4 года назад +1

    What is U transpose, and angle b/w uT and gradient is 180 not u and gradient, please clarify sir

    • @desiquant
      @desiquant 3 года назад

      u = Δθ. It is just a notation for convenience.

    • @desiquant
      @desiquant 3 года назад +2

      The vector is still u and taking a transpose does not change the direction of the vector. Why did we take the transpose? To facilitate the matrix multiplication between vector u and ΔL(θ). That is all. The direction of u remains the same.

  • @copaceanubobi6101
    @copaceanubobi6101 3 года назад

    can someone tell me how we plot the error surface ?

  • @yashmalajain1270
    @yashmalajain1270 5 месяцев назад

    Where to get notes of whole course

  • @vamkarthik
    @vamkarthik Год назад

    In the code shown in this lecture, eta value is chosen to be 1.0. However, in the earlier part, we said we can ignore the higher order terms of Taylor expansion because eta is usually very small and the higher order terms are negligible. But here higher-order terms of eta are not negligible. Why does our approach still work? In this kind of situtation, can we get a better convergence by including the higher order derivatives as well?

    • @nishantsharma7596
      @nishantsharma7596 4 месяца назад

      higher order terms are a dot product of eta_T grad(theta) evaluated as nth derivative, so the values in the derivatives will be smaller to be ignored.

  • @SumanDas-fx5vu
    @SumanDas-fx5vu Год назад

    Best teacher

  • @anjalijaiswal956
    @anjalijaiswal956 Год назад

    At 14:46 ,beta is the angle between "u" and gradient L, instead of "u transpose"?

    • @beluga.314
      @beluga.314 Год назад

      Yes. Thats what he corrected it to

  • @ayanbizz
    @ayanbizz 4 года назад +1

    OK , I understood the concept of taking a step opposite to the gradient. But what is the direction of the gradient ? I am not able to visualize that.

    • @bharathraman2098
      @bharathraman2098 4 года назад +2

      Gradient here refers to the size of the step itself. So, as the point traverses over the error surfaces and moves closer to the optimal point/surface, the size of the step will become smaller. The opposite direction only implies that the step size will decrease as we get closer.

    • @dailyDesi_abhrant
      @dailyDesi_abhrant 4 года назад +3

      look, it basically says that you have to go against the gradient of the slope. Imagine a parabola with its minima at the origin. If you are in the second quadrant, you have to move in the positive x direction. But at that point the slope is negative. Simillarly, when you are in the first quadrant, the slope there is positive, but you have to move towards the negative x-direction to reach the minima.

    • @ruturajjadhav8905
      @ruturajjadhav8905 3 года назад

      @@bharathraman2098 what is step size

    • @bharathraman2098
      @bharathraman2098 3 года назад +1

      @@ruturajjadhav8905 step size is a dynamically estimated value, by which the algorithm moves over the error surface. gradient descent tries to minimize error by finding the optimal point which should be the lowest point on the surface

    • @ruturajjadhav8905
      @ruturajjadhav8905 3 года назад

      @@bharathraman2098 what is step size the above video.?
      I am not getting.
      And how to visualise direction of the gradient?

  • @akshayshah483
    @akshayshah483 5 лет назад

    it's amazing

  • @ruturajjadhav8905
    @ruturajjadhav8905 3 года назад

    I understood the concept of taking a step opposite to the gradient. But what is the direction of the gradient ? I am not able to visualize that. Anyone?

    • @BRIJESHKUMAR-jn3lm
      @BRIJESHKUMAR-jn3lm 3 года назад

      it tells the direction in which the increase in value of function is maximum.like in case of a function of one variable it is simply the tangent.

    • @shubhamkumar-nw1ui
      @shubhamkumar-nw1ui 2 года назад

      @@BRIJESHKUMAR-jn3lm Does gradient means direction in which "increase" is maximum or "change" is maximum ?

    • @BRIJESHKUMAR-jn3lm
      @BRIJESHKUMAR-jn3lm 2 года назад +1

      @@shubhamkumar-nw1ui increase.
      Change can be either incremental or decremental. But gradient specifically means direction in which the fucntion value increases.
      It is same as the direction one follows while climbing a hill.

  • @kollivenkatamadhukar5059
    @kollivenkatamadhukar5059 2 года назад

    how is loss function 1/2*(f(x)-y)^2 at 21:52 can some one help

    • @akashkumbar5621
      @akashkumbar5621 Год назад

      It is taken for sake of simplicity, so that when you take derivative of f(x), the 2 gets cancelled out.

    • @sumitsaha-s8h
      @sumitsaha-s8h 27 дней назад

      It should be without 1/2 as the number of points is 1 only for the mean squared error loss function

  • @gauravlotey4263
    @gauravlotey4263 4 года назад

    Have you uploaded ML lectures too?

  • @surinderdhawan1061
    @surinderdhawan1061 5 лет назад +2

    Andrew Ng's version 2