Deep Learning(CS7015): Lec 2.6 Proof of Convergence of Perceptron Learning Algorithm

Поделиться
HTML-код
  • Опубликовано: 2 окт 2024

Комментарии • 53

  • @manoranjansahu7161
    @manoranjansahu7161 Год назад +8

    Best explanation I have seen on Perceptron!

  • @sarcastitva
    @sarcastitva 4 года назад +22

    "Please, we are done!" - That's what she said.

  • @parvalunawat4788
    @parvalunawat4788 2 месяца назад

    That was an Awesome Explanation 👌👌👌

  • @saptarshisanyal6738
    @saptarshisanyal6738 2 месяца назад

    Watching it in 2024. In today's world we have brilliant tools like desmos to have better geometrical intuition of what he says in this series of tutorial. NPTEL lectures are generally very dry and Prof Khapra was not able to explain it clearly.
    Somehow I felt that the dots were not connected.

  • @vivekrachakonda5803
    @vivekrachakonda5803 11 месяцев назад +2

    Very clear explanation on proof of convergence.

  • @abhishek-tandon
    @abhishek-tandon Год назад +2

    Brilliant teacher

  • @pattabhidyta7991
    @pattabhidyta7991 Год назад +1

    @ 14:14 + too good
    😀

  • @sohambasu660
    @sohambasu660 2 года назад +1

    why is p(i) square = 1 when proving on the denominator ?

  • @sahilkhanna8332
    @sahilkhanna8332 Год назад +1

    Kya hi horha hai

  • @saurabhsuman2506
    @saurabhsuman2506 6 лет назад +4

    why || pi ||^2 = 1 at 12:01

    • @gauravlotey4263
      @gauravlotey4263 4 года назад

      @ravi gurnatham Why is 2*Wi*Pi taken negative?

    • @arvind31459
      @arvind31459 4 года назад +2

      because all the inputs are normalized before training the perceptron...refer setup @ 5:43

    • @wishurathore7214
      @wishurathore7214 3 года назад

      @@gauravlotey4263 Because pi was misclassified.

    • @amoghsinghal346
      @amoghsinghal346 2 года назад

      P is a unit norm

  • @soumyasen1483
    @soumyasen1483 3 года назад +1

    At 13:20, even though roughly speaking, we could say cos (beta) grows proportional to sqrt(k), how would one prove that mathematically?

    • @RandomLolaRandom
      @RandomLolaRandom 3 года назад

      Take the expression on the right hand side, divide it by √k, and find the limit as k→∞. The division is how you would find the proportionality constant between two expressions; the limit will calculate the proportionality as k goes to infinity. It approaches −𝛿 as k grows.
      Another approach to show that "if k is unbounded then cos(β) is also unbounded" is to calculate the limit as k→∞ of just the right-hand side of the inequality in the slide (completely ignoring the proportionality to √k). Use L'Hôpital's rule and simplify to find the limit.

  • @sohambasu660
    @sohambasu660 2 года назад

    Shouldn't it be wi * xi >minus( w0)? because w0(bias) = minus(threshold) ?
    Am I missing something here ?

  • @Vivekagrawal5800
    @Vivekagrawal5800 4 года назад +1

    Why is w and x considered positive throughout?

    • @anjushac7960
      @anjushac7960 4 года назад +5

      There is not assumption about w, It can be positive or negative. The only assumption about x is that its normalized and that it belongs to the "Positive class". If any x belongs to "Negetive class", the negative of that x is taken, which will inturn be in the positive class

  • @RakeshGupta-ft6cc
    @RakeshGupta-ft6cc 4 года назад +1

    In the entire proof, we haven't used the statement that the data is linearly separable. then even if the data is linearly inseparable, then wont this proof work?
    but it shouldnt. then how is the fact that data is linearly separable effecting this proof?

    • @rajeevsebastian6513
      @rajeevsebastian6513 4 года назад +13

      But if it is not linearly separable would there be w star? Please understand that w star in this case exists and is just unknown. But in your case, it does not exist. So how will you say anything about the Cos Beta between wt and w*?

    • @aswathik4709
      @aswathik4709 2 года назад

      This proof is done assuming w* exists, that is there could be a line completely separating these two sets of points. So the moment w* doesn't exist the proof will be invalid.

  • @marcuswonder5057
    @marcuswonder5057 5 лет назад +17

    This is suprisingly well explained, thank you.
    For anyone watching this - don't be put off by his accent. The content is worth to be watched :)

  • @dipaknidhi3969
    @dipaknidhi3969 4 года назад +1

    Sir why you take transpose of W

    • @sarrae100
      @sarrae100 3 года назад +2

      For matrix multiplication, dimensions have to be adjusted first.

  • @Abhisheism
    @Abhisheism 6 лет назад +1

    Why do we Normalize the Inputs??

    • @SahanGamage99
      @SahanGamage99 3 года назад

      For sigmoidal activation functions the gradient descent can operate faster as the gradients are the large when the magnitudes are lower

    • @Abhisheism
      @Abhisheism 3 года назад

      @@SahanGamage99 I appreciate your replying but its already two years since I posted it and figured it out later. Thanks BTW

    • @bavidlynx3409
      @bavidlynx3409 2 года назад +1

      @@Abhisheism well others might have same doubt so its helpful for them.
      *BTW i am others*

  • @sohambasu660
    @sohambasu660 2 года назад

    Why does summation ( wixi > w0) needs to be proved ? Shouldn't we prove that sumamtion(wixi > 0) instead of (wixi > w0) ?
    Please somebody help

    • @amoghsinghal346
      @amoghsinghal346 2 года назад

      P is a unit norm

    • @umang9997
      @umang9997 Год назад

      I think you are reffering to 1:44
      A. wixi > Wo has summation from i=1 to n
      B. wixi > 0 has summation from i=0 to n
      If you look closely,
      B ----> w0x0 + w1x1 + w2x2... wnxn > 0
      and
      A ----> w1x1 + w2x2... wnxn > W0
      which is same as
      w0x0 + w1x1 + w2xn...wnxn > 0 with w0=-W0 and x0=1
      and which is same as B.
      Here w0 is also called bias and W0 is called threshold.
      Hence proving Summation ( wixi > w0) with i=1 to n
      is same as proving Summation ( wixi > 0) with i=0 to n

  • @nandinimishra9153
    @nandinimishra9153 Год назад +1

    p' means all possible positive points?

    • @sufiyanadam
      @sufiyanadam Месяц назад

      Technically it is the union of all the positive points and the NOT (negative) points.

  • @rafaelantoniogomezsandoval3785
    @rafaelantoniogomezsandoval3785 5 лет назад +1

    Nicela

  • @manmeetpatel9475
    @manmeetpatel9475 Год назад +1

    How is ||pi||² = 1?

    • @sufiyanadam
      @sufiyanadam Месяц назад

      It is the unit norm of pi

  • @bharathraman2098
    @bharathraman2098 4 года назад

    If the delta represents minimum increase based on the set of points (p) then k could actually be more than 't' right? For example, imagine there is a value of p which is large and violates the angle requirement. Then, the increase in each iteration will be the minimum value of p which is represented by delta. If multiple times this iteration needs to run in order to overcome the violation for a specific point then overall k can exceed 't'. Am I missing something here?

    • @anjushac7960
      @anjushac7960 4 года назад +1

      I think you are confusing between 't' (the total number of time steps) and the total number of data points. In general, in one go, we iterate over all the data points, but there is no guarantee that it will converge if we do only that much. So, we have to repeat the process again. i.e, iterate over the data points again. Maybe after a few cycles of iterations, it will converge. The theorem only gurantess that it will converge after a finite set of cycles

    • @IIMRaipur_is_a_fraud_institute
      @IIMRaipur_is_a_fraud_institute 10 месяцев назад

      You answer your own question and yet do not realize. If multiple iterations are needed for a particular point, then those multiple iterations are added to the 't'. Therefore, even in such a scenario, k

  • @vishnuvardhan6625
    @vishnuvardhan6625 7 месяцев назад

    Where can I get slides for these lectures?

    • @rishiRdutta
      @rishiRdutta 7 месяцев назад

      Google CS7015 and download handouts.

  • @surajkumar156
    @surajkumar156 4 года назад +1

    Sir..it is somewhat confusing, whether the entities of P and N are in order (ascending/descending) or random.

  • @prithwishguha309
    @prithwishguha309 2 месяца назад

    I Already said this in your BS degree lecture So I'm not going to repeat myself but No, the whole Proof is wrong because the whole statement is wrong. You can't prove that perceptron always reaches convergence when we clearly know it doesn't. it depends on the starting point... but to be honest this course is much more, far more better than that one😂 other than bad intro, bad intro music and bad outro🤣