Lecture 10 - Neural Networks

Поделиться
HTML-код
  • Опубликовано: 15 янв 2025

Комментарии • 111

  • @erikperillo9054
    @erikperillo9054 9 лет назад +128

    It is amazing how you can simplify a subject just by explaining the right things in the right order.

  • @jhpaik1
    @jhpaik1 11 лет назад +5

    An excellent teacher. I now understand why he has got Feynman award. Thank you so much Prof. Mostafa. I wish the universities have many many teachers like you.

  • @agl3797
    @agl3797 4 года назад +9

    Love this professor! Calm and serious, yet exciting and funny.

  • @AleifrLeifrson
    @AleifrLeifrson 10 лет назад +54

    Yaser Abu-Mostafa for president!

  • @StevenSarasin
    @StevenSarasin 9 лет назад +11

    This lecture made so much sense it almost made the material seem obvious. After many other sources failed to sink in, I feel like I found an oasis in the desert.
    It helped me to pause the video occasionally and attempt to work out what the next step would be. I measured my understanding at any given time by how well I predicted the next slide.

  • @CaNNaDark
    @CaNNaDark 10 лет назад +25

    Enjoyed the explanation, amazing!
    "Sorry, we denied credit because lambda is less than .5" :)

  • @HajjAyman
    @HajjAyman 12 лет назад +2

    Very enjoyable lecture! I really liked the build up of the network from simple perceptrons and how the back-propagation algorithm is derived. Can't wait to learn about support vector machines.

  • @tiagodnalves
    @tiagodnalves 11 лет назад

    Best lecture I found on youtube about neural networks. Very clear.

  • @LegoDude3258
    @LegoDude3258 3 года назад

    Thanks for making a RUclips channel as a young teenager it is cool that I can watch these.

  • @Nestorghh
    @Nestorghh 12 лет назад +1

    Genio! No se puede creer lo bien que explica.

  • @mhchitsaz
    @mhchitsaz 12 лет назад +1

    could not be more clear and descriptive, great lecture

  • @sendhilchokkalingam629
    @sendhilchokkalingam629 6 лет назад +1

    thorough examination on the topic, i doubt anyone can do better than Abu-mostafo

  • @laraditzel
    @laraditzel 9 лет назад +7

    Extraordinary class.

  • @AlexanderPolomodov
    @AlexanderPolomodov 10 лет назад +11

    Very good explanation of neural networks.
    Amazing lecturer, I will look for another his courses:)

    •  10 лет назад

      It always impresses me how we can apply theory to anything. If you look at results - we fail horribly at anything last three thousand years.

    • @konstantinkotsev
      @konstantinkotsev 10 лет назад

      He is indeed a really good lecturers and explains things clearly and makes them easy to grasp.

  • @hsamiful
    @hsamiful 11 лет назад

    always Egyptian professors has the art of teaching and clarifying hard topics

  • @brainstormingsharing1309
    @brainstormingsharing1309 4 года назад +1

    Absolutely well done and definitely keep it up!!! 👍👍👍👍👍

  • @paladin1410
    @paladin1410 11 лет назад

    The intro to SGD is excellent. Thank a lot

  • @canchirulo
    @canchirulo 12 лет назад +2

    thank you! very helpful! greatings from Guatemala!

  • @entaditorium
    @entaditorium 11 лет назад +1

    This video helped me implement the back propagation network which solves XOR problem.
    Thanks.

    • @vigneshshetty7228
      @vigneshshetty7228 4 года назад

      can you please share code

    • @entaditorium
      @entaditorium 4 года назад

      @@vigneshshetty7228 github.com/hiraditya/neural-network

    • @vigneshshetty7228
      @vigneshshetty7228 4 года назад

      datascience.stackexchange.com/questions/73725/implementing-neural-network-using-caltech-course

  • @osquigene
    @osquigene 8 лет назад +1

    Very nice talk, feels like (almost) nothing is missing :), thank you for putting this online.

    • @osquigene
      @osquigene 8 лет назад

      To be a bit more precise on the "(almost)", I think that adding a slide with the complete partial derivative chains for two consecutive w: w^{(L)} and w^{(L-1)}, would help to show why the recursive computation works (showing the common \delta^{(L)} term), this could even be extended to three level, but if I'm not mistaken this would lead to an 7 terms formula, which may be confusing.

  • @mnfchen
    @mnfchen 11 лет назад +1

    Very good lecture about a very intriguing technique. Thoroughly enjoyed :)

  • @rathapech
    @rathapech 10 лет назад +2

    I really like the lecture. It's so well explained

  • @genkidama7385
    @genkidama7385 11 лет назад

    This teacher is very good.

  • @bouzidiaymen2512
    @bouzidiaymen2512 7 лет назад +2

    thank you ..from Tunisia
    merci beaucoup

  • @profyao
    @profyao 11 лет назад

    So clear! Great professor!

  • @sidali9930
    @sidali9930 8 лет назад +1

    thank you ,, from Algeria
    merci beaucoup

  • @welovfree
    @welovfree 6 лет назад +1

    What's the pre-requisites for this course?

  • @autripat
    @autripat 10 лет назад +1

    Really starts at 4:32 into the presentation

  • @mario7501
    @mario7501 7 лет назад

    Brilliant lecturer!

  • @michaelmellinger2324
    @michaelmellinger2324 2 года назад

    @4:10 He says neural nets aren’t the model of choice these days and people might choose SVM’s, for example. This is from 2012. When did NN make the jump to being are great choice again?

    • @MrAdriandipsf
      @MrAdriandipsf 2 года назад

      I feel this is still true in industry and academia. NNs achieve SOTA and thus grab the headlines, but they are computationally expensive and are not always deployed in production. As far as NLP goes, NNs were revitalized in 2017 thanks to the attention mechanism and again in 2018 with pseudo-labeling (MLM). The former allowed you to fit in a single GPU large models that would have required a supercomputer with CNNs. The latter provided access to petabytes of labeled information. Still, people use perceptrons and regex because they're way faster and don't require a 16GB gpu for inference.

  • @akankshachawla2280
    @akankshachawla2280 3 года назад +2

    "It's a funny array but its a legitimate array"

  • @timestamp196
    @timestamp196 7 лет назад

    That was explained well enough for even me to understand...

  • @vimanyuaggarwal1420
    @vimanyuaggarwal1420 6 лет назад

    Factor no.7 is important in your case, just like 42 is the ultimate answer to the ultimate question of life, universe, and everything!

  • @shemleong7571
    @shemleong7571 9 лет назад +3

    For the recap of lecture 9, how come he says that gradient descent requires a 'twice differentiable' function? Isn't it only the first derivative?

    • @charlesaydin2966
      @charlesaydin2966 8 лет назад +1

      +Shem Leong For a local minimum to exist, not only should the 1st derivative equal zero, but also the 2nd derivative should exist and be > 0.

    • @markh1462
      @markh1462 6 лет назад +3

      @@charlesaydin2966 That statement is false. I can give you a counterexample right now where a local minimum exists while neither the 1st or 2nd order derivative exists, e.g., a "V" function. Besides, Googling the assumptions for gradient descent, all sources I found state that only the 1st derivative needs to exist. 2nd order derivatives are only needed if you want to use more sophisticated approaches such as Newton method's etc. See:
      www.stat.cmu.edu/~ryantibs/convexopt-F15/scribes/05-grad-descent-scribed.pdf
      or
      www.cs.cornell.edu/courses/cs4780/2015fa/web/lecturenotes/lecturenote07.html
      Can you provide a proof or a reference for your claim because it does not make any sense. Sure, if the 2nd order derivative exists and is >0, you can conclude it's a min, but it's completely illogical to say that a local min does not exist unless 2nd derivative exists.

    • @ZombieLincoln666
      @ZombieLincoln666 6 лет назад

      yes I think he misspoke. He was probably thinking about a second-order method

  • @billykotsos4642
    @billykotsos4642 5 лет назад

    This guy knows his stuff

  • @AndyLee-xq8wq
    @AndyLee-xq8wq Год назад

    amazing

  • @HasamMahroos
    @HasamMahroos 11 лет назад

    incredible prof

  • @00354binit
    @00354binit 12 лет назад

    Nice & clear explanation.

  • @ddg170
    @ddg170 10 лет назад +23

    Dear professor, why you never take a sip of water?

  • @double_j3867
    @double_j3867 8 лет назад +1

    His video explanation of SGD should be put in place of the current Wikipedia page.

  • @ProgrammingTime
    @ProgrammingTime 11 лет назад

    Really great

  • @fradamor
    @fradamor 8 лет назад +1

    These lectures are very very good..... but why neural network, as other topic, are not included in the book "learning from data"?
    Thank Prof. Yasser for your work. It is very useful.

    • @nguyengoctu
      @nguyengoctu 8 лет назад +2

      you can download e-chapters that cover other topics (SVM, neural network...) in book.caltech.edu/bookforum/forumdisplay.php?f=149

    • @fradamor
      @fradamor 8 лет назад

      Great!!!!

    • @manishsharma-uc5xg
      @manishsharma-uc5xg 7 лет назад

      Is it mandatory to register ?

  • @Ruhgtfo
    @Ruhgtfo 3 года назад

    Finally know why lecturer skipped to explain this

  • @carlhenninglubba1098
    @carlhenninglubba1098 9 лет назад +1

    The clarity of his lectures is amazing.
    But in this one, the left graph on the 5th slide is confusing. Taking certain sample or a subset of samples to compute the direction for the next step does not change your position in the w-space. Only the surface of E_in(w) is changed. Hard to plot, but the shown visualization is just not helping at all.

    • @mohezz8
      @mohezz8 9 лет назад

      +Koral Linski I think you actually do change your position in the w-space right after picking a single training example. This is more clear in the pseudo code at 1:00:00 where you update the weights before picking the next example.

    • @carlhenninglubba1098
      @carlhenninglubba1098 9 лет назад

      +Mohamed Ezz Yes, right. Of course you change the position in the w-space in a learning step. But this is not what I meant.
      In slide 5 he wants to illustrate the difference between SGD and GD over all traning samples. The figure only shows that there are several local minima in the error-function E_in(w) for ALL SAMPLES over the w-space. Considerung one learning step the choice of a subset of samples alters the shape of E_in(w), but not your position in w-space. Thats why you get a different gradient, a different direction and finally overcome small suboptimal minima.

    • @mohezz8
      @mohezz8 9 лет назад

      +Koral Linski Got you. Right, you see a different E(w) surface for each batch/example.

  • @brandomiranda6703
    @brandomiranda6703 8 лет назад +1

    Are the lecture slides available somewhere?

    • @vivekseth3210
      @vivekseth3210 8 лет назад +2

      +Brando Miranda yeah, they are available here: work.caltech.edu/telecourse.html

  • @shaunmike
    @shaunmike 12 лет назад

    Nice O.O; very helpful for my midterm

  • @SaimonThapa
    @SaimonThapa 6 лет назад

    can anyone explain what he says in the final slide?

  • @CumPirate
    @CumPirate 7 лет назад

    This is a 100 level course?

  • @jonatanisse6362
    @jonatanisse6362 7 лет назад

    I didn´t quite get that. Was it "okey" before or after the partial derivative?

  • @kassahundessie1361
    @kassahundessie1361 12 лет назад

    helpful! UoG, Ethiopia

  • @snylekkie
    @snylekkie 8 лет назад

    great

  • @jadmam
    @jadmam 3 года назад

    Excuse me, I have problem of understanding slide #6. As "u" and "v" are datasets from user and movie; so, where are the learnable parameters?

    • @viniciusdeavilajorge5053
      @viniciusdeavilajorge5053 2 года назад

      The learnable parameters are the weights associated with each feature the movies have and the customers apparently prefer (based on their watch history), then the algorithm tries to predict the possible movie rating that customer will give to a specific movie. If this rating cross a threshold, the algorithm will suggest that movie

    • @jadmam
      @jadmam 2 года назад

      Thank you very much.

  • @brainstormingsharing1309
    @brainstormingsharing1309 4 года назад +1

    👍👍👍👍👍

  • @Joel-jl9sb
    @Joel-jl9sb 9 лет назад +1

    This vdeo seems good. I will watch it when I find the time :)

  • @Ejeenachannel
    @Ejeenachannel 5 лет назад

    there is no topic of neural network included in the text book . can anyone send me the textbook . thanks

    • @dillonpena1310
      @dillonpena1310 4 года назад

      Neural networks is one of the "dynamic echapters" amlbook.com/

  • @BirgerBurgerBargir
    @BirgerBurgerBargir 8 лет назад

    I'm a little confused about computing the error of the output of a neuron in the final layer.
    Lets say I have a neuron signal "s", an activation function "f(x)", a neuron output "x" which is the signal fed through the activation function ( i.e. x=f(s) ), and finally the derivative of the activation function f'(x).
    For this particular neuron what is the equation to find the neuron error "d"?

    • @Bing.W
      @Bing.W 7 лет назад

      Error is not computed for one neuron. It is computed for the whole network (as the hypothesis model). That is the Ein, an function existing only after the final layer. For your question, the Ein'(s) = Ein'(f)*f'(s), assuming x = f(s).

  • @michaeladams4999
    @michaeladams4999 3 года назад

    Yup

  • @pierredehandschutter8592
    @pierredehandschutter8592 7 лет назад

    I think there is an error on slide 19 : when he does (theta '), it should be (1-s ^2) and not (1-x^2). Could someone confirm it please?

    • @Bing.W
      @Bing.W 7 лет назад

      No, it is not an error. theta'(s) = 1 - theta(s)^2 = 1 - x^2.

    • @kungfupanda2686
      @kungfupanda2686 7 лет назад

      1-theta^2(s)
      we know that [theta(s) = x]
      => 1-x^2

  • @roelofvuurboom5939
    @roelofvuurboom5939 4 года назад

    On not slavishly following a biological model: "We get a plane that flies but doesn't flap its wings." :-)

  • @eachonly
    @eachonly 12 лет назад

    greeting from AU

  • @jhabindrakhanal7366
    @jhabindrakhanal7366 11 лет назад

    good

  • @thankyouthankyou1172
    @thankyouthankyou1172 Месяц назад

    20:44

  • @liquiddonkey6530
    @liquiddonkey6530 3 года назад

    The RUclips algorithm has taken my pivot from dancing cats to self-education a bit too far this time...

  • @williamnelson4968
    @williamnelson4968 12 лет назад

    ok

  • @sheikhakbar2067
    @sheikhakbar2067 4 года назад

    RUclips Algorithm here I come again!

  • @ppascasl
    @ppascasl 2 года назад

    1:04:12 jajaja

  • @calculus661
    @calculus661 7 лет назад

    it seems like there is only one student in the room. lol

    • @rafaelespericueta734
      @rafaelespericueta734 6 лет назад

      No, there was a room full of Caltech students in his (very popular) class. The other person you hear asking questions is conveying questions from the online students who were watching the videos stream.

  • @evolivid
    @evolivid 9 лет назад

    Hooray im the 666th like !!! as of now there are 666 likes of this video

  • @RelatedGiraffe
    @RelatedGiraffe 10 лет назад

    He looks like the guy from Scrubs :P

  • @indrajitbagchi7313
    @indrajitbagchi7313 7 лет назад

    Ah......the big bang theory

  • @madScientist404
    @madScientist404 7 лет назад +1

    if he says "ok" one more time i go nuts...