LSTM Networks: Explained Step by Step!

Поделиться
HTML-код
  • Опубликовано: 23 окт 2024

Комментарии • 48

  • @rajpulapakura001
    @rajpulapakura001 11 месяцев назад +8

    At first I was inclined to click away from the video because of the unorthodox explanation of LSTM in "steps", which was different to what I had seen in other videos and blog posts which focus on the infamous LSTM diagram. However, I was struggling to fully grasp LSTMs so I decided to give the video a try. And it paid off! I can't believe LSTMs are that simple! This video is absolutely essential for understanding LSTMs at a fundamental level.

  • @user-wr4yl7tx3w
    @user-wr4yl7tx3w Год назад +19

    the great thing about your videos is that I am always guaranteed to learn something and learn it with much better understanding.

  • @LifeKiT-i
    @LifeKiT-i Год назад +4

    i have listened to a 2-hour lecture in my MSc data science, still don't know what is happening. Your video is explain it in a succinct way!!! Thank you!!!

  • @juaneshberger9567
    @juaneshberger9567 Год назад +8

    Your videos (specifically sampling and deep learning videos) helped me a lot during my master's. Thanks for all the videos!

  • @karanmaniyar9086
    @karanmaniyar9086 Год назад +3

    I've been trying to understand LSTM through multiple blogs and videos but the thing that why it needs to be this complex , you specifically targeted that point of view to understand it , this is really one of the best videos , because you showed why there was a need for a LSTM and how could the gaps be filled , which is what made it very easy to understand . Could you please list the references as well for the video , so that if anyone has to go further deep into the concepts, it would be very helpful ! Thanks a lot for this video !

  • @chaitrab9253
    @chaitrab9253 Год назад +1

    Plz continue the same good work by blending Mathematics with simple Real time example. Fantastic Explanation👍

  • @vzinko
    @vzinko Год назад +2

    Best explanation of LSTMs on the internet

  • @thankgoodnessitstheweekend2860
    @thankgoodnessitstheweekend2860 8 месяцев назад

    Thank you so much for your videos! They are super informative and much more intuitive than the hundreds of slides I have from my master's class. Keep up the great work!

  • @charleskangai4618
    @charleskangai4618 Год назад +1

    Extremely good and helpful! A great genuine desire to help learners by explaining difficult ideas in a most self effacing manner! Many thanks!

  • @carlosenriquehuapayaavalos6297
    @carlosenriquehuapayaavalos6297 Год назад +2

    Thanks for the video!! Just what I needed for my ML midterm exam. Will be waiting for the Transformers topic that I believe build upon this concept.

  • @pushkarparanjpe
    @pushkarparanjpe Год назад

    This is an extremely good explanation. Thanks for all the effort and sharing!!

  • @rizkabritania2429
    @rizkabritania2429 8 месяцев назад

    super helpful. I cant thank you enough for making this explanation

  • @DarkAtom04
    @DarkAtom04 Год назад

    Amazing explanation!

  • @KarthikNaga329
    @KarthikNaga329 Год назад +2

    Great video as always!
    The part that still perplexes me:
    How does the LSTM "know" what is important (like dog) and when to actually use that to predict the next word?

    • @Arjun----
      @Arjun---- Год назад

      It learns through lots of training, just like every other aspect of DL

  • @jianhuali4080
    @jianhuali4080 9 месяцев назад

    great job on this topic explanation!

  • @_Sam_-zh7sw
    @_Sam_-zh7sw Год назад +3

    Hi Ritvik,
    It will be really great if you could create videos which explain maths behind ML models like SVM and PCA.I am also curious about ODE, PDE, real analysis, complex analysi and stochastic calculus. But the problem is that i want to explore topics which are relevant to financial engineering. So i could read all quant finance related textbooks. I am a professional and really dont have time to read all applied maths textbooks 😅.

    • @gordongoodwin6279
      @gordongoodwin6279 11 месяцев назад

      he has covered the ML models you listed in depth. it's illegal to cover ODE and PDE though bc nobody likes them

  • @golnoushghiasi7698
    @golnoushghiasi7698 Год назад

    So happy you did this video!!! :D Thank you for all the great work!

  • @romainjouhameau2764
    @romainjouhameau2764 Год назад

    Really easy to follow ! Thanks a lot

  • @juaneshberger9567
    @juaneshberger9567 Год назад

    A video about transformers and GANs in this style would be awesome as well.

  • @billdepo1
    @billdepo1 Год назад

    Just amazing intuition! Thanks so much for the great content.

    • @ritvikmath
      @ritvikmath  Год назад

      Glad you enjoyed it!

    • @isha2567
      @isha2567 Год назад

      Thanks! Could you also explain GRU

  • @prateekcaire4193
    @prateekcaire4193 Год назад

    Thank you very much! I have a few questions:
    1. Could you please explain the reasoning behind using a candidate cell state and why the tanh activation function is necessary?
    2. I have noticed that many implementations, papers, or blogs I have read use concatenation of h[t-1] and x[t] and a single learnable weight matrix W instead of U and V used in this video. Can you clarify why this is the case?
    3. Despite the success of the model in predicting words, I remain somewhat skeptical about how it achieves such accuracy. :)

    • @Arjun----
      @Arjun---- Год назад

      1. The candidate cell state is simply the same weighted combination of input and previous hidden state, just like all the other gates. However, the other gates pass through a sigmoid because we want to essentially binarize them. We want them to be ~0 or 1, which is what a sigmoid does. This way, these gates function as True/False activations, letting data through with a 1 and stopping data with a 0. With the candiate cell state, we don't want a binarized output. Rather, we want all the information from the input state and the hidden state. Applying tanh moves the bounds between -1 and 1, which allows it to retain more information (basically we don't want 0s)
      To sumarize, we use the candidate cell state to obtain the actual information from the input and prior hidden state. This is why tanh is applied. The canditate cell state is then multiplied by the output gate (which IS 0s and 1s), and that determines how much of its information should be passed on to the actual cell state
      2. The concatenation of the weight matrices is the same thing as what he showed, it's just a little bit more efficient to store all the numbers in a single matrix. Conceptually its exactly the same though.
      3. That's kind of how it is with all of deep learning lol

  • @santiagolicea3814
    @santiagolicea3814 6 месяцев назад

    I love your videos, keep up the awesome work!!!

    • @ritvikmath
      @ritvikmath  6 месяцев назад

      Thank you! Will do!

  • @davigiordano3288
    @davigiordano3288 9 месяцев назад

    Thank you

  • @teetanrobotics5363
    @teetanrobotics5363 Год назад +1

    Hi Ritvik. It would be amazing if you could better organize the playlists.(chronological and right videos in right playlists)

  • @nasgaroth1
    @nasgaroth1 Год назад +1

    Awesome, like always - good job ;-)

  • @pauledam2174
    @pauledam2174 Год назад

    This is probably a not-so-great comment but something seems "wrong" with the gradient descent method (and the chain rule), if it generates a vanishing gradient problem (or exploding gradient problem). I mean, it isn't really "matching reality", because in reality, we can speak about a character on page 500 that was introduced on page 1. We are forced to apply this "band aid" of LSTM because the basic method of gradient descent is not "good enough" or in some way it is artificial. Does anyone agree with this? I am not sure what I mean by "matching reality".

  • @sophia17965
    @sophia17965 Год назад

    Isn't h9 also affected by x9?

  • @huntcookies5156
    @huntcookies5156 Год назад +1

    what da dogg doin😄😄

  • @anantsingh75
    @anantsingh75 11 месяцев назад

    Bro literally explained what the dog doing

  • @tamirfri1
    @tamirfri1 Год назад +1

    who cares? we have transformers

    • @ritvikmath
      @ritvikmath  Год назад +4

      It’s part of the series building up to more complex things

    • @Gowthamsrinivasan
      @Gowthamsrinivasan Год назад +1

      lstms are still surprisingly powerful for a lot of applications.