LSTM Networks: Explained Step by Step!

ritvikmath

Просмотров 27 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 26 ноя 2024

Комментарии • 48

@rajpulapakura001 Год назад ⁺¹¹
At first I was inclined to click away from the video because of the unorthodox explanation of LSTM in "steps", which was different to what I had seen in other videos and blog posts which focus on the infamous LSTM diagram. However, I was struggling to fully grasp LSTMs so I decided to give the video a try. And it paid off! I can't believe LSTMs are that simple! This video is absolutely essential for understanding LSTMs at a fundamental level.
@LifeKiT-i Год назад ⁺⁵
i have listened to a 2-hour lecture in my MSc data science, still don't know what is happening. Your video is explain it in a succinct way!!! Thank you!!!
@user-wr4yl7tx3w Год назад ⁺²¹
the great thing about your videos is that I am always guaranteed to learn something and learn it with much better understanding.
@ritvikmath Год назад ⁺¹
Happy to hear that!
@juaneshberger9567 Год назад ⁺⁸
Your videos (specifically sampling and deep learning videos) helped me a lot during my master's. Thanks for all the videos!
@ritvikmath Год назад
Thanks for watching !
@vzinko Год назад ⁺³
Best explanation of LSTMs on the internet
@ritvikmath Год назад
thanks!
@karanmaniyar9086 Год назад ⁺³
I've been trying to understand LSTM through multiple blogs and videos but the thing that why it needs to be this complex , you specifically targeted that point of view to understand it , this is really one of the best videos , because you showed why there was a need for a LSTM and how could the gaps be filled , which is what made it very easy to understand . Could you please list the references as well for the video , so that if anyone has to go further deep into the concepts, it would be very helpful ! Thanks a lot for this video !
@chaitrab9253 Год назад ⁺¹
Plz continue the same good work by blending Mathematics with simple Real time example. Fantastic Explanation👍
@thankgoodnessitstheweekend2860 9 месяцев назад
Thank you so much for your videos! They are super informative and much more intuitive than the hundreds of slides I have from my master's class. Keep up the great work!
@charleskangai4618 Год назад ⁺¹
Extremely good and helpful! A great genuine desire to help learners by explaining difficult ideas in a most self effacing manner! Many thanks!
@carlosenriquehuapayaavalos6297 Год назад ⁺²
Thanks for the video!! Just what I needed for my ML midterm exam. Will be waiting for the Transformers topic that I believe build upon this concept.
@ritvikmath Год назад
Glad it was helpful!
@pushkarparanjpe Год назад
This is an extremely good explanation. Thanks for all the effort and sharing!!
@rizkabritania2429 9 месяцев назад
super helpful. I cant thank you enough for making this explanation
@DarkAtom04 Год назад
Amazing explanation!
@jianhuali4080 11 месяцев назад
great job on this topic explanation!
@santiagolicea3814 7 месяцев назад
I love your videos, keep up the awesome work!!!
@ritvikmath 7 месяцев назад
Thank you! Will do!
@golnoushghiasi7698 Год назад
So happy you did this video!!! :D Thank you for all the great work!
@ritvikmath Год назад ⁺¹
You are so welcome!
@romainjouhameau2764 Год назад
Really easy to follow ! Thanks a lot
@ritvikmath Год назад
Glad it helped!
@KarthikNaga329 Год назад ⁺²
Great video as always!
The part that still perplexes me:
How does the LSTM "know" what is important (like dog) and when to actually use that to predict the next word?
@Arjun---- Год назад
It learns through lots of training, just like every other aspect of DL
@billdepo1 Год назад
Just amazing intuition! Thanks so much for the great content.
@ritvikmath Год назад
Glad you enjoyed it!
@isha2567 Год назад
Thanks! Could you also explain GRU
@juaneshberger9567 Год назад
A video about transformers and GANs in this style would be awesome as well.
@ritvikmath Год назад
Thanks for the suggestion!
@_Sam_-zh7sw Год назад ⁺³
Hi Ritvik,
It will be really great if you could create videos which explain maths behind ML models like SVM and PCA.I am also curious about ODE, PDE, real analysis, complex analysi and stochastic calculus. But the problem is that i want to explore topics which are relevant to financial engineering. So i could read all quant finance related textbooks. I am a professional and really dont have time to read all applied maths textbooks 😅.
@gordongoodwin6279 Год назад
he has covered the ML models you listed in depth. it's illegal to cover ODE and PDE though bc nobody likes them
@nasgaroth1 Год назад ⁺¹
Awesome, like always - good job ;-)
@ritvikmath Год назад
Thank you! Cheers!
@prateekcaire4193 Год назад
Thank you very much! I have a few questions:
1. Could you please explain the reasoning behind using a candidate cell state and why the tanh activation function is necessary?
2. I have noticed that many implementations, papers, or blogs I have read use concatenation of h[t-1] and x[t] and a single learnable weight matrix W instead of U and V used in this video. Can you clarify why this is the case?
3. Despite the success of the model in predicting words, I remain somewhat skeptical about how it achieves such accuracy. :)
@Arjun---- Год назад
1. The candidate cell state is simply the same weighted combination of input and previous hidden state, just like all the other gates. However, the other gates pass through a sigmoid because we want to essentially binarize them. We want them to be ~0 or 1, which is what a sigmoid does. This way, these gates function as True/False activations, letting data through with a 1 and stopping data with a 0. With the candiate cell state, we don't want a binarized output. Rather, we want all the information from the input state and the hidden state. Applying tanh moves the bounds between -1 and 1, which allows it to retain more information (basically we don't want 0s)
To sumarize, we use the candidate cell state to obtain the actual information from the input and prior hidden state. This is why tanh is applied. The canditate cell state is then multiplied by the output gate (which IS 0s and 1s), and that determines how much of its information should be passed on to the actual cell state
2. The concatenation of the weight matrices is the same thing as what he showed, it's just a little bit more efficient to store all the numbers in a single matrix. Conceptually its exactly the same though.
3. That's kind of how it is with all of deep learning lol
@teetanrobotics5363 Год назад ⁺¹
Hi Ritvik. It would be amazing if you could better organize the playlists.(chronological and right videos in right playlists)
@ritvikmath Год назад
Noted!
@davigiordano3288 10 месяцев назад
Thank you
@pauledam2174 Год назад
This is probably a not-so-great comment but something seems "wrong" with the gradient descent method (and the chain rule), if it generates a vanishing gradient problem (or exploding gradient problem). I mean, it isn't really "matching reality", because in reality, we can speak about a character on page 500 that was introduced on page 1. We are forced to apply this "band aid" of LSTM because the basic method of gradient descent is not "good enough" or in some way it is artificial. Does anyone agree with this? I am not sure what I mean by "matching reality".
@sophia17965 Год назад
Isn't h9 also affected by x9?
@huntcookies5156 Год назад ⁺¹
what da dogg doin😄😄
@anantsingh75 Год назад
Bro literally explained what the dog doing
@tamirfri1 Год назад ⁺¹
who cares? we have transformers
@ritvikmath Год назад ⁺⁴
It’s part of the series building up to more complex things
@Gowthamsrinivasan Год назад ⁺¹
lstms are still surprisingly powerful for a lot of applications.

Следующие

Автовоспроизведение

I Day Traded $1000 of Stocks using the LSTM Model