LSTM Networks - EXPLAINED!

Поделиться
HTML-код
  • Опубликовано: 26 ноя 2024

Комментарии • 152

  • @CodeEmporium
    @CodeEmporium  Год назад +1

    For details and code on building a translator using a transformer neural network, check out my playlist "Transformers from scratch": ruclips.net/video/QCJQG4DuHT0/видео.html

  • @AzharKhan-to2ll
    @AzharKhan-to2ll 2 года назад +8

    I learnt about LSTMs from so many sources; but no one explained it this well. This is some amazing content you are creating. It should be preserved.

  • @from-chimp-to-champ1
    @from-chimp-to-champ1 2 года назад +1

    The first time that i see how someone "unrolled" the lstm network and actually demonstrated it. This could not do any professor that i saw. They only showed the picture of cells anybody could find on the internet. Thank you very much, good job!

  • @kennethatz39
    @kennethatz39 5 лет назад +19

    Programmed my first LSTM with that video. Really good introduction to this topic. Right amount of math, architecture, background (GRU, RNN etc) and coding.

  • @simoneparvizi775
    @simoneparvizi775 2 года назад

    WTF did i just watched....man in 15 min video you explained so many topics and smoothly.....I'm new to AI but what you did was impressive. You explainaid the meaning of everything while simplifieng math concepts BUT STILL putting them in....thank you for your work, I really appreciate it

    • @CodeEmporium
      @CodeEmporium  2 года назад +1

      I'm super glad you appreciate this style. I'm trying to make more videos like this as of later too. :)

  • @sepehr_fard
    @sepehr_fard 5 лет назад +4

    I’m not done with watching it. But I had to leave a comment first. I think you have really cracked the way to make people understand. Not a single professor has ever taught the way you have unfortunately. I have always wanted someone whose teaching to start from ground up and just explain everything before diving into math. You even explained what os and random was that’s never seen before in cs 😂. So thank you I really enjoy your video it is great for once someone understood a student who might be watching this does not have a phd in mathematics so explaining what each variable means and what the big picture is might save them hours of being lost and confused. Keep up the great work man I really appreciate this channel now it’s a hidden gem!!!!

  • @LifeKiT-i
    @LifeKiT-i Год назад

    i study MSc computer science at HKU, but your teaching is much better than my professor. OMG

  • @anibhatia
    @anibhatia 4 года назад +14

    Amazing video.. I love how you have explained a number of different concepts and explained each one with due integrity

  • @loopuleasa
    @loopuleasa 6 лет назад +58

    pretty in-depth view on this
    I like your pacing better than Siraj, also the simplicity

    • @CodeEmporium
      @CodeEmporium  6 лет назад +9

      Thanks a lot! I'm going for a "here is why we do things the way we do" approach. Glad that you (and many others) find it interesting.

    • @beingnothing34
      @beingnothing34 4 года назад +7

      This man is a different beast! Way better and hence shouldn't be compared to Siraj! :) Great video.

    • @farenhite4329
      @farenhite4329 4 года назад +8

      Arun Kumar the scandal showed why Siraj was so much worse at explaining than this guy.

    • @ifmondayhadaface9490
      @ifmondayhadaface9490 4 года назад

      Farenhite oof yeah

    • @SwapnilGusani
      @SwapnilGusani 4 года назад +5

      @@beingnothing34 Dude Siraj was a fraud.

  • @AlistairWalsh
    @AlistairWalsh 4 года назад +2

    Really like your conversational explanations. Great detail presented in a palatable manner.

  • @NeilWiddowson
    @NeilWiddowson 3 года назад +11

    This makes so much more sense than my lecture...

  • @185283
    @185283 4 года назад +2

    Hello, have a question: @8:50 you mentioned x(0)... x(n) as inputs. If you had a sentence "Hello World", is a vector of "Hello" be x(0) and "World" be x(1)? If so, x(0) and x(1) will require 2 LSTM cells, and will one line of "model.add(LSTM)" have two LSTM cells to process "Hello World"? How can we visualize more than one LSTM layer then?

  • @h4ck314
    @h4ck314 6 лет назад +10

    I like the quality of your content, I'll definitely watch your other videos !

    • @CodeEmporium
      @CodeEmporium  6 лет назад +1

      Thanks sooo much! Enjoy your stay ;)

  • @bopon4090
    @bopon4090 4 года назад

    Thank you sooo much for linking references in the description.

  • @jeroenritmeester73
    @jeroenritmeester73 3 года назад

    Thank you SO MUCH for giving some examples of each architecture. Im following multiple ML courses on uni, but everything is abstracted away behind mathematical jargon, and never gets back to basics.

  • @hoangphamviet1241
    @hoangphamviet1241 2 года назад

    Great video!!! Everything I can see and understand from the video make compelling sense for me. Thank you so much!!

    • @CodeEmporium
      @CodeEmporium  Год назад

      You are very welcome (sorry I am so late)

  • @akompsupport
    @akompsupport Год назад

    Good overview! Still relevant. LSTM's have come a long way, important for the dev of LLM that are showing SOTA performance on NLP as of this date no?

  • @ChaminduWeerasinghe
    @ChaminduWeerasinghe 3 года назад

    Your explanation is amazing. Love the way you joking and that makes the video more interesting❤️

  • @happyduck70
    @happyduck70 2 года назад

    Had a mighty laugh on the Sepp Hochreiter joke, thanks!

  • @pablovillarroel3109
    @pablovillarroel3109 5 лет назад +1

    Such a great video, you explain everything so clearly and at a good pace, liked and subscribed!

  • @ИванНикитин-ч7б
    @ИванНикитин-ч7б 3 года назад

    Can't understand some points. If I have a set of temperature values or closing price of a day. Just one linear sequence. I need to forecast 3 future days values by 10 previous days values. So the question is which of values I need to put into the first LSTM cell, which values into the second cell and so on? The second question is how much LSTM cells I need for this calculations; does an LSTM cells count depend on previous days count or future days count?

  • @piotrgrzegorzek8039
    @piotrgrzegorzek8039 5 лет назад +1

    Hi! just a question, does lstm predict on sequences of FEATURES in ONE SAMPLE or sequences of SAMPLES (outputs) in ONE BATCH? For eg. I need to predict next number as many to one. I fit first sample as x1=1, x2=2 and output y=3, next sample x1=4, x2=5 y=6. NOW Does the model look on sequence of features (x1,x2) or sequence of samples (y, which are output of the model)

  • @onomatopeia891
    @onomatopeia891 Год назад

    Can you explain further what the hidden size argument is for in the LSTM? Many say it is the dimensionality of the output but I don't get it. The sample explanations of LSTM I saw only has 1 dimensionality so what does it mean when hidden size or number of units as some refer to is more than 1?

  • @georgebarnett121
    @georgebarnett121 5 лет назад +1

    Don't BatchNorms and He Initilaization fix Vanishing/Exploding Gradients? ResNet actually fixed model degredation, where deeper models perform worse than smaller models. Deeper networks should learn identity connections if an optimal model has smaller models. The ResNet shortcut connection allows easy learning of mappings similar to identity mappings.
    How does affect LSTMs? Why can't we just include BatchNorms to fix vanishing/exploding gradients?

  • @thalesogoncalves1
    @thalesogoncalves1 4 года назад

    Excelent video, dude! It's awesome when someone embraces both theoretical *and* practical parts. Thanks a lot

  • @mohammadyahya78
    @mohammadyahya78 Год назад

    Thank you very much. Why the gradient explode as a function of t/d please at 7:19?

  • @sooryaprakash6390
    @sooryaprakash6390 Год назад

    Mind-blowing Video!. Thanks for making it.

  • @friedrichwilhelmhufnagel3577
    @friedrichwilhelmhufnagel3577 Год назад

    Hello! Your Link to your coursera videos is seemingly broken/expired. Can I find videos from you on coursera and can you recommend more learning material like courses and books to me?
    Thank you!
    Great videos.

  • @DiaboloMootopia
    @DiaboloMootopia 3 года назад

    Great video. Is it possible that the graphic at 9:45 is mislabeled? h_t is coming out at the top right where I though o_t should be emerging.

  • @beshosamir8978
    @beshosamir8978 2 года назад

    Hi , i need some help here
    why we decide to make the next hidden state = the long memory after filter it ? why not the next hidden layer not = the long memory (Ct)

  • @9899895384
    @9899895384 4 года назад

    wow, your explanation is so simplistic!

  • @TheLOL9842
    @TheLOL9842 3 года назад

    Gosh can't wait for that video on GRU that's coming pretty soon! Besides the joke, Thanks for the video!

  • @shaythuramelangkovan5800
    @shaythuramelangkovan5800 3 года назад

    Hi Siraj, could you explain why we use a dense layer ?

  • @yangwang9688
    @yangwang9688 4 года назад

    Max length of the sentence is 40, but why set LSTM units to 128? What is the output size of LSTM?

  • @internationalenglish7413
    @internationalenglish7413 5 лет назад +1

    You are very good. Someday, you will be a great professor.

  • @exoticme4760
    @exoticme4760 4 года назад

    is it not better to use word embeddings rather than character vectors

  • @threeMetreJim
    @threeMetreJim 5 лет назад +1

    Very good and informative video, shame about how many adverts though.

  • @1UniverseGames
    @1UniverseGames 3 года назад

    Do you have any videos about using RNN model for cyber threat attacks, or any source to look for study it

  • @wiebetje00
    @wiebetje00 4 года назад

    You cut the text into semi-redundant sequences of maxlen characters, but how does the model or performance change if you change the value of maxlen?

  • @tylersnard
    @tylersnard 4 месяца назад

    Thank you for this. What are U, V, and W at 8:44?

  • @auslei
    @auslei 4 года назад

    nice and concise.. good work buddy

  • @Небудьбараном-к1м
    @Небудьбараном-к1м 4 года назад

    Isn't 128 is too many for hidden size? I building an LSTM network, my input shape is [300, 5] and using hidden_size=128 results in gradient vanishing.
    Also, what happens if I add more layers to the dense net which comes after LSTM? Will this architecture be able to learn? Because LSTM "requires" a relatively large learning rate, which is often too large for typical FC network I am guessing that this will cause some crazy instability as a whole. I hope you could help me with these annoying questions :). Thanks a lot for sharing your knowledge!

  • @jayasreechaganti9382
    @jayasreechaganti9382 2 года назад

    Sir can you do a video of Rnn example by giving numerical values

  • @xinyuma5358
    @xinyuma5358 4 года назад

    Hi, Why we use Tanh in RNN consider it is a bad activation function? Can we use ReLu?

  • @rahimdehkharghani
    @rahimdehkharghani 4 года назад

    I really liked this clear exlpanation.

  • @kamalmanchu3060
    @kamalmanchu3060 3 года назад

    This is phenomenal....great explanation dude..... ❤️

  • @mentalmodels5
    @mentalmodels5 5 лет назад +2

    I'm confused about the part where he says "Gradient will now explode/vanish as a function of tau/d" 7:06
    Can someone explain this to me?

    • @dfnoshamps
      @dfnoshamps 5 лет назад

      If I got the right understanding, since the propagation of weigth should happen throught jumps over d units instead of directly to next one, the explosion problem should happen in a "smoother rate"

    • @mentalmodels5
      @mentalmodels5 5 лет назад

      @@dfnoshamps Thanks for the reply, what I don't get is why it should happen at a smoother rate if you just add a skip connection?

  • @captiandaasAI
    @captiandaasAI Год назад

    Great!!!!!!!!!!!!!!!!!!!!!!! Lecture damm good explanation..

  • @medhnhadush4320
    @medhnhadush4320 2 года назад

    awesome explanation. thank you

  • @kamalchapagain8965
    @kamalchapagain8965 5 лет назад +2

    Thanks ! Simply the best.

  • @ObviouslyASMR
    @ObviouslyASMR 4 года назад +17

    I'm new to AI so this might be a silly question but I thought the weights were randomly initialized, how is it possible it performed so well on the first epoch? I assumed the characters would be completely random but they make at least some semblance of words already, or is there already some learning done before the end of the first epoch?
    Btw thanks so much for the video! Way clearer than others I've watched

    • @siddheshbalshetwar3869
      @siddheshbalshetwar3869 4 года назад +1

      The prediction sentence is printed after the epoch...so yes it did learn 'something' in that epoch that's why it makes a little sense

    • @ObviouslyASMR
      @ObviouslyASMR 4 года назад +1

      @@siddheshbalshetwar3869 Thanks man, I think when I wrote this comment I was under the impression that the printed sentences were from during the training and before backprop, but I realize now that first of all the backprop would've probably been done in batches, and second of all that like you said the sentences are printed after the final backprop in that epoch

    • @siddheshbalshetwar3869
      @siddheshbalshetwar3869 4 года назад +1

      @@ObviouslyASMR yeah any time man

    • @lynnlo
      @lynnlo 4 года назад

      The weights are randomized, the goal of a Neural Network is to make a bad guess and turn it into a better one.

    • @harriethurricane8617
      @harriethurricane8617 3 года назад +1

      OMG definitely didn't expect to see my favorite ASMR channel here lol

  • @ethiomusic3217
    @ethiomusic3217 2 года назад

    how to use ctc loos function for training of variable length sequences??? can you help to me??

  • @justingoh3750
    @justingoh3750 2 года назад

    Great video! My only complain is that I cannot find your video explaining GRUs like you said you would =p

    • @CodeEmporium
      @CodeEmporium  2 года назад +1

      Yea I did not do that and got caught up with some other videos later on :) My bad

  • @rainfeedermusic
    @rainfeedermusic 4 года назад

    I liked the explanation but unfortunately could not understand why exploding gradients is more of a problem in RNN rather than a DNN. I mean the W that gets propagated from h(t-1) to h(t) can also be in such a way that when one W is >1 the next could be

    • @zd676
      @zd676 4 года назад +1

      In DNN Ws can be different from layer to layer, so W in layer 1 is 0. In RNN, weights get shared, so if W>1 or W

  • @huangbinapple
    @huangbinapple 3 года назад

    Starts at 9:00

  • @charlieangkor8649
    @charlieangkor8649 3 года назад

    I don’t understand it. Suddenly a pic full of math symbols pops up it’s not labeled what are inputs outputs neurons connections weights

  • @osci5124
    @osci5124 2 года назад

    Great video, you are really good at explaining logically

  • @samfelton5009
    @samfelton5009 3 года назад

    what's your source for the images throughout this video? I'd love to use them in my own work!

  • @nathanaelsatrianugraha3381
    @nathanaelsatrianugraha3381 2 года назад

    Hello i'm new here, i want to ask, how do we know the value of Wi, Wf, Wo, Wc? Is it randomize? Thank you, BTW nice video

  • @bubblesgrappling736
    @bubblesgrappling736 4 года назад

    is "cell" equal to neuron? it seems to be like the case.
    But at 8:45, when you say that each sequence element goes through a cell each, then i am confused, is the cell really modelling the entire model?

  • @adampaslawski8859
    @adampaslawski8859 3 года назад

    This is a great video , thanks for making it

  • @elisimic4371
    @elisimic4371 4 года назад

    High Quality Content!

  • @joshualee3172
    @joshualee3172 4 года назад

    what are the dimensions of the weights?

  • @rabinthapa1431
    @rabinthapa1431 4 года назад

    bro can u make a video on implementing Convolutions and LSTMs

  • @nezbut7
    @nezbut7 3 года назад

    this was very helpful! thank you

  • @ApriliyanusPratama
    @ApriliyanusPratama 6 лет назад +1

    excellent explanation. can you show me where i can get full math derivation of backward pass of lstm?

    • @CodeEmporium
      @CodeEmporium  6 лет назад

      Thanks! A quick google search takes me here: arunmallya.github.io/writeups/nn/lstm/index.html#/
      It seems good.

  • @doubtunites168
    @doubtunites168 5 лет назад +1

    what kind of sorcery is this?

  • @Below10IQ
    @Below10IQ 5 лет назад

    Loading Weights generates different results to when it was trained.

  • @Firewalker124
    @Firewalker124 5 лет назад +1

    Got a specific question: I am currently trying to classify motion in a 3d-animation. So basically I get a bunch of 3d-vectors that i am trying to get in relation over time. More specifically I want to check if the movement of the bones and joints are too fast. So my thought was to use lstm to check that. I would use the 3d-vectors for each frame as an input in a lstm-cell. Yet i am not quite sure how to set each cell, each frame in relation to the next one. Any tipps? :D

    • @soareverix
      @soareverix 2 года назад

      This is a really interesting problem I'm interested in as well, for VR purposes! Did you ever solve it?

    • @Firewalker124
      @Firewalker124 2 года назад +1

      @@soareverix well it was a topic for a possible master thesis for myself, i thought a bit about it, but changed the topic due to some otver hardware related problems. However, i had an idea on how to enter all necessary information into the lstm that could work. But im currently still working, so maybe ill write back later with the idea. In my case it wasnt vr but motion capturing of movements

  • @coolvideos2829
    @coolvideos2829 2 года назад

    How can we predict the market using math? I believe it's possible through Fourier series and a few other views. Please help 🆘 I just don't understand how to get the wave form of the market and then calculate a point in time to predict the price. Itself sounds simple but idk what to

    • @CodeEmporium
      @CodeEmporium  2 года назад

      Hmm. The stock market is very hard to predict. It depends on factors that go beyond historical trends. It's a fun toy problem, but not super realistic to model. I have a video of me attempting to build a model for this too. It's one of my more recent videos

  • @cooky123
    @cooky123 2 года назад

    Good video, thank you.

  • @danielpiskorski9447
    @danielpiskorski9447 4 года назад

    Great video! Thank you

  • @akashkewar
    @akashkewar 4 года назад

    keep doing the good stuff man.

    • @CodeEmporium
      @CodeEmporium  4 года назад +1

      Thanks dude. I'm always about that good stuff.

  • @hangchen
    @hangchen 2 года назад

    Best part 6:32

  • @karthiksrini7178
    @karthiksrini7178 5 лет назад

    the presentation is at its best. What software are u using?

    • @CodeEmporium
      @CodeEmporium  5 лет назад

      Thanks for the compliments Karthik. I use Camtasia Studio for editing my videos

  • @CCCC-lu2st
    @CCCC-lu2st 4 года назад +6

    The way he pronounced "Sepp Hocrieter" blew my brains 😅

  • @PopMusicFilms
    @PopMusicFilms 4 года назад

    bro you are fire, i was struggling in my deep learning course and this LTSM video really helped

  • @ethiomusic3217
    @ethiomusic3217 2 года назад

    good videos, but i have some questions please

  • @yulinliu850
    @yulinliu850 6 лет назад

    Excellent lecture! Many Thanks!

  • @loriando7698
    @loriando7698 5 лет назад

    You are doing good jobs! But I do not really understand that in this case, your chars value are unique characters, so why after converting into text, it is not unique ones, words in alphabet instead?

  • @dompatrick8114
    @dompatrick8114 2 года назад

    6:37 Lmao the comedic timing, I died.

  • @kostasgeorgiou2417
    @kostasgeorgiou2417 5 лет назад

    I love your videos, please make more!

  • @nomadlyyy111
    @nomadlyyy111 3 года назад

    equations for GRU's are wrong, it will have Ht-1 not Ht

  • @loyodea5147
    @loyodea5147 5 лет назад

    Thank you for the great video!

  • @swathykrishna9618
    @swathykrishna9618 4 года назад

    Good explanation. Can u do one video on Xception model?Plz

    • @CodeEmporium
      @CodeEmporium  4 года назад

      Thanks! I have already done an Xception explanation. Check out my video on "Depthwise Separable Convolution - explained"

  • @cliccme
    @cliccme 5 лет назад

    Hi, I have one question regarding BiLSTM neural network. Should i ask here or on your Quora profile? Thanks

  • @TheShubham67
    @TheShubham67 4 года назад

    Really Awesome stuff

  • @gauravkumar6534
    @gauravkumar6534 5 лет назад

    hi, your video was nice and i request you to make video on LSTM for speech recognization please.

  • @MrMrjacky7
    @MrMrjacky7 4 года назад

    Hi! I have some sequences generated from some initial conditions, what model should I use to have a sequence generated from some initial condition based on the data I have?
    seq2seq models usually predict the following data of a series but don't generate sequence from initial conditions.

    • @shubhamdotdkhema
      @shubhamdotdkhema 4 года назад

      You should probably try open AI GPT-2....it will generate sentences for u given an initial context (or even a single word).

  • @rakshithak.sgowda7155
    @rakshithak.sgowda7155 4 года назад

    hi sir, can you please send me this project code if u have"Developing an Efficient Deep Learning-Based Trusted Model for Pervasive Computing Using an LSTM-Based Classification Model"

  • @arshadhashmi7938
    @arshadhashmi7938 4 года назад

    How can I get this code

  • @mikefda12
    @mikefda12 2 года назад

    At 5:24 what is that e looking symbol called?

    • @CodeEmporium
      @CodeEmporium  2 года назад

      The symbol is an epsilon which means "belongs to". So x(i) belongs to a set of vectors of real numbers with D dimensions. Simply put, x(i) is a vector of real numbers with D dimensions.

    • @mikefda12
      @mikefda12 2 года назад

      @@CodeEmporium thank you

  • @anishjain8096
    @anishjain8096 5 лет назад

    Brother i won't understand many things how to do good and learn more advance concept

  • @jhonysilver5208
    @jhonysilver5208 3 года назад

    Good video!

  • @robinmuller2402
    @robinmuller2402 2 года назад

    yeah we have no question marks in german 3:32

  • @LightFykki
    @LightFykki 6 лет назад

    Amazing video, thanks!

  • @dubey_ji
    @dubey_ji 4 года назад

    thank you so much !

  • @tharindawicky
    @tharindawicky 5 лет назад

    thanks

  • @selman6753
    @selman6753 4 года назад

    Andrew NG style

  • @rayo3117
    @rayo3117 3 года назад

    your videos are funny when you're german like me