The Narrated Transformer Language Model

Поделиться
HTML-код
  • Опубликовано: 4 ноя 2024

Комментарии • 229

  • @parthchokhra948
    @parthchokhra948 4 года назад +247

    Your blog on Illustrated Transformer was my intro to Deep Learning with NLP. Thanks for the amazing contributions for the community.

    • @jc_777
      @jc_777 3 года назад +6

      Yeah it is being referenced in my DL class too. Truly great content for new learners!

    • @ahmeterdonmez9195
      @ahmeterdonmez9195 26 дней назад

      @@jc_777 Gemini also refers Mr Alammar's blog post👍

  • @andresjvazquez
    @andresjvazquez 2 года назад +36

    Dear Teacher Alammar , thanks to this video I was able to accepted into BYU lab as an external researcher (even though I didn’t finish college) and have been invited by my professor to participate with the lab in CASP15 . You really changed the course of my life by demystifying such complex topics for non traditional learners like me . I’m eternally in your debt

  • @ans1975
    @ans1975 4 года назад +28

    The Illustrated Transformer blog is a masterpiece!

  • @Roshan-xd5tl
    @Roshan-xd5tl 2 года назад +18

    Your ability to explain and breakdown complex topics into simpler and intuitive sections is legendary. Thank you for your contribution!

  • @ayush612
    @ayush612 3 года назад +4

    I remember Seeing your Transformer's Blog Jay.. It was legendary!! Was referred to by other youtubers as well... And thanks a lot for the wonderful explanation as well!

  • @bighit7596
    @bighit7596 3 года назад +2

    you have a gift for explaining complex materials... many other technical talks assumes the audience is very knowledgeable and are attending the session just for networking

  • @nileshkikle8112
    @nileshkikle8112 9 месяцев назад

    Outstanding job demystifying the inner working details of the Transformer model architecture! All the illustrations and animations for the inference working are awesome. Thank you for taking all the time and sharing your understanding with all of us. Kudos! 👍

  • @maruthiprasad8184
    @maruthiprasad8184 11 месяцев назад

    Amazing explanation, my search to understand the transformers ended here, you done the wonderful job, thank you so much for the simplest explanation I ever seen.

  • @diogo.magalhaes
    @diogo.magalhaes 4 года назад +10

    Jay, as a PhD student, I'm a fan of your ability to explain complex topics, in a very simple, illustrated and didactic way! I always recommend your ' illustrated' posts to my colleagues. Thanks again for this great video, keep up the good work!

  • @drtariqahmadphd3372
    @drtariqahmadphd3372 3 года назад +1

    Never been more excited by a RUclipsr channel than when I saw this guy had a channel.

  • @kalinda619
    @kalinda619 3 года назад +2

    A phenomenal extension of your blog post. Commenting for that bump in the recommendation algorithm!

    • @arp_ai
      @arp_ai  3 года назад

      Thank you! Much appreciated!

  • @goelnikhils
    @goelnikhils Год назад

    I haven't see such a clear explanation of Transformers and Decoder LM Models, Amazing Work Jay

  • @curiouspie1264
    @curiouspie1264 Год назад

    One of the most comprehensive video and blog overviews of Transformers I've seen. Thank you. 🙏

  • @JimBob-lq1db
    @JimBob-lq1db 10 месяцев назад

    Thank you for this great explanation. Visualize , visualize, visualize, the best way to undestand how it works.

  • @quietkael7349
    @quietkael7349 4 года назад +9

    Thank you so much for all the tireless work you do for us visual learners out there! I’m looking forward to videos where you get into your excellent visualizations of the underlying matrix operations. Your visual abstractions both at the flow chart level and matrix/vector level have really shaped my mental model for what I think about when I’m engineering models. I’m so grateful and so excited to see what you come out with next (this library you hint at looks wonderful!)

    • @arp_ai
      @arp_ai  4 года назад +1

      Thanks Jack!

  • @kazimafzal
    @kazimafzal Год назад +1

    You sir are an amazing teacher! I'm absolutely flabbergasted by how well you've explained, to think its all mathematics at the end of the day! Thank you for taking the time to put together such a concise yet complete guide to transformers!

  • @OslecVardeven
    @OslecVardeven 7 месяцев назад

    Jay, recentemente estive em um curso de I.A, Mas voce apresentou muito bem, de forma didática a PNL.... eu aprendi muito com voce.
    Obrigado. Continue sendo este cara maravilhoso.

  • @tachyon7777
    @tachyon7777 2 года назад +7

    It would nice to have a step by step walkthrough of the training process. And why each of those steps makes sense intuitively.

  • @jesuslopez3306
    @jesuslopez3306 Год назад

    Definitely it is easier to understand in a vertical way. Thanks for everything!

  • @jacakopl
    @jacakopl 3 года назад +1

    This is the best video I have seen by far in this domain. You strike a perfect balance in assuming the level of understanding of audience :)

    • @arp_ai
      @arp_ai  3 года назад

      Awesome! Glad you found it useful!

  • @ishandindorkar2846
    @ishandindorkar2846 10 месяцев назад

    Jay, many thanks for your work. These videos help me a lot to understand key concepts in NLP domain through visualization.

  • @ultraviolenc3
    @ultraviolenc3 2 года назад +1

    I’ve just read your “The illustrated transformer” article and I wanted to say that you made very smart and simple visual representations. It seems you put a lot of thought into that.

  • @Halterofilic
    @Halterofilic 6 месяцев назад

    2024, still a great reference to Transformers. Million thanks for the amazing work!

  • @stephenngumbikiilu3988
    @stephenngumbikiilu3988 2 года назад

    Your blog was referred to me by my lecture Julia Kreutzer of Google Translate, it's just amazing piece of work. It has really helped me in my understanding of these concepts. Thanks.

  • @sudzam
    @sudzam Год назад

    Wow! One of THE best explanation of Transformers.. Thanks @Jay!!

  • @nisalbandara
    @nisalbandara 3 года назад

    Im doing a Twitter sentiment analysis and i couldn't wrap my head around BERT and i came across this video. Perfectly explained. Thanks alot

  • @IyadKhuder
    @IyadKhuder Год назад

    I've ended up here to familiarize myself with NLP transformers. Your video was the optimal choice for me, as it' explains the concept in an understandable scientific manner. Thanks.

  • @abugigi
    @abugigi 2 месяца назад

    Great video, and perhaps just as important, great selection of albums

  • @raminbakhtiyari5429
    @raminbakhtiyari5429 3 года назад

    i don't khnow how must say thank you, I just can say please continue uploading your amazing videos. I live in a constrained country and this video is my only hope for learning like other peoples. yours sincerely.
    Ramin Bakhtiyari.

  • @NarkeEmpire
    @NarkeEmpire 11 месяцев назад

    You are a great teacher!!! If you chek the EQ settings and lower the music at the beginning the video is perfect!!! Thanks a lot for sharing your knowledge in this very understandable way

  • @rsilveira79
    @rsilveira79 4 года назад +4

    Nice collection of albuns man! Miles Davis, Radiohead, John Coltrane, very classy! 👏👏👏

    • @kumarvikas_134
      @kumarvikas_134 4 года назад +1

      Spot on observation, kind of ironic to be listening to Ok Computer and teaching about artificial intelligence :D

  • @1Kapachow1
    @1Kapachow1 3 года назад

    Really enjoyed your blog post and video, super clear - thank you very much for this amazing resource :)

  • @niundisponible
    @niundisponible 2 года назад

    I see Miles Davis vinyl, kind of blue. Awesome album, and thanks for the video!

  • @tiborsaas
    @tiborsaas 10 месяцев назад

    This video really aged well. It came out just after GPT3 and before ChatGPT. I love it how it gives massive insights to how current generative AI works behind the scenes (but obviously in a simplified way).

  • @nmstoker
    @nmstoker 4 года назад +2

    Watching it now, thanks so much! It's really helpful to go through these kinds of things with clear examples and explanations.
    My only preference would've been to reduce the volume of the background music in the intro. So many podcasts do this and it's an annoying trend!

    • @arp_ai
      @arp_ai  4 года назад +1

      Thanks Neil! Noted on the audio!

  • @exxzxxe
    @exxzxxe 10 месяцев назад

    Maybe the best video on this subject.

  • @a.e.5054
    @a.e.5054 4 года назад

    The best explanation of the Transformer and GPT model !!

  • @studmatze958
    @studmatze958 Год назад

    Thank you so much for you work on attention and transformers. Your posts and videos are the best i have encountered so far in terms of visualization and explanation. And you did it way better than my Professor. Again thank you :)

  • @jpmarinhomartins
    @jpmarinhomartins 3 года назад

    Dude I freakin love your blog, keep up with the good work! Thanks for everything!

  • @o_felipe_reis
    @o_felipe_reis 4 года назад +1

    Great video! Best regards from Brazil!

  • @gergerger53
    @gergerger53 4 года назад +1

    Amazing video. Have to admit that every time I heard the wrong pronunciation of "Shawshank" it did feel a bit like nails on a blackboard but easily forgivable. Jay, your resources and videos are phenomenal :) Thank you for putting in the work to help us all out.

    • @arp_ai
      @arp_ai  4 года назад +1

      Haha! Wrong how? Am I overpronouncing the shaWshank? Thank you!

    • @gergerger53
      @gergerger53 4 года назад +1

      @@arp_ai The "Shaw" is pronounced like "sure/shore" but in the video you use the vowel that's in "how/cow". Anyway, I only meant this as a tiny point :) Take home message is that you are an incredible ML / NLP teacher!!

  • @Opinionman2
    @Opinionman2 2 года назад

    Awesome stuff. your blog really helped clarify my deep learning class.

  • @AdityPai
    @AdityPai 4 года назад +1

    Thank you for writing the blog. It has helped me .

  • @tsadigov1
    @tsadigov1 Год назад

    I am trying to understand working of transformer, you explain it much accessible way. One small thing I wish the video had less of transitions between two cameras.

  • @romulodrumond3526
    @romulodrumond3526 3 года назад

    One of the best videos of the subject

  • @zongmianli9072
    @zongmianli9072 Год назад

    Thanks for the very clear and concise explanation, Jay!

  • @thecutestcat897
    @thecutestcat897 Год назад

    Thanks, your Blog is so clear!

  • @HelenTueni
    @HelenTueni Год назад

    Amazing video. Thank you very much for making this topic accessible.

  • @jemmaj2919
    @jemmaj2919 Месяц назад

    this is amazing. One thing I didn't understand is the matrix, how it is generated and used in the processing to return the probability (how "the" turns into a big array of inputs)

  • @Udayanverma
    @Udayanverma Год назад

    loved it. thanks. got some new neurons in my head created by this video.

  • @javierechevarria1548
    @javierechevarria1548 3 года назад

    Your are really good (excellent) at explaining a complex topic in a simple way. Congratulations !!!!

  • @sachinr3823
    @sachinr3823 3 года назад

    Omg, thanks lot for these amazing videos. Your lectures and blogs are so easy to understand.

    • @sachinr3823
      @sachinr3823 3 года назад

      Small request, please pin the BGM you used in the video

  • @damonandrews1887
    @damonandrews1887 3 года назад

    I found this very helpful visual explainer, thanks so much for your time, and thanks for chopping it up into sections for easy revision 🤓!

  • @tehseenzia3135
    @tehseenzia3135 3 года назад

    Amazing illustration. Keep going Jay.

  • @armingh9283
    @armingh9283 3 года назад

    Thanks for the explanation. Good music taste at the background by the way👍

    • @arp_ai
      @arp_ai  3 года назад

      Thank you!

  • @sharkeyryan
    @sharkeyryan 2 года назад

    Thanks for creating this content. Your explanation is quite easy to follow, especially for someone like me who is just beginning to explore these areas of AI/ML.

  • @KlimovArtem1
    @KlimovArtem1 3 года назад +5

    14:15 - so, the Self-Attention layer is actually the thing that’s trying to understand the meaning of the whole sequence? How does it work and how can it be trained? How long sequenced can it analyze?

  • @yudiguzman8926
    @yudiguzman8926 3 года назад

    I really appreciate your explanation about this topic. One more time, I check that DL is my new passion. Thanks a lot.

  • @KlimovArtem1
    @KlimovArtem1 3 года назад +2

    27:56 - this explains a lot, thank you so much!

  • @omarsultan827
    @omarsultan827 2 года назад

    Thank you for this awesome introduction!

  • @FabioAlmeida-k6t
    @FabioAlmeida-k6t 5 месяцев назад

    Excellent explanation, Thanks!

  • @pypypy4228
    @pypypy4228 7 месяцев назад

    A huge thank you for this explanation!

  • @utsavshukla7516
    @utsavshukla7516 3 года назад

    great explanation! also love all the pop culture references in your room :p

  • @yoonyamm
    @yoonyamm Год назад

    Thank you for sharing wonderful insight!

  • @NilaMasrourisaadat
    @NilaMasrourisaadat Год назад

    Amazinnnng illustration of language model transformers

  •  3 года назад +9

    Just a personal comment on the format of the videos: I, personally, find that constant change of scene (like in "The architecture of the transformer" section) where the camera changes constantly showing you and then showing the computer screen and then back to you, is extremely annoying.
    The content of the video itself was informative.

  • @junlinguo77
    @junlinguo77 2 года назад

    I like the way you are teaching! !!

  • @josephsueke
    @josephsueke 8 месяцев назад

    Really clear. amazing job!

  • @tusharkhustule3316
    @tusharkhustule3316 Год назад

    1 minute into the video and I already subscribed.

  • @ygorgallina2691
    @ygorgallina2691 2 года назад

    Thank you so much for your work ! The illustration help to clearly understand these models !!

  • @itall9025
    @itall9025 4 года назад

    Great explanation! Please keep doing this format.

  • @amirhosseinfereidooni1798
    @amirhosseinfereidooni1798 3 года назад

    Thanks for the great explanation. MLP (at 11:35) stands for multilayer perceptron :)

  • @TusharKale9
    @TusharKale9 3 года назад

    Great master piece explanation of NLP in real life scenario. Thank you

  • @rupakgoyal1611
    @rupakgoyal1611 3 года назад

    loved the music behind ..

  • @MsFearco
    @MsFearco 2 года назад

    I just found this now. it's super. thanks

  • @parmarsuraj99
    @parmarsuraj99 4 года назад +5

    ❤️ That library!!!!

    • @arp_ai
      @arp_ai  4 года назад +6

      It's been my entire focus the last few months. Stay tuned!

  • @peterkahenya
    @peterkahenya Год назад

    Wow! 🎉 Awesome into.

  • @RK-fr4qf
    @RK-fr4qf Год назад

    Impressive. Thank you.

  • @hunorszegi4007
    @hunorszegi4007 Год назад

    Thank you for your videos and blog posts. These were my inspiration to create a Java GPT-2 implementation for learning purposes. I can't use a link here, but as huplay I uploaded it to the biggest hosting site, and it is called gpt2-demo.

  • @mrityunjayupadhyay7332
    @mrityunjayupadhyay7332 Год назад

    Great explanation

  • @maxbeber
    @maxbeber 4 года назад

    Thank you so much for the clear and concise explanation. Keep it up the great work.

  • @jackdavidweber
    @jackdavidweber 3 года назад

    This is really great! Highly recommend!

  • @tshepisosoetsane4857
    @tshepisosoetsane4857 Год назад

    Amazing work indeed thanks for simplifying things for everyone to understand this AI great work

  • @WanderNatureDaily
    @WanderNatureDaily 3 года назад

    absolutely amazing video

  • @hasanb2312
    @hasanb2312 3 года назад

    Great video Jay, thank you so much!

  • @evertonlimaaleixo1084
    @evertonlimaaleixo1084 3 года назад

    Amazing!
    Thank you for share!

  • @ankitmaheshwari7310
    @ankitmaheshwari7310 2 года назад +1

    Helpful.. you missed to import torch in your GitHub code.

  • @Alex-oo5rt
    @Alex-oo5rt Год назад

    6:13 actually, GPT-2 and GPT-3 models are both composed of an encoder-decoder architecture. The encoder-decoder architecture is a common framework used in natural language processing (NLP) tasks, particularly in sequence-to-sequence models. while GPT-2 and GPT-3 have an encoder component, it is not as prominently utilized as the decoder for generating text outputs.

  • @andreysguitarmusic2661
    @andreysguitarmusic2661 Год назад

    Great explanations!

  • @snehansughosh2111
    @snehansughosh2111 3 года назад

    Simply great Jay .. all it matters is keeping simple while spearheading the objective and you are bang on it

    • @arp_ai
      @arp_ai  3 года назад

      Thank you! Glad you enjoyed this.

  • @hongkyulee9724
    @hongkyulee9724 3 года назад +1

    You are my hero. You give me reason of my life :D

  • @vijayko-e9f
    @vijayko-e9f Год назад

    Great work 👍👍👍

  • @yuchenyang4394
    @yuchenyang4394 3 года назад

    Great content! can't wait for more.

    • @arp_ai
      @arp_ai  3 года назад

      Thank you Yuchen!

  • @hailongle
    @hailongle 3 года назад

    Fantastic teacher. Thanks Jay!

  • @mertcokelek4595
    @mertcokelek4595 4 года назад +3

    Thank you for the great explaination.
    I am new to this topic, and I wonder why the "shawshank" word is tokenized into 3 pieces, the "sh" and "ank" are meaningless, is it a result of a learned model? Or the tokenization is done hand-crafted?
    Thanks in advance.

    • @arp_ai
      @arp_ai  4 года назад +4

      That is the result of training the tokenizer using BPE en.wikipedia.org/wiki/Byte_pair_encoding

  • @akshikaakalanka
    @akshikaakalanka Год назад

    Thank you very much! this is awesome and easy to understand.

  • @Nereus22
    @Nereus22 2 года назад

    Great video, thank you!

  • @mahdiamrollahi8456
    @mahdiamrollahi8456 2 года назад

    Thanks, very intuitive…

  • @giofou711
    @giofou711 3 года назад

    Great video!

  • @vslobody
    @vslobody 4 года назад +1

    Jay - i think this question was asked somewhere else, but i cannot find good answer -
    From the article:
    > In the decoder, the self-attention layer is only allowed to attend to earlier positions in the output sequence. This is done by masking future positions (setting them to -inf) before the softmax step in the self-attention calculation.
    In other words, the output logits (i.e. word translations) of the decoder are fed back into that first position, with future words at each time-step masked.
    I'm not quite sure how it all flows, b/c with several rows representing words all going through at once (a matrix), it seems like you would need to run the whole thing forward several times per sentence, each time moving the decoded focal point to the next output word...
    where is this loop in the Decoder layer, i am struggling to figure it out n my own.
    Thanks much in advance,
    Volodimir

    • @arp_ai
      @arp_ai  4 года назад

      By "rows" I assume you mean when the model is processing a batch, and every row is an example sentence. This visual might explain that:
      jalammar.github.io/images/gpt2/transformer-attention-masked-scores-softmax.png
      from jalammar.github.io/illustrated-gpt2/

    • @vslobody
      @vslobody 4 года назад

      @@arp_ai Thanks! If every row is an example sentence, then why do you only look into the first word in the first row, but you look into the two words in the second row and so on?

    • @arp_ai
      @arp_ai  4 года назад

      @@vslobody sorry, let clarify. In the image, each row is for processing the same sentence with an additional word.
      The section in the article that starts with "This masking is often implemented as a matrix called..." explains in more detail

    • @vslobody
      @vslobody 4 года назад

      @@arp_ai Great, thanks a lot. So this is my question - where is the loop that allows to go me to go through each word in the sentence, it seems to me i cannot find one in the code.

    • @arp_ai
      @arp_ai  4 года назад

      @@vslobody I believe that would be the forward pass that generates each token. What implementation are you looking at? Huggingface?