Implement Llama 3 From Scratch - PyTorch

Поделиться
HTML-код
  • Опубликовано: 15 ноя 2024

Комментарии • 27

  • @nbvcxz098
    @nbvcxz098 Месяц назад +4

    WOW! You are something else dude! No one provides content like you! Exceptional!

  • @Ece-kx6qk
    @Ece-kx6qk Месяц назад

    The Video that I have been waiting for !!! Thank you 🙏🏻

  • @binyiu5353
    @binyiu5353 Месяц назад

    Many thanks for this! It gives a much better understanding before reading the paper.

  • @aykutcayir64
    @aykutcayir64 Месяц назад

    As always, great job!👏🏻

  • @learntestenglish
    @learntestenglish Месяц назад

    I was waiting for new video. Thanks for awesome work ❤😊

  • @කැලණිකුප්පි
    @කැලණිකුප්පි Месяц назад +2

    Woowwww awesome thanks for this ❤❤

  • @gustavojuantorena
    @gustavojuantorena Месяц назад

    Awesome!

  • @abhijoy.sarkar
    @abhijoy.sarkar Месяц назад

    Let’s make llama4 before llama4 🤝

  • @flashlin1
    @flashlin1 Месяц назад +1

    Data, algorithms, and computational power are the three key elements. Why hasn't anyone added more complex connection models to Transformers? We should consider increasing the algorithmic complexity of large language models (LLMs), which can be likened to the complexity of connections in the human brain. This way, we wouldn't need to endlessly increase the number of parameters, especially since the number of artificial neurons already exceeds that of human neurons. Moreover, we haven't seen designs similar to the short-term memory neuron models from the runtime period.
    We should aim to design a model that can, like humans, quickly read relevant articles when faced with a problem. During the reading process, it could summarize related content into short-term memory and continuously update it. Then, based on this short-term memory, the model could verify the correctness of answers, for instance, by writing code to check the answers. Wouldn't this approach allow us to make the model smaller?

    • @uygarkurtai
      @uygarkurtai  Месяц назад

      It's a very good research question. Attention mechanism can be viewed like the "short-term" memory you mentioned too. I remember some articles to make NN's like human brain sinapses. However the problem is that they didn't perform that well.

    • @flashlin1
      @flashlin1 Месяц назад

      @@uygarkurtai The variety of neurons in the human brain far exceeds the range of functions used in artificial neural networks. How can we expect a single model, like the transformer, to handle everything? Shouldn't we focus on designing more diverse neural functions to better reflect the complexity of the brain?

    • @uygarkurtai
      @uygarkurtai  Месяц назад

      @@flashlin1 in that case we again end up with a computationally expensive model. There's such a trade-off that is difficult to overcome. You may want to check multi-models that's closest to what you mention. Combination of several models. If you're curious about mimicking the brain also check out spiking neural networks.

    • @flashlin1
      @flashlin1 Месяц назад

      @@uygarkurtai Why haven't we seen much progress with Spiking Neural Networks? My ideal concept of short-term memory should function during the inference phase, not be fixed during the training phase. Specifically, as the model processes an input question or reads through a large volume of articles, it should be able to summarize and store useful and relevant information in short-term memory, and only then generate an answer based on that.
      Moreover, during the process of generating an answer, the model should be able to dynamically update the short-term memory. For example, if later predictions impact the earlier generated content, the model should revise the previous answers based on the new information before producing the final result.
      Is there any model that works like this?

    • @uygarkurtai
      @uygarkurtai  Месяц назад

      @@flashlin1 we haven't seem them because usually there're points where they fall short compared to regular MLPs. To me what you mentioned seems a bit like RAG applications.

  • @en-iyi-benim
    @en-iyi-benim 26 дней назад

    Hi! I really enjoy your videos and the way you explain concepts. I recently implemented the Qwen-2 Vision model using pure PyTorch. There’s a small error I’m working through at the moment, but I’d love to know if you’d be open to making a video using my code to explain the process. I think it could be really helpful for others who are interested in vision language models. Let me know what you think

    • @uygarkurtai
      @uygarkurtai  26 дней назад

      @@en-iyi-benim hey thank you! I may look Qwen-2 model in the future. You can share your repository here too when it's done