How to Build an LLM from Scratch | An Overview

Поделиться
HTML-код
  • Опубликовано: 13 дек 2024

Комментарии • 261

  • @ShawhinTalebi
    @ShawhinTalebi  Год назад +22

    [Correction at 15:00]: words on vertical axis are backward. It should go "I hit ball with baseball bat" from top to bottom not bottom to top.
    👉More on LLMs: ruclips.net/p/PLz-ep5RbHosU2hnz5ejezwaYpdMutMVB0
    --
    References
    [1] BloombergGPT: arxiv.org/pdf/2303.17564.pdf
    [2] Llama 2: ai.meta.com/research/publications/llama-2-open-foundation-and-fine-tuned-chat-models/
    [3] LLM Energy Costs: www.statista.com/statistics/1384401/energy-use-when-training-llm-models/
    [4] arXiv:2005.14165 [cs.CL]
    [5] Falcon 180b Blog: huggingface.co/blog/falcon-180b
    [6] arXiv:2101.00027 [cs.CL]
    [7] Alpaca Repo: github.com/gururise/AlpacaDataCleaned
    [8] arXiv:2303.18223 [cs.CL]
    [9] arXiv:2112.11446 [cs.CL]
    [10] arXiv:1508.07909 [cs.CL]
    [11] SentencePience: github.com/google/sentencepiece/tree/master
    [12] Tokenizers Doc: huggingface.co/docs/tokenizers/quicktour
    [13] arXiv:1706.03762 [cs.CL]
    [14] Andrej Karpathy Lecture: ruclips.net/video/kCc8FmEb1nY/видео.html
    [15] Hugging Face NLP Course: huggingface.co/learn/nlp-course/chapter1/7?fw=pt
    [16] arXiv:1810.04805 [cs.CL]
    [17] arXiv:1910.13461 [cs.CL]
    [18] arXiv:1603.05027 [cs.CV]
    [19] arXiv:1607.06450 [stat.ML]
    [20] arXiv:1803.02155 [cs.CL]
    [21] arXiv:2203.15556 [cs.CL]
    [22] Trained with Mixed Precision Nvidia: docs.nvidia.com/deeplearning/performance/mixed-precision-training/index.html
    [23] DeepSpeed Doc: www.deepspeed.ai/training/
    [24] paperswithcode.com/method/weight-decay
    [25] towardsdatascience.com/what-is-gradient-clipping-b8e815cdfb48
    [26] arXiv:2001.08361 [cs.LG]
    [27] arXiv:1803.05457 [cs.AI]
    [28] arXiv:1905.07830 [cs.CL]
    [29] arXiv:2009.03300 [cs.CY]
    [30] arXiv:2109.07958 [cs.CL]
    [31] huggingface.co/blog/evaluating-mmlu-leaderboard
    [32] www.cs.toronto.edu/~hinton/absps/JMLRdropout.pdf

    • @amortalbeing
      @amortalbeing Год назад

      thanks a lot for the refs , Shahin Jan ❤
      keep up the great job 👍

  • @LudovicCarceles
    @LudovicCarceles 8 месяцев назад +38

    "Garbage in, garbage out" is also applicable to our brain. Your videos are certainly high quality inputs.

  • @seanwilner
    @seanwilner 11 месяцев назад +59

    This is a about as perfect a coverage of this topic as I could imagine. I'm a researcher with a PhD in NLP who trains LLMs from scratch for a living and often find myself in need of communicating the process in a way that's digestible to a broad audience without back and forth question answering, so I'm thrilled to have found your piece!
    As an aside, I think the token order on the y-axis of the attention mask for decoders on slide 10 is reversed

    • @ShawhinTalebi
      @ShawhinTalebi  11 месяцев назад +4

      Thanks Sean! It's always a challenge to convey technical information in a way that both the researcher and general audience can get value from. So your approval means a lot :)
      Thanks for pointing the out. The blog article has a corrected version: medium.com/towards-data-science/how-to-build-an-llm-from-scratch-8c477768f1f9?sk=18c351c5cae9ac89df682dd14736a9f3

    • @AritraDutta-tz4je
      @AritraDutta-tz4je 7 месяцев назад +1

      Sir can you tell me how are you training your llms?

    • @xxcusme
      @xxcusme 6 месяцев назад

      most of people watching this video is through certain prompt of how to build LLM and these people is the rest 10% by your logic, the makers & inventors

    • @dortrox7557
      @dortrox7557 3 месяца назад

      Can I connect with you if possible?

  • @barclayiversen376
    @barclayiversen376 8 месяцев назад +9

    Pretty rare that I actually sit through an entire 30+ minute video on youtube. Well done.

  • @lihanou
    @lihanou 8 месяцев назад +4

    clicked with low expectation, but wow what a gem. Great clarity with just the right amount of depth for beginners and intermediate learners.

  • @mujeebrahman5282
    @mujeebrahman5282 10 месяцев назад +6

    I am typing this after watching half of the video as I am already amazed with the clarity of explanation. exceptional.

    • @ShawhinTalebi
      @ShawhinTalebi  10 месяцев назад

      Thanks, hope the 2nd half didn't disappoint!

  • @tahanadeem3717
    @tahanadeem3717 2 дня назад

    Beautifully explained! I have rarely ever watched the whole video, well you got me hooked

  • @dauntlessRx
    @dauntlessRx 9 месяцев назад +7

    This is literally the perfect explanation for this topic. Thank you so much.

  • @Hello_kitty_34892
    @Hello_kitty_34892 11 месяцев назад +9

    Your voice is relaxing.. I love that you don't speak super fast like most tech bros... And you seem relaxed about the content rather than having this "in a rush" energy. def would watch you explain most things LLM and AI! Thanks for the content.

    • @ShawhinTalebi
      @ShawhinTalebi  11 месяцев назад

      Thanks for the feedback. More AI/LLM content to come!

  • @sinan325
    @sinan325 Год назад +8

    I am not a programmer or now anything about programming or LLMs but I find this topic fascinating. Thank you for your videos and sharing your knowledge.

    • @ShawhinTalebi
      @ShawhinTalebi  Год назад

      Happy to help! I hope they were valuable.

  • @SamChughtai
    @SamChughtai 11 месяцев назад +2

    Thanks, Shaw!! Great video and excellent data, would love to be your mentee, sir!!

    • @ShawhinTalebi
      @ShawhinTalebi  10 месяцев назад +1

      Thank you for your generosity! I don't currently do any formal mentorship, but I try to give away all my secrets on RUclips and Medium :)
      Feel free to share any suggestions for future content.

  • @chrstfer2452
    @chrstfer2452 Год назад +3

    That was simply incredible, how the heck does it have under 5k views. Literal in-script citations, not even cards but vocal mentions!! Holy shit im gonna share this channel with all my LLM enamored buddies

    • @ShawhinTalebi
      @ShawhinTalebi  Год назад +2

      Thanks, I'm glad it was helpful. You're referrals are greatly appreciated 😁

  • @goldholder8131
    @goldholder8131 10 месяцев назад +2

    This is the most comprehensive and well rounded presentation I've ever seen in my life, topic aside. xD Bravo, good Sir.

    • @ShawhinTalebi
      @ShawhinTalebi  10 месяцев назад

      Thanks so much! Glad you liked it :)

  • @racunars
    @racunars Год назад +10

    All the series on using large language models (LLMs) are really very helpful. This 6th article, really helps me to understand in a nutshell the transformer architecture. Thank you. 👏

  • @MairaTariq-q1q
    @MairaTariq-q1q 2 месяца назад

    This series is definitely the best one out there! Subscribed instantly

  • @asha328
    @asha328 8 месяцев назад +1

    One of the best videos explaining the process and cost to build LLM🎉.

  • @mater5930
    @mater5930 6 месяцев назад +1

    I became interested in creating an LLM and this is the first video I opened. I am so greatful for it because I see I will never be able to do it on my own. I don't jave the money of resources. Thank you for the high level overview.

  • @starman9000
    @starman9000 2 месяца назад

    To be frank it is too hard for me to understand the subject, but your calm and explain so smoothly made to listen entire video length, Thank you.

  • @tehreemsyed8621
    @tehreemsyed8621 7 месяцев назад +1

    This is such a fantastic video on building LLMs from scratch. I'll watch it repeatedly to implement it for a time-series use case. Thank you so much!!

  • @qicao7769
    @qicao7769 11 месяцев назад +1

    Best and most efficient video about the basic of LLM!!!! I think I have saved 10h for reading. Thanks!

    • @ShawhinTalebi
      @ShawhinTalebi  11 месяцев назад

      Love to hear it! Glad it helped

  • @ares106
    @ares106 Год назад +3

    thank you, this is infinitely more enjoyable for me than reading a paper.

    • @ShawhinTalebi
      @ShawhinTalebi  Год назад +1

      😂😂 I’m glad you liked it!

    • @fab_spaceinvaders
      @fab_spaceinvaders Год назад +1

      second this, keep the good work flowing all around 🎉 🙏

  • @bradstudio
    @bradstudio 10 месяцев назад +2

    This was a very thorough introduction to LLMs and answered many questions I had. Thank you.

    • @ShawhinTalebi
      @ShawhinTalebi  10 месяцев назад

      Great to hear, glad it was helpful :)

  • @RAAI-k8r
    @RAAI-k8r 3 месяца назад

    I have little background with NLP and in BERT model actually, really fascinated by the way you describe the whole process that it would be easier to grab for general audience. much appreciated and you voice is soothing.

  • @joedigiovanni8758
    @joedigiovanni8758 9 месяцев назад +1

    Great job demystifying what is happening under the hood of these LLMs

  • @theunconventionalenglishman
    @theunconventionalenglishman 8 месяцев назад

    This is excellent - thanks for putting this together and taking the time to explain things so clearly!

  • @GBangalore
    @GBangalore 11 месяцев назад +3

    Thank you so much for putting these videos together and this one in particular. This is such a broad and complex topic and you have managed to make it as thorough as possible in 30ish minute😮 timeframe which I thought was almost impossible.

    • @ShawhinTalebi
      @ShawhinTalebi  11 месяцев назад

      My pleasure, glad it was informative yet concise :)

  • @Lidan0241
    @Lidan0241 18 дней назад

    So useful and easy to comprehend! Thank you

  • @shilpyjain6147
    @shilpyjain6147 8 месяцев назад +1

    Hey Shaw - Thank you for coming up with this extensive video on building LLM from Scratch, it certainly gives a fair idea on, how some of the existing LLMs were created !

  • @EigenA
    @EigenA 9 месяцев назад

    Great channel, 3rd video in. You earned a sub. Thank you!

  • @lFaizaanl
    @lFaizaanl 3 месяца назад

    How does this channel not have a million subs?

    • @ShawhinTalebi
      @ShawhinTalebi  3 месяца назад

      LOL.. may be too technical for causal viewing 😅

  • @ethanchong1026
    @ethanchong1026 Год назад

    Thanks for putting together this short video. I enjoy learning this subject from you.

  • @akshatjain4084
    @akshatjain4084 3 месяца назад

    Amazing and very Simple Exaplanation..Thank You for the video

  • @rohanpujara
    @rohanpujara 59 минут назад

    You must make a follow up video for this today!

  • @robwarner1858
    @robwarner1858 11 месяцев назад +1

    Amazing video. Lost me through a fair bit, but I came away understanding more than I ever have on the subject. Thank you.

  • @aldotanca9430
    @aldotanca9430 Год назад +1

    Thoroughly researched and referenced, clear explanations inclusive of examples. I will watch it again to take notes. Thanks so much!

    • @ShawhinTalebi
      @ShawhinTalebi  Год назад +1

      Great to hear! Feel free to reach out with any questions or suggestions for future content :)

    • @aldotanca9430
      @aldotanca9430 Год назад

      Thanks! I would have plenty of questions actually, but they are probably a bit too specific to make for a generally relevant video. I am exploring options for a few non-profit projects related to musical education and research. They need to integrate large bodies of text and produce precise referencing to what comes from where, so I was naively toying with the idea to perhaps produce a base model partially trained on the actual text in question. Which, I understood from the video, is a non-starter. So I will look into fine-tuning, RAG and prompt engineering. I suspect I will spend quite a lot of time watching your convent, given you covered quite a lot. I also learned quite a bit more from this specific video. Right now I am studying the basics, including a bit of the math involved, and it is a bit slow going, so I am quite grateful :)

    • @ShawhinTalebi
      @ShawhinTalebi  Год назад +1

      @@aldotanca9430 That sounds like a really cool use case (I've been a musician for over 14 years)!
      If you want to chat about more specific questions feel free to set up some office hours: calendly.com/shawhintalebi/office-hours

    • @aldotanca9430
      @aldotanca9430 Год назад

      @@ShawhinTalebi That's very generous of you! I will book a slot, would love to chat, I think it would help me immensely to rule out blind alleys and at least get a well informed idea of what is feasible to attempt. I did notice the congas, piano and Hanon lurking in the background, so I suspected the topic will be interesting to you. It is about historical research, but it is also very applicable and creative for improvvisation. Perhaps I can compile a very short list of interesting resources, in case you want to check it out at some point for musical reasons :)

  • @malakamoussaka6976
    @malakamoussaka6976 Месяц назад

    Very deep analysis

  • @DigsWigs2022
    @DigsWigs2022 7 месяцев назад

    Great explanation. I will have to watch it a few times to have a basic understanding 😂

  • @bnm123z
    @bnm123z 3 месяца назад

    Fantastic work

  • @shih-shengchang19
    @shih-shengchang19 10 месяцев назад

    Thanks for your video; it's awesome. You explain everything very clearly and with good examples.

    • @ShawhinTalebi
      @ShawhinTalebi  10 месяцев назад

      Thanks for the feedback, glad it was clear :)

  • @ifycadeau
    @ifycadeau Год назад

    Love these videos! Keep it up Shaw!

  • @vijayakashallenki7275
    @vijayakashallenki7275 8 месяцев назад +1

    Waiting for the complete AI-ML playlist! sir please

  • @gRosh08
    @gRosh08 8 месяцев назад +1

    Cool.

  • @farexBaby-ur8ns
    @farexBaby-ur8ns 16 дней назад

    Nice vid.
    When u deploy a model I have heard there is a file/files that goes with it which is referred to by the LLM whenever a prompt comes in. Can you describe that mechanism (in a palatable way😁)

    • @ShawhinTalebi
      @ShawhinTalebi  13 дней назад

      I'm not quite sure which file this is. My guess this is a either a "system prompt" or prompt template that is used to augment the raw input from a user to generate more helpful responses.

  • @SpeakerMangoes
    @SpeakerMangoes 8 месяцев назад

    watching this right before my interview.

    • @ShawhinTalebi
      @ShawhinTalebi  8 месяцев назад

      Good luck!

    • @SpeakerMangoes
      @SpeakerMangoes 8 месяцев назад

      @@ShawhinTalebi cleared 1st round, now its on Thursday, i hope your luck brings me my dream job ❤️

  • @DavidNordfors-i5i
    @DavidNordfors-i5i 9 месяцев назад +1

    Very very good!!

  • @ronakbhatt4880
    @ronakbhatt4880 Год назад +1

    @17:08 isnt weight of decoders are wrong if 0 is the weight of token to the future token to it?

    • @ShawhinTalebi
      @ShawhinTalebi  Год назад

      Sorry I didn't understand your question. Could you rephrase?

  • @randomforest_dev
    @randomforest_dev 9 месяцев назад

    Awesome Video! Thanks.

  • @PorterHarris
    @PorterHarris 9 месяцев назад

    Great content Shaw!
    Next step Im having troubles figuring out, is there a way to run locally an existing GPT and do prompt engineering or model fine-tuning on it with my own training data?

    • @ShawhinTalebi
      @ShawhinTalebi  9 месяцев назад

      Thanks! While this depends on your local machine specs, the short answer is yes! My next video will actually walk through how to do this using an approach called QLoRA.

  • @kanakorn
    @kanakorn 3 месяца назад

    thanks for your explain

  • @rezNezami
    @rezNezami Год назад

    excellent job Shawhin. Merci.

  • @funnymono
    @funnymono 9 месяцев назад

    Exceptional material

  • @vsudbdk5363
    @vsudbdk5363 Год назад +1

    Any resources on enrichment of prompt template, I feel in my case difficult one to understand and implement as an LLM returns response based on how we define the template overcoming unecessary context...

    • @vsudbdk5363
      @vsudbdk5363 Год назад +1

      Recently begun exploring Generative AI need like proper guidance on where to learn and do the code part, ik it will be a long journey understanding the math behind it, learning concept and code, staying all night for checkpointing metrics, performance and all.. thank you

    • @ShawhinTalebi
      @ShawhinTalebi  Год назад +1

      Great question. The video on prompt engineering might be helpful: ruclips.net/video/0cf7vzM_dZ0/видео.html

    • @ShawhinTalebi
      @ShawhinTalebi  Год назад +1

      That's a good mindset to have. AI is an ocean, with endless things one can learn.
      This playlist could be a good starting place: ruclips.net/p/PLz-ep5RbHosU2hnz5ejezwaYpdMutMVB0

    • @vsudbdk5363
      @vsudbdk5363 Год назад

      @@ShawhinTalebi thank you very much

  • @rajez.s7157
    @rajez.s7157 11 месяцев назад +1

    Can Ray clusters be used here for mutiple GPUs training of LLMs?

    • @ShawhinTalebi
      @ShawhinTalebi  11 месяцев назад

      I haven't used Ray clusters before, but skimming their website it seems like it was specifically made for ML workloads.

  • @romantolstykh7488
    @romantolstykh7488 9 месяцев назад

    Great video!

  • @techdiyer5290
    @techdiyer5290 11 месяцев назад +3

    What if you could make a small language model, that maybe only understand english, can understand code, and is easy to run?

    • @ShawhinTalebi
      @ShawhinTalebi  11 месяцев назад +1

      That is a compelling notion. If we can get there, then it would make this technology even more accessible and impactful.

    • @shrinik1969
      @shrinik1969 11 месяцев назад

      Size = accuracy...small may not give u what u want

    • @F30-Jet
      @F30-Jet 7 месяцев назад

      NanoChatGPT

  • @muhammadali-jv1kr
    @muhammadali-jv1kr 4 месяца назад

    Hi thanks for wonderful content. Can u make a video on prompt engineering and fine tuning with code explanation for open ended QA task.

    • @ShawhinTalebi
      @ShawhinTalebi  4 месяца назад

      Great suggestion. I added it to the list :)

  • @inishkohli273
    @inishkohli273 6 месяцев назад

    Just completed the whole video . Took me 10 days, . It is a good idea to just provide surface knowledge and not overwhelming the students but instead letting them to research and further read it on their own by giving tons of references. I have a suggestion, why not create a open notebook allow student to edit and fillup more information/learning materials because there were some point in the video where it feels like you could have elaborated more or scratched and summarized even a small portion of that subject more. Thanks

    • @ShawhinTalebi
      @ShawhinTalebi  6 месяцев назад

      That's a great suggestion! I've always been a fan of "open-source" textbooks and the like.
      Feel free to share any points you'd like me to discuss further in future videos of this series :)

  • @miguelangelcabreravictoria8775
    @miguelangelcabreravictoria8775 29 дней назад

    Should we removed the stopwords?

  • @CurrentCache
    @CurrentCache Месяц назад

    Thanks!

  • @jackflash6377
    @jackflash6377 Год назад +3

    Just now asking GPT4.0 to help me with training text. It is not allowed to assist in training any LLMs and would not give me anything.

    • @ShawhinTalebi
      @ShawhinTalebi  Год назад +1

      I believe it’s now against OpenAI’s policy to use their models to train other models. You may need to look to open-source solutions eg Llama2, Mistral

    • @petevenuti7355
      @petevenuti7355 Год назад

      ​@@ShawhinTalebi . How could it possibly stop it ? If the model being trained fed the prompt and used the response for reenforcment and alignment?

  • @Syazwan9
    @Syazwan9 2 месяца назад

    This video and others, is the first wave of riding Ai trend

    • @ShawhinTalebi
      @ShawhinTalebi  2 месяца назад

      A lot has changed since I posted this.

  • @Nobody2310
    @Nobody2310 7 месяцев назад

    what is the most basic technical artifact that is used/required to build any LLM? Is that an existing LLM such as Llama 2?

    • @ShawhinTalebi
      @ShawhinTalebi  7 месяцев назад

      I am not quite sure of the meaning of "most basic technical artifact," but here's one perspective. There are two ways to build an LLM: from scratch and fine-tuning. When training from scratch, the essential piece is the training data used to develop the model. When fine-tuning, the essential piece is the pre-trained model you start from (e.g., Llama2).
      Hope that helps!

  • @saibhaskerraju2513
    @saibhaskerraju2513 Месяц назад

    can you do a tutorial on which model to use to train a resume and it should be able to answer any question (almost). I trained with GPT-2 but the context window is just 1024 tokens and it is pretty nothing useful

    • @ShawhinTalebi
      @ShawhinTalebi  Месяц назад

      If you are trying to do document QA, using any of the recent models (e.g. GPT-4o, Claude, Llama 3.2) and passing the doc in as context should work well.

    • @saibhaskerraju2513
      @saibhaskerraju2513 Месяц назад

      @ShawhinTalebi unfortunately I don't want to use third party hosted models , I want to train something from base image and use it. I don't want dependency on any cloud provider

  • @akramsystems
    @akramsystems 9 месяцев назад

    This is Gold

  • @LezzGoPlaces
    @LezzGoPlaces 9 месяцев назад

    Brilliant!

  • @MegaBenschannel
    @MegaBenschannel Год назад

    Thanks for the great and pack expose. 😀

  • @nobafan7515
    @nobafan7515 9 месяцев назад

    Thank you for the video! I was wondering if you can help me. Lets say i ask gpt if romeo and juliet was a comedy or a tragedy, and the only data it has was put in by people that didnt have time to fact check the data, and i wanted my own gpt (lets say this is one of the tiny ones that can easily run on my laptop) so it can explain the history of it so it can explain to me the facts of it.
    Do i need to dive in the llm model and find that specific data to correct it? Can i fine tune it to improve it (lets say i have a gpu big enough to train this llm)? Is the model fine, but i need a different gpt?

    • @ShawhinTalebi
      @ShawhinTalebi  9 месяцев назад +1

      If I understood correctly, the question is on how to ensure the LLM gives accurate response.
      While there are several ways one can do this, the most effective way to give a model specialized and accurate information is via a RAG system. This consists of providing the model specific information from a knowledge base depending on the user prompt.

  • @siyufan1084
    @siyufan1084 Год назад

    Arrived right on time! The quality of the video is consistently excellent as always

    • @ShawhinTalebi
      @ShawhinTalebi  Год назад

      Great to hear! I'm glad they are helpful :)

  • @echofloripa
    @echofloripa Год назад

    Wow, what a great content, thanks for that!! In LLM Fine-tuning, is there also a suggestion table between number of trainable parameters and tokens used (dataset size)?

    • @ShawhinTalebi
      @ShawhinTalebi  Год назад +1

      That’s a great question. While I haven’t come across such a table, good rule of thumb is 1k-10k examples depending on the use case.

    • @echofloripa
      @echofloripa Год назад

      @@ShawhinTalebi thanks for the quick reply! What about the number of trainable parameter, should we worry about that? What if my number of examples is smaller than that let's say a 100 to 200?

    • @ShawhinTalebi
      @ShawhinTalebi  Год назад

      ​@@echofloripa IMO you've got to work with what you've got. I've heard some people get sufficient performance from just 100-200 examples, but it ultimately comes down to what is acceptable for that particular use case. It might be worth a try.
      Hope that helps!

  • @ErricN.C.
    @ErricN.C. Месяц назад

    Now do it IN Scratch. Haha JK
    Great Vid, Very informative.

  • @abhishekfnu7455
    @abhishekfnu7455 9 месяцев назад

    Is there a way to use Data Dictionary to train LLM model to generate SQL queries later on?

    • @ShawhinTalebi
      @ShawhinTalebi  9 месяцев назад

      Yes, but you will likely need to transform the data a bit before it can be used for fine-tuning. I give a concrete example of this here: ruclips.net/video/4RAvJt3fWoI/видео.html

  • @wilfredomartel7781
    @wilfredomartel7781 Год назад +1

    🎉❤❤❤amazing video

  • @Nursultan_karazhigit
    @Nursultan_karazhigit 9 месяцев назад

    Hello , Thanks . Do you know is it possible to create an own LLM for own startup?

    • @ShawhinTalebi
      @ShawhinTalebi  9 месяцев назад

      Of course this is possible. However, it is rarely necessary. I'd suggest seeking simpler (and cheaper) solutions before jumping to training an LLM from scratch.

    • @Nursultan_karazhigit
      @Nursultan_karazhigit 9 месяцев назад

      @@ShawhinTalebi thanks

  • @shaminMohammed-s9s
    @shaminMohammed-s9s Год назад

    Hi, i have domain specific pdf files . How do i train using transfer learning? Please advise

    • @ShawhinTalebi
      @ShawhinTalebi  Год назад

      Depends on what you mean by transfer learning. If you simply want to extract knowledge from a PDF I'd recommend exploring RAG or using off-the-shelf solutions like OpenAI Assistants interface.
      Happy to clarify, if I misinterpreted the question.

  • @hashifvs519
    @hashifvs519 11 месяцев назад

    Can you post a video onu continual pretraining of llms like Llama

    • @ShawhinTalebi
      @ShawhinTalebi  11 месяцев назад +1

      Thanks for the great suggestion. I’ll be doing more content on fine-tuning so that will be a good topic to cover there.

  • @abcoflife6420
    @abcoflife6420 Год назад +1

    Thank you so much for rich information, my target is to DIY one from scratch .. 😢 for sure it wont be billions of tokens, I want to make it practical for example for home management, or school reporting system ... instead of static reports . to enable it to create and run its own sql queries and run it .. 😅

    • @ShawhinTalebi
      @ShawhinTalebi  Год назад

      Happy to help! To make something practical I'd recommend using an existing model fine-tuned to generate SQL queries e.g. huggingface.co/defog/sqlcoder

  • @YohannesAssefa-wk5oo
    @YohannesAssefa-wk5oo 11 месяцев назад

    thankyou bro for your help

  • @arpadbrooks5317
    @arpadbrooks5317 Год назад

    very informative thx

  • @huzash6977
    @huzash6977 6 дней назад +2

    What if i wanted to make a really shitty LLM though? A lot of this stuff is only relevant for people with a lot of money. All i have is effort, time, a maths degree and me.

    • @ShawhinTalebi
      @ShawhinTalebi  6 дней назад +1

      I'd suggest this video: ruclips.net/video/l8pRSuU81PU/видео.htmlsi=hcEw4AMkjc09GTIS

    • @huzash6977
      @huzash6977 2 дня назад

      @@ShawhinTalebi Thank you this video and his channel is very useful. There's actually quite a few videos out there for learning about neural networks and LLMs for someone who knows a lot of coding and a little bit of the basic maths, but the reverse isn't nearly as common or easy to find. I only know R since I used it a lot in my degree. I'm learning python now.

  • @hari_madh
    @hari_madh 10 месяцев назад

    bro i want to build an LLM.. does this video help me learn myself and build LLM myself? possible? (i did not see it till now)

    • @ShawhinTalebi
      @ShawhinTalebi  10 месяцев назад

      While this video may be a helpful first step, more resources will be necessary. Here are a few additional resources I recommend.
      - ruclips.net/video/kCc8FmEb1nY/видео.html&ab_channel=AndrejKarpathy
      - huggingface.co/learn/nlp-course/chapter1/1?fw=pt

  • @nick066hu
    @nick066hu 8 месяцев назад

    Thank you for putting together this video, helped me a lot to understand LLM training.
    One question: with the advent of trillion token models and beyond, I wonder where will we get all that training input data from. I guess we already consumed what all humanity has produced in the last 5000 years, and by adding another 10M digitized cat videos, the models will not be smarter.

    • @ShawhinTalebi
      @ShawhinTalebi  8 месяцев назад

      Good question! I suspect there is still much content out there that hasn't been touched by LLMs i.e. non-digital text and proprietary data. Nevertheless, this content is still finite and the "just make a bigger model" approach will eventually hit a limit.

  • @amparoconsuelo9451
    @amparoconsuelo9451 Год назад

    Can a fine-tuned LLM be repurposed and re-fine-tuned for more than one task?

    • @ShawhinTalebi
      @ShawhinTalebi  Год назад

      Yes it can! In fact, that is what OpenAI did with their RLHF technique to create their InstructGPT models

  • @TheIronMason
    @TheIronMason 9 месяцев назад

    When it comes to transformers. Are you saying they're more than meets the eye?

    • @ShawhinTalebi
      @ShawhinTalebi  9 месяцев назад

      That's a good way to put it 😂

  • @jamesmurdza
    @jamesmurdza 11 месяцев назад

    The matrices at 16:40 don't look right to me. I think the words labelling the rows should go from top to bottom, not bottom to top.

    • @ShawhinTalebi
      @ShawhinTalebi  11 месяцев назад

      Good catch! Yes, the word labels are inverted on the Y axis. A corrected visualization is provided in the blog: medium.com/towards-data-science/how-to-build-an-llm-from-scratch-8c477768f1f9?sk=18c351c5cae9ac89df682dd14736a9f3

  • @Therecouldbehope
    @Therecouldbehope 3 месяца назад +1

    The problem with all LLM’s is that they lien Left Politically. Therefore, a platform to calibrate LLM’s to absolute neutrality is where the next money train is leaving the station. LLM’s cannot be allowed to be politically manipulated towards the left or the right.

  • @hayam1magdy
    @hayam1magdy Месяц назад

    how i can chat with my RDF graph

    • @ShawhinTalebi
      @ShawhinTalebi  Месяц назад

      This is a great question! I don't have experience with this. However, this resource seems helpful: www.deeplearning.ai/short-courses/knowledge-graphs-rag/

  • @catulopsae
    @catulopsae 7 месяцев назад

    What does it mean the amount of parameters???

    • @ShawhinTalebi
      @ShawhinTalebi  7 месяцев назад +1

      Good question. A model is something that takes an input (say a sequence of words) and produces an output (e.g. the next most likely word). Parameters are numbers which define how the model takes inputs and translates them into outputs.

    • @catulopsae
      @catulopsae 7 месяцев назад

      @@ShawhinTalebi thank you

  • @PabloPernambuco
    @PabloPernambuco 8 месяцев назад +1

    Now, I am discovering my low QI... 0,001% of learning...😂

  • @guerbyduval4104
    @guerbyduval4104 8 месяцев назад

    Do you have a course on how to do it as a programmer instead of *like a chat gpt talker* ?

    • @ShawhinTalebi
      @ShawhinTalebi  8 месяцев назад

      I don't have a from scratch coding tutorial yet. But I am a fan of the one from Andrej Karpathy: ruclips.net/video/kCc8FmEb1nY/видео.html

  • @crosstalk125
    @crosstalk125 11 месяцев назад

    Hi, I like your content. But I want to point out that what you are calling tokenization is vectorization. Tokenization breaks documents/sentences/words into subpart and vectorization converts tokens into numbers. Thanks

    • @ShawhinTalebi
      @ShawhinTalebi  10 месяцев назад

      Thanks for raising that point. Here I'm lumping the two together, but these are 2 separate steps.

  • @dohua_ai
    @dohua_ai Год назад

    So my dreams about own LLM are broken(( So as i understood the only way to build some personal LLM is FineTuning? Atleast while cheap ways of training not appeared yet...

    • @ShawhinTalebi
      @ShawhinTalebi  Год назад +1

      I wouldn't give up on it! My (optimistic) conjecture is as we better understand how these models actually work we will be able to develop ones that are much more computationally efficient.

  • @Joooooooooooosh
    @Joooooooooooosh 10 месяцев назад +1

    Wait how did we get from $180K for a 7B model to $100K for a 10B model...

    • @ShawhinTalebi
      @ShawhinTalebi  10 месяцев назад

      This is what we Physicists call an "order-of-magnitude estimate"

  • @varadacharya2802
    @varadacharya2802 6 месяцев назад

    Can you make a series on Data Science and Artificial Intelligence Topics

    • @ShawhinTalebi
      @ShawhinTalebi  6 месяцев назад

      Anything in particular you'd like to see?

    • @varadacharya2802
      @varadacharya2802 6 месяцев назад

      @@ShawhinTalebi I would be nice if you made on AI for begineers who do not know any algorithms of AI like DFS , BFS etc

  • @hypercoder-gaming
    @hypercoder-gaming Год назад +1

    When you were calculating the cost, you estimates that a 10b model would take 100k GPU hours but Llama 2 took 180k GPU hours and that was 7b. These estimates are way off. How is it that 100b costs less than 70b?

    • @ShawhinTalebi
      @ShawhinTalebi  11 месяцев назад +1

      The numbers from Llama 2 were only meant to give an idea of scale. More precise estimates will depend on the details of the use case.

  • @julius333333
    @julius333333 4 месяца назад

    the training part is really basic. I would like to see more practical, real world preoccupations in scaling duration, synch communication costs, logging, etc.

    • @ShawhinTalebi
      @ShawhinTalebi  4 месяца назад

      Great suggestion. Noted for future videos :)

  • @Sunnyangusyoung
    @Sunnyangusyoung 9 месяцев назад

    What if I don’t want to build my model but work for someone who is building one.

  • @lyonspeterson1094
    @lyonspeterson1094 8 месяцев назад

    Good contents. But when I watch the video, there are so many ads. I;m even confused what I am supposed to watch.

  • @aftalavera
    @aftalavera 2 месяца назад

    This bullshit will never end. DEI LLM!

  • @issair-man2449
    @issair-man2449 Год назад

    Hi, hoping that my comment will be seen and responded... I FAIL to understand:
    If a simple model learns/predicts, couldn't we prompt it to delete the trash data and train itself by itself autonomously until the model becomes super intelligent?

    • @ShawhinTalebi
      @ShawhinTalebi  Год назад

      LLMs alone only do token prediction, as discussed in the first video of this series: ruclips.net/video/tFHeUSJAYbE/видео.html
      While an AI system could in principle train itself, it would require much than just LLM to pull that off.

  • @Amipotsophspond
    @Amipotsophspond 2 месяца назад

    2:52 wait how do you manage to do anything with ai and not know Navidia starts with a N. yeah it's a hard to spell word I could not do it but if you are actually pricing out how much it will cost to rent gpu time you will see that word a lot.