Large Language Models from scratch

Поделиться
HTML-код
  • Опубликовано: 28 май 2024
  • How do language models like chatGPT and Palm work? A short cartoon that explains transformers and the tech behind LLMs.
    Part 2: • Large Language Models:...
    0:05 - autocomplete
    0:44 - search query completion
    1:03 - language modeling with probabilities
    1:59 - time series and graphs
    2:34 - text generation
    3:43 - conditional probabilities
    3:52 - trigrams
    4:49 - universal function approximation
    5:19 - neural networks
    6:33 - gradient descent
    7:03 - back propagation
    7:24 - network capacity

Комментарии • 216

  • @triton62674
    @triton62674 Год назад +244

    This is seriously *really* good, I've not seen someone introduce high level concepts by-example so clearly (and nonchalantly!)

  • @solotron7390
    @solotron7390 Месяц назад +2

    Finally! Someone who knows how explain complexity with simplicity.

  • @somethingness
    @somethingness Год назад +310

    This is so good. I can't believe it has so few views.

    • @adarshraj1467
      @adarshraj1467 Год назад +6

      Same, brillant explaination on NN

    • @facqns
      @facqns Год назад +1

      Was just about to write the same.

    • @webgpu
      @webgpu Год назад +1

      if you really think so, post the link to this video on your social media.

    • @itsm3th3b33
      @itsm3th3b33 Год назад

      So few views... If a Kardashian posts a brain fart it gets more views from the unwashed masses. That is the sad reality.

    • @phil5053
      @phil5053 Год назад +2

      Very few study about it

  • @JohnLaudun
    @JohnLaudun 8 дней назад

    I have been working on ways to explain LLMs to people in the humanities for the past year. You've done it in 5 brilliant minutes. From now on, I'm just going to hand out this URL.

  • @DanielTateNZ
    @DanielTateNZ Год назад +3

    This is t he best explanation of LLMs I've seen

  • @GregMatoga
    @GregMatoga Год назад +23

    That might be the best, most concise and impactful neural network introduction I have seen to date

  • @ivocamilleri8913
    @ivocamilleri8913 Год назад +70

    This is an excellent articulation. We need part 3, 4, and 5

  • @SethWieder
    @SethWieder Год назад +7

    These visuals were SO HELPFUL in introducing and understanding some foundational ML concepts.

  • @TheAkiller101
    @TheAkiller101 Год назад +4

    if there is an Oscar for best tutorial on the internet, this video deserves it !

  • @stratfanstl
    @stratfanstl 5 месяцев назад +1

    I've been watching a lot of videos on LLMs and the underlying mathematics. This explanation is PHENOMENAL. Not dumbed down, not too long, and uses concepts of existing maths and graphing that cement the concept perfectly.

  • @JessieJussMessy
    @JessieJussMessy Год назад +9

    Being able to visualize this so simply is legendary. You're doing amazing work. Subbed

  • @ApocalypsoTron
    @ApocalypsoTron Год назад +1

    Thanks for showing what a neural network function looks like

  • @marktahu2932
    @marktahu2932 Год назад +1

    You have made it so easy to see and understand - it puts into place all the complicated explanations that exist out there on the net.

  • @marcusdrost8452
    @marcusdrost8452 5 месяцев назад +1

    Clearly explained! I will use it.

  • @ChatGPt2001
    @ChatGPt2001 7 месяцев назад

    Training a large language model from scratch is a complex and resource-intensive task that requires a deep understanding of natural language processing, access to significant computational resources, and large amounts of data. Here are the general steps involved in training a large language model from scratch:
    1. Define Objectives:
    - Clearly define the objectives and goals of your language model. Decide what tasks it should be capable of performing, such as text generation, translation, question answering, etc.
    2. Collect Data:
    - Gather a vast amount of text data from various sources. This data can include books, articles, websites, and other textual sources. High-quality, diverse data is essential for training a robust language model.
    3. Data Preprocessing:
    - Clean and preprocess the data by removing noise, formatting, and irrelevant content. Tokenize the text into smaller units, such as words or subword units (e.g., Byte-Pair Encoding or SentencePiece).
    4. Model Architecture:
    - Choose a suitable neural network architecture for your language model. Popular choices include recurrent neural networks (RNNs), transformers, and their variants. Transformers, especially the GPT (Generative Pre-trained Transformer) architecture, have been widely successful for large language models.
    5. Model Design:
    - Design the specifics of your model, including the number of layers, attention mechanisms, hidden units, and other hyperparameters. These choices will affect the model's size and performance.
    6. Training:
    - Train the model on your preprocessed dataset using powerful hardware like GPUs or TPUs. Training a large language model from scratch typically requires distributed computing infrastructure due to the enormous amount of data and computation involved.
    7. Regularization:
    - Implement regularization techniques like dropout, layer normalization, and weight decay to prevent overfitting during training.
    8. Optimization:
    - Choose an optimization algorithm, such as Adam or SGD, and fine-tune its hyperparameters to ensure efficient model convergence.
    9. Hyperparameter Tuning:
    - Experiment with different hyperparameters (e.g., learning rate, batch size) and training strategies to optimize your model's performance.
    10. Evaluation:
    - Evaluate your model's performance on various natural language processing tasks to ensure that it meets your objectives. Use metrics like perplexity, BLEU score, or F1 score, depending on the specific tasks.
    11. Fine-Tuning:
    - After initial training, fine-tune your model on specific downstream tasks, if required. Transfer learning is a powerful technique that leverages pre-trained models to perform well on specific tasks with less data.
    12. Deployment:
    - Once your model performs well, deploy it in the desired application, whether it's a chatbot, language translation service, or any other NLP task.
    13. Monitoring and Maintenance:
    - Continuously monitor your model's performance in production and update it as necessary to adapt to changing data distributions or requirements.
    It's worth noting that training large language models from scratch can be resource-intensive and time-consuming, requiring access to significant computational power and expertise in machine learning. Many organizations choose to fine-tune pre-trained models on specific tasks, which can be more efficient and effective for many practical applications.

  • @AncientSlugThrower
    @AncientSlugThrower Год назад +1

    This was awesome. I don't think I could adequately explain how this all works yet, but it fills in so many gaps. Thank you for this video!

  • @mmarsbarr
    @mmarsbarr Год назад

    how is it possible that i’ve watched a ton of videos trying to understand LLMs from the likes of universities and big tech companies yet this simple video in comic sans explains everything in the most direct and concise manner possible !?

  • @BenKordick
    @BenKordick Год назад +1

    Awesome video! I really appreciated your explanation and representation of neural networks and how the number of nodes and weights affect the accuracy.

  • @liamcawley6440
    @liamcawley6440 Год назад +1

    possibly the best explanation of LLM i've ever seen. accurate, pointed and concise

  • @laStar972chuck
    @laStar972chuck Год назад +1

    Stunning video of absolutely high and underrated quality !!!!
    Thanks so much, for this !

  • @gigabytechanz9646
    @gigabytechanz9646 Год назад +1

    Very clear and concise explanation! Excellent work!

  • @noahnazareth8248
    @noahnazareth8248 Год назад +16

    Great video. "energy function" instead of error function, but a great explanation of gradient descent and backprop in a super short time. Excellent job!

  • @DimitriMissentos
    @DimitriMissentos Год назад +1

    Amazingly insightful. Fantastically well explained. Thanks !

  • @SaidakbarP
    @SaidakbarP Год назад +3

    This is the best explanation of Large Language Models. I hope your channel gets more subscribers!

  • @ericsun1990
    @ericsun1990 Год назад +5

    I really liked your explanation of how "training a network" is performed. Made it a lot easier to understand

  • @user-wr4yl7tx3w
    @user-wr4yl7tx3w Год назад +6

    Wow. This is so well presented. And a different take that gets to the real intuition.

  • @sudhindrakopalle7071
    @sudhindrakopalle7071 Год назад +1

    Fascinating and such wonderful explanation. Thank you very much!

  • @pierrickguillard-prevert4213
    @pierrickguillard-prevert4213 11 месяцев назад +2

    Thanks Steve, this explanation is just... Brillant! 😊

  • @saurabhmaurya6380
    @saurabhmaurya6380 8 месяцев назад +1

    Straight away subscribed .... i would really love these videos in my feed daily.❤

  • @saqhorov
    @saqhorov Год назад +26

    this is excellently done, I'm very grateful for you putting this together.

  • @looppp
    @looppp Год назад +2

    This is an insanely good explanation. Subscribed.

  • @funnycompilations8314
    @funnycompilations8314 Год назад +7

    you sir, deserve my subscription. This was so good.

  • @narasimhasriharshakanduri3325
    @narasimhasriharshakanduri3325 Год назад +1

    i generally don't subscribe to any channels but this one deserves one. This takes a lot of understanding and love for the subject to do these kind of videos. thank you very much

  • @stan.corston
    @stan.corston 25 дней назад +1

    Great way to explain a complex idea ⚡️

  • @wokeclub1844
    @wokeclub1844 Год назад +1

    such clean and lucid explanation. amazing

  • @darinmcbride4003
    @darinmcbride4003 Год назад

    Great video. The example of the network with too few curve functions to recreate the graph really helped me understand how more or fewer nodes affects the accuracy of the result.

  • @lfmtube
    @lfmtube Год назад +9

    Brilliant! A truly example of intelligence and simplicity to explain! Thanks a lot.

  • @go_better
    @go_better 9 месяцев назад

    Thank you so much! Very well and simply explained!

  • @A338800
    @A338800 10 месяцев назад +1

    Incredibly well explained! Thanks a lot!

  • @junglemandude
    @junglemandude Месяц назад +1

    Thanks, what a video, in 8 minutes I have learnet so much, and very well explained with graphics indeed.

  • @ravinatarajan4894
    @ravinatarajan4894 Год назад +2

    Very nice illustration and fantastic explanation. Thanks

  • @zhenli8674
    @zhenli8674 Год назад +1

    This is awesome. Very good Illustrations.

  • @marlinhowley9858
    @marlinhowley9858 Год назад +1

    Excellent. Some of the best work I've seen. Thanks.

  • @lucamatteobarbieri2493
    @lucamatteobarbieri2493 Год назад +1

    Simple and clear, kudos!

  • @jankshtt
    @jankshtt Год назад +2

    nice concise video explaining what is a large language model

  • @DavideVarvello
    @DavideVarvello Год назад +2

    wonderful, thank you so much for sharing

  • @kahoku451
    @kahoku451 Год назад +2

    Incredible. Thank you

  • @RK-fr4qf
    @RK-fr4qf 9 месяцев назад +1

    Fantastic. Please teach more
    You are a legend.

  • @LittleShiro0
    @LittleShiro0 Год назад +1

    Very well explained. Thank you for the video!

  • @noahlin3851
    @noahlin3851 11 месяцев назад +1

    This video was ahead of its time

  • @Luxcium
    @Luxcium Год назад +3

    Wait a minute all day I try to understand what are neural networks and you have explained all parts so easily wow 😮 it obviously 🙄 imply that I have struggled to learn all of these terms so far but I finally have found a good explanation of back-propagation, gradient-descent, error functions and such 🎉🎉🎉🎉

  • @WoonCherkLam
    @WoonCherkLam Год назад +1

    Probably one of the best explanations I've come across. :)

  • @KayYesYouTuber
    @KayYesYouTuber Год назад +1

    Great explanation. Thank you very much

  • @govinddwivedi582
    @govinddwivedi582 Год назад +1

    You are so good at explaining it! Please keep doing it.

  • @lopezb
    @lopezb Год назад +1

    I loved this. Clarity = real understanding= respect for the curiosity and intelligence of the audience.
    Requests: Would like more depth about "back propagation", and on to why so many "layers" and so on...!!!!

  • @samwynn3183
    @samwynn3183 Год назад

    I had not considered exactly how words related to eachother in automated texts and this video explained that concept in a really clear and concise way.

  • @BeSlaying
    @BeSlaying Год назад +1

    Really great explanation of LLM! Just earned a subscriber and I'm looking forward to more of your videos :)

  • @lipefml7200
    @lipefml7200 11 месяцев назад +1

    The best content ever I saw about the subject. Super dense and easy.

  • @coderversion1
    @coderversion1 Год назад +3

    Best and simplest explanation I have ever come across. Thank you sir

  • @mirabirhossain1842
    @mirabirhossain1842 Год назад +1

    Unbelievably good video. Great work.

  • @jonathanpersson9656
    @jonathanpersson9656 Год назад +1

    Agree with the other comments, so clear and easy to understand. I wish all teaching material was this good...

  • @KalebPeters99
    @KalebPeters99 Год назад +1

    Holy shit. This is one of the best RUclips videos I've seen all year so far. Bravo 👏👏👏

  • @ashleygesty7671
    @ashleygesty7671 Год назад +3

    You are a genius, thank you for this amazing video!

  • @quantphobia2944
    @quantphobia2944 Год назад +1

    Simply amazing, so intuitive..omg subscribed

  • @eriksteen84
    @eriksteen84 Год назад +1

    fantastic video, thank you!!!

  • @s0meus3r
    @s0meus3r Год назад +1

    🎉 I know I need to write something to promote this video. It deserves that

  • @ammarlahori
    @ammarlahori 8 месяцев назад +1

    Absolutely brilliant..great examples

  • @ryusei323
    @ryusei323 9 месяцев назад +1

    This is fantastic. Thank you for sharing.

  • @stanleytrevino6735
    @stanleytrevino6735 Год назад

    This is a very good video! Excellent explanation on Large Language Models!

  • @_sudipidus_
    @_sudipidus_ 3 месяца назад +1

    This is so good
    I’m inspired to go back and learn Fourier and Taylor series

  • @AVV-A
    @AVV-A Год назад +4

    This is insanely good. I've understood things in 8 minutes that I could not understand after entire classes

  • @Bhuvana2020
    @Bhuvana2020 Год назад +1

    What a fantastic tutorial! Thank you! Liked and subscribed!

  • @peregudovoleg
    @peregudovoleg Год назад

    Eventhough I knew all this stuff, it is still nice to watch and listen to a good explanation of these fundamental ML concepts.

  • @darrin.jahnel
    @darrin.jahnel Год назад +1

    Wow, what a fantastic explanation!

  • @smithwill9952
    @smithwill9952 Год назад +2

    Clean and clear explaination

  • @dylan_curious
    @dylan_curious Год назад +2

    Wow, this video was really informative and fascinating! It's incredible to think about how much goes into building and training a language model. I never realized that language modeling involved so much more than just counting frequencies of words and sentences. The explanation of how neural networks can be used as universal approximators was particularly interesting, and it's amazing to think about the potential applications of such models, like generating poetry or even writing computer code. I can't wait for part two of this video!

  • @gregortidholm
    @gregortidholm Год назад +1

    Really great description 👌

  • @vaguebrownfox
    @vaguebrownfox 10 месяцев назад +1

    Omgg are you serious? You have some top-notch pedagogical skills.

  • @hrishabhchoudhary2700
    @hrishabhchoudhary2700 11 месяцев назад +1

    The content is gem. Thank you for this.

  • @jacobwilliams4182
    @jacobwilliams4182 Год назад +1

    Great explanation of an advanced topic

  • @sumitarora5861
    @sumitarora5861 Год назад +1

    Simply superb explanation

  • @sumitpawar000
    @sumitpawar000 2 месяца назад +1

    Wow .. what an explanation sir ❤
    Thank you 🙏

  • @shreeramshankarpattanayak7409
    @shreeramshankarpattanayak7409 Год назад +1

    Brilliantly explained !

  • @davidvarela6020
    @davidvarela6020 Год назад +1

    This might be the highest signal to noise video I've ever watched

  • @zeeg404
    @zeeg404 Год назад +8

    What an amazing lecture on LLM! Loved the example Markov chain model with the bob Dylan lyrics, that was actually a fun homework exercise in one of my grad school courses. This really helped me understand neural networks, which are so much more complex.

  • @berbudy
    @berbudy Год назад +1

    This video is a must watch

  • @wastethesweat
    @wastethesweat Год назад +2

    Thank you, this is brilliant

  • @mbrochh82
    @mbrochh82 Год назад +1

    Wow. This is incredible!!

  • @angelatatum8642
    @angelatatum8642 Год назад

    Easy to understand explanation of large language models 👍

  • @Larock-wu1uu
    @Larock-wu1uu Год назад

    This was incredibly good!

  • @LeviNotik
    @LeviNotik Год назад +2

    Very nicely done

  • @gregpavlik6474
    @gregpavlik6474 Год назад +3

    Super clear - need to circulate this around my teams adjacent to the scientists at work

  • @yourfuneral
    @yourfuneral Год назад +1

    Thanks for the good explanation

  • @YusanTRusli
    @YusanTRusli 8 месяцев назад +1

    very well explained!

  • @davhung8004
    @davhung8004 6 месяцев назад +1

    Seems really really cool

  • @user-vc6uk1eu8l
    @user-vc6uk1eu8l Год назад +2

    Great video 👍!
    This reminds me of Markov Chain (MC). I read in some probability book long time ago that MC had been used to calculate probability of the next letter in a (randomly chosen) word.

    • @lopezb
      @lopezb Год назад +2

      It is exactly a Markov matrix, also known as a probability matrix, that is all the rows are probability vectors (nonnegative real numbers that sum to 1), like that used to define a Markov chain. If it depends only on the previous word, it's a
      1-step chain (the usual kind); if on 3 words it's a k-step Markov chain, which can be re-coded as a 1-step chain by replacing the alphabet of symbols (words) by triples of symbols (triples of words). In fact, Markov himself used this model back in the 1930's to try to generate text. I found that out from this
      great talk by a UC Berkeley CS prof, author of the basic textbook in the field, and also of the "pause" letter:
      ruclips.net/video/T0kALyOOZu0/видео.html&ab_channel=GoodTimesBadTimes
      Markov himself invented this to predict first words, and then word pairs in Eugene Onegin!
      (a 2-step version). Chat GPT is a 32,000 step version, but they have to train it
      stochastically or it would be way too much computation to use actual frequencies...

  • @labCmais135
    @labCmais135 2 месяца назад +1

    Wow, just saw this 😂, its excellent, thank you

  • @perseusgeorgiadis7821
    @perseusgeorgiadis7821 Год назад +1

    This was actually amazing

  • @imzhaodong
    @imzhaodong Год назад +1

    Purely awsome

  • @magunciero
    @magunciero Год назад +1

    Really well explained!!