Large Language Models from scratch
HTML-код
- Опубликовано: 28 май 2024
- How do language models like chatGPT and Palm work? A short cartoon that explains transformers and the tech behind LLMs.
Part 2: • Large Language Models:...
0:05 - autocomplete
0:44 - search query completion
1:03 - language modeling with probabilities
1:59 - time series and graphs
2:34 - text generation
3:43 - conditional probabilities
3:52 - trigrams
4:49 - universal function approximation
5:19 - neural networks
6:33 - gradient descent
7:03 - back propagation
7:24 - network capacity
This is seriously *really* good, I've not seen someone introduce high level concepts by-example so clearly (and nonchalantly!)
What have they done? amazing stuff
Agree. I have it on one of my playlists now.
Finally! Someone who knows how explain complexity with simplicity.
This is so good. I can't believe it has so few views.
Same, brillant explaination on NN
Was just about to write the same.
if you really think so, post the link to this video on your social media.
So few views... If a Kardashian posts a brain fart it gets more views from the unwashed masses. That is the sad reality.
Very few study about it
I have been working on ways to explain LLMs to people in the humanities for the past year. You've done it in 5 brilliant minutes. From now on, I'm just going to hand out this URL.
This is t he best explanation of LLMs I've seen
That might be the best, most concise and impactful neural network introduction I have seen to date
This is an excellent articulation. We need part 3, 4, and 5
These visuals were SO HELPFUL in introducing and understanding some foundational ML concepts.
if there is an Oscar for best tutorial on the internet, this video deserves it !
I've been watching a lot of videos on LLMs and the underlying mathematics. This explanation is PHENOMENAL. Not dumbed down, not too long, and uses concepts of existing maths and graphing that cement the concept perfectly.
Being able to visualize this so simply is legendary. You're doing amazing work. Subbed
Thanks for showing what a neural network function looks like
You have made it so easy to see and understand - it puts into place all the complicated explanations that exist out there on the net.
Clearly explained! I will use it.
Training a large language model from scratch is a complex and resource-intensive task that requires a deep understanding of natural language processing, access to significant computational resources, and large amounts of data. Here are the general steps involved in training a large language model from scratch:
1. Define Objectives:
- Clearly define the objectives and goals of your language model. Decide what tasks it should be capable of performing, such as text generation, translation, question answering, etc.
2. Collect Data:
- Gather a vast amount of text data from various sources. This data can include books, articles, websites, and other textual sources. High-quality, diverse data is essential for training a robust language model.
3. Data Preprocessing:
- Clean and preprocess the data by removing noise, formatting, and irrelevant content. Tokenize the text into smaller units, such as words or subword units (e.g., Byte-Pair Encoding or SentencePiece).
4. Model Architecture:
- Choose a suitable neural network architecture for your language model. Popular choices include recurrent neural networks (RNNs), transformers, and their variants. Transformers, especially the GPT (Generative Pre-trained Transformer) architecture, have been widely successful for large language models.
5. Model Design:
- Design the specifics of your model, including the number of layers, attention mechanisms, hidden units, and other hyperparameters. These choices will affect the model's size and performance.
6. Training:
- Train the model on your preprocessed dataset using powerful hardware like GPUs or TPUs. Training a large language model from scratch typically requires distributed computing infrastructure due to the enormous amount of data and computation involved.
7. Regularization:
- Implement regularization techniques like dropout, layer normalization, and weight decay to prevent overfitting during training.
8. Optimization:
- Choose an optimization algorithm, such as Adam or SGD, and fine-tune its hyperparameters to ensure efficient model convergence.
9. Hyperparameter Tuning:
- Experiment with different hyperparameters (e.g., learning rate, batch size) and training strategies to optimize your model's performance.
10. Evaluation:
- Evaluate your model's performance on various natural language processing tasks to ensure that it meets your objectives. Use metrics like perplexity, BLEU score, or F1 score, depending on the specific tasks.
11. Fine-Tuning:
- After initial training, fine-tune your model on specific downstream tasks, if required. Transfer learning is a powerful technique that leverages pre-trained models to perform well on specific tasks with less data.
12. Deployment:
- Once your model performs well, deploy it in the desired application, whether it's a chatbot, language translation service, or any other NLP task.
13. Monitoring and Maintenance:
- Continuously monitor your model's performance in production and update it as necessary to adapt to changing data distributions or requirements.
It's worth noting that training large language models from scratch can be resource-intensive and time-consuming, requiring access to significant computational power and expertise in machine learning. Many organizations choose to fine-tune pre-trained models on specific tasks, which can be more efficient and effective for many practical applications.
This was awesome. I don't think I could adequately explain how this all works yet, but it fills in so many gaps. Thank you for this video!
how is it possible that i’ve watched a ton of videos trying to understand LLMs from the likes of universities and big tech companies yet this simple video in comic sans explains everything in the most direct and concise manner possible !?
Awesome video! I really appreciated your explanation and representation of neural networks and how the number of nodes and weights affect the accuracy.
possibly the best explanation of LLM i've ever seen. accurate, pointed and concise
Stunning video of absolutely high and underrated quality !!!!
Thanks so much, for this !
Very clear and concise explanation! Excellent work!
Great video. "energy function" instead of error function, but a great explanation of gradient descent and backprop in a super short time. Excellent job!
Amazingly insightful. Fantastically well explained. Thanks !
This is the best explanation of Large Language Models. I hope your channel gets more subscribers!
I really liked your explanation of how "training a network" is performed. Made it a lot easier to understand
Wow. This is so well presented. And a different take that gets to the real intuition.
Fascinating and such wonderful explanation. Thank you very much!
Thanks Steve, this explanation is just... Brillant! 😊
Straight away subscribed .... i would really love these videos in my feed daily.❤
this is excellently done, I'm very grateful for you putting this together.
This is an insanely good explanation. Subscribed.
you sir, deserve my subscription. This was so good.
i generally don't subscribe to any channels but this one deserves one. This takes a lot of understanding and love for the subject to do these kind of videos. thank you very much
Great way to explain a complex idea ⚡️
such clean and lucid explanation. amazing
Great video. The example of the network with too few curve functions to recreate the graph really helped me understand how more or fewer nodes affects the accuracy of the result.
Brilliant! A truly example of intelligence and simplicity to explain! Thanks a lot.
Thank you so much! Very well and simply explained!
Incredibly well explained! Thanks a lot!
Thanks, what a video, in 8 minutes I have learnet so much, and very well explained with graphics indeed.
Very nice illustration and fantastic explanation. Thanks
This is awesome. Very good Illustrations.
Excellent. Some of the best work I've seen. Thanks.
Simple and clear, kudos!
nice concise video explaining what is a large language model
wonderful, thank you so much for sharing
Incredible. Thank you
Fantastic. Please teach more
You are a legend.
Very well explained. Thank you for the video!
This video was ahead of its time
Wait a minute all day I try to understand what are neural networks and you have explained all parts so easily wow 😮 it obviously 🙄 imply that I have struggled to learn all of these terms so far but I finally have found a good explanation of back-propagation, gradient-descent, error functions and such 🎉🎉🎉🎉
Probably one of the best explanations I've come across. :)
Great explanation. Thank you very much
You are so good at explaining it! Please keep doing it.
I loved this. Clarity = real understanding= respect for the curiosity and intelligence of the audience.
Requests: Would like more depth about "back propagation", and on to why so many "layers" and so on...!!!!
I had not considered exactly how words related to eachother in automated texts and this video explained that concept in a really clear and concise way.
Really great explanation of LLM! Just earned a subscriber and I'm looking forward to more of your videos :)
The best content ever I saw about the subject. Super dense and easy.
Best and simplest explanation I have ever come across. Thank you sir
Unbelievably good video. Great work.
Agree with the other comments, so clear and easy to understand. I wish all teaching material was this good...
Holy shit. This is one of the best RUclips videos I've seen all year so far. Bravo 👏👏👏
You are a genius, thank you for this amazing video!
Simply amazing, so intuitive..omg subscribed
fantastic video, thank you!!!
🎉 I know I need to write something to promote this video. It deserves that
Absolutely brilliant..great examples
This is fantastic. Thank you for sharing.
This is a very good video! Excellent explanation on Large Language Models!
This is so good
I’m inspired to go back and learn Fourier and Taylor series
This is insanely good. I've understood things in 8 minutes that I could not understand after entire classes
What a fantastic tutorial! Thank you! Liked and subscribed!
Eventhough I knew all this stuff, it is still nice to watch and listen to a good explanation of these fundamental ML concepts.
Wow, what a fantastic explanation!
Clean and clear explaination
Wow, this video was really informative and fascinating! It's incredible to think about how much goes into building and training a language model. I never realized that language modeling involved so much more than just counting frequencies of words and sentences. The explanation of how neural networks can be used as universal approximators was particularly interesting, and it's amazing to think about the potential applications of such models, like generating poetry or even writing computer code. I can't wait for part two of this video!
Really great description 👌
Omgg are you serious? You have some top-notch pedagogical skills.
The content is gem. Thank you for this.
Great explanation of an advanced topic
Simply superb explanation
Wow .. what an explanation sir ❤
Thank you 🙏
Brilliantly explained !
This might be the highest signal to noise video I've ever watched
What an amazing lecture on LLM! Loved the example Markov chain model with the bob Dylan lyrics, that was actually a fun homework exercise in one of my grad school courses. This really helped me understand neural networks, which are so much more complex.
This video is a must watch
Thank you, this is brilliant
Wow. This is incredible!!
Easy to understand explanation of large language models 👍
This was incredibly good!
Very nicely done
Super clear - need to circulate this around my teams adjacent to the scientists at work
Thanks for the good explanation
very well explained!
Seems really really cool
Great video 👍!
This reminds me of Markov Chain (MC). I read in some probability book long time ago that MC had been used to calculate probability of the next letter in a (randomly chosen) word.
It is exactly a Markov matrix, also known as a probability matrix, that is all the rows are probability vectors (nonnegative real numbers that sum to 1), like that used to define a Markov chain. If it depends only on the previous word, it's a
1-step chain (the usual kind); if on 3 words it's a k-step Markov chain, which can be re-coded as a 1-step chain by replacing the alphabet of symbols (words) by triples of symbols (triples of words). In fact, Markov himself used this model back in the 1930's to try to generate text. I found that out from this
great talk by a UC Berkeley CS prof, author of the basic textbook in the field, and also of the "pause" letter:
ruclips.net/video/T0kALyOOZu0/видео.html&ab_channel=GoodTimesBadTimes
Markov himself invented this to predict first words, and then word pairs in Eugene Onegin!
(a 2-step version). Chat GPT is a 32,000 step version, but they have to train it
stochastically or it would be way too much computation to use actual frequencies...
Wow, just saw this 😂, its excellent, thank you
This was actually amazing
Purely awsome
Really well explained!!