How Large Language Models Work
HTML-код
- Опубликовано: 27 июл 2023
- Learn about watsonx → ibm.biz/BdvxRj
Large language models-- or LLMs --are a type of generative pretrained transformer (GPT) that can create human-like text and code. There's a lot of talk about GPTs and LLMs lately, but they've actually been around for years! In this video, Martin Keen briefly explains what a LLM is, how they relate to foundation models, and then covers how they work and how they can be used to address various business problems.
#llm #gpt #gpt3 #largelanguagemodel #watsonx #GenerativeAI #Foundationmodels
I don't know what is more impressive, LLMs or this guy's ability to write backwards perfectly.
the whole thing is flipped i guess. He's "writing left handed" and we all know that is impossible
It's mirrors and a screen
I have a teacher who can write backwards perfectly. It's creepy lol
There are videos that show you how people do this- it's a visual trick not a dexterity master class ;)
@@djham2916And smoke!
Very nice explanation, short and to the point without getting bogged down in detail that is often misunderstood. I will share this with others
Nicely done! You explain everything very clearly. This video is concise and informative. I will share with others as an excellent foundational resource for understanding LLMs.
Very nice and crisp explanation. Love it.. Thanks
Great video presentation! Martin Keen delivers a superbly layman friendly elucidation of what are otherwise very 'high tech talk' to people like me who do not come from a tech based professional background. These types of content are highly appreciable, and in fact motivate further learning on these subjects. Thank you IBM, Mr. Keen & team. Cheers to you all from Sri Lanka.
P ppl
Martin keen as awesome as usual...... so natural. I love his talks and somehow I owe to him my understandingof complicated subjects in AI> thanks......
Really really enjoyed this primer. Thank you and great voice and enthusiasm!
Hey, nice job!!! yeah, I'd like to see more of these kinds of subjects in the present and the future as well!!!
great presentation, feels like personal asistant, great!
Large language models like GPT-3 work by using deep learning techniques, specifically a type of neural network called a transformer. Here's an overview of how they work:
1. **Data Collection**: Large language models are trained on vast amounts of text data from the internet, books, articles, and other sources. This data is used to teach the model about language patterns, grammar, syntax, semantics, and context.
2. **Tokenization**: The text data is tokenized, which means breaking it down into smaller units such as words, subwords, or characters. Each token is assigned a numerical representation.
3. **Training**: The model is trained using a process called supervised learning. During training, the model learns to predict the next word or token in a sequence based on the preceding context. It adjusts its internal parameters (weights and biases) through backpropagation to minimize prediction errors.
4. **Transformer Architecture**: Large language models like GPT-3 use a transformer architecture, which is highly effective for handling sequential data like language. Transformers include attention mechanisms that allow the model to focus on relevant parts of the input sequence while generating output.
5. **Fine-Tuning**: After pre-training on a large dataset, language models can be fine-tuned on specific tasks or domains. This process involves additional training on a smaller dataset related to the target task, which helps the model specialize in that area.
6. **Inference**: Once trained, the language model can generate text by predicting the most likely next tokens given an input prompt. It uses the learned patterns and context from training to generate coherent and contextually relevant responses.
7. **Continual Learning**: Some language models support continual learning, which means they can be updated with new data over time to improve their performance and adapt to changing language patterns.
Overall, large language models combine sophisticated neural network architectures, extensive training data, and advanced training techniques to understand and generate human-like text.
Can a subsequent SFT and RTHF with different, additional or lesser contents change the character, improve, or degrade a GPT model?
tbh, I just love his voice and ready to listen all his videos 🤗
Very elaborate explanation. Thank you
Thank you for posting this video. What are the other architectures available apart from Transformer?
perfect for learning LLMs
IBM big thanks to you for all this videos! This videos are really helpfull
I've liked and subscribed and done it again a thousand times in my mind
Very nicely done.
What is meant by, when referring to "sequences of words", "understanding"? I mean, what does "understanding" mean in that context?
Very nice explanation, are these foundation models are proprietary? How many foundation models exist?
Greate explanation ❤
Imagine a world where wikipedia no longer needs human contributors. You just upload the source material, and an algorithm writes the articles and all sub-pages, listing everything it knows about a certain fictional character because it read the entire book series in half a second. Imagine having a conversation with the world's most eminent Star Wars expert.
such a great video
Intro to LLM’s. Thanks
The term large can not be referred to as large data; to be precise it is the number of parameters that is large. So slight correction.
I do beleive that Large in LLM refers both to the large amount of data as well as the large number of hyper parameters, so both are correct but there is a prerequisite that the data be large not only the paramaters.
There's a lot of params because of the huge dataset
Hi Martin, are you there around? Could you please talk about " Emerging LLM App Stack" ? Thanks in advance!
Interesting explanation
Lol. I only knew Martin Keen from Brulosophy. This is sort of mindblowing.
Thank You Sir ❤
I get a remote job offer. The duty is AI training for LLM.
Shall i go for it? What do you think?
That's amazing! Our company has a great project that can benefit from this and then use the proceeds to benefit mankind. How can we speak more about this? I am very intrigued.
Thanks . How much to build the own LLM
Knowig about LLM Model Work Mr. Martin Keen. Can you larger focus on LLM Modelling and what exact related stuff(program skills) is requried. Thank you so much it was pleasant video i appreciated.
What is quantized version of models, how it would be created?
A model consists of lots of numbers. Those numbers would be smaller. Fewer bits per number.
Nice explanation! But I am still missing the most important point. How does one control relevance of the produced results? E.g. ChatGPT can answer questions. So far, what you explained is a model that can predict -> generate the next word in a document, given what has already been written. However, given a set of existing sentences, there is a multitude of ways to produce the next sentence, that would be somewhat consistent with the rest of the document. How does one go from plausible text generators to desired text generators?
Statistical likelihood based on the training data. And then there is a random seed so that there a little variation between inputs and outputs, so that the answer isn't always exactly the same for the same prompt.
Thanks dude
Amazing!
In this presentation, there was not enough detail on Foundation Models as a baseline to then explain what LLMs are.
The foundation model is trained on a gigantic amount of general text data on a very general task (such as language modeling, which is next-word prediction). The LLM is then created by finetuning a foundation model (a specific case of "pretrained model") on a more specific dataset (e.g. source code), sometimes also for a more specific task.
The foundation model is basically a stem cell for LLMs. It does not yet fulfill a specific purspose, but since it saw tons of data can be adapted to (pretty much) everything. Training the foundation model is extremely expensive, but it makes the downstream LLMs much cheaper as they do not need to be trained from scratch.
Does anyone know what program he uses to sketch on screen like that?
It's a glass window. He is physically writing on it. For it to show the correct way (and him not having to write backwards) they just flip the image!
Other than the physical limitation of space like any other computer has, it seems to me that technology like this should be applicable to robotics and allow for creation of much smarter and adaptive robotics projects and creations.
so the transformers only for the language and text related things??
no for the image processing too
Transformer models, originally developed for natural language processing tasks, have been extended to computer vision tasks as well. Vision Transformer (ViT) is an example of a transformer model adapted for image processing. Instead of using convolutional layers, ViT uses self-attention mechanisms to capture relationships between different parts of an image.
Did you only mirror the screen and it looks like you can write RTL, isnt it?! wow
See ibm.biz/write-backwards
What's the difference between large language models and text to speech
How did you learn to write backwards
Wait a minute. Did he really write in mirror handwriting?
AI was used to make it appear that he can write on your screen.
He writes it normally but the video is flipped horizontally..
Lucid, thanks
1 PB = 1024 TB
1TB = 1024 GB
1GB = 1024 MB
1MB = 1024 KB
1KB = 1024 B
1B = 8 bits
So 1 PB = 1024 * 1024 * 1024 * 1024 *1024 Bytes
Multiply it again by 8 to get the number of bits.
Guys do correct me if I'm wrong!!
Need one use case
Something tells me “The sky is the limit” here 👀
@2:15 a different sequence. this is just for fun .
could haver been better, most of it was speculative when it came to application building, not to mention the laws governing it
How does ChatGPT know about itself and its own behavior? If you ask questions about those topics, it will answer intelligently and accurately about itself and its own behavior. It will not just spout random from patterns from the internet. How does it know this?
To start with ChatGPT does not "know itself" it is not self aware, what you are seeing when GPT answers the question "Who are you?" is a pre programmed response that has been put there by the trainers of the model, something like toy with prerecorded messages that you can hear when pressing a button or pulling a string.
ChatGPT does not "know" anything it simply responds to your prompts or as you see them your questions with the appropriate answers.
GPT doesn't possess genuine awareness, but it can certainly mimic it to some extent
Thank you for explaining! 🪲 Min. 3:37 is the major "bug" 🐞 within the learning system, *it does not start off with a related guess, it's random.* 🌬
I can't wait until the *brain slice chips* can last longer and get trained like a real human brain that is actually learning by feelings and repeating instead of random guessing and then correcting itself until the answer is appropriate. They could soon replace A.I. technology completely, so maybe we shouldn't hype too much about it.
After all the effort, energy and money we put into A.I. and new technology, it's no doubt that *we could have educated our children better* instead of creating a fake new world, based on pseudo knowledge extracted from the web. 👨👩👧👦👨👩👧👧 Nobody want's to be r3placed without having the benefit of the machine. General taxes on machines and automated digital services could fund better education for humans.
Dear A.I.: You know what is real fun? Planting a tree in real life! 🍒
NİCE VİD O7
but how its possible to an LLM innovate when its being trained with over human knowledge boundaries?
I'm far from an expert on the matter but the simple answer to your question is that it's programmed to be able to learn and adjust according to many various inputs. Arguable it's probably where robot technology should be headed next. Having an ability to learn and react to that learning.
Is this video mirrored?
So LLM based AI is just language not ‘intelligence’? Based on what it’s read it knows or guesses what usually comes next? So zero intelligence?
From what I can tell of the subject matter it's more of a mimicked intelligence. That's why the analogy of a parrot was used. Cause this technology can learn, repeat back and limitedly guess what's coming next. But there's a certain level of depth and nuance that a human posses that parrots and chat GPT tech do not.
How are you able to write that way
My chemistry professor does videos with one and explains it in a video: Chemistry with Dr. Steph (thats her Channel), it's the featured video on her page
I don’t even know where to begin. 😵💫
How does this presentation work? You do are not mirror writing behind a glass pane, do you?
Yea, it's a glass window! He is physically writing on it. For it to show the correct way (and him not having to write backwards) they just flip the image!
Still dont get it
Why does a gigabyte have more words then a petabyte? I am lost already!!! 1 Gig =178 million words, 1 petabyte is 1.8x10^14 words, and there are only 750,000 words in the dictionary?
I got this far, stopped the video and searched for a comment like this. Why isn't this the top comment?
its not total unique words… basically its text from different websites its different sentences … so lets say you want llm to answer you about coding you train it on all the data on stackoverflow, leetcode etc every available resource … so it knows when users asked questions how to run loop in java the replies were x,y,z …
its more of glorified and better google search that feels like intelligence …
He said 178m words in a 1 GB sized file. And a petabyte sized file has 1 million _gigabytes_ in it. So, loosely speaking, you multiply 178m with 1 million to get number of words in an LLM. But…It’s not being fed unique words. It’s getting word patterns. Think about how we speak…our sentences are word patterns that we use in mostly predictable structures and then fill in the blank with more rich words as we get older to convey what want to say with synonyms etc.
1PB = 1024 TB = 1024 GB
What makes knowledge so complex is not the words, but the way the words are used.
Choose any word and you will see that it is linked with hundreds of topics and contexts.
If I say draw, I could be talking about
drawing water
drawing class
drawing during class
drawing my friend
drawing a dog
drawing a long time
drawing that sold for a lot of money
I like drawing
And so on. These all code for a different idea. And it is these “ideas” or relationships that foundation models encoded.
With these relationships, you now have the probabilistic weights that allow you to construct realistic and correct sounding sentences that are also likely accurate because of the enormous dataset it was trained on.
Another context idea. You want to connect fish to swim. This is highly weighted in the llm.
Getting hallucinations!
Ugh corporate videos..... the horror
How many 'parameters' does the human brain have, I wonder.
$PLTR
How is he writing backwards
See ibm.biz/write-backwards
Wow! It's a clever idea 😊
@@IBMTechnology oh yeah, then how come your tattoo is the right way round?
You could have finished the video by saying an LLM like Chat GPT could have produced the entire explanation for this video.. (I think u hinted the same)
What to do if a Large Language Model after putting all petabytes of data into it is still talking nonsense?
Isn't it using the most likely thing that humans defined and just uses patterns of what's most expected based on how humans interact and info put in..... that's not complicated. How do they not understand how it works....
This is scarrry
Hire someone next time who can explain it to the average John and Jane. Talk about 7 billion parameters and you already have John and Jane scrathing their head like crazy what the fuck he's talking about. Oh, yeah, some in the comment section understand it... but they're not the average John and Jane... they're often familiar with coding, data, business processes, computers, etc
Not a very good video. Really didn't explain much. You could have said so much more in 5:33 than slowly drawing things and talking about business applications.
If IBM knows that, why they didnt implement it in the Watson that were useless 😂😂😂
1 petabyte is not 1m gigabytes, it is 1,000 gigabytes.
I thought this speech is coming from an engineer but perhaps it is just a hired actor.
1k terabytes, 1m gigabytes
It's funny how it's always the most ignorant and arrogant one who points out the mistakes of others.
you fool , 1000 GB is one TB not PB
Explaining the constituent parts, the end product, is not the same as explaining how something works. Bad video.