Mr. Alammar, your post with gpt2 explanation is great, I frequently return to it, because it is very detailed and visual; A lot of time has passed, it would be awesome to see the same post explaining more modern LLMs such as llama 2 (for instance). I wish I could read the explanation of the "new" activations, norms, embeddings used in modern foundation models. Looking forward for such post!
I think it is unfortunate that the word 'model' is used so often everywhere that it becomes difficult to understand what it means. e.g is it LLM "tokenizer foo" or LLM "model foo"? Are they the same? is bert-base-cased a "model"(if so, what does it mean?), or a "tokenizer" that has N number of tokens in its dictionary? Another question that is a bit fuzzy is, a "model" that uses a particular tokenizer must "know" what these tokens are, and must have a corresponding embeddings for every one of the tokens supported by the tokenizer it is using. So, speaking of tokenizers in isolation, without the downstream "model"(?) that is tied to this tokenizer is a bit confusing. I am still unclear on the flow of these tokenizer->embeddings->output-vector->some-decoder etc...
Mr. Alammar, your post with gpt2 explanation is great, I frequently return to it, because it is very detailed and visual; A lot of time has passed, it would be awesome to see the same post explaining more modern LLMs such as llama 2 (for instance). I wish I could read the explanation of the "new" activations, norms, embeddings used in modern foundation models. Looking forward for such post!
very nice video thanks, a video on galactica would be aswsome
Such a passionating topic, thank you !
Amazing video! Thanks Jay
@Jay, this is super cool, and exactly what I was waiting for. Thank you so much for this video. Please keep up the good work :)
Colab link please?
Very nice and helpful. How is ambiguity resolved? How does a tokenizer choose whether (toy example) "t abs" vs "tab s"?
Brilliant, unexpected insights!
Great video 😊
Great video, would be great if can explain how to know the token is name or date of birth and so on
Collab link please
Could you share the notebook link?
I think it is unfortunate that the word 'model' is used so often everywhere that it becomes difficult to understand what it means. e.g is it LLM "tokenizer foo" or LLM "model foo"? Are they the same? is bert-base-cased a "model"(if so, what does it mean?), or a "tokenizer" that has N number of tokens in its dictionary?
Another question that is a bit fuzzy is, a "model" that uses a particular tokenizer must "know" what these tokens are, and must have a corresponding embeddings for every one of the tokens supported by the tokenizer it is using. So, speaking of tokenizers in isolation, without the downstream "model"(?) that is tied to this tokenizer is a bit confusing. I am still unclear on the flow of these tokenizer->embeddings->output-vector->some-decoder etc...
so GPT-4 is the best, right?
Thanks a lot doctor, but you are bit too close to the screen. would you go back a bit?😅
Too close to the screen