If you enjoyed the video, please consider subscribing :) Part 2: ruclips.net/video/rTz6hadM1Lg/видео.html I'm excited to be starting this new series! NLP is the topic I feel like I have the most to say about, but I'll avoid throwing in my personal opinions into these videos :p Stay tuned for the next chapter which I'll be posting next Monday!! (And the third chapter next to next week). Also, let me know what other kinds of topics you'd be interested in seeing!!!
Hi, One of the topics which I am struggling with understanding is the requirement of V in QKV and why the multihead attention outputs are concatenated rather than doing any other operation.If you could make a video on concatenation of vectors and how they retain information better that would be great
@@buddhadevbhattacharjee1363 hmmm, interesting question for sure! I believe the reason concatenation is done is just because its loss-less (retains all information).This is just a standard DL practice - for example, we concatenate the positional embeddings too. The next to next chapter will be on attention. Let me know if that addresses you questions, and if not, I'll look into what a follow-up video could contain. Thanks ofr your input!
Just for general interest, here are a few examples of syntax vs semantics. Consider "The tree ate a banana". It's syntactically valid, but it doesn't mean anything. Or, "Is a dog conscious ?". It's also syntactically valid, but it doesn't mean anything until we decide what "conscious" means. Or, "Does the past still exist ?". It doesn't mean anything until we decide what "exist" means.
Hi, love the video - just one thing, the C in the Probability equation throws me off. I keep reading it in my mind as "complement" - as in the complement of a set. I'm probably missing the right context for it. I can grasp from what you're saying that it probably signifies occurrences of the event, but uncertain why it's "c". Is it c for condition ?
Dang, it's been 4 years already...how time flies by. I'll try and make this my next-to-next-to-next video (After Chapter 3 of this series). Sorry for the delay, and I'm happy you're still around to wait for it :)
you know what chatgpt is unfortunately, the manifestation of the Chinese room paradox, and it is SO humorous that we are taking that much time to realize
You’re stupid: The Chinese room argument doesn't work for complex tasks beyond fixed rule-based symbolic manipulation. AI like ChatGPT goes beyond counting word co-occurrences, making decisions based on intricate feature interactions. We need to clearly define "understanding" first. Understanding involves making functional predictions by compressing data into representations in vector space synaptic interactions etc. GPT-4 doesn’t store explicit symbols but extracts features from data, comprehending context rather than concrete content. Fixed translation are without representational ability to demonstrate understanding.
Computers "understand" languages in as far as they can compute statistics. But they don't really understand like humans do. For example can they understand the levels of meaning of poetry, or sarcasm, or cynicism?
@@panulli4 I think the difference is that LLMs compute statistics on words themselves, while humans "perform statistics" on lots of different inputs, and then transform whatever result it gets into language
To be honest, it's really unclear what it even means to "understand" language. I'm fairly certain that we should be able to get to a sarcasm-detection level of humans within the next 10 years. See relevant work: scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=sarcasm+detection&btnG=&oq=sarcasm+detection I feel like 5 years ago, the idea of being able to generate code was unfathomable. Yet, here we are, and Github Copilot knows C++ syntax almost perfectly. Who's to say that everything in our brain is not a type matrix multiplication? We don't know :)
@@panulli4 And that intuition may be a consequence of analogical thinking and overlooking the subtleties involved. Not that it's "wrong", but arguments such as "the brain is he brain is definitely like a stack of LSTMs", or "the brain is just a Markov chain" etc. has always existed and they've only focused on certain overlaps to construct a simplistic explanation. Sure, certain submodules of the brain may operate stochastically, but it's also evident that there are a lot of other architectural complexities involved that allows for agentic behavior, continuous learning, inferring priors from observations, meta-awareness and deliberate allocation of attention and cognitive resources, and adapting to highly chaotic and out-of-distribution environments and contexts to name a few. Qualia itself hasn't been fully explained or understood and it's unclear if it can be, however there are good reasons to think it's a crucial mechanism that allows for agentic models to operate consistently and develop a coherent world model. It's highly likely it wouldn't simply "emerge" from scaling up statistical models. And equivalently, it's easy to conceptualize why a statistical model can achieve a high level of mastery in specific domains which are already deterministic or statistical in nature, or can at least be brute-force computed and generalized for but a lot of things aren't. You can for example, give the impression that you understand quantum mechanics by simply paraphrasing scientific articles, especially if you can do so at scale and very efficiently.
@vcubingx yes, I'm not saying it can't happen. I'm only saying that at this point it's not there and it may take a while with more tech. And when I say a while, I mean that in the most open sense.
If you enjoyed the video, please consider subscribing :)
Part 2: ruclips.net/video/rTz6hadM1Lg/видео.html
I'm excited to be starting this new series! NLP is the topic I feel like I have the most to say about, but I'll avoid throwing in my personal opinions into these videos :p Stay tuned for the next chapter which I'll be posting next Monday!! (And the third chapter next to next week). Also, let me know what other kinds of topics you'd be interested in seeing!!!
Strange timing you go there…
3B1B published basically the same video before you.
The 🐐
so hot
Hi, One of the topics which I am struggling with understanding is the requirement of V in QKV and why the multihead attention outputs are concatenated rather than doing any other operation.If you could make a video on concatenation of vectors and how they retain information better that would be great
@@buddhadevbhattacharjee1363 hmmm, interesting question for sure! I believe the reason concatenation is done is just because its loss-less (retains all information).This is just a standard DL practice - for example, we concatenate the positional embeddings too.
The next to next chapter will be on attention. Let me know if that addresses you questions, and if not, I'll look into what a follow-up video could contain.
Thanks ofr your input!
Just came from 3B1B, subbed this is excellent. Thanks!
What timing for this video~
Looking forward to more!
Thanks!!
This is wonderful, excited for the next video!
Also nice choice of music :)
Thanks! Love nintendo music :)
Odd timing 🤔
ye.
Indeed, haha!
Found this by watching 3Blue1Brown
Awesome channel!
10:14 Brought to you by... 3Blue1Brown!!
Thanks a lot for the video ! Now I understood that Trigrams model just take into acount the last three words.
Great video!! I'm hoping you discuss some of the history in the next episodes too though
agree!
That's the plan! I'm trying to touch on key papers until 2016
brother never stop making these videos
these are very interesting
Glad you like them!
Just for general interest, here are a few examples of syntax vs semantics. Consider "The tree ate a banana". It's syntactically valid, but it doesn't mean anything. Or, "Is a dog conscious ?". It's also syntactically valid, but it doesn't mean anything until we decide what "conscious" means. Or, "Does the past still exist ?". It doesn't mean anything until we decide what "exist" means.
Good one! I'll be waiting for the next one
Thanks! Currently working on it - should be up on Monday
Nicely done!
You uploaded this a minute after 3b1b’s video, how?
IKR. FIRST I WAS WONDERING HOW AND THEN THIS TOO WHAT WHAT
This dude must be 3b1bs younger bro, or a buddy
I was like when did 3b1b release the video about transformers? Turns out same time as this video
:)
@@vcubingx What a troll.
We're eating good today guys.
Perfect timing
Indeed!
You'll go up boi. Just put in the effort. Make the quality content. People are looking for quality content related to ML.
What's the module of |V| , what it represents in the context?
Hi, love the video - just one thing, the C in the Probability equation throws me off. I keep reading it in my mind as "complement" - as in the complement of a set. I'm probably missing the right context for it. I can grasp from what you're saying that it probably signifies occurrences of the event, but uncertain why it's "c". Is it c for condition ?
C stands for count. Sorry! It can be a bit confusing - should’ve explained it. Some of the notation NLP folk use is certainly questionable
From 3b1b!
mfv what a great job you have done
i couldnt agree more mfc
mfc + mfe = mfce
great vid frfr
Still waiting for part 3 on neural networks
Dang, it's been 4 years already...how time flies by.
I'll try and make this my next-to-next-to-next video (After Chapter 3 of this series). Sorry for the delay, and I'm happy you're still around to wait for it :)
Goated
Dame bro it was too good i don't know about g gram
its awesome, please remove the background music, very distracting :(
bro is basically alan turing at this point
ski fast take chances
hi
Hello!
you know what chatgpt is unfortunately, the manifestation of the Chinese room paradox, and it is SO humorous that we are taking that much time to realize
You’re stupid:
The Chinese room argument doesn't work for complex tasks beyond fixed rule-based symbolic manipulation. AI like ChatGPT goes beyond counting word co-occurrences, making decisions based on intricate feature interactions. We need to clearly define "understanding" first.
Understanding involves making functional predictions by compressing data into representations in vector space synaptic interactions etc. GPT-4 doesn’t store explicit symbols but extracts features from data, comprehending context rather than concrete content. Fixed translation are without representational ability to demonstrate understanding.
breh
dang retired cuber comes out of the dead only to smash mohanraj's 3x3x3 PR average
you should make a video on how to get girls
Third
First
This is so Asian
Computers "understand" languages in as far as they can compute statistics. But they don't really understand like humans do. For example can they understand the levels of meaning of poetry, or sarcasm, or cynicism?
What makes you think that human brains don’t just compute statistics?
@@panulli4 I think the difference is that LLMs compute statistics on words themselves, while humans "perform statistics" on lots of different inputs, and then transform whatever result it gets into language
To be honest, it's really unclear what it even means to "understand" language. I'm fairly certain that we should be able to get to a sarcasm-detection level of humans within the next 10 years. See relevant work: scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=sarcasm+detection&btnG=&oq=sarcasm+detection
I feel like 5 years ago, the idea of being able to generate code was unfathomable. Yet, here we are, and Github Copilot knows C++ syntax almost perfectly. Who's to say that everything in our brain is not a type matrix multiplication? We don't know :)
@@panulli4 And that intuition may be a consequence of analogical thinking and overlooking the subtleties involved. Not that it's "wrong", but arguments such as "the brain is he brain is definitely like a stack of LSTMs", or "the brain is just a Markov chain" etc. has always existed and they've only focused on certain overlaps to construct a simplistic explanation.
Sure, certain submodules of the brain may operate stochastically, but it's also evident that there are a lot of other architectural complexities involved that allows for agentic behavior, continuous learning, inferring priors from observations, meta-awareness and deliberate allocation of attention and cognitive resources, and adapting to highly chaotic and out-of-distribution environments and contexts to name a few. Qualia itself hasn't been fully explained or understood and it's unclear if it can be, however there are good reasons to think it's a crucial mechanism that allows for agentic models to operate consistently and develop a coherent world model. It's highly likely it wouldn't simply "emerge" from scaling up statistical models. And equivalently, it's easy to conceptualize why a statistical model can achieve a high level of mastery in specific domains which are already deterministic or statistical in nature, or can at least be brute-force computed and generalized for but a lot of things aren't. You can for example, give the impression that you understand quantum mechanics by simply paraphrasing scientific articles, especially if you can do so at scale and very efficiently.
@vcubingx yes, I'm not saying it can't happen. I'm only saying that at this point it's not there and it may take a while with more tech. And when I say a while, I mean that in the most open sense.