Pod version: podcasters.spotify.com/pod/show/machinelearningstreettalk/episodes/107---Dr--RAPHAL-MILLIRE---Linguistics--Theory-of-Mind--Grounding-e20c60h (I ran out of space on the video description, this might be the longest list of references ever! 😄)
Also note that due to an editing error, about 1 minute is chopped off the end of this video. It was pretty much at the end, we just about to say our goodbyes. It's on the audio podcast.
Thank you for your work Tim! This is the only AI channel which remained true to it's original style and content. Such an important and crucial contribution for the community!
I really like the way you talked about unknown abstractions, Tim. We are getting splits in our models for concepts we may not even recognize, but back-prop doesn't lie. If a split on a node needs to be made, the training algorithm will make it.
Next token prediction is done in the feed-forward network that generates logits. The encoder that embeds tokens in multidimensions, along with the transformer's attention heads creates a complex context signature to travel through the feed-forward NN along with the initial prompt -- both for training and for inference. The pyramidal hierarchical nature of NNs results in an organized high entropy model of organized abstractions. Semantic nearness allows NNs to find new solutions that have not been seen in their training sets. This is real intelligence, but it is stateless intelligence in the sense that nothing is learned. As is the case with my calculator, an LLM nearly spits out a correct answer. Agents, however, can be made to be stateful, They can store input and output. They can synthesize identity, plans, goals, opinions, and emotions. Therein lies the possibility of conscious sentience.
I asked Bing Chat to make an analogy for "grading papers." It replied: "Grading papers can be like panning for gold. Just as a prospector carefully sifts through dirt and rocks to find valuable nuggets of gold, a teacher must carefully read through a student’s paper to find the valuable insights and ideas that demonstrate their understanding of the material. And just as a prospector must use their knowledge and experience to distinguish real gold from fool’s gold, a teacher must use their expertise to accurately assess the quality of a student’s work."
I was also interested in the claim about analogy, I presume they were referring to unique analogies, "panning for gold" would be in the training data. I will have to search where this claim comes from
The attention heads create a context signature. Stephen Wolfram has some interesting graphics of what paths look like for given contexts. Look at his video explaining transformers. these could be used as alternate encodings. The attention heads in GPT-3 are huge. They store the context of every fragment of every length of the training set. In essence, there is a semantic search that happens for every next token. It's not done with cosine similarity, but it composes the next maximum logit as a "similarity." In other words, what is nearest to this context signature.
From Bing Chat: The phrase “on top of the world” is an idiom that means feeling extremely happy or successful. When someone says “that woman is on top of the world,” they mean that she is feeling very happy and fulfilled, perhaps because she has achieved something important or overcome a significant challenge. It’s a way of saying that she is at the peak of her happiness or success.
@@exhibitD79 what body of land is on top of the world? Searching for: what body of land is on top of the world Generating answers for you… The highest point on Earth is Mount Everest which is located between Nepal and Tibet12. It is considered the tallest point on Earth with a height of 8,849 meters (29,032 feet)12.
Bravo, Dr. Millière, Your analysis of compression as the basis of generalization seems entirely on the money. Abstraction weights are accumulated not replicated. A good model would necessarily be sparse. It would be deep and wide, but not every path would be connected. In other words, its encoding would be maximally compressed. An improved model would be deeper still and sparser.
There are ways of interpolating in higher dimensions involving manifolds. This requires following along paths of highly correlated data while ignoring large voids.
Yes, words are abstractions, but vision and touch are also abstractions, starting at the first layer, whether retina or skin. Was Helen Keller multimodal? As for consciousness, conscious behavior can be modeled using computation or emergent within a system, so we can't know if a system is conscious from its output alone.
*Suggestions to spread the word:* 0. Edit your transcription to be readable 1. Make it CcBySa 2. Create a dedicated Wiki (with Recent Changes and Links) to make the conversation and comments +refactorable, compressable, operational,... 3. Integrate several Ais as partners into y/our discussion. Thank you. ❤
1:20:54 “But that’s phenomenology…” My guess-and it’s only a guess-is that that isomorphism refers to RGB values and the _names_ of the colors that the language model learns so there’s no phenomenology involved.
You've made me change my mind, Tim, about a model being conscious. I think it may be conscious for the brief time it processes a prompt. Have we created an ephemeral life that "lives" over and over again for seconds or milliseconds?
The great minds of AI have (possibly) created milliseconds of being conscious in a silicon medium. What created our own human consciousness that has arrived at this point after xxxxx years of evolution? Atoms are condensed electricity and light (electromagnetic) is the outer aspect of the inner faculty of thinking.
Consciousness needs internal feedback loops ie 'thoughts' based on other thoughts or feelings. Our consciousness are obviously an emergent effect, and systems without feedback loops doesn't reach emergence. As long as LLM haven't got internal feedback loops, they cannot be conscious. They are however intelligent..
A transformer attention repeller space. So training it to avoid a result is not quite the same as detecting intrusion into a result and applying a matrix multiply (affine map) to reposition the state to a more logical place. Should the falsity be trained as the same set of parameters as the truth? Or more exactly should truth be made worse to prevent false detection accuracy and vice versa?
😂 ✅️. ChatGpt is causing philosophers to reaccess what is human intelligence. Maybe human intelligence doesn't mean we truly understand meaning. Instead, we simply learn patterns and reguritate like the AI is apparently doing. With additional layers of networks, perhaps the AI be better than us.
@@farmerjohn6526 Of course we understand meaning, because we have a heart and we desire to procreate (most of us), and we can see beauty and sing with joy. AI will do that also, but not yet. We've given them brains and rules, but no heart or intuition.
@Van Hovey some people understand, meaning but not all. Look around. As for heart and intuition. Well, that's partly programming and partly inherited. But it, too, is not consistent across the humans. I would argue that many humans are heartless. Intuition is more complicated. But it's not magic. We have inborn instictual intuition, and then we have a form of learned intuition. I see nothing that here that artificial beings can't acquire in time.
@@farmerjohn6526 You talk like an atheist. You ever wonder how everything came from nothing, or why we are here and aware? Maybe there's a deeper truth involving our soul! Do you have one? BTW, every human has a heart, and all hearts beat at the same resonant frequency. So although some people may ignore their connection to humanity, never the less it still exists. Resonance, BTW, is not constrained by space or time.
I hope we're not wondering what AIs are doing. I mean, we wrote them! They are executions of our code. And if we're unsure what paths are taken, add some trace statements (outside of its control).
All right, so which is it, stochastic parrots or stochastic chameleons.... The stochastic elephants in the room? LOL - it's all fascinating...thanks again - cheers.
I am not sure how scientific is a view that a machine "has" world view. AI means human created intelligence. We have to program a machine what is the world and what is view. Who and when performed this operation? I love podcast and discussions, but when all of it done with the seriousness of an infinite wisdom - funny. "Processes induced" during learning process?? Com-on! All this woodooisation of a machine is no different than old movies about automobiles coming alive of 1970th. If any of LLM were "conscious" - every computer must be alive considering how much programming went into its construction. Apple - one word that describes apple - you do not need a trillion words to hold in your head to know it. It means - less is more in intelligence. True? :-)
Pod version: podcasters.spotify.com/pod/show/machinelearningstreettalk/episodes/107---Dr--RAPHAL-MILLIRE---Linguistics--Theory-of-Mind--Grounding-e20c60h (I ran out of space on the video description, this might be the longest list of references ever! 😄)
Also note that due to an editing error, about 1 minute is chopped off the end of this video. It was pretty much at the end, we just about to say our goodbyes. It's on the audio podcast.
Version with hesitation sounds removed: share.descript.com/view/aGelyTl2xpN
Thank you for your work Tim! This is the only AI channel which remained true to it's original style and content. Such an important and crucial contribution for the community!
Thank you!!
Agreed ❤
Thanks!
🙏
I really like the way you talked about unknown abstractions, Tim. We are getting splits in our models for concepts we may not even recognize, but back-prop doesn't lie. If a split on a node needs to be made, the training algorithm will make it.
Next token prediction is done in the feed-forward network that generates logits. The encoder that embeds tokens in multidimensions, along with the transformer's attention heads creates a complex context signature to travel through the feed-forward NN along with the initial prompt -- both for training and for inference. The pyramidal hierarchical nature of NNs results in an organized high entropy model of organized abstractions. Semantic nearness allows NNs to find new solutions that have not been seen in their training sets. This is real intelligence, but it is stateless intelligence in the sense that nothing is learned. As is the case with my calculator, an LLM nearly spits out a correct answer. Agents, however, can be made to be stateful, They can store input and output. They can synthesize identity, plans, goals, opinions, and emotions. Therein lies the possibility of conscious sentience.
I asked Bing Chat to make an analogy for "grading papers." It replied: "Grading papers can be like panning for gold. Just as a prospector carefully sifts through dirt and rocks to find valuable nuggets of gold, a teacher must carefully read through a student’s paper to find the valuable insights and ideas that demonstrate their understanding of the material. And just as a prospector must use their knowledge and experience to distinguish real gold from fool’s gold, a teacher must use their expertise to accurately assess the quality of a student’s work."
I was also interested in the claim about analogy, I presume they were referring to unique analogies, "panning for gold" would be in the training data. I will have to search where this claim comes from
wow the doctor is very good.does he have workshops. i wanna meet him
The attention heads create a context signature. Stephen Wolfram has some interesting graphics of what paths look like for given contexts. Look at his video explaining transformers. these could be used as alternate encodings. The attention heads in GPT-3 are huge. They store the context of every fragment of every length of the training set. In essence, there is a semantic search that happens for every next token. It's not done with cosine similarity, but it composes the next maximum logit as a "similarity." In other words, what is nearest to this context signature.
The beat AI talk channel for me. So grateful to listen up the contents. Thanks alot
From Bing Chat: The phrase “on top of the world” is an idiom that means feeling extremely happy or successful. When someone says “that woman is on top of the world,” they mean that she is feeling very happy and fulfilled, perhaps because she has achieved something important or overcome a significant challenge. It’s a way of saying that she is at the peak of her happiness or success.
What happens when you use it in context rather than ask for a meaning? The meaning is no doubt in the data it's using.
@@exhibitD79 what body of land is on top of the world?
Searching for: what body of land is on top of the world
Generating answers for you…
The highest point on Earth is Mount Everest which is located between Nepal and Tibet12. It is considered the tallest point on Earth with a height of 8,849 meters (29,032 feet)12.
Bravo, Dr. Millière, Your analysis of compression as the basis of generalization seems entirely on the money. Abstraction weights are accumulated not replicated. A good model would necessarily be sparse. It would be deep and wide, but not every path would be connected. In other words, its encoding would be maximally compressed. An improved model would be deeper still and sparser.
Brilliant lecture.
Wonderful, Tim. Another belter!
Thank you Mike! 😍
There are ways of interpolating in higher dimensions involving manifolds. This requires following along paths of highly correlated data while ignoring large voids.
This is very well done, very interesting, you are very inspiring dude.. you make me want to create my own channel in french
You can implement a NAND gate in a NN, and NAND gates are functionally complete.
Exactly. So much complication... You are right.
Yes, words are abstractions, but vision and touch are also abstractions, starting at the first layer, whether retina or skin. Was Helen Keller multimodal? As for consciousness, conscious behavior can be modeled using computation or emergent within a system, so we can't know if a system is conscious from its output alone.
*Suggestions to spread the word:* 0. Edit your transcription to be readable 1. Make it CcBySa 2. Create a dedicated Wiki (with Recent Changes and Links) to make the conversation and comments +refactorable, compressable, operational,... 3. Integrate several Ais as partners into y/our discussion. Thank you. ❤
1:20:54 “But that’s phenomenology…”
My guess-and it’s only a guess-is that that isomorphism refers to RGB values and the _names_ of the colors that the language model learns so there’s no phenomenology involved.
Super interesting aspect of intelligence. Self, other, general abstract. Categories or labels or functions?
Don’t you think AI is a perfect example of no free will? Look at us constructing these tools and not even knowing how to safe guard it. Crazy times
You've made me change my mind, Tim, about a model being conscious. I think it may be conscious for the brief time it processes a prompt. Have we created an ephemeral life that "lives" over and over again for seconds or milliseconds?
The great minds of AI have (possibly) created milliseconds of being conscious in a silicon medium. What created our own human consciousness that has arrived at this point after xxxxx years of evolution? Atoms are condensed electricity and light (electromagnetic) is the outer aspect of the inner faculty of thinking.
Consciousness needs internal feedback loops ie 'thoughts' based on other thoughts or feelings. Our consciousness are obviously an emergent effect, and systems without feedback loops doesn't reach emergence. As long as LLM haven't got internal feedback loops, they cannot be conscious. They are however intelligent..
A transformer attention repeller space. So training it to avoid a result is not quite the same as detecting intrusion into a result and applying a matrix multiply (affine map) to reposition the state to a more logical place. Should the falsity be trained as the same set of parameters as the truth? Or more exactly should truth be made worse to prevent false detection accuracy and vice versa?
These experts argue that AI doesn't think like humans do, without knowing how humans think. Remarkable.
😂 ✅️. ChatGpt is causing philosophers to reaccess what is human intelligence. Maybe human intelligence doesn't mean we truly understand meaning. Instead, we simply learn patterns and reguritate like the AI is apparently doing. With additional layers of networks, perhaps the AI be better than us.
@@farmerjohn6526 Of course we understand meaning, because we have a heart and we desire to procreate (most of us), and we can see beauty and sing with joy. AI will do that also, but not yet. We've given them brains and rules, but no heart or intuition.
@Van Hovey some people understand, meaning but not all. Look around. As for heart and intuition. Well, that's partly programming and partly inherited. But it, too, is not consistent across the humans. I would argue that many humans are heartless. Intuition is more complicated. But it's not magic. We have inborn instictual intuition, and then we have a form of learned intuition. I see nothing that here that artificial beings can't acquire in time.
@@farmerjohn6526 You talk like an atheist. You ever wonder how everything came from nothing, or why we are here and aware? Maybe there's a deeper truth involving our soul! Do you have one?
BTW, every human has a heart, and all hearts beat at the same resonant frequency. So although some people may ignore their connection to humanity, never the less it still exists. Resonance, BTW, is not constrained by space or time.
@@MrVanhovey no soul, no spirit, we are body and language, nothing more
I think it understands the language, it's structure, how words connect, but it's vision of the real world the words represent is vague.
I hope we're not wondering what AIs are doing. I mean, we wrote them! They are executions of our code. And if we're unsure what paths are taken, add some trace statements (outside of its control).
All right, so which is it, stochastic parrots or stochastic chameleons....
The stochastic elephants in the room? LOL - it's all fascinating...thanks again - cheers.
At the very least, consciousness requires an outer loop. LLMs don't need to have an outer loop.
It's hilarious to watch all these intellectuals getting their rationalisations and pet theories blown out of the water.
💓
Banerries all the way down
I am not sure how scientific is a view that a machine "has" world view. AI means human created intelligence. We have to program a machine what is the world and what is view. Who and when performed this operation? I love podcast and discussions, but when all of it done with the seriousness of an infinite wisdom - funny. "Processes induced" during learning process?? Com-on! All this woodooisation of a machine is no different than old movies about automobiles coming alive of 1970th. If any of LLM were "conscious" - every computer must be alive considering how much programming went into its construction. Apple - one word that describes apple - you do not need a trillion words to hold in your head to know it. It means - less is more in intelligence. True? :-)
What’s an apple?
@@oncedidactic it’s a banerry
@@anonymous.youtuber 😁
great content but a tough listen, choppy sentences
I didn't really notice myself, but if you prefer I uploaded a version with hesitation sounds removed - share.descript.com/view/aGelyTl2xpN
Stop with the ..ah.. ah ...ah lol
unwatchable
Try this version share.descript.com/view/aGelyTl2xpN
@@MachineLearningStreetTalk Thank you so much for saving my sanity! Oh, and also for the great work you're doing with your channel of course