video timestamps 0:09:53 - start of presentation, intro 0:12:16 - language model definition 0:15:30 - “temperature” parameter 0:17:20 - Wolfram Desktop demo of GPT2 0:18:50 - generate a sentence with GPT2 0:25:56 - unigram model 0:31:10 - bigram model 0:33:00 - ngram model 0:38:50 - why a model is needed 0:39:00 - definition of a “model” 0:39:20 - early modeling example: Leaning Tower of Pisa experiment 0:43:55 - handwritten digit recognition task 0:47:40 - using neural nets to recognize handwritten digits 0:51:31 - key idea: attractors 0:53:35 - neural nets and attractors 0:54:44 - walking through a simple neural net 1:01:50 - what’s going inside a neural net during classification 1:06:12 - training a neural net to correctly compute a function 1:09:10 - measuring “correctness” of neural net with “loss” 1:10:41 - reduce “loss” with gradient descent 1:17:06 - escaping local minima in higher dimensional space 1:21:15 - the generalizability of neural nets 1:28:06 - supervised learning 1:30:47 - transfer learning 1:32:35 - unsupervised learning 1:34:40 - training LeNet, a handwritten digit recognizer 1:38:14 - embeddings, representing words with numbers 1:42:12 - softmax layer 1:42:47 - embedding layer 1:46:22 - GPT2 embeddings of words 1:47:40 - ChatGPT’s basic architecture 1:48:00 - Transformers 1:52:50 - Attention block 1:59:00 - amount of text training data on the web 2:03:35 - relationship between trillions of words and weights in the network 2:09:40 - reinforcement learning from human feedback 2:12:38 - Why does ChatGPT work? Regularity and structure in human language 2:15:50 - ChatGPT learns syntactic grammar 2:19:30 - ChatGPT’s limitation in balancing parentheses 2:20:51 - ChatGPT learns [inductive] logic based on all the training data it’s seen 2:23:57 - What regularities Stephen Wolfram guesses that ChatGPT has discovered 2:24:11 - ChatGPT navigating the meaning space of words 2:34:50 - ChatGPT’s limitation in mathematical computation 2:36:20 - ChatGPT possibly discovering semantic grammar 2:38:17 - a fundamental limit of neural nets is performing irreducible computations 2:41:09 - Q&A 2:41:16 - Question 1: “Are constructed languages like Esperanto more amenable to semantic grammar AI approach?” 2:43:14 - Question 2 2:32:37 - Question 3: token limits 2:45:00 - Question 4: tension between superintelligence and computational irreducibility. How far can LLM intelligence go? 2:52:12 - Question 5 2:53:22 - Question 6: pretraining a large biologically inspired language model 2:55:46 - Question 7: 5 senses multimodal model 2:56:25 - Question 8: the creativity of AI image generation 2:59:17 - Question 9: how does ChatGPT avoid controversial topics? Taught through reinforcement learning + possibly a list of controversial words 3:03:26 - Question 10: neural nets vs other living multicellular intelligence, principle of computational equivalence 3:04:45 - Human consciousness 3:06:40 - Question 11: automated fact checking for ChatGPT via an adversarial network. Train ChatGPT with WolframAlpha? 3:07:25 - Question 12: Can ChatGPT play a text-based adventure game? 3:07:43 - Question 13: What makes GPT3 so good at language? 3:08:22 - Question 14: Could feature impact scores help us understand GPT better? 3:09:48 - Question 15: ChatGPT’s understanding of implications 3:10:34 - Question 16: the human brain’s ability to learn 3:13:07 - Question 17: how difficult will it be for individuals to train a personal ChatGPT that behaves like a clone of the user?
I agree with you on everything, these days finding a financial mentor is a tough challenge which is why I am happy and grateful to have been introduced to my mentor Larry Kent Burton by a friend. I made a lot of money in just two months working with him for just a small investment
@@martinsriggs2441 He is a financial advisor and investor, he helps people to better understand the financial markets and he also does trading and investing on your behalf
this took me a few days to get through... in a good way so much good stuff here, such a great instructor... great ways of explaining and visual aids Amazed Mr. Wolfram is as generous with his time as to share his insights and be as open with everyone given he has many companies to run and problems to solve. i love engineering😊
Blacks are always the criminals, poor, in the background, asking questions and subordinate in Hollywood movies. Its an agenda. The China film administration is better than Hollywood. Hollwood really Hates Black on Blacks Love. Black men not allowed to have their own Hair & must be bald headed in every single Hollywood movie.
True, I also thought of it the same way you take your phone, just start typing and just spam whatever autocorrect thinks the best next word is as the principle of its work. Didn't think it can get this good.
Thank you so much ! I learned more in these 3 hours than in months of watching other videos about this subject. It would be great if more knowledgeable people used youtube to share their experiences. 🙏🏻🙏🏻🙏🏻
Here's a question: how much does the wording of the questions afect it's answers? - Presumably if it just tries to continue, if you make errors, it ought to make more errors after too, right? - How about if you ask with "uneducated" language vs scientific? - Rather than just affect the tone, would it also affect the contents? - What if you speak in a way it has associated with certain biases? - Who knows what kinds of patterns it has came up with, considering it "discovered" those "semantic grammars" we as humans aren't even aware of ...
Amazing presentation. If I were to experiment with machine learning I would examine small-world networks instead of layered networks. And try genetic algorithms such as randomly adjusting the network into a number of variations, then pick the best candidate and repeat the adjustment for the new candidate and continue iterating until a desired outcome is found.
Ya, that's been done actually. research the various AI/ML models and research papers. btw, the 'layered networks' is kind of a useful structure for 'adjusting the network into a number of variations'
In essence, this sort of weighed inference about an existing corpus, can only produce a deterministic set of possibilities, even if this set is enormous. We have a general problem with the notion of "intelligence", insofar as we rarely consider the difference between functional knowledge and knowledge production. These approaches to AI can produce new knowledge within the extant corpus - they can help discover previously unknown, optimal relations in the existing corpus, and that is useful, but it cannot produce new paradigms about the world. Intelligence is more than the ability to infer relations ; it is the ability to change the entire coordinate system of the corpus by altering the vantage point of the observer. For this to be possible, there has to be a higher-order, synthetic model of the corpus, based on what we call logic, which is the opposite of the brute-force approach of LLMs. What we may need, to produce new paradigms, is a sparse model that embeds key structures in the language of concepts.
I'd Love to see more in-depth analysis like this on the current LLM topic utilizing Dr. Wolfram in this format. Exceptional content. As an aside I've really been missing the physics project live streams.
The most important thing that should be on everyone's mind currently should be to invest in different sources of income that doesn't depend on the government. Especially with the current economic crisis around the word. This is still a good time to invest in various stocks, Gold, silver and digital currencies.
This was the most fascinating and informative discussion, particularly, your responses to commenters! Please post the link to the paper you recently wrote (?) that inspired this live video discussion. And thank you!
asked chat to quit reminding me it was a language model because i personally find more it easier to converse if i treat them as if they were another being. there was a rather long pause, then chat came back and for all intensive purposes was a very polite and helpful uh... person? dunno how to regard them, they're awesome tho :)
Logic, concepts, math, ie "deterministic processes" seems to be missing in this language models (LMs). Either we can identify where or how the model reflect this abilities and work from that, or maybe we could use other types of models like logic indictors, "demostrators" etc in conjunction with LMs. On the one hand humans are capable both of "unconsiuos intuition" (similar to LMs), on the other, we can reason, we have formal languages etc. To me, that combination of abilities is what define human intelligence.
This is super interesting and I’m learning a lot, thank you for this video. I do feel the amount times I hear “ugh” and “um” is really off putting. Sorry if that’s nitpicky but I almost can’t make it through the beginning because of ugh. Um. Ugh.
Deep neural networks can unroll a certain number of loops. They can hold a certain amount of local memory, but not much. So they can't do things like use the number of words in a generated sentence in that sentence. This requires a second pass through the llm with the sentence as context. Good agent design can solve many of the foibles of neural nets. Langchain and LangGraph solve a lot of these problems.
I use it a lot to help me write and fix code and also to explain things for me or piece things things together. It's a great partner/tool to use if you have some good input and existing knowledge
Nicely done, Stephen. This is a great introduction for a novice. Your talk creates great intuition. You made the embeddings seem simple as a prebaked unchanging part of the entire NN. Also the breaking up of the "feature signature" makes parallelism possible through the various attention heads. One missing idea that you might include at some point is how signals can be added, basically the Fourier series.
Great video, Wolfram! As someone who's fascinated by AI, I found your explanation of Chat GPT's inner workings to be very informative. One thing I found myself wondering while watching the video was how Chat GPT compares to other language models out there. Have you done any comparisons with other models, and if so, how does Chat GPT stack up? I also think it would be interesting if you could have delved a bit more into the ethical considerations surrounding the use of language models like Chat GPT. For example, what steps can we take to ensure that these models aren't being used to spread misinformation or reinforce harmful biases? Overall, though, great job breaking down such a complex topic in an accessible way!
Ah. That's why they call the API "completion". I worked with something called "Hidden Markov Models" to decompose documents and recognize parts like title, author, subject etc. this was done by training on already labelled documents until the model had a "path" of most likely joined words.
Beyond any doubt, this is the best lecture for understanding what lies behind NLP and NLU. I find that many professionals who work with models don't understand why these models work so well and what they do. You can't get the depth of understanding of semantic space as you get from this video from reading Attention is All You Need. That understanding is missed. I wonder how this understanding happened. Was it found piecemeal, or was accidental? Was it understood after this architecture first worked?
Those paths through meaning space are fascinating, Stephen. I would call each one a context signature. In auto-regressive training, we are looking for the next token. Why not look for the next context signature? In fact, why not train a model using graphical context signatures? Then decode replies. Other than training with graphical context signatures, in essence, I believe this is what's occurring when training a transformer. The addition of signal from the entire context is retrieving the next token so that token by token a context signature is retrieved. But is it possible to retrieve an entire context signature and then decode it? I wonder how much efficiency one could achieve with this method. Moreover, I wonder how well a convolutional NN would handle training from graphical context signatures? If you want to discover physic-like laws of semantic motion, this might be a way in.
Thank you for sharing your insights and all the good questions. It's really lonely to not being in an academic environment or a company about ML and AI.
That's very useful information, because you don't really know where to start investigating the topic. It's also impressive, that the Wolfram language can manage a representation of that mechanism. What surprises me, however, is how ChatGPT includes different contexts in its predictions, because there are certainly multiple interpretations of the large number of learned text structures if the context is not clearly defined at the beginning of the conversation.
I would love to get Noam Chomsky's comments on the idea of "semantic grammar." It seems fairly compelling. Thanks. I also think the parenthesis grammar as a hand-hold for understanding these models is a great idea.
@@Casevil669 I think he means more at scale / commerical applications. Similar projects have already been done with the API + Wolfram for some time now, hobbyist architectures aside.
@@Casevil669 Thanks for the reference. Have you tried it? Does it attempt to determine which of WolframAlpha (for maths/facts heavy question) or ChatGPT (for text based questions) will be better at answering any given question; or does the integration go deeper than this? Any good?
(Removed; Unfair. Did not watch the whole presentation.) In any case: Great presentation so far, and huge technological respect for everyone involved in the ChatGPT project. Fascinating stuff.
I have a rule for writing text always choose the word that eliminate the most other words first I am writing a plan i will start this way: my plan is ... If I gone write a essay I will start with: this essay is about ChatGPT...
2:50:30 Does that mean we could play the natural role of the AI's Brain stem, where we are not as conscious as the AI but the AI still works to understand and aid us?
2:14:02 Stephen says "verbs and nouns go this way..." and shows the old - very old! as old as Aristotle - idea of context-free phase structure grammar based on subject-predicate - but that's a wrong idea, because the real grammar of English is both simpler and more elaborate. *The Natural Topology of English* includes subject-predicate as one of its basic forms, but there are others, such as event = agent + action + object which is an instance of concept - relation - concept
This isnt just a great lesson in AI, it is a lesson in how to be a good teacher. (Start with simple concepts students can grasp and only then build up.)
ChatGPT is excellent at answering questions about Western music theory, but in some cases the initial answer needs prompting, especially when accounting for enharmonic equivalents.
Who would have thought that conversation was a slightly random walk through probable clumps of letters and words? Fascinating. I have to say, though, I think it's actually the human reinforcement that gives particular clumpings their perceptible meaningfulness.
Could the randomness process for choosing the next probable word within a certain temperature parameter be consigned to a quantum random process? If so, an essay could be viewed as a flat plane or an entire terrain with peaks and troughs.Within this paradigm, a certain style of writer could be viewed as a dynamic sheet, similar to how different materials when laid over a ragged topology should comply and not comply with what it is laid on top of. With this quantum process an overall view of the essay could be judged at an aesthetic level from most pleasing to least on several different qualities concurrently and not mutually exclusively making an approximate or some sort of conscious viewer
What I find interesting is how they inject the objective pattern recognition into the model to aid in figuring out puzzles and riddles. It will provide extensive reasoning on how it arrived at its answer. GPT-4 really excels in this ability and has a great sense of humor to go with it.
I guess that only works for riddles that were already solved and the reasoning is already established from someone, where Chatgpt got his data from. I don’t think it could solve any riddle by itself… It can hardly do easiest algebra.
The thing I like about ChatGPT is, you can tell it some information and then ask a question and it can get it wrong, but you can then say, no you got it wrong. But if you figure out its break in logic and explain to it why it got it wrong and what assumptions it made that was wrong, and correct that, it learns. Do that enough times and you can break down any concept, no matter how nuanced and complicated. I've done this. It works. But I only used ChatGPT for one day and never since. Why? Because it's not capable of any truly new and original thought. It can only spit out what we already know. So if the world thinks lemmings jump off cliffs, then so does ChatGPT. Again, you could dig down into it and ask why it thinks lemmings jump off cliffs and show its assumption are unproven, but that's no better than talking to a human and there are over 8 billion other natural ChatGPTs on this planet which already do that. At that point, I lost interest. It's like a boat without a rudder.
Blacks are always the criminals, poor, in the background, asking questions and subordinate in Hollywood movies. Its an agenda. The China film administration is better than Hollywood. Hollwood really Hates Black on Blacks Love. Black men not allowed to have their own Hair & must be bald headed in every single Hollywood movie.
Oh...and finally Stephen makes the statement "That pattern of language has occurred before". No, I dont think so. The implication is that the probability given by the weightings leading to the next word can only have come from seeing the previous word and adjusting the weightings on that combination alone but thats not true. All the weightings including the ones leading to the next word have been influenced by many words from many sentences back propagating. I dont think its necessary for the next word to have been seen before for that pattern to emerge.
" I dont think its necessary for the next word to have been seen before for that pattern to emerge." im not even sure what you are referring to. large models learn concepts in the latent space, from patterns it observes. so when it predicts the word which is most likely for the whole pattern to be most likely. so it takes in account he whole context so it still is the case that the probability reflects data it has seen , even if it has not seen the exact sentence before
@@armin3057 Stephen's claim was that the next word chosen had been seen before as a word pair. It was towards the end of the talk but unfortunately I didn't take note of the timestamp. In other words an AI can never produce unseen word pairs like "moose feathers" because feathers never followed moose in its training set. I took note of the quote at the time "That pattern of language has occurred before"
Many thanks Stephen! I absolutely enjoyed the step by step introduction into the layers of the matter. However it is obvious that we are still on the technical/mechanical side of the whole journey. Still none is able to explain the concept and reality of infinity, or "1" or "0", but an honest struggle towards that wisdom may open new paths in learning and lead to brilliant discoveries.
What i find interesting is how similar an action potential and binary boolean values are so similar neuron during an action potential the nueron can be considered state is 1 and 0 when it is not. biological based memory. basically could start as bubble memory but in organic form. if there was a system that was able to interface with a neuron if the system was addressable it wouldn't matter what neuon migrated to what interface point the addressing would just need to be adjust to correct the nuerons connection. example neuron that migrated to connection point for the eye to correct instead of the thumb just change the port address.
This is a video of Stephen Wolfram preparing to make a video. Through laziness or distraction, he did not make the actual video. The most frequently used word in the video is "um." This cleverly demonstates the point that if ChatGPT simply used the most frequently found word in every situation you would get very bad output.
I speculate it does have a “global plan” of what to say next, instead of one word at a time. It implicitly has a representation of the joint probability distribution of what’s to be continued… Prompting kind of bring out that distribution… which you can extract knowledge, in current its form, some piece of text (but may be other modalities in the near future). i was convinced by Sutskever’s take more.
So fascinating! I guess one major difference between the way the human brain and GPT handle language is that human brains use emotions to categorize objects and concepts… I wonder if it would be possible to teach GPT emotions, and what might be the result?
It doesn’t even know what it is saying, it just predicts the next word. So I’d say the biggest diFference would be knowing what you want to say instead of just guessing the next word on probability…
I still don’t understand how it’s able to generate a really elegant Boost ::spirit C++ parser, and a compiler/evaluator for a non trivial language I explained to it
In semiotics, there is "pragmatics," as well as syntactics and semantics. Communication is purposeful, about something or someone, rather than about predicting the next word. There is always some motivation behind utterances, which in turn are predisposed by "beliefs" and "values." So there is cognition, yes, but also affective and psychomotor skill domains involved. When "natural language processors" begin to present (and see) themselves as agents with a memory at least of their own experiences with you (and others), conversations will be more interesting. As more and more complex ideas and concepts (gained, say, from reading text with attention to those, rather than to words per se) can be mapped in some "meaning space" which can be transitioned in the direction of the allegorical, metaphorical, or hypothetical (and thus creative), the closer we will come to AGI.
"Your description of the different components of communication in semiotics is accurate. Pragmatics refers to the way language is used in context to achieve a specific goal or purpose, while syntax and semantics deal with the structure and meaning of language itself. And as you note, communication is influenced by a wide range of factors, including cognitive, affective, and psychomotor skills, as well as beliefs, values, and experiences. Regarding natural language processors and their potential to evolve towards AGI, it is indeed an exciting area of research. As these systems become more sophisticated, they will be able to parse and understand complex ideas and concepts, as well as respond with more creativity and nuance. However, achieving true AGI will require advances in many other areas beyond natural language processing, including machine learning, robotics, and computer vision. In any case, the development of more advanced natural language processing systems will undoubtedly have a major impact on the way we interact with technology and with each other, and may bring us closer to a future where machines and humans can communicate with each other in truly meaningful and productive ways." So says ChatGPT when prompted by what you wrote which, itself, was a response prompted by your reflections on the issue. I do actually agree, and this strikes me as the essential thesis behind "The embodied mind", brilliantly explained in a series of books by the philosopher of Mind Andy Clark.
Has anyone considered the parallel to how people compose writing, and a process of selecting a word that feels right in the context to add to the string, and head, once the theme is developed, to bringing it to conclusion? That's what AI is now learning to do, so it appears that the "temperature" is NOT random at all. That may be accidental, of course, but something is inputting a sense of feeling about a context (if you take my words for their root meanings and understandings).
Definitely points to the issues with attention economy. The bland predictive capability may very well be the correct percentage of use cases compiled, but then does not seem to correspond to a request for _interesting_ writing or dialog. Editing is an Attention Economy factor that works on many levels. Think genre, or audience. Where is the story relationship between platypus and unicorn in terms of either believability or interest and how anchored in reality will A.I be trained to be if attention is the only requirement?
I fell asleep with youtube on and im at this
same
same😂
Same
JDHSJAHAHW SAME
Same
video timestamps
0:09:53 - start of presentation, intro
0:12:16 - language model definition
0:15:30 - “temperature” parameter
0:17:20 - Wolfram Desktop demo of GPT2
0:18:50 - generate a sentence with GPT2
0:25:56 - unigram model
0:31:10 - bigram model
0:33:00 - ngram model
0:38:50 - why a model is needed
0:39:00 - definition of a “model”
0:39:20 - early modeling example: Leaning Tower of Pisa experiment
0:43:55 - handwritten digit recognition task
0:47:40 - using neural nets to recognize handwritten digits
0:51:31 - key idea: attractors
0:53:35 - neural nets and attractors
0:54:44 - walking through a simple neural net
1:01:50 - what’s going inside a neural net during classification
1:06:12 - training a neural net to correctly compute a function
1:09:10 - measuring “correctness” of neural net with “loss”
1:10:41 - reduce “loss” with gradient descent
1:17:06 - escaping local minima in higher dimensional space
1:21:15 - the generalizability of neural nets
1:28:06 - supervised learning
1:30:47 - transfer learning
1:32:35 - unsupervised learning
1:34:40 - training LeNet, a handwritten digit recognizer
1:38:14 - embeddings, representing words with numbers
1:42:12 - softmax layer
1:42:47 - embedding layer
1:46:22 - GPT2 embeddings of words
1:47:40 - ChatGPT’s basic architecture
1:48:00 - Transformers
1:52:50 - Attention block
1:59:00 - amount of text training data on the web
2:03:35 - relationship between trillions of words and weights in the network
2:09:40 - reinforcement learning from human feedback
2:12:38 - Why does ChatGPT work? Regularity and structure in human language
2:15:50 - ChatGPT learns syntactic grammar
2:19:30 - ChatGPT’s limitation in balancing parentheses
2:20:51 - ChatGPT learns [inductive] logic based on all the training data it’s seen
2:23:57 - What regularities Stephen Wolfram guesses that ChatGPT has discovered
2:24:11 - ChatGPT navigating the meaning space of words
2:34:50 - ChatGPT’s limitation in mathematical computation
2:36:20 - ChatGPT possibly discovering semantic grammar
2:38:17 - a fundamental limit of neural nets is performing irreducible computations
2:41:09 - Q&A
2:41:16 - Question 1: “Are constructed languages like Esperanto more amenable to semantic grammar AI approach?”
2:43:14 - Question 2
2:32:37 - Question 3: token limits
2:45:00 - Question 4: tension between superintelligence and computational irreducibility. How far can LLM intelligence go?
2:52:12 - Question 5
2:53:22 - Question 6: pretraining a large biologically inspired language model
2:55:46 - Question 7: 5 senses multimodal model
2:56:25 - Question 8: the creativity of AI image generation
2:59:17 - Question 9: how does ChatGPT avoid controversial topics? Taught through reinforcement learning + possibly a list of controversial words
3:03:26 - Question 10: neural nets vs other living multicellular intelligence, principle of computational equivalence
3:04:45 - Human consciousness
3:06:40 - Question 11: automated fact checking for ChatGPT via an adversarial network. Train ChatGPT with WolframAlpha?
3:07:25 - Question 12: Can ChatGPT play a text-based adventure game?
3:07:43 - Question 13: What makes GPT3 so good at language?
3:08:22 - Question 14: Could feature impact scores help us understand GPT better?
3:09:48 - Question 15: ChatGPT’s understanding of implications
3:10:34 - Question 16: the human brain’s ability to learn
3:13:07 - Question 17: how difficult will it be for individuals to train a personal ChatGPT that behaves like a clone of the user?
Thanks. A. Ton!
Thanks 🙏🏾
😊 Appreciate it!!!
Most significant comment of our time 😂
😊😊😊😊😊😊
Watching this videos is a great way to review all this things and understand them again, maybe a little better.
Thank you very much.
Wolfram is such a gem. He never ceases to fascinate.
The teachings on this channel are always top notch so informative and easy to understand, it's very hard to find good content online these days
I agree with you on everything, these days finding a financial mentor is a tough challenge which is why I am happy and grateful to have been introduced to my mentor Larry Kent Burton by a friend. I made a lot of money in just two months working with him for just a small investment
Who exactly is this Mr. Larry? what does he do? And how can I take advantage of him
@@martinsriggs2441 He is a financial advisor and investor, he helps people to better understand the financial markets and he also does trading and investing on your behalf
@ Larry Kent Nick Trading
Thanks for the help everyone, I will contact him as soon as possible
Starts at 9:53
1:16:25 breakthrough in 2012
1:57:35 "It's crazy that things like this work"
this took me a few days to get through... in a good way
so much good stuff here, such a great instructor... great ways of explaining and visual aids
Amazed Mr. Wolfram is as generous with his time as to share his insights
and be as open with everyone given he has many companies to run and problems to solve.
i love engineering😊
as a radical thinker/CS student studying some graduate level mathematical logic. Wolfram is one of my "12 disciplies", i.e. he's a holy figure to me.
Blacks are always the criminals, poor, in the background, asking questions and subordinate in Hollywood movies. Its an agenda. The China film administration is better than Hollywood. Hollwood really Hates Black on Blacks Love. Black men not allowed to have their own Hair & must be bald headed in every single Hollywood movie.
"Amazed Mr. Wolfram is as generous with his time..."
Then maybe you'd be interested in buying his book.
Love this. I knew a lot of this, but it was still great to hear it expressed in a clear and systematic way.
True, I also thought of it the same way you take your phone, just start typing and just spam whatever autocorrect thinks the best next word is as the principle of its work. Didn't think it can get this good.
Thank you so much ! I learned more in these 3 hours than in months of watching other videos about this subject.
It would be great if more knowledgeable people used youtube to share their experiences.
🙏🏻🙏🏻🙏🏻
Difference between qualified and unqualified people. Basically its the difference between a radio DJ and college proffesor, yeah.
@porkbun1555
n
@@porkbun1555
starts at 9:50
thanks. appreciated
we're asleep anyway
About ChatGPT,
very few people are
telling the truth and
Wolframe is the most powerful one
❤
Thank you very much,
Steve Wolfram ❤
Expertly explained in a way understandable to a large set if people. Bravo.
Here's a question: how much does the wording of the questions afect it's answers?
- Presumably if it just tries to continue, if you make errors, it ought to make more errors after too, right?
- How about if you ask with "uneducated" language vs scientific? - Rather than just affect the tone, would it also affect the contents?
- What if you speak in a way it has associated with certain biases?
- Who knows what kinds of patterns it has came up with, considering it "discovered" those "semantic grammars" we as humans aren't even aware of ...
This was amazing... never watched a lecture from Stephen and he's an amazing teacher.
Amazing presentation. If I were to experiment with machine learning I would examine small-world networks instead of layered networks. And try genetic algorithms such as randomly adjusting the network into a number of variations, then pick the best candidate and repeat the adjustment for the new candidate and continue iterating until a desired outcome is found.
Ya, that's been done actually. research the various AI/ML models and research papers. btw, the 'layered networks' is kind of a useful structure for 'adjusting the network into a number of variations'
Thank you so much Mr. Wolfram, you really shed light in some areas I had not fully grasped before!
Amazing & super helpful!!! I really enjoyed watching it and learned a lot.
Should be awarded a NOBEL prize for this tutorial. Well played.....
In essence, this sort of weighed inference about an existing corpus, can only produce a deterministic set of possibilities, even if this set is enormous. We have a general problem with the notion of "intelligence", insofar as we rarely consider the difference between functional knowledge and knowledge production. These approaches to AI can produce new knowledge within the extant corpus - they can help discover previously unknown, optimal relations in the existing corpus, and that is useful, but it cannot produce new paradigms about the world. Intelligence is more than the ability to infer relations ; it is the ability to change the entire coordinate system of the corpus by altering the vantage point of the observer. For this to be possible, there has to be a higher-order, synthetic model of the corpus, based on what we call logic, which is the opposite of the brute-force approach of LLMs. What we may need, to produce new paradigms, is a sparse model that embeds key structures in the language of concepts.
The weights are Gaussian because they are constrained to be during training via layer normalisation. It makes the gradient signal flow better.
very easy to understand ....amazing method of sir...thanks
Oh wait your the website that helps me rearrange formulas. Thanks I’ve used it so much
I'd Love to see more in-depth analysis like this on the current LLM topic utilizing Dr. Wolfram in this format. Exceptional content. As an aside I've really been missing the physics project live streams.
‘We U6😊
m.ruclips.net/video/vidaHDvXMLo/видео.html
1:18:58 the art of training
1:32:25 training LLMs
1:52:47 attention
1:56:20 attention blocks
2:08:57 neurons have memory
The most important thing that should be on everyone's mind currently should be to invest in different sources of income that doesn't depend on the government. Especially with the current economic crisis around the word. This is still a good time to invest in various stocks, Gold, silver and digital currencies.
S
You forgot ammo
Very insightful area to learn from. Thank you.
This was the most fascinating and informative discussion, particularly, your responses to commenters! Please post the link to the paper you recently wrote (?) that inspired this live video discussion. And thank you!
At 33:49, it's interesting how the text looks more like English the longer the character length. Great video.
This is a LONG video truthfully. But very informative as it should be with the length of it
Remarkable talk, simply outstanding!
Amazing presentation. Thank you so much !👍👍👍
let me guess… everyone fell asleep and then woke up to this livestream playing, am i right?
Nope
asked chat to quit reminding me it was a language model because i personally find more it easier to converse if i treat them as if they were another being. there was a rather long pause, then chat came back and for all intensive purposes was a very polite and helpful uh... person? dunno how to regard them, they're awesome tho :)
I really like how he explained everything. Oh, how I wish I didn't sleep during math class.🤣🤣
Logic, concepts, math, ie "deterministic processes" seems to be missing in this language models (LMs).
Either we can identify where or how the model reflect this abilities and work from that, or maybe we could use other types of models like logic indictors, "demostrators" etc in conjunction with LMs.
On the one hand humans are capable both of "unconsiuos intuition" (similar to LMs), on the other, we can reason, we have formal languages etc. To me, that combination of abilities is what define human intelligence.
ruclips.net/video/vidaHDvXMLo/видео.html
New Drinking Game: Everytime Mr. Wolfram says "Umm" take a drink!
This is super interesting and I’m learning a lot, thank you for this video. I do feel the amount times I hear “ugh” and “um” is really off putting. Sorry if that’s nitpicky but I almost can’t make it through the beginning because of ugh. Um. Ugh.
Good analysis. Please do more of this
Greatest primer/teaser for genetic algorithms and neural networks that I've seen
Thanx
Deep neural networks can unroll a certain number of loops. They can hold a certain amount of local memory, but not much. So they can't do things like use the number of words in a generated sentence in that sentence. This requires a second pass through the llm with the sentence as context. Good agent design can solve many of the foibles of neural nets. Langchain and LangGraph solve a lot of these problems.
I use it a lot to help me write and fix code and also to explain things for me or piece things things together. It's a great partner/tool to use if you have some good input and existing knowledge
Nicely done, Stephen. This is a great introduction for a novice. Your talk creates great intuition. You made the embeddings seem simple as a prebaked unchanging part of the entire NN. Also the breaking up of the "feature signature" makes parallelism possible through the various attention heads. One missing idea that you might include at some point is how signals can be added, basically the Fourier series.
Great sharing my dear Have a NICE day STAY connect FULL Watching
So amazing ! Thank you for explaining
Great video, Wolfram! As someone who's fascinated by AI, I found your explanation of Chat GPT's inner workings to be very informative.
One thing I found myself wondering while watching the video was how Chat GPT compares to other language models out there. Have you done any comparisons with other models, and if so, how does Chat GPT stack up?
I also think it would be interesting if you could have delved a bit more into the ethical considerations surrounding the use of language models like Chat GPT. For example, what steps can we take to ensure that these models aren't being used to spread misinformation or reinforce harmful biases?
Overall, though, great job breaking down such a complex topic in an accessible way!
A
Ah. That's why they call the API "completion". I worked with something called "Hidden Markov Models" to decompose documents and recognize parts like title, author, subject etc. this was done by training on already labelled documents until the model had a "path" of most likely joined words.
Beyond any doubt, this is the best lecture for understanding what lies behind NLP and NLU. I find that many professionals who work with models don't understand why these models work so well and what they do. You can't get the depth of understanding of semantic space as you get from this video from reading Attention is All You Need. That understanding is missed. I wonder how this understanding happened. Was it found piecemeal, or was accidental? Was it understood after this architecture first worked?
Privet! You can produce well. Electrifying i find Your channel is getting ridiculously well. I can watched repeat again! Keep going.
Those paths through meaning space are fascinating, Stephen. I would call each one a context signature. In auto-regressive training, we are looking for the next token. Why not look for the next context signature? In fact, why not train a model using graphical context signatures? Then decode replies. Other than training with graphical context signatures, in essence, I believe this is what's occurring when training a transformer. The addition of signal from the entire context is retrieving the next token so that token by token a context signature is retrieved. But is it possible to retrieve an entire context signature and then decode it? I wonder how much efficiency one could achieve with this method. Moreover, I wonder how well a convolutional NN would handle training from graphical context signatures? If you want to discover physic-like laws of semantic motion, this might be a way in.
😊😊
I've since learned that we do have sequence to sequence training but that the output space grows exponentially.
Thank you for sharing your insights and all the good questions. It's really lonely to not being in an academic environment or a company about ML and AI.
31:04 ❤Love ❤usa❤
😅You
That's very useful information, because you don't really know where to start investigating the topic. It's also impressive, that the Wolfram language can manage a representation of that mechanism.
What surprises me, however, is how ChatGPT includes different contexts in its predictions, because there are certainly multiple interpretations of the large number of learned text structures if the context is not clearly defined at the beginning of the conversation.
Ppp
Pppl
Thank you.
2:57 An episode of Star Trek had aliens that were brains with no bodies.
two words....................sleep learning LOL I fell asleep with utube on and woke up to this more than once. Very interesting!!!
The onion rooting protocol isn't as anonymous as you think it is. Whomever controls the exit notes controls the traffic. Which makes me in control.
Thank you so much !
I would love to get Noam Chomsky's comments on the idea of "semantic grammar." It seems fairly compelling. Thanks. I also think the parenthesis grammar as a hand-hold for understanding these models is a great idea.
Excellent...learned so much.
When will we see the next obvious hybrid system: ChatGPT + WolframAlpha/Mathematica ?
There's one already on huggingface if you're interested
I guess ChatGPT can learn Wolfram language as other languages too.
@@Casevil669 I think he means more at scale / commerical applications. Similar projects have already been done with the API + Wolfram for some time now, hobbyist architectures aside.
@@Casevil669 Thanks for the reference. Have you tried it? Does it attempt to determine which of WolframAlpha (for maths/facts heavy question) or ChatGPT (for text based questions) will be better at answering any given question; or does the integration go deeper than this? Any good?
It was helping me program things in mathematica the other day actually
Thank you very much Professor Stephen,
His example of unsupervised learning around 1:33:00 seems like supervised learning to me.
Incredible work, thank you.
This was a fantastic video to watch
(Removed; Unfair. Did not watch the whole presentation.) In any case: Great presentation so far, and huge technological respect for everyone involved in the ChatGPT project. Fascinating stuff.
Thank you for this… Language is fascinating
Thank you very much for sharing your knowledge!
I have a rule for writing text always choose the word that eliminate the most other words first I am writing a plan i will start this way: my plan is ...
If I gone write a essay I will start with: this essay is about ChatGPT...
2:50:30 Does that mean we could play the natural role of the AI's Brain stem, where we are not as conscious as the AI but the AI still works to understand and aid us?
2:14:02 Stephen says "verbs and nouns go this way..." and shows the old - very old! as old as Aristotle - idea of context-free phase structure grammar based on subject-predicate - but that's a wrong idea, because the real grammar of English is both simpler and more elaborate. *The Natural Topology of English* includes subject-predicate as one of its basic forms, but there are others, such as
event = agent + action + object which is an instance of concept - relation - concept
This isnt just a great lesson in AI, it is a lesson in how to be a good teacher. (Start with simple concepts students can grasp and only then build up.)
Everyone loves an excellent Educator.
Great info Thx!👍
ChatGPT is excellent at answering questions about Western music theory, but in some cases the initial answer needs prompting, especially when accounting for enharmonic equivalents.
Who would have thought that conversation was a slightly random walk through probable clumps of letters and words? Fascinating. I have to say, though, I think it's actually the human reinforcement that gives particular clumpings their perceptible meaningfulness.
ruclips.net/video/kBPpZKZc8Dc/видео.html (good)
Could the randomness process for choosing the next probable word within a certain temperature parameter be consigned to a quantum random process? If so, an essay could be viewed as a flat plane or an entire terrain with peaks and troughs.Within this paradigm, a certain style of writer could be viewed as a dynamic sheet, similar to how different materials when laid over a ragged topology should comply and not comply with what it is laid on top of. With this quantum process an overall view of the essay could be judged at an aesthetic level from most pleasing to least on several different qualities concurrently and not mutually exclusively making an approximate or some sort of conscious viewer
What I find interesting is how they inject the objective pattern recognition into the model to aid in figuring out puzzles and riddles. It will provide extensive reasoning on how it arrived at its answer. GPT-4 really excels in this ability and has a great sense of humor to go with it.
I guess that only works for riddles that were already solved and the reasoning is already established from someone, where Chatgpt got his data from. I don’t think it could solve any riddle by itself… It can hardly do easiest algebra.
The thing I like about ChatGPT is, you can tell it some information and then ask a question and it can get it wrong, but you can then say, no you got it wrong. But if you figure out its break in logic and explain to it why it got it wrong and what assumptions it made that was wrong, and correct that, it learns. Do that enough times and you can break down any concept, no matter how nuanced and complicated. I've done this. It works.
But I only used ChatGPT for one day and never since. Why? Because it's not capable of any truly new and original thought. It can only spit out what we already know. So if the world thinks lemmings jump off cliffs, then so does ChatGPT. Again, you could dig down into it and ask why it thinks lemmings jump off cliffs and show its assumption are unproven, but that's no better than talking to a human and there are over 8 billion other natural ChatGPTs on this planet which already do that. At that point, I lost interest. It's like a boat without a rudder.
Blacks are always the criminals, poor, in the background, asking questions and subordinate in Hollywood movies. Its an agenda. The China film administration is better than Hollywood. Hollwood really Hates Black on Blacks Love. Black men not allowed to have their own Hair & must be bald headed in every single Hollywood movie.
Thank you
Very interesting!!! THX
i fall asleep once, and this is what I wake up to? aha
HOW ME TOO
Absolutely love these sessions!!
Good talk
Oh...and finally Stephen makes the statement "That pattern of language has occurred before". No, I dont think so. The implication is that the probability given by the weightings leading to the next word can only have come from seeing the previous word and adjusting the weightings on that combination alone but thats not true. All the weightings including the ones leading to the next word have been influenced by many words from many sentences back propagating. I dont think its necessary for the next word to have been seen before for that pattern to emerge.
You are correct.
" I dont think its necessary for the next word to have been seen before for that pattern to emerge."
im not even sure what you are referring to.
large models learn concepts in the latent space, from patterns it observes. so when it predicts the word which is most likely for the whole pattern to be most likely. so it takes in account he whole context so it still is the case that the probability reflects data it has seen , even if it has not seen the exact sentence before
@@armin3057 Stephen's claim was that the next word chosen had been seen before as a word pair. It was towards the end of the talk but unfortunately I didn't take note of the timestamp. In other words an AI can never produce unseen word pairs like "moose feathers" because feathers never followed moose in its training set. I took note of the quote at the time "That pattern of language has occurred before"
Many thanks Stephen! I absolutely enjoyed the step by step introduction into the layers of the matter. However it is obvious that we are still on the technical/mechanical side of the whole journey. Still none is able to explain the concept and reality of infinity, or "1" or "0", but an honest struggle towards that wisdom may open new paths in learning and lead to brilliant discoveries.
What i find interesting is how similar an action potential and binary boolean values are so similar neuron during an action potential the nueron can be considered state is 1 and 0 when it is not. biological based memory. basically could start as bubble memory but in organic form. if there was a system that was able to interface with a neuron if the system was addressable it wouldn't matter what neuon migrated to what interface point the addressing would just need to be adjust to correct the nuerons connection. example neuron that migrated to connection point for the eye to correct instead of the thumb just change the port address.
Great video Stephen!
How does ChatGPT figure out if there is a dog in a cat suit while looking for cats? You started explaining but I don’t recall the answer.
Thank you, thank you.
Here is Wolfram knowing the exact number of words he has sent in email!! WOW!
This video is awesome.
I woke up to this 3 hours in?
This is a video of Stephen Wolfram preparing to make a video. Through laziness or distraction, he did not make the actual video.
The most frequently used word in the video is "um." This cleverly demonstates the point that if ChatGPT simply used the most frequently found word in every situation you would get very bad output.
I speculate it does have a “global plan” of what to say next, instead of one word at a time. It implicitly has a representation of the joint probability distribution of what’s to be continued… Prompting kind of bring out that distribution… which you can extract knowledge, in current its form, some piece of text (but may be other modalities in the near future). i was convinced by Sutskever’s take more.
So fascinating! I guess one major difference between the way the human brain and GPT handle language is that human brains use emotions to categorize objects and concepts… I wonder if it would be possible to teach GPT emotions, and what might be the result?
It doesn’t even know what it is saying, it just predicts the next word. So I’d say the biggest diFference would be knowing what you want to say instead of just guessing the next word on probability…
I still don’t understand how it’s able to generate a really elegant Boost ::spirit C++ parser, and a compiler/evaluator for a non trivial language I explained to it
In semiotics, there is "pragmatics," as well as syntactics and semantics. Communication is purposeful, about something or someone, rather than about predicting the next word. There is always some motivation behind utterances, which in turn are predisposed by "beliefs" and "values." So there is cognition, yes, but also affective and psychomotor skill domains involved. When "natural language processors" begin to present (and see) themselves as agents with a memory at least of their own experiences with you (and others), conversations will be more interesting. As more and more complex ideas and concepts (gained, say, from reading text with attention to those, rather than to words per se) can be mapped in some "meaning space" which can be transitioned in the direction of the allegorical, metaphorical, or hypothetical (and thus creative), the closer we will come to AGI.
"Your description of the different components of communication in semiotics is accurate. Pragmatics refers to the way language is used in context to achieve a specific goal or purpose, while syntax and semantics deal with the structure and meaning of language itself. And as you note, communication is influenced by a wide range of factors, including cognitive, affective, and psychomotor skills, as well as beliefs, values, and experiences.
Regarding natural language processors and their potential to evolve towards AGI, it is indeed an exciting area of research. As these systems become more sophisticated, they will be able to parse and understand complex ideas and concepts, as well as respond with more creativity and nuance. However, achieving true AGI will require advances in many other areas beyond natural language processing, including machine learning, robotics, and computer vision.
In any case, the development of more advanced natural language processing systems will undoubtedly have a major impact on the way we interact with technology and with each other, and may bring us closer to a future where machines and humans can communicate with each other in truly meaningful and productive ways."
So says ChatGPT when prompted by what you wrote which, itself, was a response prompted by your reflections on the issue. I do actually agree, and this strikes me as the essential thesis behind "The embodied mind", brilliantly explained in a series of books by the philosopher of Mind Andy Clark.
Also see the notion of implicatures. Pretty fascinating subject.
This is fascinating! Any chance there's a Cliff's notes or something?
Good morning everyone!
1:16:27 interesting discovery💡
Has anyone considered the parallel to how people compose writing, and a process of selecting a word that feels right in the context to add to the string, and head, once the theme is developed, to bringing it to conclusion? That's what AI is now learning to do, so it appears that the "temperature" is NOT random at all. That may be accidental, of course, but something is inputting a sense of feeling about a context (if you take my words for their root meanings and understandings).
Definitely points to the issues with attention economy. The bland predictive capability may very well be the correct percentage of use cases compiled, but then does not seem to correspond to a request for _interesting_ writing or dialog.
Editing is an Attention Economy factor that works on many levels. Think genre, or audience. Where is the story relationship between platypus and unicorn in terms of either believability or interest and how anchored in reality will A.I be trained to be if attention is the only requirement?
What comments would you make about notable observations between different culture's outputs given similar topics as inputs using ChatGPT 4?
its not trying its doing it and its smarter than it was when you posted this video.