To try everything Brilliant has to offer-free-for a full 30 days, visit brilliant.org/bycloud/ . You’ll also get 20% off an annual premium subscription! (I reuploaded this video cuz there was a pretty big mistake at 6:13, sorry notifications!)
What was the mistake? I watched the original one so I don't want to rewatch this just to know what's changed. So could you please say what the mistake was?
ironically I feel the same... There is absolutely no one I know that would even understand the beautiful in this concept comparing to human brain functions, the shear realization of how we perceives or own reality in the middle of caos. It's not total caos , it's just complex order.
This feels somehow similar how physics is based on a seemingly simple set of rules, yet creates impossibly complex situations/states. There must be a limited set of core rules a base model needs to learn to become an effective reasoner.
It is very similar. These rules are called elementary cellular automata and the naming scheme was developed by the physicist Stephen Wolfram. He has a theory to look for something analogous to explain complex physical phenomena. It has something to do with hypergraphs (I think sabine hossenfelder has a video on it). The connections between complexity theory, physics, machine learning, and intelligence is extremely interesting.
Bro cover “Fourier Heads” or belief state transformer. Fourier head research is interesting, I see a lot of value of integrating Gaussian mixture model principles into LM to better handle complex distributions. To be honest. One of my core principles is disentanglement, there’s a reason why we don’t see expected performance gains with multimodal data and reasoning in general, the model treats it as a single continuous sequence, the solution I’ve been working is multivariate next-token prediction, where each modality is considered, and yes everything can be treated as distinct modality, even reasoning via structured reasoning tokens, instead of for T = sequence length, it would be N x T, where N is the modality count, almost like a time series problem, obviously increases sequence memory for the sequence, I’ve seen clear benefits and think it’s the future. Why I don’t expect legit breakthroughs from any of the top players. No new ideas. Rather no divergent ideas. AGI will be created by divergent thinkers. Someone already releases an entropix I believe it’s called, which recreates o1-preview style outputs lol, just needs DPO to really get that juice out. We need to fund our divergent thinkers.
You did not mention that rule 110 is Turing complete. It may be not because of the edge of chaos, but because of the Turing completeness. All Turing complete systems behave generally similar to what they define as edge of chaos. Although you can construct some which hides this under apparent noise.
GPT-3 is already turning complete. It's a bad test. Edit: I mixed up Turing Test with Turing Complete. The above post makes no sense with it's context.
@@rmt3589 I believe you are confusing the Turing test aka the imitation game with Turing completeness. Turing completeness refers to whether something can be used to simulate a Turing machine, which makes it computationally universal.
@@owenpawling3956 nothing "passes" the Turing test, as it depends on the participants. But LLM are somewhat Turing complete if you assume infinite context to use as "tape".
@@4.0.4 LLMs long passed the Turing Test. Back when widespread conspiracy about Replika, which ran on GPT-3, being real people pretending to be AI. Now with so many fake AIs, people literally cannot tell what's human and what's AI, and keep getting surprised when one turns out to be the other. This is a perfect and natural Turing Test, and is being passed with flying colors.
Multi-step prediction has been known for a while to perform poorly. It's best to either predict probabilities and sample or predict a single timestep, recursing for more. LLMs are doing both.
It makes a lot of sense - to predict five steps in advance you’d need to predict one step 5 times in a row, and you only run the model once, so it’d have to make more shortcuts with each step’s prediction given how it has to fit it five times in the same space, accumulating errors in the process.
@@ZeroRelevance What you said is true, but I think it's still possible to implement multi-step prediction in a performant manner. It would depend on the specific problem, but generally I can see a lot of instances where timestep 5 does not rely on timestep 4 or maybe even 3, so there is no error to accumulate from those steps. Currently, one of the big drawbacks of predicting multiple steps (this is true of predicting multiple values for one step as well) is that the loss associated with each predicted value is only weakly accounted for and chasing the gradient for an average increase in performance is likely to make some of the predicted values worse. What we need are better feedback mechanisms and more channels. MoE are a sort of rudimentary solution to channels, but we're still relying on SGD of the whole network and in some instances manual freezing, which I don't really like from a technical standpoint. We need to be able to decompose problems into multiple losses, but that might not even be possible depending on the problem.
The "long tail" really explains why AI slop is so mid - it is literally the middle of the distribution of language. And you can see it in most models, even if different wording is used.
Great video! I first heard about the brain being on the edge of chaos by Artem Kirsanov, who has a great channel (of the same name) on computational neuroscience. I'm thinking, those models that are trying to predict 5 steps at once might be ultimately better, but they would require much longer training (and maybe size), and therefore computational resources, to start learning some complex patterns. It could probably be tested with models that try predicting 2 steps ahead...
It seems almost obvious that just chasing complexity horizons will lead to increasingly complex output potentials also, but to see how this can be done in practice, and related back to OG cell automata is very cool.
If you train a model to predict sin waves from discrete data points it will approximate sin. The more training it gets the closer the approximation. The model does not learn the sin function. The benefit of this is that with missing or incorrect data the model can still approximate the correct sin wave where a real calculation would be completely wrong.
Essentially, because the model learns to incorporate past states in its decision making, it becomes capable of better reasoning. AKA, this is just another case where transfer learning is truly an important key. Transfer learning aka generalization is also the reason why sample efficiency improves with training.
Read it almost 13 years ago and felt I was accessing some seriously Arcane shit that I was not ready to know. Amazing seeing it applied to transformers.
I have a playlist on my channel that has info about this, there are some good lecture sets linked in it, it's not in order so look through to find stuff, it's more of a directory. it's called advanced apaSiddhanta @@rmt3589
Kinda reminds me of how some fighter-jets are designed to be on the edge of instability allowing for more extreme maneuverability by controlling when to lose control
I think, AI should constantly assess the state of play and pick an appropriate strategy, but in order to do that, it should be able to self-reason. Something like o1 but with an extra degree of freedom.
So in the end, training from small problem to big problem like us humans is the best way to get better? Would the patterns equal to personality in terms of human?
I believe AIs should be taught similarly to humans. First they need to understand simpler artificial cases to perfection, sometimes giving them realworld cases. Like give them spinning cubes, then switch yo concave meshes, and only then to compound scenes
What about infinite language model Working on entire output at once and constantly improving it Like it was looking at babel libry with all possible word combinations and looking for the best one using hiking 7z algorithm or smth but it'll need beefy evaluation ai You could then use debuggers or language engines to prune combinations that are certain to fail
No one has and those that claim so are deluded. Mistaking advanced algorithms for reasoning or sentience is a problem among techbros and data scientists, although for very different reasons.
Well, AGI folks. Thats our future doodling chaotically like a baby... Good news: chaos was the answer and emergency is awesome. Bad news: that embryo can beat you at chess...
Am I alone in noticing that A.I. learns a lot like bio brains? Study, reflect and connect, repeat incrementally to make REAL gains in comprehension and knowledge!
To try everything Brilliant has to offer-free-for a full 30 days, visit brilliant.org/bycloud/ . You’ll also get 20% off an annual premium subscription!
(I reuploaded this video cuz there was a pretty big mistake at 6:13, sorry notifications!)
What was the mistake? I watched the original one so I don't want to rewatch this just to know what's changed. So could you please say what the mistake was?
@@ghulammahboobahmadsiddique8272 They had the wrong graphic/text for Class 4 Complex
@@ghulammahboobahmadsiddique8272 6:13 i showed class 3 chaos twice, i needa catch some sleep lol
@@ghulammahboobahmadsiddique82726:16 Here class 3 and 4 had the same captions and images. But now he fixed it
Your editor still missed it in like two spots lmao
such a beautiful knowledge, i will not use it anywhere and not talk about it with anybody.
I will! Gotta make my own AGI somehow.
@@rmt3589that would be easier than actually making friends
ironically I feel the same... There is absolutely no one I know that would even understand the beautiful in this concept comparing to human brain functions, the shear realization of how we perceives or own reality in the middle of caos. It's not total caos , it's just complex order.
These videos are so nice for someone like me with no technical background in machine learning. Thank you and please keep making more!
Careful, last time I said "pls never change" he changed a week later.
I fell asleep while listening to this video and dropped my phone on my wife's head and now she's mad.
Wow
Wow
Wow.
Wow
Wow
This feels somehow similar how physics is based on a seemingly simple set of rules, yet creates impossibly complex situations/states.
There must be a limited set of core rules a base model needs to learn to become an effective reasoner.
It is very similar. These rules are called elementary cellular automata and the naming scheme was developed by the physicist Stephen Wolfram. He has a theory to look for something analogous to explain complex physical phenomena. It has something to do with hypergraphs (I think sabine hossenfelder has a video on it). The connections between complexity theory, physics, machine learning, and intelligence is extremely interesting.
Wolfram would love this
Bro cover “Fourier Heads” or belief state transformer. Fourier head research is interesting, I see a lot of value of integrating Gaussian mixture model principles into LM to better handle complex distributions.
To be honest. One of my core principles is disentanglement, there’s a reason why we don’t see expected performance gains with multimodal data and reasoning in general, the model treats it as a single continuous sequence, the solution I’ve been working is multivariate next-token prediction, where each modality is considered, and yes everything can be treated as distinct modality, even reasoning via structured reasoning tokens, instead of for T = sequence length, it would be N x T, where N is the modality count, almost like a time series problem, obviously increases sequence memory for the sequence, I’ve seen clear benefits and think it’s the future. Why I don’t expect legit breakthroughs from any of the top players. No new ideas. Rather no divergent ideas. AGI will be created by divergent thinkers. Someone already releases an entropix I believe it’s called, which recreates o1-preview style outputs lol, just needs DPO to really get that juice out. We need to fund our divergent thinkers.
very carefully , and with compassion and wisdom
You did not mention that rule 110 is Turing complete. It may be not because of the edge of chaos, but because of the Turing completeness.
All Turing complete systems behave generally similar to what they define as edge of chaos. Although you can construct some which hides this under apparent noise.
GPT-3 is already turning complete. It's a bad test.
Edit: I mixed up Turing Test with Turing Complete. The above post makes no sense with it's context.
@@rmt3589 I believe you are confusing the Turing test aka the imitation game with Turing completeness. Turing completeness refers to whether something can be used to simulate a Turing machine, which makes it computationally universal.
@@owenpawling3956 nothing "passes" the Turing test, as it depends on the participants. But LLM are somewhat Turing complete if you assume infinite context to use as "tape".
@@owenpawling3956 I was. You are correct. I'm also technically not wrong, but did not convey what I wanted to.
I will go fix my post.
@@4.0.4 LLMs long passed the Turing Test. Back when widespread conspiracy about Replika, which ran on GPT-3, being real people pretending to be AI.
Now with so many fake AIs, people literally cannot tell what's human and what's AI, and keep getting surprised when one turns out to be the other. This is a perfect and natural Turing Test, and is being passed with flying colors.
2:31 Lack of words like "skibidi"
Multi-step prediction has been known for a while to perform poorly. It's best to either predict probabilities and sample or predict a single timestep, recursing for more. LLMs are doing both.
It makes a lot of sense - to predict five steps in advance you’d need to predict one step 5 times in a row, and you only run the model once, so it’d have to make more shortcuts with each step’s prediction given how it has to fit it five times in the same space, accumulating errors in the process.
@@ZeroRelevance What you said is true, but I think it's still possible to implement multi-step prediction in a performant manner. It would depend on the specific problem, but generally I can see a lot of instances where timestep 5 does not rely on timestep 4 or maybe even 3, so there is no error to accumulate from those steps.
Currently, one of the big drawbacks of predicting multiple steps (this is true of predicting multiple values for one step as well) is that the loss associated with each predicted value is only weakly accounted for and chasing the gradient for an average increase in performance is likely to make some of the predicted values worse.
What we need are better feedback mechanisms and more channels. MoE are a sort of rudimentary solution to channels, but we're still relying on SGD of the whole network and in some instances manual freezing, which I don't really like from a technical standpoint. We need to be able to decompose problems into multiple losses, but that might not even be possible depending on the problem.
The "long tail" really explains why AI slop is so mid - it is literally the middle of the distribution of language. And you can see it in most models, even if different wording is used.
Great video! I first heard about the brain being on the edge of chaos by Artem Kirsanov, who has a great channel (of the same name) on computational neuroscience. I'm thinking, those models that are trying to predict 5 steps at once might be ultimately better, but they would require much longer training (and maybe size), and therefore computational resources, to start learning some complex patterns. It could probably be tested with models that try predicting 2 steps ahead...
It seems almost obvious that just chasing complexity horizons will lead to increasingly complex output potentials also, but to see how this can be done in practice, and related back to OG cell automata is very cool.
What you mentioned reminds me of curriculum learning from RL. Start off training easy then gradually make it harder.
If you train a model to predict sin waves from discrete data points it will approximate sin. The more training it gets the closer the approximation. The model does not learn the sin function. The benefit of this is that with missing or incorrect data the model can still approximate the correct sin wave where a real calculation would be completely wrong.
very cool and interesting. I had some similar intuition and I'm glad you discussed this paper. Great work.
Would love for you to cover the things going on around Entropix!
why do i perfectly understand some of your videos, and at the same time get absolutely confused by others?? 😭😭
Essentially, because the model learns to incorporate past states in its decision making, it becomes capable of better reasoning. AKA, this is just another case where transfer learning is truly an important key. Transfer learning aka generalization is also the reason why sample efficiency improves with training.
Stephen wolfram a New Kind of Science and itscomputational theory applied to training models is the way to go, me thinks
Read it almost 13 years ago and felt I was accessing some seriously Arcane shit that I was not ready to know. Amazing seeing it applied to transformers.
Thank you for your hard work. ❤ 🙏
This is basically just information theory…
Looking it up. Ty.
I have a playlist on my channel that has info about this, there are some good lecture sets linked in it, it's not in order so look through to find stuff, it's more of a directory. it's called advanced apaSiddhanta @@rmt3589
@@rmt3589 I made a playlist about information theory, complexity, emergence etc
not rly
@ yeah, it’s exactly the same thing, read through the original information theory paper from like 70 years ago
Great video as always, edge of chaos where understanding ends)
Kinda reminds me of how some fighter-jets are designed to be on the edge of instability allowing for more extreme maneuverability by controlling when to lose control
It's all about the "day daa" 😂
Need someone to train an LLM on NBA scores. pretty simple but also trends like scoring effects comebacks run and momentum
The collective human endeavor needs to mature before AGI can emerge in classical data its a simple fundamental that people dont understand
I think, AI should constantly assess the state of play and pick an appropriate strategy, but in order to do that, it should be able to self-reason. Something like o1 but with an extra degree of freedom.
Intelligence is the edge of chaos in a map territory feedback loop
Been there, done that
fascinating ideas 💡
So in the end, training from small problem to big problem like us humans is the best way to get better? Would the patterns equal to personality in terms of human?
great video thx
One extra point for the "Critical brain hypothesis", to become a factual theory.
great vid
The proper way to train AGI is to put it in the square hole
?
@lionelmessisburner7393 everything goes in the square Hole
are you stupid
I believe AIs should be taught similarly to humans. First they need to understand simpler artificial cases to perfection, sometimes giving them realworld cases. Like give them spinning cubes, then switch yo concave meshes, and only then to compound scenes
Whichever supports my interests.
I always say AI is evolving fully backwards, vision is one of basics after brain before language and logic. We are doing it completed in reverse.
We’ll get there though. Not TOO far away I think
fascinating
What about infinite language model
Working on entire output at once and constantly improving it
Like it was looking at babel libry with all possible word combinations and looking for the best one using hiking 7z algorithm or smth but it'll need beefy evaluation ai
You could then use debuggers or language engines to prune combinations that are certain to fail
I have not seen single AGI yet.
How so
No one has and those that claim so are deluded.
Mistaking advanced algorithms for reasoning or sentience is a problem among techbros and data scientists, although for very different reasons.
Well, AGI folks. Thats our future doodling chaotically like a baby... Good news: chaos was the answer and emergency is awesome. Bad news: that embryo can beat you at chess...
The year is 2029 and the first AGI was raised Catholic
LLMs meet game of life 🤯
Garbage in, garbage out
Give me a pen and paper and i'll teach AGI anything and everything.
Intelligence is fragile that's why it took so long to emerge
I'm hella suss about Intelligence at the Edge of Chaos some how you make a violine graph for complex rules despite having 2 data points.
The Pandoras box of programming?
RLHF is the thing that makes them slop. base models are still way better at good writing
what did training on rule 30 do?
not sierpinskis triangle
Where's the 34th rule?
Kazuma licks Aqua holes, by the way Colette from brawl stars have tasty legs 😋
Or you can just understand intelligence down to a fundamental level.
IN PUBLIC
self reflexive knowledge graphs... llms are a piece, the reasoning needs to be in a separate non-black boxed system. Neuro-symbolic ftw
What does rule 46 do? 🐾
I can't understand
whoa whoa dude slow down.... need another video please this time slower.... please
You don’t…😂
dream
Am I alone in noticing that A.I. learns a lot like bio brains? Study, reflect and connect, repeat incrementally to make REAL gains in comprehension and knowledge!
Keep trying to break the wall set by God. I can’t wait to see the next Magic
why does this video sound like it's AI generated tho
AI reaches bycloud level of intelligence. - 2024
🐢🐢🐊🦖
Thumbs down for the singing voice. It’s annoying. Ending with a high pitch like if Becky is gossiping with Amanda all day long