SLIDES SO YOU CAN FOLLOW ALONG DURING THE TALK: www.dropbox.com/scl/fi/sys3iasc63lgj8lm5t0ld/JONAS_SLIDES.pdf?rlkey=ak6ir61a2pyhrfuwyvgrdvq66&st=9cloopv9&dl=0
A very keen and interested young man that clearly expressed his passion at providing a solution. I really hope he continues to express willingness to share his thoughts and beliefs with others and that he fulfills the potential that he shows so fluently.
Local and Context based Learning is one area where more and more people should work. It may solve the huge energy-need problem to run LLMs and can have various applications from Query Retrieval in DBMS to Edge Learning devices.
😂 I was literally geeking out about this paper two weeks ago. Entropy-based everything is clearly the answer. Data selection, pretraining sampling, fine-tuning, inference…everything should be driven by entropy, because in real-life there are no gold labels, only Best guesses based on evidence and accepted axioms.
Color me shocked that embedding a model in a closer to real time than training and then freezing that state was an effective way to for that model to learn better I mean it is cool that this approach is being taken serious and being actifely researched, but within the circles of debating AI sentience, consciousnes, reasoning etc. This is not a novel point of speculation about the difference between machine intelligence and organic intelligence. I have been suggesting that during training for a model this analgous to the continuous experience of organic agenthood. I don't find it surprising that there is something like a hyperwebster scaling pattern that seems to emerage at a more "inuitive" level.
@@memegazer is it a fair summation that it's attempting to solve extrapolation, generalisation, abstract reasoning, with more of the same, more interpolation?
Oh man, my eyes lit up when I saw that oil and paint magnetic ferrofluid animation! I literally saved that exact videoclip from RUclips a few months ago because it was just sooooo 4k, awesome!!!
Warping the state space to experience, such as by K-means clustering, is how knowledge is applied efficaciously ("system 2" thinking). Interesting talk, thank you for sharing.
This TTT is very exciting. Only a matter of time (or coincidence) the the OS community achieves as good capability as internal OAI. Were you able to see the robotics they've been working on, the anymal bot? I like their technical video updates they put out, but always wish there was an accompanying interview.
This is connected to the power consumption of AGI. Scaling laws of progress will be based on electricity flowing through systems. Limited power is the main factor now
It's not going to be easy to aggregate multiple queries into a batch and run inference with this technique. The TTT would need different fine tuning data for each query. It might be very good for small models running on personal desktop machines.
Does that mean it can be unrelated and non-redundant data for local training? It’s hard to see how that can help with inference. If not then, are you not imposing an inductive prior by your selection of data? And your bet effectively becomes the inductive prior.
When he says that the learned information becomes part of its beliefs, technically how does that work? I’m really stuck on how providing new data “gets into” the model.
Working on ARC, Francois gave us some hints about this approach in his University tour. Now he's left Google to work on ....? Something that we learned in ARC-AGI Challenge we hope!!!
Is the answer to solve extrapolation and generalisation, interpolate bettter? Seems like this will get us to GenAI saturation faster, not take us beyond?
Really? Why? I’m not asking as an “AGI can’t happen” kind of thing; I don’t see any fundamental limit to prevent it. But I don’t see why these results, as neat as they are, would make you feel that way. Edit: Ah, were you responding to like 28:08 in particular ? About selecting the most informative data in order to inform belief about the situation at hand?
@@drdca8263 he's probably new-ish to the machine learning sphere. I was once like that in the past everytime a new technique or algorithm reported 'performance improvement'.
How much is it really about just teaching what it needs to know at the last minute so that it can perform well on the expected question? Are we not simply engineering an answer given that if the local training is done on unrelated data then the result will be less good.
two serious problems for this paper 1. how do you know using embedding Nearest Neighbors can find the nearest data for training , embedding is not the right measurement especially for complex multiple dimensional similarity requirement 2. what's the business application scenarios , this is method is complicated and most of business application has pretty stabilized tasks , why I need to hold a huge distributed index for highly flexible tasks ? I do not see much high potential from this paper .
Me too, however, about 1Gigabyte of DNA is all the sperm needed to swim to the egg and build a "brain" and "dumb animals" like deer are superior to humans by learning to walk faster in hours instead of months.
Machine learning used to be about maths. Now it is about hardware, hype, opinions, and moving big software blocks around. A generation has come into being that lacks the basic ability to make the kinds of jumps in theory made in the 1980s and 1990s.
😂exactly. Spent time listening to the presentation and I heard nothing important. I’m trying to understand the mathematical framework behind what he was presenting. The easiest way to convey your message in machine learning is to use mathematics, algorithms etc. If I don’t hear any of that, I can’t take the presentation seriously.
The market messed up in anthropremorphiszing AI, when ai is really more framed like a space. We got that wrong and it has led, and leading us, to bad places.
@ ai isn’t thinking or doing anything with knowledge, it’s closer to a search. So all the comparisons of consciousness are because of this projection we put onto it. You would never think the internet or a computer is conscious, but for some reason we spend a lot of effort and time discussing how evil AI could be once it’s smart enough. Also the design of the products and uses are limited by this human like role we put onto ai.
@@KevinKreger Yeah that's good feedback, we felt that the video would be more engaging if it didn't have powerpoint presentation vibes and tried to make it more "real" and make you feel like you were in the room with the speaker. We tried various ways of showing the slide and the speaker at the same time and the camera angles didn't work unfortunately (it was better on the AGI conference talks). We will take this feedback onboard
SLIDES SO YOU CAN FOLLOW ALONG DURING THE TALK: www.dropbox.com/scl/fi/sys3iasc63lgj8lm5t0ld/JONAS_SLIDES.pdf?rlkey=ak6ir61a2pyhrfuwyvgrdvq66&st=9cloopv9&dl=0
@@MachineLearningStreetTalk 🤩 thanks!
A very keen and interested young man that clearly expressed his passion at providing a solution. I really hope he continues to express willingness to share his thoughts and beliefs with others and that he fulfills the potential that he shows so fluently.
Local and Context based Learning is one area where more and more people should work. It may solve the huge energy-need problem to run LLMs and can have various applications from Query Retrieval in DBMS to Edge Learning devices.
😂 I was literally geeking out about this paper two weeks ago. Entropy-based everything is clearly the answer. Data selection, pretraining sampling, fine-tuning, inference…everything should be driven by entropy, because in real-life there are no gold labels, only
Best guesses based on evidence and accepted axioms.
Color me shocked that embedding a model in a closer to real time than training and then freezing that state was an effective way to for that model to learn better
I mean it is cool that this approach is being taken serious and being actifely researched, but within the circles of debating AI sentience, consciousnes, reasoning etc.
This is not a novel point of speculation about the difference between machine intelligence and organic intelligence.
I have been suggesting that during training for a model this analgous to the continuous experience of organic agenthood.
I don't find it surprising that there is something like a hyperwebster scaling pattern that seems to emerage at a more "inuitive" level.
@@memegazer Nicely put!
@@memegazer is it a fair summation that it's attempting to solve extrapolation, generalisation, abstract reasoning, with more of the same, more interpolation?
What about the chips as well? Oh yeah thermodynamic-computation is becoming a thing.
Why do you think is not entropy-based right now? entropy is tied to the process of prediction, even if its not explicitly named.
Superb presentation but it would be better to show the slides while the presenter is talking may be as a separate window.
Do agree, we need the slides, we don’t need to watch the guy. Or maybe putting the guy on a reduced window on a corner would be better
Oh man, my eyes lit up when I saw that oil and paint magnetic ferrofluid animation! I literally saved that exact videoclip from RUclips a few months ago because it was just sooooo 4k, awesome!!!
Yesss I was waiting for this. Specifically looked for it in the channel last night and realized the paper only came out a week ago
Warping the state space to experience, such as by K-means clustering, is how knowledge is applied efficaciously ("system 2" thinking). Interesting talk, thank you for sharing.
This TTT is very exciting. Only a matter of time (or coincidence) the the OS community achieves as good capability as internal OAI. Were you able to see the robotics they've been working on, the anymal bot? I like their technical video updates they put out, but always wish there was an accompanying interview.
Top material as usual.
This is connected to the power consumption of AGI. Scaling laws of progress will be based on electricity flowing through systems. Limited power is the main factor now
Wow, the time stamps with info are really helpful.
Some of the techniques he's talking about I've tried even on very small data and have seen some interesting results!
where can I learn more about test time training? I am not familiar with this concept
It's not going to be easy to aggregate multiple queries into a batch and run inference with this technique. The TTT would need different fine tuning data for each query. It might be very good for small models running on personal desktop machines.
Does that mean it can be unrelated and non-redundant data for local training? It’s hard to see how that can help with inference. If not then, are you not imposing an inductive prior by your selection of data? And your bet effectively becomes the inductive prior.
It's konda hard to follow when we don't see what he's talking about. Show the slides instead of the presenter. 🙏
When he says that the learned information becomes part of its beliefs, technically how does that work?
I’m really stuck on how providing new data “gets into” the model.
Now this is scientific research!
can this be used with entropix sampler ?
good we are getting at the person of interest level soon :)
At 00:15 this is Geneva not Zurich 😂😂
Same city, one is written in French the other in German 🫠
@pedrogorilla483 are you joking ? Not sure what you mean?
Working on ARC, Francois gave us some hints about this approach in his University tour. Now he's left Google to work on ....? Something that we learned in ARC-AGI Challenge we hope!!!
Such a cool atmosphere!!
Is the answer to solve extrapolation and generalisation, interpolate bettter? Seems like this will get us to GenAI saturation faster, not take us beyond?
not about extrapolation / interpolation, so much about local vs global
@MachineLearningStreetTalk local manifold interpolation rather than global manifold interpolation? I think I must be missing something?
That quote from Vapnik sounds like engineering at best and hacking at worst.
the video quality is amazing, thx, but plz make the slide first class citizen for this kind of talk. what's the point of 99% speaker headshot?
I can't concentrate on what this dude is saying because he's so annoyingly handsome, jesus christ.
Nice complements to my recent active learning course :)
I'm feeling the AGI while watching this...
Really? Why? I’m not asking as an “AGI can’t happen” kind of thing; I don’t see any fundamental limit to prevent it.
But I don’t see why these results, as neat as they are, would make you feel that way.
Edit: Ah, were you responding to like 28:08 in particular ? About selecting the most informative data in order to inform belief about the situation at hand?
@@drdca8263 he's probably new-ish to the machine learning sphere. I was once like that in the past everytime a new technique or algorithm reported 'performance improvement'.
@@macchiato_1881 nah, I've been around since 2016, if that counts as long to you
@@drdca8263 Yeah I think similar methods will likely play a large role in further progress towards more general agents
@@drdca8263yes that is one point I'm happy to see. One of the biggest obstacles to AGI is how little existing systems can adapt during test time.
How much is it really about just teaching what it needs to know at the last minute so that it can perform well on the expected question? Are we not simply engineering an answer given that if the local training is done on unrelated data then the result will be less good.
Historically, betting against technology advancement, just you couldnt imaging how exactly it could work, was always a bad idea
two serious problems for this paper 1. how do you know using embedding Nearest Neighbors can find the nearest data for training , embedding is not the right measurement especially for complex multiple dimensional similarity requirement 2. what's the business application scenarios , this is method is complicated and most of business application has pretty stabilized tasks , why I need to hold a huge distributed index for highly flexible tasks ?
I do not see much high potential from this paper .
eli5 please
There is no wall
I am Compute poor
Me too, however, about 1Gigabyte of DNA is all the sperm needed to swim to the egg and build a "brain" and "dumb animals" like deer are superior to humans by learning to walk faster in hours instead of months.
Machine learning used to be about maths. Now it is about hardware, hype, opinions, and moving big software blocks around. A generation has come into being that lacks the basic ability to make the kinds of jumps in theory made in the 1980s and 1990s.
OK boomer
i disagree, the scaling does emerge something.
Um.. no.. it really isn't. Take a read of any of the major papers in the last 10 years.. you will see some pretty serious math in play.
😂exactly. Spent time listening to the presentation and I heard nothing important. I’m trying to understand the mathematical framework behind what he was presenting. The easiest way to convey your message in machine learning is to use mathematics, algorithms etc. If I don’t hear any of that, I can’t take the presentation seriously.
@@marilynlucas5128 I found the git using google. and both papers are linked there. go for it.
For me the presentation had too much jargons that it was difficult to follow and understand.
Exactly. I really didn't get what he said all this time
dawg you gotta be more clear about ETH-- I didon't know it was a university. I was about to call everone here a grifter lmao
The market messed up in anthropremorphiszing AI, when ai is really more framed like a space. We got that wrong and it has led, and leading us, to bad places.
Much of the analysis uses our human thinking process to imagine how the language model works or should work, so what is your point?
@ ai isn’t thinking or doing anything with knowledge, it’s closer to a search. So all the comparisons of consciousness are because of this projection we put onto it. You would never think the internet or a computer is conscious, but for some reason we spend a lot of effort and time discussing how evil AI could be once it’s smart enough. Also the design of the products and uses are limited by this human like role we put onto ai.
editing is ridiculously bad
Yeah.. they really needed to cut to the equations when he was referring to them. A bit if highlighting wouldn't go astray either.
Sorry, we are expanding editing team and upskilling new starters - would be super helpful if you didn’t mind giving more detailed feedback.
@@MachineLearningStreetTalk When he is talking about and gesturing to a slide we should see it more-or-less in the same time frame.
@@KevinKreger Yeah that's good feedback, we felt that the video would be more engaging if it didn't have powerpoint presentation vibes and tried to make it more "real" and make you feel like you were in the room with the speaker. We tried various ways of showing the slide and the speaker at the same time and the camera angles didn't work unfortunately (it was better on the AGI conference talks). We will take this feedback onboard
@@KevinKreger www.dropbox.com/scl/fi/sys3iasc63lgj8lm5t0ld/JONAS_SLIDES.pdf?rlkey=ak6ir61a2pyhrfuwyvgrdvq66&st=9cloopv9&dl=0 here are the slides