Q-Star 2.0 - AI Breakthrough Unlocks New Scaling Law (New Strawberry)

Поделиться
HTML-код
  • Опубликовано: 22 дек 2024

Комментарии • 454

  • @avi7278
    @avi7278 Месяц назад +369

    Mon: AI is doomed, Tues: AGI is here! .... My god.

    • @SimonNgai-d3u
      @SimonNgai-d3u Месяц назад +64

      AI news is basically cypto trading

    • @drwhitewash
      @drwhitewash Месяц назад +37

      Agi isn't here, just hype for investors.

    • @itsARSX
      @itsARSX Месяц назад +15

      yeah all these youtubers are just spitting out every news they see 🤢

    • @ImranHossain-by6nk
      @ImranHossain-by6nk Месяц назад +22

      The only thing scarier than AI companies are AI RUclipsrs

    • @とふこ
      @とふこ Месяц назад +3

      2010s "bitcoin to moon"
      2020s "AI to moon"

  • @alcott122
    @alcott122 Месяц назад +471

    The concept in "Mastering the AI Money Game" book completely explains market. Trend is changing with AI

  • @ethans4783
    @ethans4783 Месяц назад +79

    If the entirety of the internet is not enough data to improve your model, then maybe its an architecture problem. And this looks to be a novel approach to try and overcome that, awesome!

    • @antonystringfellow5152
      @antonystringfellow5152 Месяц назад

      The aim here is to create an AGI.
      Using more data during training does not move a model towards AGI at all. It increases the knowledge base and knowledge is not intelligence, it's just information. Put another way, you can no more create intelligence using data stored in binary code than you can by carving 1s and 0s onto rocks. Intelligence is the ability to understand, not recall stored information, and this requires the ability to form useful models and make predictions based on those models.

    • @mrd6869
      @mrd6869 Месяц назад +17

      It's is an architecture problem.Does your brain need the entire internet to train? Nope. Its more capable and super energy efficient. They just need to research a new layout

    • @TravisChalmers
      @TravisChalmers Месяц назад

      "MAYBE" 😂

    • @guilhermehx7159
      @guilhermehx7159 Месяц назад +3

      Exactly. Its funny how no one talks about architecture. Only scaling, energy and data

    • @codelapiz
      @codelapiz Месяц назад

      Its a bit more complicated. Your brain is sort of pretrained on billions of years of labeled data(reproduced or not). Its only finetuned on your experiences in your life. And even they are very abundant and diverse. And you got billions of extremely well labeled datapoints by watching peoples emotions as you grew up.
      Lets not pretend humans magically get this intelligent. A lot of us dont even get intelligent at all. And if we lived in isolation our whole lifewe would not be nearly as smart as even gpt 3. no matter how much internet we read.

  • @gnosis33-real
    @gnosis33-real Месяц назад +42

    I don't think we really have an equivalent until we create a model with online learning vs Pre-trained. It needs to be able to continue to update the network with every interaction.

    • @mikeschwarz4588
      @mikeschwarz4588 Месяц назад +6

      It feels like this is a step in that direction… right?

    • @adolphgracius9996
      @adolphgracius9996 Месяц назад +1

      Chill out, that's how you get skynet, hell naah

    • @gnosis33-real
      @gnosis33-real Месяц назад

      @adolphgracius9996 😅

    • @gnosis33-real
      @gnosis33-real Месяц назад

      @@mikeschwarz4588 id agree with that

    • @Mihalk777
      @Mihalk777 Месяц назад +2

      connecting AGI to the internet
      what could go wrong

  • @NoHandleToSpeakOf
    @NoHandleToSpeakOf Месяц назад +114

    This relies on the set of transformation rules that preserve the underlying concept. Coming up with such set is far from trivial. They tricked the benchmark with hand-crafted set but this does not scale.

    • @GrindThisGame
      @GrindThisGame Месяц назад +9

      I was wondering how the "generated" similar test set examples without knowing exactly how each one is solved. Seems like overfitting.

    • @MaciejTrebacz
      @MaciejTrebacz Месяц назад +11

      This is true, and it feels like cheating. At the same time the idea of fine-tuning the model on the fly based on the inputs sounds very promising and I bet someone will come up with a more generalized way of doing that.

    • @d_b_
      @d_b_ Месяц назад +1

      @@MaciejTrebacz If the state was held, maybe? If its not catastrophic to the rest of the models abilities. Still not understanding what the use of it is if the weights are reset after inference.

    • @wurstelei1356
      @wurstelei1356 Месяц назад +3

      I think: The reason why you have to open source is so they can see, if you cheat. BTW OpenAI o1 and Claude shouldn't be on the list for that reason.

    • @heyman620
      @heyman620 Месяц назад +3

      @@GrindThisGame Ho, they know. For example, they know how you can rotate it. Embarrassing hacking, 0 value for this work. This benchmark does not work, although the idea is nice. The last winner was some stupid search and heuristics if I recall correctly, zero "intelligence". This idea clearly works only for toy problems.

  • @tristanreid5770
    @tristanreid5770 Месяц назад +75

    While I'm excited about the idea of TTT, the whole idea of the benchmark is that it's trying to test ability to generalize - specifically the ability to solve a problem you haven't seen before and thereby don't know the domain. What this method is doing is generating new test data by applying geometric transformations - but that's because the writers of this paper knew that those puzzles remain solvable under those geometric transformations! They're thereby leveraging domain knowledge to give the LLM an edge. I'm not saying I don't think it's a valuable approach - just that this paper more exposes a weakness in the assumptions of the benchmark rather than definitely demonstrating that this approach leads to generalization.

    • @Ikbeneengeit
      @Ikbeneengeit Месяц назад +8

      Thanks for explaining this. I was indeed wondering how the model can produce correct synthetic data if it doesn't know the solution to the question already.

    • @HansKonrad-ln1cg
      @HansKonrad-ln1cg Месяц назад +4

      here's a conversation i just had with chatgpt:
      i have a simple problem here: peter has 8 apples. sue has twice as many apples as peter. how many apples does sue have? i want you to list all the things that could be changed in this problem such that it remains the same problem.
      chatgpt:
      Sure, let's explore the variables that can be modified while keeping the core structure of the problem intact:
      Names of the People: Change "Peter" and "Sue" to any other names.
      Example: John has 8 apples. Mary has twice as many apples as John.
      Number of Apples: Change the number of apples Peter has, and adjust Sue's apples accordingly.
      Example: Peter has 10 apples. Sue has twice as many apples as Peter.
      Item: Change the apples to any other item.
      Example: Peter has 8 oranges. Sue has twice as many oranges as Peter.
      Ratio: Change the ratio from twice to another multiplicative relationship.
      Example: Peter has 8 apples. Sue has three times as many apples as Peter.
      Context: Change the context or setting in which the problem takes place.
      Example: Peter found 8 seashells. Sue found twice as many seashells as Peter.
      Time Element: Add or change the time element without altering the fundamental relationship.
      Example: Yesterday, Peter collected 8 apples. Today, Sue collected twice as many apples as Peter.
      By adjusting these elements, the basic structure of the problem-one person having a certain number of items and another person having a multiple of that number-remains unchanged.
      Anything else you'd like to explore?

    • @corvox2010
      @corvox2010 Месяц назад

      @@HansKonrad-ln1cg Just because I dont understand the patterns I see, doesnt mean I can solve something I never solved with similar examples. Sooner or later will spot the pattern.

    • @HansKonrad-ln1cg
      @HansKonrad-ln1cg Месяц назад

      the benchmark doesnt assume anything, it has no weakness. if you come up with anything that solves the private, closed set, you are done. there are no other rules. you can put in as much domain knowledge that you think you might have no problem. your solution doesnt even have to be intelligent or ai or smart or anything. it just turns out, that it is not so easy.

    • @Sven_Dongle
      @Sven_Dongle Месяц назад +5

      Yeah, they used specific sets of geometric transforms to game the system as described in the appendix, hardly a "general intelligence" approach it seems:
      Table 3: We provide the augmentations use in our TTT procedure with their function signature and description.
      Augmentation Name Description
      Rotate(90) Rotates a grid 90 degrees.
      Rotate(270) Rotates a grid -90 degrees.
      Rotate(180) Rotates a grid 180 degrees.
      Flip(0) Flips a grid horizontally
      Flip(1) Flips a grid vertically
      Reflect(0, reverse=True) Flips a grid horizontally and prepend to the left of
      the original grid
      Reflect(1, reverse=True) Flips a grid vertically and prepend to the above of
      the original grid
      Reflect(0, reverse=False) Flips a grid horizontally and append to the right of
      the original grid
      Reflect(1, reverse=False) Flips a grid vertical and append to the left of the
      original grid
      RandomTranslateXY() Shifts a grid randomly both in horizontal and vertical
      directions. The maximum shift size is 4
      Transpose() Reflect a grid on diagonal
      IncreaseResolution(2) Upscale the grid by interleaving elements in both
      horizontal and vertical directions
      IncreaseHeight(2) Upscale the grid by interleaving elements in vertical
      direction
      IncreaseWidth(2) Upscale the grid by interleaving elements in horizontal direction
      Chain([Rotate(90),IncreaseResolution(2)]) Sequential application of Rotate(90) and
      IncreaseResolution(2)
      Chain([Rotate(270),IncreaseResolution(2)]) Sequential application of Rotate(270) and IncreaseResolution(2)
      Chain([Rotate(180),IncreaseResolution(2)]) Sequential application of Rotate(180) and IncreaseResolution(2)
      Chain([Flip(0),IncreaseResolution(2)]) Sequential application of Rotate(180) and IncreaseResolution(2)
      Chain([Flip(1),IncreaseResolution(2)]) Sequential application of Rotate(180) and IncreaseResolution(2)
      Chain([Transpose(),IncreaseResolution(2)]) Sequential application of Rotate(180) and In

  • @nilsmach6399
    @nilsmach6399 Месяц назад +73

    What they’re doing is simply adjusting a generalized model on the fly to fit the problem domain. After this, the model has seen thousands more geometric figures, and that’s enough to solve these toy problems. But this doesn’t take us any further toward solving a more complex problem, especially if there’s more to it than just understanding the domain language. The result will be essentially the same as training the model specifically for the test problems beforehand.

    • @leonwinkel6084
      @leonwinkel6084 Месяц назад +24

      I agree but the main point here is the technique in my opinion. Adjusting an llm on the go is exactly the same thing our brain does. So the technique used here is surely one step further towards agi.

    • @consciouscode8150
      @consciouscode8150 Месяц назад +9

      I think it's more of a first step. "Training" using synthetic data from geometric transformations is a discrete approximation of visual reasoning in the mind's eye, so the next step would be to take this technique and train a model to create the sorts of transformations which yield better predictions.

    • @cyberprotec
      @cyberprotec Месяц назад +3

      This technique sounds like fine-tuning to me, though.

    • @nilsmach6399
      @nilsmach6399 Месяц назад +1

      @@consciouscode8150 This is my point: If they could somehow determine which transformation leads to better results given the problem, then both of you would be right, and this would be a significant step toward AGI. However, what they do instead is simply multiply the relevant area of the model without actually changing this area to direct it toward a solution.

    • @pon1
      @pon1 Месяц назад +4

      You are right, but the toy problems have been a real pain in the bottom. It can do the most advanced stuff, but stumbles on the most basic. It's a step in the right direction, now we just need to make it learn permanently, and that is much more difficult, because the weights already there can't simply be replaced with the "lora" that it has learned, cause that would break other things. It's a difficult problem, but I believe it can be solved.

  • @vio_tio12
    @vio_tio12 Месяц назад +5

    love these paper reviews with a simplification of concepts. Keep doing them please!

  • @Justin_Arut
    @Justin_Arut Месяц назад +9

    I can see this utilized as scaled up MoE. With multiple agents, each of which has self-trained in this manner in its own domain, a central manager could call up the appropriate expert for a given task, for increased overall efficiency.

  • @alfredomoreira6761
    @alfredomoreira6761 Месяц назад +14

    This is just doing Lora for each test and called it TTT. It's too much compute and they don't publish the ttt time to train and asjust Lora parameters. It's a setup that only work to pass ARC.If ARC has scoring based on time to infer (like IQ tests), this technique would get bad results.
    In the end it's not practicable.

  • @evidenceX
    @evidenceX Месяц назад +52

    Soon we going to have insane models open source

  • @Tetsujinfr
    @Tetsujinfr Месяц назад +14

    Cool, but how does the model know which transform to apply at training - test time? Looks to me the model developpers knew which transforms are relevant to achieve good results for ARC, so in a way they supervised the training. Now maybe those transforms generalise beyond problems like those in the ARC benchmark, but I do not this they have proved this in the paper. Do I miss something?

    • @rolandwfleming
      @rolandwfleming Месяц назад +1

      I was wondering exactly the same ...

    • @ppbroAI
      @ppbroAI Месяц назад

      I think they use the vector representation of the problem and apply transformations trying to hit a sweet spot

    • @Tetsujinfr
      @Tetsujinfr Месяц назад +2

      I would probably need to read the paper though tbh. Since the on-the-flight via lora training at inference time seems very powerful. Like maybe the model could infer dynamically the best type of transforms to apply at inference training time depending on the input problem. Probably the core of the training would actually be to best predict those types of transforms to apply depending on the problem. I would need to look at the inference run time as well, looks extremely costly to train and to run inference to me, at first glance.

    • @jtjames79
      @jtjames79 Месяц назад +2

      ​@@Tetsujinfr I can see the value in a super slow, super expensive AGI. It's not like human experts are cheap or fast.
      I used to wait several hours just for boobies at 14.4 kbps.
      In fact, it taught me patience and the value of using your time wisely, that kids just don't have these days.
      Since I prefer to be involved in every step of any process, even if I just have to trust that the AI knows what it's doing. Going slower works for me.
      First it was compiling, then it was rendering, now it's inferring. Any excuse for a Nerf gun fight.

    • @xthesayuri5756
      @xthesayuri5756 Месяц назад +1

      The really smart models like o1 or Sonnet might be able to come up with augmentations themselves and then train themselves.

  • @Jayk-56
    @Jayk-56 Месяц назад +7

    Notes on this 'breakthrough', they only improved results slightly on a paper published Nov 4th 2024 by Cornell University, both of those papers are part of paper submissions to win cash prizes (you don't need to beat ARC to get the prize, just have the best paper/method), like others mentioned in the comments this is not easily generalized since they had to use some domain knowledge in crafting the LoRA adapters. Progress for sure, but contextually not unique

    • @alfredomoreira6761
      @alfredomoreira6761 Месяц назад

      There should be a penalty on ARC challenge for long inferrence time. These methods would not see the day of light

    • @entreprenerd1963
      @entreprenerd1963 Месяц назад

      @@alfredomoreira6761 - there's a time cut-off on specified hardware. The folks who did this paper didn't fit within those limits, that's why they don't appear on the actual ARC leaderboard at the ARC website.

  • @Curlack
    @Curlack Месяц назад +3

    One thing they need to incorporate is when input does not have a high confidence level understanding or could be ambiguous, the AI shouldn't feel the need to decide what is meant and just answer. It should ask questions to clarify and when the confidence is high enough, perform the task. That's how a teacher would determine what a student doesn't understand and adapt his input accordingly.

  • @WaltWhite71100
    @WaltWhite71100 Месяц назад +1

    Test time training sounds like an important step towards in context training. What we need is the ability for people to explain during runtime, how a model can improve its performance. When I’m using my cars autopilot for example, I’d like to be able to vocally tell the autopilot what I like and don’t like about its performance, similar to what I would do with a student human driver. This new information should get incorporated into the autopilot’s future actions immediately. This type of test-time-training, as well as the use of reasoning, and using ‘guard rails’, is what humans do every day as we learn new skills.

  • @LukeKendall-author
    @LukeKendall-author Месяц назад +1

    I think one aspect of animal intelligence is spatial reasoning applied to perceived images (e.g. used in navigation for exploring to find food and returning to the nest). This mode of pattern recognition and prediction is a foundation layer for us, but only laboriously and indirectly approximated by LLMs etc.

  • @AV_YOUTUBE_202X
    @AV_YOUTUBE_202X Месяц назад +1

    It's a great idea. For a while we've had the situation where we've got these transformer networks but they're not being used to the max of their potential capabilities. _"Let's just put the problem in K/V context and pull the next token probabilistically and be done"_ has been the status quo. Well, turns out that's a pretty sub-optimal way to use these networks.
    The ARC benchmark is a "follow these examples" challenge. So, these guys are saying _"per problem, let's generate a LORA from the problem examples, then use that LORA."_ It's a great idea.
    In general it's following the principle that _"minds should focus specifically on the problem at hand."_ When a human being works on a hard problem, they forget about other things and lock in! That's what this LORA is doing. It is de-prioritizing broad abilities and "focusing in" on solving the problem at hand.
    Is just a good thing to generally notice, to notice that AI ought to "focus" while working.

  • @jason.arthur.taylor
    @jason.arthur.taylor Месяц назад +1

    This is brilliant research. Hats off to Ekin Akyürek, Mehul Damani, Linlu Qiu, Han Guo, Yoon Kim, and Jacob Andreas.

  • @HansKonrad-ln1cg
    @HansKonrad-ln1cg Месяц назад +2

    the ability to continuously learn is crucial for real intelligence. the model can adapt and have a short term memory outside of its context window. very well presented paper. the fact that they produced such a huge jump in the arc challenge needs to be brought to public attention so everyone can draw their conclusions from it.

  • @En1Gm4A
    @En1Gm4A Месяц назад +3

    Thx for paper breakdown. Much appreciated

  • @tomdfluter
    @tomdfluter Месяц назад +1

    Great breakdown of the paper and I love you enthusiasm, thanks again 👍🙏

  • @WanderingJoy
    @WanderingJoy Месяц назад +1

    Yes - very excited to see this paper. Glad you’re getting the word out

  • @robmaiale652
    @robmaiale652 Месяц назад

    Super helpful, thanks a bunch for explaining the concepts

  • @d_b_
    @d_b_ Месяц назад +2

    Feels counterintuitive that this would work better than other methods, if I am understanding it. So its like in-context learning, but the model parameters change with its own self-generated samples? How does this fare against a self loop, or what I guess is called reflection?

  • @VanSocero
    @VanSocero 25 дней назад

    Great vid. I'm on deep seeknnow testing it against some of my use cases. Keep up the great info. Happy 🦃 day..

  • @jimbo2112
    @jimbo2112 Месяц назад +4

    The state of AI reminds me of a spoiled rich kid at Christmas. They open presents and barely play with each one before being presented with a new one. They don't know how the previous toy worked nor did they explore all the options... 1/2hr goes by without getting anything new and they question the validity, fun and usefulness of all their presents.
    We have never seen a cadence of tech innovation like this and we are barely exploiting any of it before something new comes along to catch our eye.
    We need to be more realistic about the situation, stop doom-mongering about the slowed pace of AI innovation and accept that we already have enough toys to change our lives forever - knowing that there will be more significant advances in the near future.

  • @SuperYapayZeka
    @SuperYapayZeka Месяц назад

    It's amazing that more than a hundred thousand people watch such technical topics.

  • @justfoundit
    @justfoundit Месяц назад +2

    I have a feeling that we'll need a more definite score of response-goodness to achieve AGI. But this direction is promising.
    Math problems are easier, probably coding too, where you can define goodness of your response.

  • @antoniobortoni
    @antoniobortoni Месяц назад +4

    Instead of relying on constant calculations, these systems work by memory, much like how a person recalls math answers rather than solving them every time. Imagine a software that repeatedly performs the same calculations-why not replace those with precomputed memory? Using a chart or lookup table with all the answers stored, the computer could retrieve results instantly instead of performing the operations each time. The chart doesn't even need to be perfectly precise-approximations are often good enough to solve real-world problems efficiently. This approach could revolutionize computing by reducing complexity while maintaining functionality, especially for applications where 'good enough' physics or outcomes suffice.

  • @technocorpus1
    @technocorpus1 Месяц назад +7

    ah, just sipp'n my coffee and watch'n some ai news.

  • @saisrikaranpulluri1472
    @saisrikaranpulluri1472 Месяц назад

    Yes it is fascinating, good work 👍

  • @francknouyrigat9809
    @francknouyrigat9809 26 дней назад

    Fascinating, if I got it correctly it still qualifies as synthetic data, but things like inversion means there should be some kind of conservation of information / symmetry present in the data for it to work. Really cool!

  • @JBLU7
    @JBLU7 Месяц назад

    Great Video!

  • @libiroli
    @libiroli Месяц назад +10

    This is not a breakthrough. It's just more of the same. We are stuck in the current paradigm until LLMs are paired with a UNIVERSE model (math, physics, spatial), which will enable actual reasoning.

  • @aviralapurva
    @aviralapurva Месяц назад +3

    Love the content!

  • @TheodoreGelber
    @TheodoreGelber Месяц назад

    When you modify parameters to solve certain kinds of problems after standard training , won't that make the model worse at other kinds of problems ? The fine tuner may not care about that, but it seems you can only use that tuned model on those problems related to the fine tuned training set.

  • @patrickzupanc1795
    @patrickzupanc1795 Месяц назад

    Great video, thank you!

  • @williamal91
    @williamal91 Месяц назад +1

    you are a treasure Matthew

  • @andrewlewin6525
    @andrewlewin6525 Месяц назад +1

    I’m excited, that you’re excited 😆

  • @David-tp7sr
    @David-tp7sr Месяц назад

    Yes, this is important. Thank you for covering in depth.

  • @luisalfonsohernandez9239
    @luisalfonsohernandez9239 Месяц назад

    Amazing video Matt, very promising approach indeed. Ultimately what most people think is AGI will probably be a spatial first approach, like what Dr. Fei Fei Li proposes but TTT/active inference will definitely be one component for adaptability

  • @En1Gm4A
    @En1Gm4A Месяц назад +2

    finally understood Lora - like saw its effects and visual representation in front of me

  • @ScottLahteine
    @ScottLahteine Месяц назад +1

    It’s not exactly recognizing and applying analogies, but it’s getting closer!

  • @JamesRogersProgrammer
    @JamesRogersProgrammer Месяц назад

    You can build a workflow to take a prompt, brainstorm on it, review the ideas and then generate a response based on all that.

  • @pon1
    @pon1 Месяц назад +1

    This is what we needed, I also think that we need memory, maybe some way to store the real-time learning that TTT achieves. It seems wasteful to just drop the insight, but how to update the weights to make good insights permanent (fixing the LORA into the model itself) maybe is more difficult than it seems? Or it could mess with the already good weights for other things. But yeah, we need to find the solution to that to progress even further. We do have the answer, our brains, but we don't fully understand it yet or how to make it compatible to our digital version. We need to draw insight from neuroscience, it won't be the same, they are two different architectures (our brains and AI), but I believe that there is a way to make them compatible with eachother. That will not only spark superintelligence, but make US superintelligent as well.

    • @ronilevarez901
      @ronilevarez901 Месяц назад +1

      Sone paper declared that Lora overrides pretraining knowledge.
      But I'm sure they'll figure it out.

    • @ronilevarez901
      @ronilevarez901 Месяц назад

      Btw,we wont become super intelligent unless we merge with AI.

    • @pon1
      @pon1 Месяц назад

      @@ronilevarez901 Yeah, but that's what I think will happen, first we will make it compatible with our brains, then merge with it. Would solve all problems that cold arise with AI also, like alignment.

  • @Suro_One
    @Suro_One Месяц назад +1

    @MatthewBerman The arc prize can be seen as a challenge of predicting the next pixels. You can scale it up to have it generate the next logical move in a game of chess, a task on your PC or a realistic game or AI OS.

    • @ScottzPlaylists
      @ScottzPlaylists Месяц назад +1

      "Predicting the next pixels" is nothing like the ARC puzzles.
      For each one, is you have to come up with the correct set of rules to transform the inputs into the outputs -- and to find the right blocks of "pixels" that a given rule should transform into the output.
      Every puzzle is a different set of rules you have to figure out.
      Sometimes a rule you make for Example 1, won't work for Example 2 or 3 or 4, and you have to modify your rules to fit All examples as you go through them. It's easy to make a bad rule that won't work for the other examples.
      I've manually written python code that solves about 20 problems so far, and I've built up a library of primitives to re-use code. There is always custom new code functions you have to create for each problem because each puzzle set is unique.

  • @akhildraksharapu
    @akhildraksharapu Месяц назад +24

    Bro is back with Namaste 🙏

  • @tom-et-jerry
    @tom-et-jerry Месяц назад

    Given that in the Arc Prize tests there are few training tasks per category, during the inference test the model has every interest in carrying out micro training on the 3 tests presented in order to make a good quality inference, that's why TTT is much better than other competitors.

  • @Let010l01go
    @Let010l01go Месяц назад

    Yes, you are absolutely right but what is the underlying logic/algorithm or whatever of the time-based training on synthetic data? Greate E.p❤

  • @andrewnewman398
    @andrewnewman398 Месяц назад +1

    Test data is held out to provide an unbiased estimate of model generalization. TTT effectively creates a custom model for each test query, “…we first generate training data 𝒟 TTT (d input) from the test inputs…” It takes 12 hours/100 tasks. Isn’t this just overfitting with more steps? How is this generalised? It seems only useful as an optimisation for a known test.

  • @En1Gm4A
    @En1Gm4A Месяц назад +1

    This should be trending

  • @A_Me_Amy
    @A_Me_Amy Месяц назад

    Scale up with prompt time too, "think step by step, and offer a counter to the best solution you come up with. Weight these two options." This kind of thing.

  • @deepskyfrontier
    @deepskyfrontier Месяц назад

    The way this method might enable better smaller models is big, but the opportunity to take this further, to audit and act upon the insights won thereby, to intelligently drive feedback ever closer to the native structure of the model, it’s the really big kind of gigantic.
    A year from now; AI using TTT to iterate on its own instantaneous requirements may have a claim on a kind of consciousness.

  • @SeregaZinin
    @SeregaZinin Месяц назад

    It's funny that they called it LoRa, since LoRa is most commonly known as LoRa and LoRaWAN define a low-power, wide-area (LPWA) networking protocol designed to wirelessly connect battery operated devices to the Internet. :)

  • @matt.stevick
    @matt.stevick Месяц назад +1

    i’ve have been having some unexpectedly profound conversations with my chatgpt lately that make time seem to disappear as i get so into the conversations. we delve into complex topics, and it’s incredible how it analyzes and remembers details from our discussions, sometimes understanding me better than i understand myself. it feels like i’m living in the movie ‘her.’ lol
    thx for being on top of everything going on matthew b.!

  • @frun
    @frun Месяц назад +1

    I have a doubt they achieved AGI. ARC is clearly not the definitive test for AGI, although i hope it have some potential.

  • @u2b83
    @u2b83 28 дней назад

    Fine-tuning during test time simply acts as a prior into the test distribution lol
    ...but the surprise is that it goes/resets back to the base model after prediction.

  • @mickidydee
    @mickidydee Месяц назад

    I'm curious if this sort of opens a tremendously huge back door to new techniques for jailbreaking the models and how that will effect the time frames for model releases trying to implement this new technique.

  • @justindressler5992
    @justindressler5992 Месяц назад

    Its about pre activation based on problem domain. This is why prompting works aomewhat. If you feed several examples related to the domain problem into the network this activates the nuerons through attention. Lora is a patch with this activations pre defineded. I think the problem with this approach is synthesising accurate domain specific examples. But i believe OpenAI are working on this approach internally because this was something hinted by Sam Altman. The community use Loras for diffusion models the problem is the models only function with vary specific prompts. But if these patches are removed each time and reloaded to fit the context of the prompt it could work. It seems you still need a model capable of creating the examples. A foundation model such as 405b might be able to create examples for training a small model. I think long term a two model solution could be used with a adversaerable design. One model could generate a answer while another could propose improvements to the answer then prompt the first model again. These models would have to be trained differently

  • @frankjohannessen6383
    @frankjohannessen6383 Месяц назад +1

    but how can it generate training-data with the correct solutions if it can't solve the problem to begin with?

    • @drdca8263
      @drdca8263 Месяц назад

      Did you look at what was on-screen, or just listen to what he said?
      The input problem consists of 3 input output pairs, followed by an additional input, and the expected answer is the corresponding output.
      The inference-time generated training set is obtained by:
      1) pick a random transformation, such as reflecting everything across some axis, and apply it to each of the 3 input/output pairs. 2)Then, randomly select one of the 3 pairs, and produce as a training sample the problem which has the other two input output pairs, and the input from the selected pair, as the input, and the corresponding output as the output.
      then it trains on all of these.

  • @interestedinstuff
    @interestedinstuff Месяц назад

    Makes sense. Seems to me that's what we humes do. I see a set of puzzle result pairs and I look at them for a while till I figure out a pattern of transformation. Giving an LLM the ability to do the same, to look at build a working model of the transformation, before being confronted with the novel questions makes sense to me and I can see why it works.

  • @ShaunShibu-oz8yn
    @ShaunShibu-oz8yn Месяц назад

    When it generates similar samples to the new problem at hand, it needs to solve the similar samples correctly as well before fine tuning

  • @Thedeepseanomad
    @Thedeepseanomad Месяц назад

    Insstead of AGI we might end up with a so-so general model that we load up with what is basically similar to a LoRa for our current task need.
    Llama 8.5 in Optimus 6: "...I know Kung Fu!"

  • @theaugur1373
    @theaugur1373 Месяц назад

    Do we know for sure that the test data is not in the training data?

  • @dr.mikeybee
    @dr.mikeybee Месяц назад

    It seems to me that the couple of examples given for the problem can be used for supervised learning.

  • @ramakrishna5480
    @ramakrishna5480 Месяц назад

    All these super intelligent models and no ai can give an new mnemonic to remember some simple facts for my exam

  • @baroianandrei7883
    @baroianandrei7883 Месяц назад

    It would have been useful to make the difference between the ARC-AGI and ARC-AGI-pub. We don't know the leaderboard for the latter until 6th December

  • @leegaul8250
    @leegaul8250 Месяц назад +5

    The trained on the public dataset. This isn't the real ARC-AGI challenge.

    • @leegaul8250
      @leegaul8250 Месяц назад +2

      Also, is the leaderboard at 0:05 photoshopped? The leaderboard doesn't have Test-Time Training on it.

    • @ronilevarez901
      @ronilevarez901 Месяц назад

      Feels like cheating right?
      But we have trained on the public datasets since the begging.
      As long as they don't train on the test set, it's all good.

    • @leegaul8250
      @leegaul8250 Месяц назад

      ​@@ronilevarez901 yes but there is a note in the paper that specifically states that they cannot guarantee that the model has not seen the test set (which is public).

  • @jawadmansoor2456
    @jawadmansoor2456 Месяц назад

    Thrusters compute post training means more training. It's just a training method, it says train twice.

  • @BrettWrightsPage
    @BrettWrightsPage Месяц назад

    How do you train a model on an answer that you don’t have?

  • @drdca8263
    @drdca8263 Месяц назад

    Did you mention an actual scaling law for this inference-time-training?

  • @gpsx
    @gpsx Месяц назад

    I haven't read the paper and don't fully understand what they are doing, but I don't see how this process generalizes. It just seems like they found a technique that works well on this test by essentially putting the question in to the training set.

  • @ppbroAI
    @ppbroAI Месяц назад

    This seems simple enough to be implemented very soon. I think is kinda like a baked dspy vector technique into each query.

  • @gd7163
    @gd7163 Месяц назад

    Ok so two very basic questions:
    1/ how do they generate extra problems that preserve the « spirit » of the initial problem?
    2/ they minimise what objective function on each of those « similar » problems for which presumably they don’t the solution given they are generated from an unsolved problem in the first place, and of the same kind?
    Without this it sounds like cheap prestidigitation trick to me.
    Could be wrong though

  • @MeinDeutschkurs
    @MeinDeutschkurs Месяц назад

    Good, now we train the reflection process into the model. Well. Let’s hope that any prompt is covered. (TTT, test time training) - btw, live-LORA is dangerous, the model gets silly over time, because the user is silly.

  • @alexmipego
    @alexmipego Месяц назад

    This is at least 1 step short of the real big real, that is use this not to simply update your ai parameters like this, but by enabling/disabling other networks. For example, image if ChatGPT could just ignore all the Japanese translation parts for this reply… this would mean the core would be much smaller and focused on high level knowledge, that triggers/enabled submodels as needed - a much more efficient approach to the whole problem, with the extra bonus that now you can also start reusing and parallelizing the develop of each "module."

  • @GregoryMcCarthy123
    @GregoryMcCarthy123 Месяц назад

    It’s hard to call it AGI when the entire premise of this technique requires fine tuning on similar tasks beforehand. They’re just priming it to solve similar tasks.

  • @rul1175
    @rul1175 Месяц назад

    Tomorrow: AI is doom, Sunday: AI is not dead, Monday:Rinse and repeat.

  • @fynnjackson2298
    @fynnjackson2298 Месяц назад

    The only ceiling is us using our infinite imagination to imagine the lack of infinity. - This will continue forever.

  • @telotawa
    @telotawa Месяц назад +2

    this is still too specific and unlike how actual reasoning works - it only works because you get to see examples first. true reasoning can explore uncharted territory, and idk if we can make benchmarks for that

  • @riffsoffov9291
    @riffsoffov9291 Месяц назад +1

    Instead of "just erases all of that" before going to the next problem, should it send the data somewhere because maybe it's more useful than other synthetic data, or maybe send just the problem and the solution? Only when it doesn't break privacy etc. Just my amateur thoughts, likely way-off.

    • @ronilevarez901
      @ronilevarez901 Месяц назад +1

      Nah, it's right on point. We can store the lora as document connected to the problem it solves and next time the LLM recognizes the same problem it will pull the lora and use it.
      That's gonna be the next thing added to knowledge graphs. You'll see.

    • @riffsoffov9291
      @riffsoffov9291 Месяц назад

      @@ronilevarez901 Thanks for the info.

  • @WaveOfDestiny
    @WaveOfDestiny Месяц назад

    Now we are cooking!! humans can reflect and learn. this is a step forward

  • @SethCohn23
    @SethCohn23 Месяц назад +1

    The 'average' human score of AGC is widely overblown. They even claim 'kids' do well. Dig into the actual tests, and the very limited pool they use: it's scary how small the pool, and it's not the test suite used for AI... I do these tests well, and I was shocked at the claims of even 60%... 60% of Mechanical Turk users is not 60% general public, nor 60% of kids. I'm convinced the AGC score is skewed too high (and thus harder to achieve and pay out)

  • @ferencgaborsimon245
    @ferencgaborsimon245 Месяц назад

    achieving AGI and it will run on a datacenter, so who will decide if it needs to be powered down? Would that be like mansla....er? What will it do to block its powerdown? Would it blakmail you or switch off traffic lights and end you?

  • @wurstelei1356
    @wurstelei1356 Месяц назад +1

    Why TF is o1 in that list ? Where is the source code ?

  • @jmlbeaujour
    @jmlbeaujour Месяц назад

    1. How does the LLM decides that he needs to perform test time training? i.e how does he know he does not have the capacity to respond accurately the new task
    2. The model generates its own training data: there is no guaranty that the the generated training has no error (bad labels)

    • @drdca8263
      @drdca8263 Месяц назад

      In this case, the leave-one-out version is sound, it will not produce an inaccurate label, because it is using the original label.
      The other data augmentation strategies may or may not be valid for a given task. They appear to be based on assuming certain symmetries about the problem.
      Which, the validity kind of depends on what class of tasks you want it to be able to recognize…
      But, because the entire collection of inputs and outputs are transformed in the same way, I think it should generally be fine?
      If e.g. you flip everything in some task, this corresponds to a different task which has everything flipped,
      and, as long as there is enough in the examples to know to be able to distinguish between the task and the flipped task, I think this should be fine?

  • @garyshi
    @garyshi Месяц назад

    This is interesting. So we can also give a small model that's solid in logic and common sense but don't have a lot of specific knowledge about a domain, give it some articles, books or what have you to "fine tune" at real time, and let the model to generate more data from those materials (it's like "reflect on the knowledge you just learned"), and it can give better answers than, say the current Perplexity approach which is purely context-based? Essentially this is System 2 partially happens in model weights vs. purely in context.

    • @garyshi
      @garyshi Месяц назад

      This may not be *the AGI* someone was looking for, but could still be quite potent.

  • @vaidphysics
    @vaidphysics Месяц назад

    I think what should be worked on first is getting a LLM to understand the "environment", to build a physical model of the whole rather than just looking at the parts or correlations between those parts.

  • @jsebastianmunch
    @jsebastianmunch Месяц назад

    Wow, so they just train on the benchmark - amazing…

  • @diaitigai9856
    @diaitigai9856 Месяц назад

    Great video on 'Test Time Training'! 🤖✨ I appreciate how you simplified the complex concepts, making them accessible to a wider audience. However, I think it would have been helpful to dive a bit deeper into the potential limitations or challenges of this approach. For instance, how might 'Test Time Training' affect the overall efficiency of AI models in real-world applications? It’s exciting to see AI evolving, but understanding the drawbacks is just as important. Looking forward to your next video and more discussions on this topic! 🚀

  • @taichikitty
    @taichikitty Месяц назад +1

    If you want a rubric for AGI: when an AI can solve the bicycle stability conundrum, it will have discovered something that humans have perplexingly been unable to solve for themselves. Unlike the Collatz conjecture or other mathematical problems, this is physical and should have a solution, or at least a cogent explanation.

    • @Alistair
      @Alistair Месяц назад +1

      what is the bicycle stability conundrum? Isn't it fairly well understood by now how rake etc affect bicycle self centering, or am I missing something?

  • @fotisj321
    @fotisj321 Месяц назад

    Great video and great paper. Isn't it strange that they then reset the model to its original state?

  • @kait3n10
    @kait3n10 Месяц назад

    12:55 They already beat the average human score, doesn't this count as AGI according to this test? Or do they need to beat the best human score? It would be ASI at that point.

  • @henrytuttle
    @henrytuttle Месяц назад +1

    One of the biggest problems with AI is it doesn't learn from its mistakes. Generally, the most effective way for humans to learn is trial and error. And the error doesn't come from internal chain of thought but rather the outside world demonstrating our error. AI doesn't do this. I've noticed that in "discussions" with AI, I can point out a flaw and it will sometimes integrate that into a new solution.
    Test time training seems to somewhat solve this problem, but it's not integrated. The problem is that it doesn't actually "learn" from this mistake. It does not integrate this lesson into its "intelligence". It merely uses it momentarily and then forgets it. If AI has already combed all known knowledge from the internet, the next source is obviously from both real time information and from information it gains from interacting with the world.
    I don't believe that we'll have true AGI until it can integrate lessons it learns from interaction with an external source into its structure.

  • @nikhil_jadhav
    @nikhil_jadhav Месяц назад +1

    AGI is here..Breakthrough.. this is it.. blah blah blah... 10% improvement!! Let me know in the comment if you wants to test it out..

  • @SimeonRadivoev
    @SimeonRadivoev Месяц назад +1

    I should just train my AI mainly on benchmarks, so I get ace them and get in the leaderboards

  • @3thinking
    @3thinking Месяц назад

    I can see how this approach might be applied to creating a super coder LLM. Get an AI to generate billions of programs, some will compile and produce an output, some will generate compile or runtime errors. Use these as synthetic data to fine tune an LLM aimed at leet coding and solving new unseen software problems.

  • @guillaumevermeillesanchezm2427
    @guillaumevermeillesanchezm2427 Месяц назад

    By tuning a model on ARC-like stuff you get better results on ARC. What a non-surprise.

  • @tomaszzielinski4521
    @tomaszzielinski4521 Месяц назад

    Okay, so now AI is able to learn to solve a new problem. This was one missing feature for AGI no one quite knew how to tackle, but here is the breakthrough.
    Now, how can I do the same stuff at home?

  • @Barrel_Of_Lube
    @Barrel_Of_Lube Месяц назад

    pairing this up with Mixture-of-LoRAs on top of moe architecture, huge potential for tiny fast models