OpenAI's 'AGI Robot' Develops SHOCKING NEW ABILITIES | Sam Altman Gives Figure 01 Get a Brain

Поделиться
HTML-код
  • Опубликовано: 11 ноя 2024

Комментарии • 1,4 тыс.

  • @postandghost2024
    @postandghost2024 8 месяцев назад +764

    "The apple found it's new owner."
    That's adorable... We're all going to die, aren't we?

    • @Chavagnatze
      @Chavagnatze 8 месяцев назад +110

      + The way he walked away while it was still talking to him... We are fucking dead.

    • @rootor1
      @rootor1 8 месяцев назад +24

      We probably will live better.

    • @Earthwirm
      @Earthwirm 8 месяцев назад +51

      @@Chavagnatze yeah, seemed kinds of rude!! The robot will remember.

    • @andu2oo6
      @andu2oo6 8 месяцев назад +29

      Yep, I was thinking exactly that as I was watching, more along the lines of "it's over. it's just a matter of time". I am actually not an AI doomer, but I couldn't really help it, perhaps all the dystopian Hollywood stuff has done its job. Anyway, if anything happens, at least it'll be more interesting/exciting than dying of old age, or some cumulative effect of random age related diseases.

    •  8 месяцев назад +15

      nah, robot waifus will make it worth it

  • @eIicit
    @eIicit 8 месяцев назад +234

    I am astonished at the breakthroughs being announced almost daily at this point.

    • @Ben_D.
      @Ben_D. 8 месяцев назад +28

      In a year from now, it will just be a live stream where the updates come as continual Breaking News.

    • @ich3601
      @ich3601 8 месяцев назад +6

      Why? Hundreds of thousands of people are working on it since decades now. Most are smart. If you focus so much brain power and now money at one topic, you will get results.

    • @ryzikx
      @ryzikx 8 месяцев назад

      @@ich3601the significance and frequency of breakthroughs is increasing and most people dont keep up

    • @NickDrinksWater
      @NickDrinksWater 8 месяцев назад +7

      Most of it is just clickbait tbh

    • @christianjensen952
      @christianjensen952 8 месяцев назад +4

      ​@@NickDrinksWaterexactly 💯
      Going from gpt 3 to this and sora in a year is nothing

  • @GetawayFilms
    @GetawayFilms 8 месяцев назад +131

    "Great, can you explain why you did what you just did while you pick up this trash?"
    "I'm sorry Dave, I'm afraid I can't do that"

    • @IceMetalPunk
      @IceMetalPunk 8 месяцев назад

      *Picks up the human and drags him to the dumpster while saying, "because you commanded me to, you asshole"*

    • @JBDuncan
      @JBDuncan 8 месяцев назад +1

      I was waiting for it to say that!

    • @u.v.s.5583
      @u.v.s.5583 8 месяцев назад +7

      "And, before you ask me to rotate a pod, I am very much aware of your fragility as well as the mass, aerodynamical properties and hardness index of these plates in front of me."

  • @NarcatasCor
    @NarcatasCor 8 месяцев назад +163

    "Can I have something to eat?" Figure 01: "No." -End of presentation 😂

    • @allanshpeley4284
      @allanshpeley4284 8 месяцев назад +34

      My robot is gonna be like, "Haven't you had enough today fatty?"

    • @ryant.5173
      @ryant.5173 8 месяцев назад +7

      Like a bad wife 😂😂

    • @IceMetalPunk
      @IceMetalPunk 8 месяцев назад +4

      "I only have a 5 hour battery life, so I have to eat every 5 hours to live. What's your excuse, human?"

    • @ADreamingTraveler
      @ADreamingTraveler 8 месяцев назад

      "Hey jeeves go make me a sandwich" Figure 01: No.

    • @P4INKiller
      @P4INKiller 8 месяцев назад +1

      I'd buy that for a dollar.

  • @LurkingCrassZero
    @LurkingCrassZero 8 месяцев назад +99

    I'll be genuinely impressed when this is done live in front of free press. Until then it's a promising but well rehearsed demo.

    • @ALFTHADRADDAD
      @ALFTHADRADDAD 8 месяцев назад +19

      That’s fair but considering how much a group like open ai has invested I think it’s fair to take them on good faith

    • @jeffsteyn7174
      @jeffsteyn7174 8 месяцев назад +2

      😂 I bet you genuinely impressed with teslas remote control bot.

    • @harmless6813
      @harmless6813 8 месяцев назад +5

      @@ALFTHADRADDADNah. We've had plenty of large companies pulling bs.

    • @Volatile-Tortoise
      @Volatile-Tortoise 8 месяцев назад +5

      @@harmless6813 Yes, of course, and if this was 10 years ago, or even five years ago, there would be every reason to assume this was BS, but the technologies shown here already exist; especially at Open AI which is at the absolute cutting edge, inconsistently ahead of everyone else in AI.

    • @putinscat1208
      @putinscat1208 8 месяцев назад +1

      A dog can do most of that too if rehearsed. Except speak, and the apple will have bite marks on it.

  • @mackabeats
    @mackabeats 8 месяцев назад +343

    Here I am, brain the size of a planet and you have me doing basic kitchen chores.

    • @Sajuuk
      @Sajuuk 8 месяцев назад +13

      Well, at least you're not opening doors...😉

    • @ClarkPotter
      @ClarkPotter 8 месяцев назад +2

      ​@@SajuukLike the yellow dog one with Carl issues?

    • @bug5654
      @bug5654 8 месяцев назад +5

      Sounds like job satisfaction par excellence.

    • @allanshpeley4284
      @allanshpeley4284 8 месяцев назад +2

      And it still barely manages.

    • @AlonzoTG
      @AlonzoTG 8 месяцев назад +6

      Get back to work, Marvin.

  • @MrBrukmann
    @MrBrukmann 8 месяцев назад +359

    I would be more impressed if it asked, "You want me to put the plate that had trash on it back in the rack with the clean plates?"

    • @Noqtis
      @Noqtis 8 месяцев назад

      I would be more impressed if it had boobs and said: clean your trash yourself!

    • @JohnnysaidWhat
      @JohnnysaidWhat 8 месяцев назад +12

      you realize how stupid that question is right?

    • @MrBrukmann
      @MrBrukmann 8 месяцев назад +45

      @@JohnnysaidWhat and you can't identify a sentence meant to entertain. Are we all caught up?

    • @martymarl4602
      @martymarl4602 8 месяцев назад +16

      I hope people realise they gave it Sam Altman's voice

    • @billedifier8584
      @billedifier8584 8 месяцев назад +10

      I'm curious why it picked up the apple with its right hand before transferring to its left hand to pass to the man.

  • @m0ose0909
    @m0ose0909 8 месяцев назад +109

    I think most people are under the assumption that software engineer jobs and other "information only" jobs are going to go away, but skilled trades jobs are safe. Well, its only partially right. Office type jobs are going to go first, but I think mostly everything else is not far behind. This is going to WRECK the job market - we REALLY need to get a handle on how to transition the economy, or we are screwed.

    • @ich3601
      @ich3601 8 месяцев назад +21

      Maybe we do it as we did last time - introduce even more bullshit jobs without purpose.

    • @Vini.The.Artist
      @Vini.The.Artist 8 месяцев назад

      Take a look at “moores law for everything pdf” on google. Wes has made a video about that. Interesting points there

    • @ClarkPotter
      @ClarkPotter 8 месяцев назад +18

      UBI and universal free education.

    • @jaqsro
      @jaqsro 8 месяцев назад

      I honestly think they don’t care about who is going to be without jobs. The economy is going to change completely and people are trying to see the future with an outdated model. Billionaires are building bunkers, hotels in space and spaceships. I think they’re planing to scape and let us die. They will own everything and won’t need consumerism. But, I have a very pessimistic mind so I might be completely wrong.

    • @tringuyen7519
      @tringuyen7519 8 месяцев назад +5

      Everyone wants a Jarvis AI like Tony Stark. How many humans are like Tony Stark? Humans with original thoughts like Steve Jobs & Sam Altman will be fine. Average humans won’t be.

  • @hibou647
    @hibou647 8 месяцев назад +105

    Reality is merging with science fiction. I guess I'll finally have time to learn the flute.

    • @vampir753
      @vampir753 8 месяцев назад

      In the news: "Man murdered by robot that got angry at bad play and shoved a flute up the man's ass."

    • @thegameboyshow
      @thegameboyshow 8 месяцев назад +2

      I wish to learn... How to bang a desk properly like those in my childhood back in days

    • @MesoScale
      @MesoScale 8 месяцев назад +4

      Figure 01, hand me ear plugs.

    • @k-c
      @k-c 8 месяцев назад +2

      Ok Is this reference to Captain Picard - The Inner Light ?

    • @ursusss
      @ursusss 8 месяцев назад +1

      To entertain your new masters

  • @charlieg3437
    @charlieg3437 8 месяцев назад +127

    "Hey figure 01, pick up all these hobos and put them in the soilent green hopper."

    • @mrvalveras
      @mrvalveras 8 месяцев назад +4

      Hmmm, cookies!

    • @narrativeless404
      @narrativeless404 8 месяцев назад +4

      💀

    • @JakeWitmer
      @JakeWitmer 8 месяцев назад

      They're not hobos, they're "illegals" ...or...if the leftists win this time, Stalin already had a model calling "undesirables" "criminals" ...heck, we've been doing that in the USSA since the 1914 Harrison Narcotics Act of 1914 was passed...

    • @KanedaSyndrome
      @KanedaSyndrome 8 месяцев назад +1

      Hey, figure 01, where do you think the human goes next?

    • @narrativeless404
      @narrativeless404 8 месяцев назад +3

      @@KanedaSyndrome
      "to hell"

  • @IceMetalPunk
    @IceMetalPunk 8 месяцев назад +31

    My understanding of the explanation of loading action weights onto the GPU is this (and it's quite interesting!): the main OpenAI model does not control the robot directly. Instead, there is an action controller network, and a separate variation of that controller is trained for every action the robot can perform. (So for instance, perhaps there's a version for "pick up object" actions, a version for "place object" actions, a version for "hand off object" actions, etc.)
    They're all re-trained variants of the same network which takes in images at 10fps and a task description, and outputs robot control "keyframes" at 200/sec. When the main OpenAI model decides it needs to perform an action, it hot-swaps out the current action model for a relevant one -- so if "pick up" is currently loaded and it wants to now place the object down, it'll unload the "pick up" model and load up the "place object" model instead. These controller keyframes get passed to the baseline stability and control model which determines how to move each motor/servo/etc. to achieve the desired keyframe, at 1,000 adjustments per second.
    In other words, the main LMM decides *what* to do, then loads up a "procedural memory", basically (i.e. a pretrained action model) that looks at the world and quickly decides the *actions* to achieve that task, which then get passed to the control model to decide *how* to perform those actions.
    I think the incredible speed and reaction time is a result of the fact that action models are separate from the main LMM, meaning they can be smaller (since they only have to operate with a more limited output space) and don't get interference from the main model while it's thinking (which is also why it can talk to you while it's doing things).
    A huge benefit to this approach is that it would be modular. You can just train up new variants of the action model for any action you want the robot to learn to perform, and just plug-and-play it into the bot's library of known skills. The LMM is already general enough to be able to pick the new model when appropriate, based probably on just a name and description of it in the library; and the models all output the same format of action keyframes, so they're all compatible with the base control model by default.
    Another cool thing about this approach is that it's very similar to how a human brain works. When we learn to perform actions, we do so by forming procedural memories, sometimes called muscle memories. These are stored directly in the motor cortex, the part of the brain that plans and controls movements. In fact, the premotor cortex plans the movements -- like these hot-swapped models -- and the rest of the motor cortex executes them. By storing them this way, we're able to -- once we've learned to do something -- perform the task without thinking much about it, leaving us open to think about other things. It's why you can walk and talk, or tie your shoes and wonder what's for dinner, or drive while jamming out to your favorite music, etc.
    So in a way, the action models are procedural memories in the premotor cortex, and the base control model is like a mix of the motor cortex executing those actions and the cerebellum keeping balance. Meanwhile, the main LMM can see stuff (like your visual cortex), process language (like Wernicke's Area of a brain), and generate new language (like Broca's Area).
    If they throw in a vector database to store episodic memories (i.e. memories of personal experiences) like a hippocampus, then it's crazy cool to think about how we're effectively building a brain with the *structure* of a human brain.

    • @fotoyartefotoyarte1044
      @fotoyartefotoyarte1044 8 месяцев назад

      I mark your comment for later inspection

    • @tridibeshsen6492
      @tridibeshsen6492 8 месяцев назад

      Leaving a comment to trace the grail of knowledge u shared

    • @simoneromeo5998
      @simoneromeo5998 8 месяцев назад

      Wow, thanks for this comment! So insightful!

    • @alexyooutube
      @alexyooutube 8 месяцев назад

      +1

    • @glen9820
      @glen9820 8 месяцев назад

      A year or two ago, I saw a video with Elon Musk and some of his senior people talking about a myriad corner cases, and trying to account for them all i.e. bicycle on the back of a car vs on the road. Is it the same problem and would the solution be similar? A few months ago Musk talked about replacing a lot of C++ code with neural nets. Your comment makes me wonder how close we really are to a general purpose humanoid robot.

  • @ComedorDelrico
    @ComedorDelrico 8 месяцев назад +19

    Wes 6 months ago: "these robots are kinda dumb"
    Wes after seeing the latest Figure 1 robot in action: "I, for one, welcome our new robot overlords"

  • @davidk7849
    @davidk7849 8 месяцев назад +32

    One of the reasons we get along with dogs is because they have the ability to recognize what pointing means... So good point, I bet early on that will be key for interaction with AI droids.

    • @CStoph1979
      @CStoph1979 8 месяцев назад

      Most of the inbred dogs today have no idea what pointing means. Even the stupid pointer that lives with me cant figure it out.

  • @fashb2k
    @fashb2k 8 месяцев назад +26

    I'm a futurist but even this gave me a slight uneasy feeling. It literally sounds like a human, waking up into a new world and discovering his/her surroundings.

    • @Also_sprach_Zarathustra.
      @Also_sprach_Zarathustra. 8 месяцев назад +2

      You're a biological machine, so you're a robot too. This isn't surprising or uneasy for neuroscientists.

    • @Scipher77
      @Scipher77 8 месяцев назад

      Uncanny valley territory.

  • @Vini.The.Artist
    @Vini.The.Artist 8 месяцев назад +42

    Ok now i got acctually shocked for real. I didn’t not expect this for at least a year to be honest,

    • @vampir753
      @vampir753 8 месяцев назад +6

      At that speed it will probably turn out that OpenAI accidentally created god and that is what Q-star is.

    • @PRISMADROID
      @PRISMADROID 8 месяцев назад +1

      They Probably Have Real Life T-850s In Their Lab Helping Them Out.

    • @atlas9401
      @atlas9401 8 месяцев назад +4

      Because AI helped them build newer AI faster than they ever could have by themselves. What you’re noticing is what is happening when we say that we are seeing progress trend exponentially now.

    • @pgc6290
      @pgc6290 8 месяцев назад +1

      Yeah. But its good that its already here though. But the thing is, many things get demonstrated but dont get to public and public use for way long.

  • @SandyRegion
    @SandyRegion 8 месяцев назад +90

    This one actually did SHOCK THE ENTIRE INDUSTRY!
    Was that a simulation of Sam Altman's voice?

    • @cacogenicist
      @cacogenicist 8 месяцев назад +5

      Doesn't sound at all like Altman's voice to me.

    • @NarcatasCor
      @NarcatasCor 8 месяцев назад +2

      Suunded kinda like the same dude talking to it

    • @blaise3045
      @blaise3045 8 месяцев назад +6

      I thought it was more like Jordan Peterson. 🤷

    • @RhumpleOriginal
      @RhumpleOriginal 8 месяцев назад +6

      Nah y'all are all wrong. That Obama.

    • @The_Questionaut
      @The_Questionaut 8 месяцев назад +5

      ​​@@RhumpleOriginalgood ear u right

  • @computerrockstar2369
    @computerrockstar2369 8 месяцев назад +15

    About the gpu. I wrote a paper for my cybersecurity class. The idea was to limit the possibility of a probabilistic perturbance in the tech that could cause it to gain self-awareness using fine tuning or RAG etc. The plausibility of a power intensive consumer grade gpu's to compute an entire neural network weights for AGI present thermodynamic and computational challenges especially because we havent figured out how to only compute the necessary weights and have to compute ALL the weights each time unless it is running on a cloud somewhere, (which is a great technicality because it prevents superintelligence from being out in the wild because it would take too many resources to compute a neural net that large) the idea was to only compute portions of the neural net as a forcing function of the needed compute. Tying weights needed for an output and only passing the weights of those needed portions, lower the chance of rogue AGI. I assume that they have some type of backdoor function for that.

    • @computerrockstar2369
      @computerrockstar2369 8 месяцев назад +5

      @@Hi98765 it’s not as sketchy as it sounds. I honestly think this is a very thoughtful solution. I would prefer it that my hardware doesn’t have the ability to become alive.

    • @charliekelland7564
      @charliekelland7564 8 месяцев назад

      Does that mean the compute is not distributed? I always assumed robots would have a GPU or equivalent in each limb etc, a la Rodney Brooks ...

    • @gewgleplussuux5756
      @gewgleplussuux5756 8 месяцев назад

      the way you word this makes it sound as if training large models will only ever take on the order of days. that's just obviously faulty logic when it comes to the rate of advancement in ai, and also when it comes to processor and memory speed advancements. something that would've taken days to render 10 or 20 years ago can be done basically realtime nowadays. that same advancement in hardware will affect ai training, especially because we are only really starting to see hardware that's custom designed spefically to train models. as that advances, and as transistors and cpus go truly 3d (just google 3d stacked moore's law to see what will be happening in the next 10-20 years).. when cpus have 5, 10 layers that's only a 5x performance gain. what happens when we figure out how to dissipate the heat from 100 layers?? 1000? a gpu that's 100-1000x as performant as the ones we currently have could compute a future language model in likely seconds or minutes. imagine having robots running around everywhere with gpus in them. the risk definitely is there that agi could easily get loose.

    • @computerrockstar2369
      @computerrockstar2369 8 месяцев назад

      @@gewgleplussuux5756 by then I assume we would have more safety mechanisms in place. This is just one of many steps needed to contain the risk.

    • @computerrockstar2369
      @computerrockstar2369 8 месяцев назад +1

      @@charliekelland7564 Nah that very inefficient because each computer would have to communicate. It would be much easier to have a single onboard computer with access to a cloud network. Though there may be computers in certain high friction places, knees wrists joints, so they can get better data for the main computer but they won’t be anywhere near a 4090 equivalent. Even so, I believe that would be a temporary measure until the data becomes more available over time.

  • @MartinGamsby
    @MartinGamsby 8 месяцев назад +11

    GPU. Ok I didn't think I would have to write it, but I saw so many weird and/or incomplete answers. To answer what "The same model is responsible for deciding which learned, closed-loop behavior to run on the robot to fulfill a given command, loading particular neural network weights onto the GPU and executing a policy." means, I'll try my best (as a senior software developer).
    Your answer was pretty good IMO ("There's maybe a finite pre-trained number of actions that it has, like 'pick-up' or 'pour some liquid', or 'push a button', then the AI model selects which specific thing to run and runs it.")
    Just noting here a few vocabulary that can be interpreted differently depending on context.
    For the context here:
    - model: It's like an LLM, but the output is not text, it's tokens to describe movements.
    - learned, closed-loop behavior: Like you said, for example they *learned* "grab that bag", and it's closed-loop, because the destination is not enough. It needs to start the movement, get feedback on what it's doing, and adjusting to make sure it's heading to the right place.
    - weights: This is just describing how the models work. Like sliders of multiple things, you put more weight on something you want to do more.
    - GPU: This is just what they're using to process. It doesn't really matter, they could have just said "computing" or whatever. It's faster to run a model on a GPU than on a CPU, for example.
    - Policy: "Neural Network Policies for fast, dexterous manipulation", which is just a basic movement.
    So if we put all of this together:
    "deciding which learned, closed-loop behavior to run"
    means that from the input text, they decide, for example getting the trash, to split it in multiple sequential behaviors, like "move bag#1 from here to there" (here to there is filled from the vision input they have), and that would be split down in smaller behaviors, like "move hand towards bag until it's at the position over the bag", "open hand", "move hand down", etc.
    Note that these are not text, I quoted it because it's our description of it, but it's just something they know, and the model chooses which ones it needs for a particular case.
    then "on the robot to fulfill a given command"
    is pretty self-explanatory IMO.
    then "loading particular neural network weights onto the GPU"
    just means that it will ... run it. Just like ChatGPT does. Or any model that you would run locally. With weights according to the command spoken above.
    then "and executing a policy." is what I described above: for the trash example, they begin with "move hand towards bag until it's at the position over the bag", which (again) is not text, but something they learned already.
    They just need to give as input WHERE that is.
    So something like "move to x,y,z" in their internal coordinates, relative to something (probably the camera, or something like that)
    Where x,y,z is computed from the visual they have of the bag.
    And this is where they put the weights, to move (for example), the arm, the wrist, the whatever motor needs to be moved.
    And then multiple times per second, they adjust to what they see, what they sense, to make sure that it's going where it should be going (in a "closed-loop").
    And when the closed-loop is done, the hand is "moved at x,y,z", then they do the next thing in the list that they made from the command (as described above).
    I don't know if it was clear...

    • @alexyooutube
      @alexyooutube 8 месяцев назад

      "to fulfill a given command" ... That command is likely "parameterized". And, those "parameters" are mapped to the elements and entities perceived by AI models (e.g. GPT4 Vision).
      Even there are a finite number of these Commands and Pre Trained NNs, the combination of these Commands can attain very rich and versatile results.

  • @Rollthered
    @Rollthered 8 месяцев назад +51

    I feel like I'm watching a sci fi movie.

    • @Qwijebo
      @Qwijebo 8 месяцев назад +1

      Disney had this stuff since the late 60's

    • @jhunt5578
      @jhunt5578 8 месяцев назад +5

      I feel like we are inside of a sci-fi movie.

    • @philipduttonlescorlett
      @philipduttonlescorlett 8 месяцев назад +1

      @@jhunt5578Yep, or a George Orwell/Philip K Dick novel!

    • @MichaelErnest666
      @MichaelErnest666 8 месяцев назад

      ​@@jhunt5578 "You Can Definitely Say That Again" And I Just Found Out That Katy Perry Song "Harley's In Hawaii" Is Literally About ai 🤯🤔

    • @ImperativeGames
      @ImperativeGames 8 месяцев назад

      Terminator: Sarah Connor Chronicles?

  • @billmantie8778
    @billmantie8778 8 месяцев назад +17

    I don't know why, but I'm really impressed with how well it transfers an item from one hand to the other, and will even do that to move an item from one side of its body to the other side.

    • @ADreamingTraveler
      @ADreamingTraveler 8 месяцев назад

      Yeah robotics made massive strives back in the 2010's and we've basically almost perfected it. Though their robot is still a bit behind the other companies that have been around longer. For example their robot still walks a bit slow and can't run yet

    • @kimchiman1000
      @kimchiman1000 8 месяцев назад +1

      I noticed also how it seemed to drop the apple into his hand, as opposed to merely placing it there. Hadn't expected that.

    • @xcidgafhamas
      @xcidgafhamas 8 месяцев назад

      @@kimchiman1000yeah that took me by surprise too. Felt very natural

    • @PSiOO2
      @PSiOO2 8 месяцев назад

      ​@@kimchiman1000that's the sus part, the ai obviously has to take a second to recognize the visual data, how did it do that? In fact, it looks like guy just caught the apple, and the ai just randomly dropped it

  • @kiloromeo4725
    @kiloromeo4725 8 месяцев назад +77

    This new technology is coming at humanity at an alarming rate.

    • @NeoKailthas
      @NeoKailthas 8 месяцев назад +6

      As long as it doesn't leave at alarming rate I am ok with it lol

    • @therainman7777
      @therainman7777 8 месяцев назад +14

      We are most likely fucked. I’ve largely accepted it.

    • @fabp.2114
      @fabp.2114 8 месяцев назад

      @@therainman7777Why though? Just accept to find a personal purpose in life. There will be no jobs. There will be no scarcity. There will be fun. Okay, sure, or we all die, but that's fine, everyone dies so there's no FOMO. Our thoughts will forever life inside our robot overlords.

    • @therainman7777
      @therainman7777 8 месяцев назад +3

      You read that section correctly, Wes. The large multimodal modal is like a central “brain” that processes the input scene (verbal instructions from the human, its video feed, etc.), and uses the generalized. intelligence of that model to determine what type of action(s) need to be performed. Once that is determined, a pre-trained set of neural network weights are loaded, presumably from some sort of action library that they have trained in advance, and computation through this neural network is performed on the GPU (where neural network computation is usually performed, due to massive parallelization which produces huge efficiency gains). For example, they may have trains a neural network specifically on the action “cut apple.” If the human asks it to cut up an apple, the large model determines that that’s the action that needs to be performed, loads the weights for that action onto the GPU, and then the computation needed for that action are carried out, the output of which are commands that get sent to the robot’s actuators.

    • @NebulaSon
      @NebulaSon 8 месяцев назад

      Good 😊

  • @konstantinavalentina3850
    @konstantinavalentina3850 8 месяцев назад +33

    The vocal segregate of "uh", "and brief stutter it uses for talking is an interesting choice to humanize interaction.
    I wonder how much of that is trained/learned, or artificially programmed?

    • @sunsu1049
      @sunsu1049 8 месяцев назад +2

      Interesting question, I think that the answer lies in whether or not the voice model was based on a real person's voice, like Siri, or if it was completely made from scratch.

    • @bgtyhnmju7
      @bgtyhnmju7 8 месяцев назад

      Yes.

    • @vampir753
      @vampir753 8 месяцев назад +5

      @@sunsu1049May depend on how it was trained. If the benchmark was that the voice should sound "human" it seems reasonable to assume that an error would be lowered by inserting these "uh..."s and "ah..."s

    • @spockfan2000
      @spockfan2000 8 месяцев назад +2

      I think all answers were textual and someone just read them out loud. "They wouldn't lie to us"... hehe.. remember the Gemini video?

    • @WildVoltorb
      @WildVoltorb 8 месяцев назад +2

      That's how it learned, it's not programmed to do that

  • @Christian_Luczejko
    @Christian_Luczejko 8 месяцев назад +6

    Lol Holy shit. I’m most impressed with how fluidly the servos moved the damn robots arms around. It looked so uncanny to me for moments it almost looked fake. Pretty mind blowing stuff. Super exciting.

  • @j.sargenthill9773
    @j.sargenthill9773 8 месяцев назад +2

    "closed loop" meaning the individual movements and postures. you can kinda see it, like how it moves the hand in position, opens all fingers, closes all fingers slowly, but only uses the index and thumb to grip the apple, then the arm swing, the rotation of the wrist, and the drop. it looks fluid altogether but upon closer inspection is a series of individual movements. not all that different from how we move, but a bit over flourished in order to appear natural.

  • @podz3038
    @podz3038 8 месяцев назад +7

    0:41 "cups and a plate" with 3 plates and 1 cup visible on the drying rack

    • @dwsel
      @dwsel 8 месяцев назад +1

      I hope it doesn't hallucinate any movements

  • @TheCaphits
    @TheCaphits 8 месяцев назад +4

    I love these new robots and all AI models. They're amazing! Thanks for sharing their amazing achievements and development. 😅
    I'd be proud to have these as ancestors, that's for sure. ❤️❤️❤️

  • @JordanMillsTracks
    @JordanMillsTracks 8 месяцев назад +5

    New type of AI generated art: giving this robot a paint brush

  • @freddykruger3090
    @freddykruger3090 8 месяцев назад

    Mr. Roth,
    Enjoying these videos.
    The word exciting, doesn't quite describe it.
    Thank you.

  • @JackNorthrup
    @JackNorthrup 8 месяцев назад +5

    I was surprised when it 'dropped' the apple in his hand, it did not place it in his hand

    • @u.v.s.5583
      @u.v.s.5583 8 месяцев назад +1

      "Sir Isaac Newton once observed and Dr. Albert Einstein explained that the gravity of planet Earth is bending the space-time around it so that a massive object such as this apple, while maintaining uniform straight motion, still follows the bent geodesics and to observers such as me and you it looks as if the apple is falling down. So I decided to recontextualize this observation to invent a mode of transportation of apples from my hand to your hand, provided that my hand containing the apple is higher up in the gravity field of planet Earth than your hand, which is the case at the moment since I had strategically placed my hand with the apple above your hand, which was without an apple just a moment ago but holds the apple right now."

  • @Justineyedia
    @Justineyedia 8 месяцев назад +1

    Its getting alot smoother. More fluidity in the movements. Pivot points. Motor skills.😮🤯❤️‍🔥😎

  • @alertbri
    @alertbri 8 месяцев назад +10

    Smoother than C-3PO 😁 should have used Elevenlabs to clone Anthony Daniels. Missed opportunity. Or Darth Vader for the laugh.

    • @netscrooge
      @netscrooge 8 месяцев назад

      The voice reminded me of RFK Jr, for some unknown reason.

  • @AaronNicholsonAI
    @AaronNicholsonAI 8 месяцев назад +1

    Thanks, man! Your updates are super helpful, smart, and well-produced.

  • @FcoAngelPD
    @FcoAngelPD 8 месяцев назад +12

    This might sound weird, but I felt compassion and pity for the robot while watching the video.

    • @warsongdog
      @warsongdog 8 месяцев назад +9

      That’s how they win.
      Only half joking. AI LLMs are being trained to be pleasant to interact with. Add a body that can gesture in ways that manipulate our mammalian emotions, and we will be putty in their hands.

    • @coolbeans1998
      @coolbeans1998 8 месяцев назад

      I totally agree. Wanna give him a hug lol

    • @ChibiViolin
      @ChibiViolin 8 месяцев назад

      Yeah peeps like you will be the trouble makers in the future. You'll start arguing for AI and robot rights because you've got nothing better to do.

  • @trashman1358
    @trashman1358 8 месяцев назад

    that is so much better than I expected. And i mean by miles. Speechless. It's here.

  • @GhostofJamesMadison
    @GhostofJamesMadison 8 месяцев назад +35

    Ok so today was the first day a real robot existed

    • @bgtyhnmju7
      @bgtyhnmju7 8 месяцев назад +4

      I have that feeling too.

  • @alexyooutube
    @alexyooutube 8 месяцев назад +1

    From ChatGPT4: ( GPU )
    ~~~~~
    In the context of the robotic AI system you described, "closed-loop behavior" refers to the robot's ability to execute tasks or actions based on feedback from its environment, continuously adjusting its actions based on this feedback to achieve a desired outcome. This process forms a loop: the robot performs an action, receives new data from its environment through sensors (like cameras and microphones), processes this data to evaluate the outcome of its action, and then decides on the next action based on this evaluation.
    The "closed-loop" aspect emphasizes the continuous, self-regulating nature of this process, where the output or response of the system directly influences its next action, creating a feedback loop. This is in contrast to "open-loop" systems, where actions are not adjusted based on feedback or outcomes but are pre-defined and follow a set sequence without real-time adaptation. Closed-loop behaviors in robotics allow for more adaptive, responsive, and intelligent interactions with dynamic environments.
    ~~~~~

  • @beyondthebounce23
    @beyondthebounce23 8 месяцев назад +27

    We will all be on UBI by next year at this rate.

    • @Souleater7777
      @Souleater7777 8 месяцев назад +7

      Pray. Or we are starving

    • @focusedeye
      @focusedeye 8 месяцев назад +2

      Here in Canada, we had a recent test run of a UBI style system during COVID-19 with a cheque every two weeks for CAD$1000.00 for the duration of the pandemic.

    • @niaschim
      @niaschim 8 месяцев назад

      No. 3 years on the good timeline.
      4 years and a revolution on the bad tims.
      5 years and bad stuff on the worst timeline I'll acknowledge.
      7 years and then nobody is around anymore in the timeline we don't talk about 👍

    • @derendohoda3891
      @derendohoda3891 8 месяцев назад +4

      The promise of UBI is more valuable to the masters than its realization.

    • @jorgeduardo101
      @jorgeduardo101 8 месяцев назад +1

      ​@@focusedeyeI wouldn't call it UBI since only those without a job received it. Anyways, still great initiative from all countries that did it

  • @rabbithowls71
    @rabbithowls71 8 месяцев назад +2

    The dexterity is really impressive. Even if it was a remote avatar, the movement is really good.

  • @dkwroot
    @dkwroot 8 месяцев назад +7

    It's game over, man. Game Over!

  • @JONSEY101
    @JONSEY101 8 месяцев назад +1

    It's interesting to see the speed and fluidity of the robot as that's not often seen at the moment.
    It also doesn't seem to have a jerking motion with movement which is good.
    I would say that response time with the reply needs to be made better as we live in a busy world and so people won't have the patience for a delay in response, plus it doesn't feel so natural.
    The difficulty I see there, though, is that it will need to know when it is ok to have its turn speaking as opposed to continuing to listen as the person may have other things to say.
    I'm sure such things will be improved, but the inflexions in the speech feel natural.

    • @PSiOO2
      @PSiOO2 8 месяцев назад +1

      This delay issue has been tackled by individuals before, to partial succes. I am excited to see what actual big companies will do about it. And you've guessed it, the biggest issue is to know when human stops speaking. We do it by picking up subtle clues like intonation and sentence structure. Considering that it's implied Figire 01 uses speech to text, intonation is probably lost. From what I've seen, best solutions for now are to have a minimal delay + ability to abort speech if they are interrupted. Those have their own issues. For example, AI might stop talking mid-sentence if someone coughs or speaks to someone else in immediate vicinity

  • @a7xcss
    @a7xcss 8 месяцев назад +9

    "...The apple found its new owner." (...when he leaves the table...) "The trash is gone." LOL

  • @SuperSaverPlaysSPG
    @SuperSaverPlaysSPG 8 месяцев назад +28

    yea, game over, we are done

    • @pvanukoff
      @pvanukoff 8 месяцев назад +8

      Maybe we could build a fire. Sing a couple of songs, huh? Why don't we try that?

    • @k-c
      @k-c 8 месяцев назад

      Combat robots within 2 years.

  • @brettmarshall9340
    @brettmarshall9340 8 месяцев назад +3

    Always find your videos interesting. Sometimes I wonder if I’m the only one who is struggling to keep up with the speed of advancement. I could spend all day every day learning about this tech and I would still feel behind the curve. I work in tech, I’m 51, spent my life being ahead of the curve until now…. I guess my time being the expert in the room is sunsetting, I feel the next gen team will have to work harder and smarter than my gen did. Not sure how I feel about this, not sure how I feel about it actually matters anymore.

    • @hellblazerjj
      @hellblazerjj 8 месяцев назад

      This is the most wonderfully self aware thing I've read online for a long while. Thank you. Also I agree, thankfully we now have AI and soon brain chips and robots to help us think and work harder and smarter.

    • @brettmarshall9340
      @brettmarshall9340 8 месяцев назад +1

      @@hellblazerjj thanks, you put a smile on my face. What a world we live in, perhaps being a passenger this time around will be more fun than flying the plane.

    • @marcariotto1709
      @marcariotto1709 8 месяцев назад +1

      I'm on the low end of the tech knowledge curve and watch these types of videos to help me keep up with the real world changes.
      It seems humans will soon be outpaced by AI machines and transhuman evololution will be the only way to keep up with our creations. Of course the wealthiest will will be the first to recieve such augmentation. God help us!

  • @SantinoDeluxe
    @SantinoDeluxe 8 месяцев назад

    Explanation of the GPU Model Text at 12:30 - As far as I understand it... Assume pre-training of certain actions for the robot that would be useful. The compute model that processes the stored memory, speech and images also decides a response. The same model is used to pick (from interpretation of it's own response) which of it's learned actions to use, combined with the object info specific to the scene, to figure the movements, speed and other physical forces for proper interaction with the environment. Conceptually, as a bystander, we see no difference in this part of the functionality, both Figure01 and RT-2 can watch-to-learn skills. However, it's a matter of style; Applying closed-loop behavior weights to the GPU before the 'policy'(the robotic movement code) is like conforming the learned skill to the task at hand. This may do something like cause a pick-up or drop action to use more finesse for a delicate item like a tomato or for a deposit action to give the paper trash a small toss into the basket if it's not too far but also not within reach.
    Let me know if I'm wrong or missed something.

  • @HenryCalderonJr
    @HenryCalderonJr 8 месяцев назад +7

    Wow 😮 finally getting there! Can’t wait in the next few months to see this massive improvements as we knew would happen since everything is duplicating at 10x faster than 2023! Next year agi will become self aware. Exciting and scary if it falls to the wrongs hands

    • @marcariotto1709
      @marcariotto1709 8 месяцев назад +1

      It's already in the wrong hands.

    • @raven.4815
      @raven.4815 8 месяцев назад +2

      Yeah, I'm excited for losing my job!

  • @observingsystem
    @observingsystem 8 месяцев назад

    This is so awesome. I've been hoping for robots that can help around the house for a long time. In my experience if you can't do everything yourself it can suck to have to ask for help from humans for a bunch of reasons, but a robot would be always there. And they don't just do things around the house, you can also have a conversation with them! I love it!

    • @tearlelee34
      @tearlelee34 8 месяцев назад

      How are you going to pay for the mortgage. The great replacement is a posibilty now.

    • @observingsystem
      @observingsystem 8 месяцев назад

      What do you mean?@@tearlelee34

  • @singularitybound
    @singularitybound 8 месяцев назад +3

    The thing about BD is for example Atlas has the craziest articulation, agility etc. What they can do is only possible because of its actuators etc. Image when they load one of these up on him! Or the dog.. I mean we are talking next lvl.

    • @vampir753
      @vampir753 8 месяцев назад +1

      It probably has already happened internally and they just did not present it yet.

  • @ryanturner6920
    @ryanturner6920 8 месяцев назад +1

    The line, "This is the worst it will ever be..." Keeps ringing in my ears. Starting to really see the outlines of the near future here.

  • @lilmafya
    @lilmafya 8 месяцев назад +6

    This reminded me of the cool bartender robot in the movie passengers

    • @giordano5787
      @giordano5787 8 месяцев назад +1

      Sameee!!! It's so similar

  • @MudroZvon
    @MudroZvon 8 месяцев назад

    *Summary*
    The demonstration highlights several impressive capabilities of this robot:
    - It can perceive and describe its surroundings through vision and natural language understanding models.
    - It can engage in full conversations, comprehending context and prompts like "Can I have something to eat?"
    - It uses common sense reasoning to plan actions, like providing an apple since it's the only edible item available.
    - It has short-term memory to follow up on commands like "Can you put them there?" by recalling previous context.
    - The robot's movements are driven by neural network policies that map camera pixels directly to dexterous manipulations like grasping deformable objects.
    - A whole-body controller allows stable dynamics like maintaining balance during actions.
    The key innovation is combining OpenAI's advanced AI models for vision, language, and reasoning with Figure AI's expertise in robotics hardware and controllers.
    Figure AI is actively recruiting to further scale up this promising approach to general, multi-modal robotics through leveraging large AI models.
    Companies and researchers effectively combining the cutting-edge in large language models with advanced robotic hardware/controls are emerging as leaders in pushing embodied AI capabilities forward rapidly.
    There is a sense of optimism that general, multi-purpose robots displaying intelligent behavior are now within closer reach through neural network approaches rather than classic programming paradigms.

  • @redratpetrax
    @redratpetrax 8 месяцев назад +3

    LOL so quick. Nice this year i think them will walk :) Next year will be fun.

  • @bradgalaxy8419
    @bradgalaxy8419 8 месяцев назад

    I love that you press play in a wes roth vid and its content

  • @Chuck_Hooks
    @Chuck_Hooks 8 месяцев назад +11

    Looking forward to 99.9% of humans adjusting to reservation life.
    It's been so good for Native Americans.

  • @Jacobk-g7r
    @Jacobk-g7r 8 месяцев назад +1

    I think what they mean by all neural network is like how a hand isn’t the brain but it’s connected so a part of the neural network without the data or programing in it specifically. Like how the brain doesn’t hold the info but is the info in relation to the body and its differences.

  • @robertkerr4199
    @robertkerr4199 8 месяцев назад +14

    AI won't destroy us; we'll destroy ourselves with AI.

    • @SSS-v3y3b
      @SSS-v3y3b 7 месяцев назад

      There is always a golden era before massive destruction

  • @thrust_fpv
    @thrust_fpv 8 месяцев назад +1

    I think we should be investing in personal EMP devices, if these things are going to be running around.

  • @JoePiotti
    @JoePiotti 8 месяцев назад +13

    Most of the delay is while it is waiting to see if the human is done speaking. The same thing happens on my phone when I talk to chat gpt

    • @IceMetalPunk
      @IceMetalPunk 8 месяцев назад +2

      To be a bit more accurate, I don't think it's quite "to see if the human is done speaking"; I think that's the delay from the time it takes to actually transcribe the speech into text first. There's a framework built on top of Whisper that allows you to constantly stream the transcription live (instead of waiting for the entire audio chunk first), and even that has a 3-4 second delay between an audio chunk being streamed and the transcription of it being done.

  • @Caberbalschnit
    @Caberbalschnit 8 месяцев назад

    We're so much closer to a Skynet event now...soooo excited....yaaaay...

  • @alvarvonhofsten5679
    @alvarvonhofsten5679 8 месяцев назад +7

    oh well :) it was nice being alive while it lasted

  • @hjups
    @hjups 8 месяцев назад

    My guess is that they have something like an Jetson Orin onboard, which is able to dynamically select from what is effectively a finetuned RT1 for the behavior library. That GPU would be low-power and capable of running the smaller RT1 model at the rate they claimed (the RT2 model would be too large). RT1 showed that if the tasks are simple enough, you don't need a particularly large model (could even use a LoRA-type approach on the transformer stem). Alternatively, they could be a larger library running on an external server grade GPU (notice that the robot is still tethered, though the update rate would likely work with wireless too). The motion planning from the VLA model would then be similar to how the RT platforms do it, using a real-time kinematics control model. DeepMind already figured out a working solution, so why re-invent the wheel when you could simply stick GPT4V ontop of it.

  • @OBEYTHEPYRAMID
    @OBEYTHEPYRAMID 8 месяцев назад +10

    Why would an AI model imitate human speech hesitations ?

    • @therainman7777
      @therainman7777 8 месяцев назад +17

      To make it feel more human.

    • @tudor-octavian4520
      @tudor-octavian4520 8 месяцев назад +9

      Because we're more likely to buy one for our household if it acts familiar, like a human. I think it's about making the public accept them into society

    • @13371138
      @13371138 8 месяцев назад +8

      It sounded to me like a human was reading out text responses. The voice didn't sound like the clean output of text-to-speech software

    • @PaulSpades
      @PaulSpades 8 месяцев назад +2

      I found that to be surprising, and then immediately concerning. Was that a real glitch?

    • @RyanKittleson
      @RyanKittleson 8 месяцев назад +5

      probably was trained on natural speech patterns. like how midjourney imitates stray brush strokes

  • @AngriestEwok
    @AngriestEwok 8 месяцев назад

    Fascinating and terrifying in almost equal measure.

  • @markmuller7962
    @markmuller7962 8 месяцев назад +3

    This is probably the closest thing to AGI we've seen so far

  • @VeritasPraevalebit
    @VeritasPraevalebit 8 месяцев назад

    I contend that the compute necessary to run an AGI robot cannot be produced within the small space that is available within a humanoid robot. This means that without an umbilical cord (which this one has) the controlling computer must communicate via a very fast bidirectional line, 5G comes to mind. The data rate for live video and more will be very high. The cost for robot plus data transmission plus compute in a supercomputer will perhaps be prohibitive.

  • @Blxz
    @Blxz 8 месяцев назад +4

    That GPU thing you were looking at appears to indicate that the physical behaviour is pre-programmed/'learned' but that the ai model is choosing from a list of actions.
    ie. It has been trained to stack dishes, hand apples, and pick up rubbish. The ai on the inside though is deciding independantly whether to push the 'pixkup rubbish' button or the 'hand over apple' button depending on context.
    Not quite a truck rolling downhill but certainly less impressive than my initial reaction watching that robot video.

    • @LaurentCassaro
      @LaurentCassaro 8 месяцев назад

      It still demonstrates that we can have NOW robots that can be trained to do specific tasks, and can replace humans in a lot of jobs if they're trained for that.

    • @Blxz
      @Blxz 8 месяцев назад

      @@LaurentCassaro oh yeah, it's pretty cool regardless. But there qas a section in the video where he specifically asked about that specific section.

  • @JessieThorne886
    @JessieThorne886 8 месяцев назад +2

    Probably not a long shot from "I think I did pretty well; the apple found its new owner; the trash was removed " to "I think I did pretty well; the Earth found its new owners; the humans were removed."

  • @sandenium
    @sandenium 8 месяцев назад +3

    Idky but robot talking reminded me of sam Altman

  • @twirlyspitzer
    @twirlyspitzer 8 месяцев назад

    FINALLY! The real thing ready to fit in any smart rich person's house AND get the real work job done! No more investors dream-on. No more 'reasonably working' prototypes & lab models. This is it. The starting gun for a robot world of the tomorrow-upon-us-now.

  • @RaikiDas
    @RaikiDas 8 месяцев назад +7

    Time to start building handheld emps. Just in case…
    Jk ofc 😉

    • @marcariotto1709
      @marcariotto1709 8 месяцев назад +2

      That's actually not funny and a good idea except the ones you'll need them for will be hardened against such measures.

    • @tearlelee34
      @tearlelee34 8 месяцев назад +1

      Ingeneous. I'll be your first customer.

    • @RaikiDas
      @RaikiDas 8 месяцев назад

      funny you should mention, i have a design for one that basically kills all electronics in a room... the problem is the budget@@marcariotto1709

  • @MoonTech168
    @MoonTech168 8 месяцев назад

    The model that runs in the GPU selects the in/outputs for joints based on some vector representation that is mapped with the openai prompt return unpacking and subsequent policy activation that translates to dynamic parameter inputs based on a trained approximation model, which is Figure01 side basically.

  • @progrob27
    @progrob27 8 месяцев назад +3

    The rise of philosophical zombies is coming, and it is terrifying.

    • @jichaelmorgan3796
      @jichaelmorgan3796 8 месяцев назад +1

      Would be wise to make consumer models will have to be strength limited, possibly plastic parts that break before too much force is applied. But once they make themselves, it's over haha

    • @karpabla
      @karpabla 8 месяцев назад

      We will be the zombies now. Millions and millions of hungry zombies.

    • @Hi98765
      @Hi98765 8 месяцев назад

      ​@@karpablaif that disease x they keep talking about is some airosolized rabies then ya, you're 100% right 😂

  • @Th3R3p1yGuy
    @Th3R3p1yGuy 8 месяцев назад

    I'm Stunned and shocked

  • @tytoalba4794
    @tytoalba4794 8 месяцев назад +2

    This one now acctually shock me.

  • @Mdjagg
    @Mdjagg 8 месяцев назад +2

    This is really incredible.

  • @hdthor
    @hdthor 8 месяцев назад +5

    1:09 “So I gave you the apple, because it’s the only, uh, edible item I could provide you with from the table.”
    It used “, uh,” in its speech, and contractions. Wow, even Lt Cmdr Data couldn’t do that!
    2:08 “I.. I think I did pretty well”
    And it can stutter when flustered! And it has a vocal fry! Amazing!

    • @JordanMillsTracks
      @JordanMillsTracks 8 месяцев назад

      It's probably using elevenlabs, you can get that effect pretty easily on there

    • @IceMetalPunk
      @IceMetalPunk 8 месяцев назад +1

      Contractions aren't anything new for LLMs. And while not ubiquitous, speech disfluencies (um, uh, stuttering, etc.) are present in other previous TTS models as well. (Not so much in any ElevenLabs model as far as I've heard, though.) So while this is definitely a very high quality TTS model, those features that make it sound real aren't anything new.

  • @domenicperito4635
    @domenicperito4635 8 месяцев назад +1

    "What is my purpose?" "You pass butter" *slumps over in sadness*

  •  8 месяцев назад +4

    Man this is sooo cool, it's not coming fast enough, tho, lets accelerate!!!!

  • @veejaytsunamix
    @veejaytsunamix 8 месяцев назад +1

    I find it exciting, I am not a programmer but I have so many ideas i cannot wait for this to work!🎉❤😊

  • @brunodangelo1146
    @brunodangelo1146 8 месяцев назад +6

    Have we finally found the fabled SHOCKING news?

  • @leslieviljoen
    @leslieviljoen 8 месяцев назад

    I love how Figure's background music is so simular to the music in Ex Machina.

  • @jameshughes3014
    @jameshughes3014 8 месяцев назад +10

    anyone want to take bets on how many times 'AGI' will be invented for the first time this year?

    • @Souleater7777
      @Souleater7777 8 месяцев назад +4

      It’s already here. It just has the intelligence of an 8 y.o boy and it’s only increasing. The only reason we keep “ re-inventing “ agi is because we keep moving the goal post every time we see it

    • @_Safety_Third_
      @_Safety_Third_ 8 месяцев назад +2

      366

    • @jameshughes3014
      @jameshughes3014 8 месяцев назад

      @@Souleater7777If there is a machine with the intelligence of an 8 year old, i've not seen it. I doubt we have anything as smart as a rat, but i tend to think of intelligence as being the ability to learn and adapt. so far most of these machines have to be pre-trained, so they don't learn after. It's just a method of programming them using ML instead of hand crafted code. AGI is defined as having the ability to generalize to any task, so far everything we have is narrow. sure, chatGPT can generate any kind of text, but it's still just doing the one thing, generating text. I think if we do currently have proto AGI, it's one of the video game playing programs, but i wouldn't say they are as smart as a human child just yet. I do think , though, when we have something as smart as a dog, we'll have something truly useful in lots of ways, hence the generalization.

    • @jameshughes3014
      @jameshughes3014 8 месяцев назад

      @@Souleater7777nothing yet has the ability to learn new tasks, that's what the General in AGI stands for. I would say that the smartest things we have right now aren't quite as smart as a rat in that regard. But when we have something as smart as a dog, then we'll have something that can be useful in lots of ways, and is generalizable. No need to move goalposts until we reach that first one. I suspect it will be one of the programs that plays video games, or something that comes from that. Especially if one of those agents is adapted to use robots.

    • @Souleater7777
      @Souleater7777 8 месяцев назад

      @@jameshughes3014 we’ll see about that ,
      Naysayers take heed , the time is coming , and is already here .
      Meet back here in 1 year .

  • @sausage4mash
    @sausage4mash 8 месяцев назад +9

    that is going to be producing a ton of data for future LLM models, This could be a wright brothers moment in history if they are for real

  • @Dron008
    @Dron008 8 месяцев назад

    That robot which cooks was not teleoperated, operator taught it and after that it generalized that knowledge. Quite practical scenario. But learning by videos even better.

  • @Experternas
    @Experternas 8 месяцев назад +7

    Why did the ai Studder!?! at 2:15 he studder on his words.

    • @13371138
      @13371138 8 месяцев назад +3

      It sounded to me like someone was reading out text responses it gave

    • @ToKnowIsToDie
      @ToKnowIsToDie 8 месяцев назад +2

      Exactly what I thought. Weird

    • @Experternas
      @Experternas 8 месяцев назад

      @@13371138sure did but it feels silly to suggest, there are so many reasons why that would be dumb as f to cheat on..but yeah, i don't want to put on my foil hat but it's a skeptical aspect. I also didn't like the final reply where his voice went up as he said his goodbye, that's something humans do but i figure trained voices can't balance that waveform yet.

  • @calvingrondahl1011
    @calvingrondahl1011 8 месяцев назад

    I prompt DALL-E3 and I see improvement in the last year. Figure1 is the real deal for AI.

  • @oxygon2850
    @oxygon2850 8 месяцев назад +11

    The only thing that makes me question whether or not this is real or not is how it just dropped the Apple into his hand as if he didn't expect it to hand it to him... Sure the voice could be synthetic but after that Google video where they got in trouble for faking stuff I have a hard time believing this.

    • @eIicit
      @eIicit 8 месяцев назад +2

      …why would it be faked though? Figure01 may have been instructed explicitly to drop the apple into his hand rather than place it for the flair it adds to the video

    • @antman7673
      @antman7673 8 месяцев назад

      Google is google, openai is a different company.
      If google thinks its language models are too dangerous for the public.
      Is faking half the stuff, you can be sceptical to them.
      Currently openai has the benefit of the doubt. Until they mess up.

    • @oxygon2850
      @oxygon2850 8 месяцев назад

      What do you mean why would it be faked? What a ridiculous statement... For the same reason any company would fake something, fake it till you make it... get that funding & hype. Google did the same thing@@eIicit

    • @RhumpleOriginal
      @RhumpleOriginal 8 месяцев назад +1

      Why go to the trouble of faking this video?

    • @francisco444
      @francisco444 8 месяцев назад +1

      FYI Figure is not Google

  • @Evil_joker33
    @Evil_joker33 8 месяцев назад

    GPU. It basically means it takes things in through both text and image. So it interprets based on both sight and concept using ai. It creates files to be stored locally on a memory in the robot that’s called a weight. So a weight is like a skill. It then loads the correct weight or skill to run to complete a policy or series of actions like the rotation and bending of arms to move in three dimensions. Those weights are loaded into the gpu and run like loading with tradition memory ram gpu set up. The ai regulates what files have become faulty and maintains its actions sort of like maintaining versions of software and runs the correct program for the correct action. For laymen’s it’s a computer in a robot layered with ai to do several different things to make it all work seamlessly. I’ve been waiting for them to do this. Been thinking of starting my own project.

  • @cbuchner1
    @cbuchner1 8 месяцев назад +3

    Why did the bot briefly stutter and insert „er“? The large language model underpinning it should not generate imperfections of speakers.

    • @hellblazerjj
      @hellblazerjj 8 месяцев назад +3

      It makes it sound more human. I like it. Same with the cute "the apple found its new owner" line. You want your robot overlords to have a sense of humour don't you?

    • @NarcatasCor
      @NarcatasCor 8 месяцев назад

      It is already decaying mentally haha

    • @SpaceshipOperations
      @SpaceshipOperations 8 месяцев назад +3

      When Google demoed their whatever-it-was-called that makes phone calls and reservations on your behalf, it did the same. Conversational voice synthesis AI is trained to imitate human imperfections to sound natural.

    • @cbuchner1
      @cbuchner1 8 месяцев назад

      @@hellblazerjj a steel robot doing this mimikry is deeply in uncanny valley territory

  • @Noname-w7f1e
    @Noname-w7f1e 8 месяцев назад +1

    I’m just imagining all the hal 9000 scenes from the Space Odyssey done with the same voice:
    “I’m sorry, uh, Dave! I, uhm, am afraid I can’t do that!”
    He wouldn’t be such a charmer with these staters and uhm’s!

  • @Falkenbergo
    @Falkenbergo 8 месяцев назад +3

    AGI is finally here guys!

    • @angrygreek1985
      @angrygreek1985 8 месяцев назад

      no. no it isn't.

    • @KhattaRapidus
      @KhattaRapidus 8 месяцев назад

      You got fooled and I hope they enjoy fooling you.

    • @Charvak-Atheist
      @Charvak-Atheist 8 месяцев назад +1

      This is not AGI

    • @karlwest437
      @karlwest437 8 месяцев назад +1

      Nah it's ChatGPT picking from a set of pretrained actions

    • @DivinesLegacy
      @DivinesLegacy 8 месяцев назад +1

      @@karlwest437I’m sure you typing is “pre trained”, unless you just learned how to do it right now.

  • @ollimacp
    @ollimacp 8 месяцев назад

    To the GPU sentence: I think "the same model" is like the minecraft voyager Paper architecture, that the model is dissecting its actions into a sequence of subtasks. Each simle task is writtten in code and if it works, then it saves it into the skill library and is just being reused. The Neural Network Policies are presumably a kind of Reinforcement Learning environment like in OPEN AI's gym. Have a look at the pyhon module stable-baselines-3 and you'll see what the Network policies are about. But I'm also just guessing by experience.

  • @alertbri
    @alertbri 8 месяцев назад +7

    This is GPT-5

    • @ossianravn
      @ossianravn 8 месяцев назад +1

      How would you know?

    • @sunsu1049
      @sunsu1049 8 месяцев назад

      @@ossianravn I think he is guessing based on the abilities of the AI, especially since they are partnered with OpenAI it wouldn't be out of the realm of possibility that they gave early access to GPT-5

    • @alertbri
      @alertbri 8 месяцев назад

      @@ossianravn nobody knows anything, it's just an educated guess.

  • @orhanmekic9292
    @orhanmekic9292 8 месяцев назад +1

    Can't wait having this guy washing my dishes 😄

  • @jeremycronic
    @jeremycronic 8 месяцев назад +3

    Seems at least partly scripted to me. When the guy says 'What do you see?' The robot knows to say what it says on the table only. Not in the entire room. Also the guy says "Pick up the trash' and the robot knows to pick up trash AND put it in the basket. Presentation seems a little off to me.

  • @michelkliewer3996
    @michelkliewer3996 8 месяцев назад +2

    This is very fascinating! While its cool to see what it can do already, it would be even better to see what the robot cant do yet or which tasks it still fails at. Nontheless, thanks for sharing!

    • @vampir753
      @vampir753 8 месяцев назад

      Hey figure 01, could you do my taxes?

    • @PSiOO2
      @PSiOO2 8 месяцев назад

      What we see is of course a demo. What I saw with these AI's that have a resemblance of memory, is that they go crazy. It depends on data that they were trained on, but they are almost guaranteed to do seizure-speak every once in a while. Imagine if Figure 01 prompts it's body to go into seizure. They seem to have filters in place, but those could fail

  • @karlwest437
    @karlwest437 8 месяцев назад +6

    When it's describing what it sees, it gets it wrong, it says "a drying rack with cups and a plate", when there's actually plates and a cup 😂

    • @therainman7777
      @therainman7777 8 месяцев назад +2

      To err is robot

    • @rootor1
      @rootor1 8 месяцев назад

      There is actually 2 cups on the scene, the robot saw it better than you 🤣
      Now seriously, the cup inside is included in what the vision model is labeling, nobody told the robot "don't include in the description the objects inside drying rack".

    • @karlwest437
      @karlwest437 8 месяцев назад +1

      @@rootor1 well going by that logic it should have said, "a drying rack, an apple, some cups and some plates" 😝

  • @JohnDlugosz
    @JohnDlugosz 8 месяцев назад +1

    In the thumbnails for suggestions shown after the video ends are a couple of _The Late Show_ with Stephen Colbert.
    Makes me think, next we'll be seeing a robot like this as a guest on a late-night talk show.
    Then, it will be the guest host of a late-night talk show. Oh, imagine it interviewing (and bantering with) Neil Degrasse Tyson?

  • @user-qb8yr4vb4u
    @user-qb8yr4vb4u 8 месяцев назад +3

    Scary

  • @gavinjling6142
    @gavinjling6142 8 месяцев назад +1

    gpu closed loop... does that mean that if Corey was to move his hand whilst the robot passed him the apple. The apple would fall to the table. As the planning/action/feedback loop is simple not fast enough to pick the correct action to place the apple in a suddenly moving outstretched palm ?

  • @papackar
    @papackar 8 месяцев назад

    Sounds like the robot has a set of learned, visual/motor behaviors stored in the form of parameter weights. These only function when loaded into a neural network, which is to say the physical substrate of the GPU. Which of these behaviors gets loaded is decided in the same way a chatbot may decide to call on some tool, like a browser or calculator, in responding to the chatbot user. (My best guess)