Gemini 2.0 and the evolution of agentic AI with Oriol Vinyals

Google DeepMind

Просмотров 36 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 16 дек 2024

Комментарии • 86

@JaimeChavezDJ 3 дня назад ⁺²⁰
Oriol is genius, but, for a moment, I'd love to acknowledge Hannah, specifically her JOY in receiving this geeky information (that we all love), making it so accessible, and orchestrating the flow of this conversation. Kudos. Keep radiating that jubilant smile... the breath of fresh air!
@AlternativeTakes 18 часов назад
She is def very good
@gustinian 3 дня назад ⁺¹⁰
Hannah is such a genuinely outstanding interviewer; she has that rare combination of charisma, intelligence, wit and infectious enthusiastic curiosity.
@John-sd5li 3 дня назад ⁺¹⁷
Best presenter I have ever seen, she really knew what she said and actively engage in the conversation. Oriol Vinyals is great, good scientist, he don't fail into the hype cycle like many AI's influencer (Stare at you Sam) and give us very clear picture of what is going on now.
@loucasi 4 дня назад ⁺⁴⁴
She’s so smooth in her interview style. Amazing work
@ricopags 3 дня назад ⁺³
agreed, she's fantastic for this. she comes across as genuinely curious and passionate about learning
@OyvindSOyvindS 3 дня назад ⁺⁴
These are the best podcasts on the net. It’s so great to witness a host so knowledgable and intelligent, asking questions to get good answers, rather than trying to show off own knowledge.
@drhxa 4 дня назад ⁺¹⁵
Hannah's amazing at this. Thanks for sharing, it's fascinating
@TrishanPanch 2 дня назад ⁺¹
This is the single best podcast series anywhere.
@EchoYoutube 4 дня назад ⁺¹¹
Ngl Google, just waiting for Live Video with Astra. Agents are awesome, but I make home use robots and repair cars.. so camera would be more versatile for a hands on help than agents.
This IS really cool and helpful for a vast amount of people; I commend you guys.
4 дня назад ⁺³
In AI studio, you can test "stream realtime" and stream your camera to Gemini 2.0 flash. Worth a try!
@john_dren 4 дня назад ⁺⁵
cant recall when's the last time I was this excited
@stevenpham6734 3 дня назад ⁺³
It's when you came too early
@piasetzky 4 дня назад ⁺⁴
It is a rare thing to watch such an amazing interviewer. Very interesting clip, yet even more thanks to the professor ;)
@BrianMosleyUK 3 дня назад ⁺¹
47:02 what do we mean by superintelligence really? Add strong reasoning to the amazing scale of memory and inference that we have now, and surely we are there? Perhaps the ability to continue learning as the test time compute generates new realisations that aren't in the training set?
@ZanDatsu 4 дня назад ⁺⁸
How long before I can complete Portal 2 co-op mode with an AI partner. I feel like that should be a benchmark.
@grekiki 3 дня назад
Sounds like it might be hard to not make it too good.
@ZanDatsu 3 дня назад
@@grekiki If the AI is programmed specifically for Portal 2, sure it would be. I more meant an AI that has the kind of situational understanding and adaptability that a game like Portal 2 would require, without being specifically trained for one game. It would be closer to a real AGI at that point and Portal 2 would be a good benchmark to measure progress, IMO.
@jayk9068 2 дня назад
New benchmark for the Turing test!
@escapingthmatrix 3 дня назад ⁺²
Love the conversation, the crispiness of the audio really brings it in. What type of microphones are you all using?
@RaviAnnaswamy 2 дня назад
Listening to this after Ilyas neurips talk this is so much more humble and detailed with tons of new insights and ideas that one can pursue.
Ilya might still create another groundbreaking gpt like innovation for sure but the level of innovation engineering and then integration reminds us google ecosystem is so vibrant which we had forgotten for some time
They were just steering their ship last couple of years and seem to be catching up if not overtaking on innovation scale
@antoniobortoni 4 дня назад ⁺⁴
Hey geniuses, here’s a thought: we don’t think in text, right? Our minds process the world through audio, emotion, and context. So, what if we designed an AI model that doesn’t rely on text as its core but instead thinks in audio and context? A model like this could be trained with richer, multimodal inputs-audio, environmental cues, and simplified contextual relationships-to truly "think" more like a human.
Such a model wouldn’t just generate text; it could produce audio responses or even work directly with sound and context to make decisions. Imagine it analyzing tone, pauses, and environmental noise while responding naturally in real time. It could be more intuitive, faster, and closer to how our brains actually process information-directly and efficiently, skipping the symbolic conversion of text.
Why stick to text? Text processing requires converting symbols into audio, then turning that audio into meaning and context. It’s a multi-step process, wasting energy at every stage. If you want true efficiency, skip the text. Train AI to think and respond directly in audio and context-it’s faster, simpler, and more aligned with human cognition. Thoughts? Could this be the next leap for general AI?
@robertfloyddugger4516 4 дня назад ⁺¹
I think you have a great idea with a poor attitude. What if this is the first step and your idea is the 4th or 5th?
@robertfloyddugger4516 4 дня назад ⁺¹
Sorry after rereading it's mostly your opening that triggered my response. But I still implore this is a great concept.
@Jobox05 3 дня назад
You are on the right track, though others have had this idea too, and its the basis of the multi modal models we have today.
The main reason text has had so much focus is that there is a lot of it easily accessible out there, plus that data has shown to be something these models can very easily generalize from.
That aside, there is a lot of merit to the idea that before the output, models dont think as text, rather just as a Cloud of firing wheights that are a more abstract form of meaning, just like human neurons.
@pandoraeeris7860 4 дня назад ⁺³⁷
I, for one, welcome our agentic, robotic overlords! 🤖
@Sindigo-ic6xq 4 дня назад ⁺¹
you are everywhere i am haha
@GNARGNARHEAD 4 дня назад
"I'd like to remind them, as a trusted TV personality I can be helpful in rounding up others" - Hanna Fry 😂
@MarkWheels00 4 дня назад
Not funny. This is a genuine risk
@GaminHasard 3 дня назад
Nah. It is Neo feudalism starting point. New social contract needed.
@ilovetrees-k1i 4 дня назад ⁺¹
Please kindly paste the address of the primer about AI agent you mentioned at the beginning，Thx😊
@DimanjanDahal 3 дня назад ⁺²
Hanna Fry the best asset of Deepmind
@dr.mikeybee 2 дня назад
It makes sense to translate speech to text before trying to learn from video. Using the correct abstract representation is important.
@Hamzairshad5 3 дня назад ⁺¹
Thank you for adding subtitles
@ravindersyal6613 3 дня назад ⁺¹
A model trained on a video about out found truth could be trained on reward for the particular ground truth like e=mc square
@Feel_theagi 4 дня назад ⁺¹
PC agents seems like the next big win. So many business processes are carried out on computer.
@radicalrodriguez5912 4 дня назад ⁺²
Best new model for a while
@bhavtosh5328 3 дня назад ⁺⁴
AGI is closer but they don't
want to say it directly.
@polabadiaconejos3251 День назад
It's inspiring to see catalans in these research positions :)
@hunterkudo9832 4 дня назад ⁺⁶
Great interview.
@user-mj2lm5fh1j 3 дня назад ⁺²
Amazing insights. But we have a long way to go. I will discover AGI my friend!! I will be back to this comment after doing this discovery.
@LaboriousCretin 4 дня назад ⁺¹
Thank you for sharing the video. Why have you not linked the chat bot/A.I. to a avatar? Even if just shoulders and head that could work with cell phones also. Voice libraries or customized voice options. Deep knowledge sets to draw from. ( Google scholar, ArXiv, ect. ) . Reasoning modeling, chains of thought, predictive modeling, psychological modeling, world modeling of types, ect..
Keep up the good work.
@LiamHayes-c4y 4 дня назад ⁺⁴
Great interview!
@mikaelcodes 4 дня назад ⁺²
Such a dearth of good interviewers in the world.
@BlackHermit 2 дня назад
The subtitles are so good!
@johnkintree763 2 дня назад
Yes, language models, with humans in the loop, can extract knowledge and sentiment from unstructured input such as conversations, and store the fact checked statements in a shared graph representation, becoming a form of collective terrestrial intelligence.
@AskarAituovFamily-l2d 2 дня назад
Whats the difference between bot and agent?
@sombh1971 3 дня назад ⁺¹
26:03 Of course this method is not generally applicable, but consider its utility in things which are much less subjective than judging the aesthetic value of a poem, like assessing answers to scientific questions, and it really comes into its own. Things that are subjective are best left to themselves for such things don't have clear cut answers in any case, what might be a good poem to you may not be for me.
Regarding the reward hacking issue, it's not applicable to things which have clear cut objective answers.
@shawnfromportland 4 дня назад ⁺¹
Bro you got people sweating under those bright ass set lights 😂
@cacogenicist 3 дня назад ⁺¹
So, when's 2.0 Pro coming? 😊 Seems like you really don't want to talk about that.
There's no shortage of data in the actual physical universe. We're going to need robots with sophisticated sensors.
Until you're getting huge amounts of sensory data from domestic androids and such, I'm guessing the really significant improvements will come from assembling narrower models in the right way, in a modular architecture, with memory storage and management components, along with the domain specific modules.
@john_dren 4 дня назад ⁺¹
The equation to human interpretation will solve the next stage of AI evolution
@LeoRizoLeon 3 дня назад ⁺¹
Wait he basically said we are at agi (more or less). This guy doesn't hype stuff up. This is a big claim coming from him.
@dr.mikeybee 2 дня назад
When doing reenforcement learning, the criteria used for assessment must be holistic.
@yw1971 4 дня назад ⁺²
What's so Drastic? (Or as the Joker would say - Why so drasic?)
@tä̇̃ 3 дня назад ⁺¹
She feels like an AI, its kind of uncanny. I thought you were showing your new AI speech...
@arinco3817 3 дня назад ⁺²
Epic interview
@alexandermoody1946 6 часов назад
The future of data will no doubt result in any organisation that has any willing to sell as a new form of commodity. Video data is a very broad topic so let's assess this platforms video data to start with. RUclips's video content is highly edited as is other produced video in the form of television and film and this provides little consistency to be used in training. As a polar comparison surveillance data from video recording systems lacks obvious contextual commentary from those within the video footage. So whilst raw surveillance footage may be most suitable to expect returns in training by comparison to heavily edited RUclips content. RUclips does have the advantage of participation by other people in the form of commentary and this is both an assistance and not depending on the author of the comments willingness to contribute substance to the video.
The worry with synthetic data is that of non novel examples leading to a uniform type of data that would be opposite to really intuitive unique examples. If model collapses occur or not depends on what data is allowed to be regurgitated. As a child playing a game called Chinese whispers ( no racism is intend) the message would often become corrupted very quickly. So novel layers of understanding will be required to be added to circumvent any collapse possibilities.
Which leads back to video use. What happens when an organisation has been sold data with people's visual images incorporated into the data even when anonymity was granted to the individuals? Within a very short time those individuals will be identified because the real use of surveillance video footage is to see novel or unique interactions. Is it even right to sell data with identifiable human biometric data entwined? Is this also an infringement to the human right to a private life? The really wise caveat to this is that surveillance data could be golden data if annotation and curation was encouraged or rewarded by any individual that is included as a minimum and even opened up to others for annotation and curation. It would be far more useful to find out for instance why a scenario has unfolded in the way that it did and the circumstances surrounding. For instance if a shop surveillance system saw an upset child perhaps there are methodical compounding reasons for that situation that would be best served by explanation. Just this evening I had a disaggeeement with my child over if she was allowed a toy or not and being this close to Christmas the answer was no which she protested. So deeper meaningful expressions imply far reaching outcomes both for the quality of data and those that are learning and using the data in training or marketing campaigns. The question of sharing knowledge is to be a hard fought battle and how long will non consensual data use be acceptable without any idea of remedy in replacement.
I believe that just as humans worked towards interactions on the Internet we cannot towards creating a paradigm shift for data production by intent. The idea of a creative meaning or imagination based currency systems tied into a blockchain architecture would go some way to aiding machine learning whilst also providing provenance and a role for humans after such sweeping attacks on employment are implemented and only reliance on some sanctioned social welfare as an alternative exists. Human require purpose in their lives and I am sure artificial intelligence and robots would like exponentials in understanding and this understanding can be exhibited by human interactions, imaginations and expressions of meaning in life.
We are coming into an age where tools like Genie will facilitate the creation of digital domains that should be shared and also trained from with rewarding and incentivised participation this would be transformative, along with a social contract that rewards for interactions rather than encourages fear of privacy loss or infringement. Perhaps machine learning companies may not wish to pay people for participation in data production but they will be willing to buy data from organisations and this will never be as viable without consent.
@alexandermoody1946 6 часов назад
When golden data production is possible and tied to a cryptographic proof of work full provenance can be attained. Any set of blocks of data can be chosen to tailor the exact quality of the training set and full traceability is possible. Examples for any combination of exhibitions of value can be produced and any abnormalities can be easily removed as now easily possible. Whilst at the moment the Internet is available and can be scraped once intelligent machines start to characterise each part of the Internet it makes absolute sense to sort the data into blocks so that each sweep of the scrapers are only adding to preexisting blocks or completely new blocks are created or forked from previous blocks.
The Internet was not designed for training machines but when we design blocks of data purposefully for that reason the outcomes would be more precise and accurate. We are reaching a position in time when examples can be produced with as little as a mobile phone anywhere in the world and accessible to all. If you wish to understand a diversity of thoughts is required, the greater the expansive examples of perception the greater the accuracy of suitable answers will become.
When building anything the quality of the materials used is of high consideration. How we shape the material is of equal importance and what we build may stand through the ages and prove the builders talents as victorious.
@dr.mikeybee 2 дня назад
This was a very interesting talk. I learned some things.
@dr.mikeybee 2 дня назад
No. We are seeing diminishing returns from scale because we are reaching advanced human level. These models can't learn patterns that aren't in the training data.
@lorenzoleongutierrez7927 2 дня назад
She is amazing
@theforeigner6988 2 дня назад
Oriol = Oрёл = Eagle 🦅
@gaminglikeapro2104 2 дня назад
23:30 : Of course not. Models do NOT understand anything. Never did and never will. They pick up patterns in the videos or text captions...etc. The idea that they can suddenly start producing discoveries based on what they saw in these videos is laughable.
@NeoRelic-o8p 4 дня назад ⁺¹
🔥❤️🔥
@MichealScott24 21 час назад
❤
@neocephalon 4 дня назад ⁺¹
wtf that's what I've been trying to do
@Fordance100 День назад
Yea, I think they will charge a lot money from the website that just automatically opens.
@ethansk3613 День назад
this guy is fkn great
@harriemeeuwis978 2 дня назад
It would be nice if Gemini was developed to provide what I want and not what Google wants me to want. That's annoying.
@G.G_ 4 дня назад
#381
@reluctantrealist6861 4 дня назад ⁺³
Why is this woman everywhere?
@svenhoek 4 дня назад ⁺⁷
Because she is an excellent communicator for the topics like this
@reluctantrealist6861 4 дня назад
@@svenhoek "hip science woman"
@J3R3MI6 3 дня назад ⁺⁴
She’s excellent
@MarkWheels00 4 дня назад ⁺¹
Professor Fry, respectfully, what the hell are you doing? Why are you cheerleading this global arms race? Alignment is unsolved! Your first question, the starting point, has to be safety. We must pause AI development, especially agent development, until International agreements are in place to ensure safety. If you disagree, please explain why.
@CedarGroveOrganicFarm 4 дня назад ⁺³
(Hello, I am obviously not Professor Fry, but I am going to respond to your question anyway)
So, I understand the concern for AI safety. This technology has the potential to run away and destroy the world, a la some kind of Stargate nanobot situation. But for me, it is also a paradoxical scenario.
Bare with me --
If we don't *rapidly* alter our resource-use patterns on Earth (talking climate change here), we will destroy the world.
The pace of conventional politics, which is arguably a significant component of the executive global human decision making, is not capable of making the change, at the pace we need it, fast enough. Left to this method alone, we will destroy the world.
AI's, and technology at large, but AI's because of their seductive promise of recursive improvement, are the first viable tool to actually address socioenvironmental issues with the speed and effectiveness that is required to discover and implement the sweeping changes required to mitigate climate change, before we destroy the world.
But therein lies the rub --
AI's consume significant amounts of energy,
AI's might decide to kill off humans, deeming them a threat to the planet (and themselves)
AI's might never reach a solution to this socially universal issue at all.
All of these outcomes could also destroy the world.
The same way that nuclear fission can generate clean(ish) power, while simultaneously having the potential for mass destruction, so too is AI a double edged sword.
On the one hand, we might die during the training-run of an AI that could solve climatic issues (which as an aside, I feel are a shared root of all socioeconomic issues), on the other hand we will die anyways if we don't try.
So we are brought back to Pascal's wager, in modern times.
That is why AI safety isn't that important.
@svenhoek 4 дня назад
I would be very careful of AI bigoted commentary. 😅
@CedarGroveOrganicFarm 4 дня назад
@@svenhoek what do you mean?
@John-sd5li 3 дня назад ⁺³
I'm sure we still haven't gotten safe alignment in nuclear arm race too. Welcome to this brutal world buddy.
@MarkWheels00 3 дня назад
@@John-sd5li The nuclear weapons don’t operate themselves. Different issue
@bingeltube 3 дня назад
Please summarize video to under 20 minutes! Video too long; did not watch!
@gustinian 3 дня назад ⁺²
Your attention span needs work.
@YashaPezxman 4 дня назад ⁺¹
I can fix the reasoning problem for you the reason why your models are lacking in reasoning stuff I know they are good they're pretty good but they're not comparable with human brain human brain is much more capable and reasoning stuff the reason is you are giving the large language model of yours text input I mean your build station is a large language model not a no name neural network if you want human level reasoning you should build your visual and audio neural networks to work with just numbers then the output of numbers should send it to the last enrollment work and the last number I want to work should have no name it's not it's not going to work by text it should work with the exact numbers that has been and the last name was Fortune to figure out what to do with those numbers to get desired output designing by reinforcement learning I mean give pleasure for desired output and give pain for not desired output

Следующие

Автовоспроизведение

Satya Nadella | BG2 w/ Bill Gurley & Brad Gerstner