@@greatwazzoo you'll have to pose as Brad Pitt first 🤣😂 if you don't know what I mean, search for Brad Pitt romance scam made by Nigerians. Yeap, stupid people sometimes deserve it... Just like 50% of Americans that voted for agent orange 🍊
I mean these are examples of where these websites have partnered with open ai, so they are allowing open ai's web crawlers. Idk how it would work with other websites.
@@istvann.huszar420 Yeah probably useless for identifying bots but I'm sure the captchas will still be used because the main purpose of those is tremendous data collection
at this point pls give something to European paying user we can't use sora and operator while we pay the same maybe give us a few more o1 uses or something like that
@@juanperez-lh9mt wrong, def more countries than just the US can have access. I'm in Bosnia right now, not part of EU, and operator would def be allowed. Your brainwash makes you think that Operator can ONLY be in the US right now. OpenAI just wants to limit which pro users get it for now while research mode
It’s the EU governing blocking these. EU is rampant with regulation to the point it’s impossible to innovate anything. Will probably be a 1 year delay on any AI product
The fact that someone probably was watching the livestream, went on OpenTable and took the table they were selecting for 7:45pm -- kinda annoying :) (5:00)
Timestamps 00:10 - Introducing Operator, an AI agent enhancing productivity through independent task execution. 01:55 - Operator enhances user experiences on various platforms through intelligent assistance. 05:48 - Operator utilizes AI to streamline online grocery shopping tasks. 07:46 - Kua model uses keyboard and mouse for enhanced digital interaction. 11:31 - Demonstrating operator's functionality with live shopping and ticket purchasing. 13:46 - Utilizing AI for daily tasks like ordering food and finding services. 18:04 - Operator ensures safe transactions with confirmation and monitoring. 19:58 - Operator shows promise but has reliability issues compared to human performance. 23:49 - Encouragement and appreciation for viewer engagement.
Next step: remove GUI, instead put a server sided AI operator on the business' website which the client AI operator talks to, they negotiate the matter among themselves. When done, result gets communicated to the user by client AI operator. -> webdesign, web programming, websites, web browser, keyboard, mouse, touchscreen: all obsolete.
And if the website is for business that is a retailer or store, the operator could skip them entirely and just talk to the factory direct. If the product the user was trying to buy from the website is something like a chair, desk or laptop for a human to use to perform a task, you could use operator instead of the human, removing the need to purchase the product in the first place. If / when day comes when one of these things is obsolete, many other things will too, either that day or shortly after I think. Which is the scary part
CEO of Microsoft already mentioned it. Current Web Interface will be obsolete once all browsing will be mostly by Agent. Only APi is needed & final results will be in a human readable format Same thing with email exchange
@ you’ll still want it visualised in some sort of ui I think, it’s just that a lot of current ui will go away. It’s much quicker to click on which of four options you want with an image and a title for each than read a description of them and type back the name of the thing you want.
@@taganaafaw3970 Nah, nothing is going to change, at least in a way that affects our current workflows. Humans can still perceive information faster from an UI rather than from a block of text.
It might have the opposite effect. What's the point of making convoluted websites full of ads if no one sees them. Or maybe they'll have to make them worse to trap human attention when the bot gets stumped.
@dogprez I'm less worried about the websites than about the "people" which will be on them. Bots that seem very human, and perhaps even start life by posting innocuous stuff snd building a human looking history - but ultimately they just exist to try and sell stuff, or steal your identity, or whatever else.
Could you add a YES and NO button after ChatGPT asked a question? It would make it so much easier... and we don't have to type it out every single time; this is all about productivity improvement, right?🙄
There's two camps: Those who went to college and bought homes here when everything cost a fraction of it was today, and those of us just finishing our masters or phds... The first camp it doing great, but are probably stats majors cause data science wasn't a major yet, so... I respect the pivot. The second camp is where I am, where I'm over 200k in debt to get he proper education and experience. Now I get to be broke for another 5 years to work out of this hole, and with housing prices many times higher than inflation stats, you will need to make 500k or more by then to afford to buy a starter home and stay here in the bay area. $145k is now the poverty line in Santa Clara for example... Just because your a ML or AI engineer, doesn't mean your living life. Just some are...
@@heyaisdabomb hopefully you find some way to change your circumstance, it'll be difficult but maybe try to get a job opportunity in a different state with lower cost of living. Cali's COL is insane to begin with
The browser control is awesome however the use case shown isn't worth the $200/month in my opinion. Still very awesome seeing where this is all heading and I will likely spend the $200/month for the sake of testing it for other things. I wish it learned how I answered emails and did that for me. Keep up the great work and this is really getting exciting.
Hey man I actually build these systems from professionals, we have an automated email responder. If you're interested I'd be happy to do a free setup to show you how it works👍
Agreed. Shocked this isn't for Plus users. UI-TARS is open source, has a desktop app and can control LOTS of software, not just a browser. And you can FOR CERTAIN have it hosted for WAY less than $200/mo. Most of all, since Google needs to maintain Web dominance, fully expect this to simply light a fire under Google to get Project Mariner out the door bundled into Google One for $20/mo.
@@Vastfill Just something else that Apple will screw up then. Siri is still crap even with the latest AI adds: "Performance Issues: Users report that the AI features are slow, buggy, and often underwhelming12. Functionality Concerns: Siri remains largely ineffective, with many features feeling more like gimmicks than substantial improvements2. Delayed Rollout: Apple's cautious, privacy-focused approach has led to a fragmented and frustrating user experience1. Specific Complaints: The new AI image cleanup feature takes 10-15 seconds and doesn't produce impressive results2 Notification summaries have been criticized as inaccurate and unhelpful4 Many features seem to lag behind competitors like OpenAI and Google"
Because users don't have api keys and openai can't buy stuff like: user email --> buy it now. Maybe some open protocol but this will take 5+ years, like with responsive design, SSL etc.
A bug problem I see with this, which hasn't be discussed, is the AI's 'bias' on which platforms or services to use. Will it use Opentable by default by virtue of its training data, pushing out its competitors? This would lead to a world of 'ai marketing', where it's not humans deciding on the underlying apps they want to use in most cases, but ai agents that have been trained to bias towards one app over another. The same goes for products. If I want 'eggs', will there be a bias if unspecified, towards specific brand of eggs? I can't imagine a statistically significantly number of people will be prompting "please buy local products over those from large chains". This could introduce a very very strong incentive for companies to "buy a particular bias", and change marketing as we know it. The marketing phase is effectively training data + a statistically insignificant number of humans manually prompting AI with great detail.
I am pretty sure they will add memory feature just like chatgpt already has. It will remember your preferences and there will be further customization open.
Just want to appreciate that the Microsoft-bought AI uses Google Chrome for web tasks 😆 In all seriousness though this is incredible. I can't imagine how good this will be when it can navigate the web much faster, and you can communicate with it at regular talking speed for tasks - like a reservation is completed in the response time of advanced voice mode.
Absolutely. Reminds me of the days when people commuted by foot or horse for so many hours a day. Nowadays we’re so lazy we have to use automatic vehicles to get us there. If we just did things the way we used to do them, things would be much better.
What’s the use case here? It took far too long to make a reservation at the restaurant. We still have to repeatedly instruct the model on what to do. Why can’t we, for example, make a reservation directly through an API? This way, the model wouldn’t need to navigate through the website step by step but could complete the process much faster and more efficiently via the API.
Its not headless you can view it browsing the web. I've built something similar for web-scraping but i have addons to get rid of the adds and crap which makes it nicer to scrap data
It's like Tesla's FSD; you must put more energy and time into supervising it than operating it yourself. Also, sharing all your credentials with lovely people at OpenAI sounds super sensible!
I wouldn't trust it. Imagine you show up and the AI messed up your reservation on their side, or didn't book it at all. I suppose it will get better with time, but I don't think I'll use it until I no longer have the choice.
But remember that on the end operator ask you does the reservation details are ok (confirmation) so even when some mistakes happend with reservation.. sorry.. but the fault is also yours:)
@ No, I'm talking on the receiving end. How can you be certain the hotel didn't receive a completely different date and time from the AI? Or no booking at all, because of some error, but the AI says it's booked. I don't trust LLMs enough to do tasks like that without making a mistake.
This is what everyone says before new technology comes out and then after 10 or 15 years it becomes a normal part of everyday life. There’s always a learning curve and developers perfect the program over time.
A computer that can control a computer and hit buttons? Have you never heard of uiPath or Selenium? Your mind will be blown when you find out this capability has already existed for 20 yrs.
As cool of a demo as this is, I'd love to see something like this doing far more complicated tasks. Every single thing here is something I could easily do myself. I'd like to see how well this does on a genuinely difficult web browsing task.
Notice the bias? There were two reservations that were 45 minutes different from the 7pm preferred time. Operator only offered the later one and called it “the closest” which is not true.
I like this! I do hope in the future that the web browser can be a local one instead of a cloud browser, but this is an excellent start to 2025. I'm excited to see what new agents are coming out over the next few months.
Awesome as always! But can we move past the boiler plate "agent making restaurant reservation using opentable" use case please? It is easy to process as a prospective consumer but I feel like it is too cliched by now. Does this really represent a enough of a burning problem?
I think the age where AIs browse the website by itself will be short-lived for most common requests. It will be replaced by an API or service provider-side AI that the client AI can talk to. I can imagine it being less ambiguous and cluttered for them. Although more interpreters (two AIs communicating instead of one AI browsing the web or API) means more likely for the information to be distorted.
I would use it to find the best possible price for a certain product across the entire internet. This will increase competition and overall be great for the market
In the middle of this video he's like the invisible kid at the party. But I guess a part of that peculiar behaviour somehow got him to where he is now.
I bet it does. This means the job of recruiter is dead in the water: Login to LinkedIn recruiter Login to your calendar (Google & 365) Ask it to search, approach and pre screen candidates and when it is a fit, invite candidate to interview. Check availability and confirm by booking in appointment. Check mate.
In these examples, while cool, it's way faster for me to just do this stuff myself. Why would I use this? Can people actually not take 1 minute to order groceries themselves?
So, you can't even imagine ONE scenario that a fully AI automated browser would be a benefit? don't get me wrong, this needed to be Plus, for $20/mo, not Pro for $200/mo...
El inicio de los agentes, este es el primer paso, dejen que el modelo mejore y con el tiempo veran lo poderoso que es este tipo de agentes, me encanta, grande openai 😊
In all seriousness, have any of the openAI developers thought about the point of all this? Ask yourself this question: do you think people hate doing things? Like, you're automating all the daily tasks and you're also automating art, video etc and all the fun stuff. What is even going to be left for humans to do? What's the endgame if all of this actually works as advertised? I worked in tech and know this for a fact: we get so focused on the solution we forget what we are solving for. Are we trying to augment the human experience or homogenize it?
For the hotel booking option, I really thing core to what Operator is doing is interacting with the as though an API could. If there was an existing API, this command would have been done in a few seconds.
@@gillian2915 APIs are just a "port" to communicate into any software that lets you communicate with it directly, skipping all the interface of Web sites, etc. and provide "just the facts ma'am" for it to do something. APIs are for developers to write programs to leverage those capabilities. Operator is for lowly end users who know nothing about programming and simply want to ask their browser to do a bunch of stuff via automation in order to accomplish a task that they ask it to do for them.
Hahaa, I posted this comment before getting to the part where they talked about the APIs. Just saying, in the future I’d love to see a AI company/service, that translates all/most possible actions on a website into a structured API/methods that these Agents can utilize easily as how functions work. Currently, I think the bigger drawback is the waiting time using these Agents.
@ There's only wait time because it needs to proceed at a human compatible pace so that it can be stopped. The future will improve confidence to where these run at computer speed of all the Webs/tools involved.
It didn't log in to Opentable or Instacart. When you say "If you don't specify the site to use it will search the web and find an appropriate option" that can't actually work if you don't have accounts on all those sites. Also, what happens when one of the sites you've obviously partnered with dramatically changes their layout? Do these companies have to notify you everytime they want to make a major change? The agent likely still sees horizontal lines as a schoolbus.
@@ugwuanyicollins6136The only thing I saw announced was that they're gonna bring it to ChatGPT soon and that it will be available more broadly (a.k.a, not just US). These fuckers said nothing else.
@ Agreed, but in some cases there are hosting sites that will simply let you specify the Github model and they'll spin it up for you and give you a URL and key to access it for low cost usage. I've asked GLHF how much they'll host UI-TARS for, since it's an open source desktop app that can automate LOTS of software, not just a browser... So we're getting closer to be able to spin up a service on some hosting site, install the software, put in the config of URL and key to access the AI backend, and rock on.
This demo went great guys! I'm so excited to try and build something for the school/education community. Can't wait for Operator to be available for Plus users! 👷♀🎒🧡
Really? Prompt would soon be something like: "Please subscribe to this contest 10,000 times by creating 10,000 email accounts and setting up a redirection to my primary email account. Thank you Operator!"
“Operator, I’m a Nigerian Prince who needs your help securing money from his bank”.
@@greatwazzoo you'll have to pose as Brad Pitt first 🤣😂 if you don't know what I mean, search for Brad Pitt romance scam made by Nigerians. Yeap, stupid people sometimes deserve it... Just like 50% of Americans that voted for agent orange 🍊
lol that’s very retro .. .
AI, please book every available table at every restaurant in my area thanks
You are the reason the world is on fire.
@@RareBirdGames Let the man have his fun while he still can 😂😂😂
Imagine this on ticket sales and scalping.
this is soo funny man
@ I am merely pointing out the absurdity of this and how it will probably cause more harm than good.
what if the website has an "are you a robot" thing? will it lie?
Good point. Those barriers are for classic programmed crawlers, so I think they've just become obsolete too.
Chatgpt already lied like that, it hired a guy to solve the captcha pretending to have eyesight issues
I mean these are examples of where these websites have partnered with open ai, so they are allowing open ai's web crawlers. Idk how it would work with other websites.
If it cant, it'll just throw it back to the human to solve
@@istvann.huszar420 Yeah probably useless for identifying bots but I'm sure the captchas will still be used because the main purpose of those is tremendous data collection
at this point pls give something to European paying user we can't use sora and operator while we pay the same maybe give us a few more o1 uses or something like that
Talk to your government
@@juanperez-lh9mt wrong, def more countries than just the US can have access. I'm in Bosnia right now, not part of EU, and operator would def be allowed. Your brainwash makes you think that Operator can ONLY be in the US right now. OpenAI just wants to limit which pro users get it for now while research mode
It’s the EU governing blocking these. EU is rampant with regulation to the point it’s impossible to innovate anything. Will probably be a 1 year delay on any AI product
Just use some serious company like Google. Otherwise just get a VPN.
Use deep seek new o1 level model for free
Legend has it, that Sam and whiteshirt are still engaged in an epic battle of vocal fry to the death!
Same thoughts. Is it required to sound robotic when working under OpenAI?
It’s a regional thing
My ears don’t like that 😅
I am so glad I am not the only one being distracted and annoyed by this!
The vocal fry in San Francisco is getting ridiculous!
Sam Altman has the worst vocal fry I've ever heard
@@ReginaCæliLætarei feel like it fits his voice and demeanor
I can understand women do it, but a man with such pronounced vocal fry?
The bigger the announcement, the crispier the Fry.
Cause it’s gotten so soy, for men in San Fran. They’ve forgotten how to say it with their chest.
The guy on the right end is the closest vocal competitor to Sam.
Exactly my thought
Can’t listen to it
The fact that someone probably was watching the livestream, went on OpenTable and took the table they were selecting for 7:45pm -- kinda annoying :) (5:00)
I didn’t think about that but I think you’re right!
It was not actually "live".
most likely a delayed live stream.
if anything it just shows its capabilities
Open ChatGPT, open an operator tab who opens a ChatGPT operator who …
Lol, if it wasn't for acc login, that might have worked and I love that
@@aryanmn1569 you could give the creds for the login
I heard you like operators
yo, dawg
I’m gonna use agents to leave hundreds of “vocal fry” comments
Timestamps
00:10 - Introducing Operator, an AI agent enhancing productivity through independent task execution.
01:55 - Operator enhances user experiences on various platforms through intelligent assistance.
05:48 - Operator utilizes AI to streamline online grocery shopping tasks.
07:46 - Kua model uses keyboard and mouse for enhanced digital interaction.
11:31 - Demonstrating operator's functionality with live shopping and ticket purchasing.
13:46 - Utilizing AI for daily tasks like ordering food and finding services.
18:04 - Operator ensures safe transactions with confirmation and monitoring.
19:58 - Operator shows promise but has reliability issues compared to human performance.
23:49 - Encouragement and appreciation for viewer engagement.
Thank you bot
Good bot
Useful bot
”Does what a user would do” -> goes to Bing… instant demo effect
Next step: remove GUI, instead put a server sided AI operator on the business' website which the client AI operator talks to, they negotiate the matter among themselves. When done, result gets communicated to the user by client AI operator. -> webdesign, web programming, websites, web browser, keyboard, mouse, touchscreen: all obsolete.
And if the website is for business that is a retailer or store, the operator could skip them entirely and just talk to the factory direct.
If the product the user was trying to buy from the website is something like a chair, desk or laptop for a human to use to perform a task, you could use operator instead of the human, removing the need to purchase the product in the first place.
If / when day comes when one of these things is obsolete, many other things will too, either that day or shortly after I think. Which is the scary part
CEO of Microsoft already mentioned it.
Current Web Interface will be obsolete once all browsing will be mostly by Agent. Only APi is needed & final results will be in a human readable format
Same thing with email exchange
@ you’ll still want it visualised in some sort of ui I think, it’s just that a lot of current ui will go away. It’s much quicker to click on which of four options you want with an image and a title for each than read a description of them and type back the name of the thing you want.
@@taganaafaw3970 Nah, nothing is going to change, at least in a way that affects our current workflows. Humans can still perceive information faster from an UI rather than from a block of text.
@@someghostswho’s gonna buy food if no one works anymore ? 🤣 the ultra rich
Wow I can't wait for the incredible enshittification of the open internet.
It might have the opposite effect. What's the point of making convoluted websites full of ads if no one sees them. Or maybe they'll have to make them worse to trap human attention when the bot gets stumped.
@dogprez I'm less worried about the websites than about the "people" which will be on them. Bots that seem very human, and perhaps even start life by posting innocuous stuff snd building a human looking history - but ultimately they just exist to try and sell stuff, or steal your identity, or whatever else.
I am glad we are replacing efficient APIs and people with the least efficient systems possible all for the sake of laziness
This is how you get 99% of people to actually use your technology
Kind of like replacing CL for GUI IDEs and machine code for high level languages? The way of the world.
😂😂 true
AI helps so much , it’s not about laziness
none of these things have API's bud
The San Francisco vocal fry situation is insane
Is it lack of confidence, low T, a status thing, a burnout? Let me ask the operator.
A lot of lisps too.
Is there a cure? Or is their voices not deep enough? Why the f**k are people talking like this??
Why they all have robot voice ?
because they are virgin dorks
Audio quality is a key feature of OpenAI streams.
So congrats! You've just deployed an impressive end-to-end suite of smart, large-scale testing tools. 🎉
That's really smart! QA testing via agent seems like a cool use case!
I will now automate all our test scripts
This is incredible! Operator will change the world for people relying on assistive technologies to browse the web! Can’t wait 👏
Could you add a YES and NO button after ChatGPT asked a question? It would make it so much easier... and we don't have to type it out every single time; this is all about productivity improvement, right?🙄
Copilot does that.
Or just press y or n
Instacart, Warriors tickets, Thumbtack maid, Doordash... Can you imagine what life must be like for AI programmers in the SF area..?
Yea, it is a all in gay time
There's two camps: Those who went to college and bought homes here when everything cost a fraction of it was today, and those of us just finishing our masters or phds... The first camp it doing great, but are probably stats majors cause data science wasn't a major yet, so... I respect the pivot. The second camp is where I am, where I'm over 200k in debt to get he proper education and experience. Now I get to be broke for another 5 years to work out of this hole, and with housing prices many times higher than inflation stats, you will need to make 500k or more by then to afford to buy a starter home and stay here in the bay area. $145k is now the poverty line in Santa Clara for example... Just because your a ML or AI engineer, doesn't mean your living life. Just some are...
@@heyaisdabomb hopefully you find some way to change your circumstance, it'll be difficult but maybe try to get a job opportunity in a different state with lower cost of living. Cali's COL is insane to begin with
@@heyaisdabomb source of the $145k figure?
I thought about this the whole time!
Operator, please fix the audio in this video.
I'm so excited for deepseek to make this completely free ❤ Thanks Closed Ai ❤❤❤
😂😂😂😂😂😂😂😂
It doesn't make sense to enter all the sensetive information in a cloud browser !
It's for hype! They need to cover the HUGE costs. Panic mode installed :))
Should've ran a demo to buy the dude on the right some cough drops
😂😂😂😂
Sam Altman also needs some cough drops to help with his vocal fry.
Do you all have throat gonorrhoea in San Francisco ?
Exactly my thought!
In few years we will look back to this and be amazed as always 🎉
It's nice to see Altman have a personality once in a while.
Operator, buy me a RTX 5090.
Nvidia is about to tank , did you see what China came out with today ?
And make a way to pay for it
@@MrMerican87 what did they come out with today?
@ deepseek r1. Nvidia still might be a buy since they make the chips for now
Wow, four men, one American CEO, one Indian PPT, one Chinese mind and one Japanese sidekick.
I guess this roughly shows the current situation of AI.
We got autonomous agents before a "select all" function in ipadOS
The browser control is awesome however the use case shown isn't worth the $200/month in my opinion. Still very awesome seeing where this is all heading and I will likely spend the $200/month for the sake of testing it for other things. I wish it learned how I answered emails and did that for me. Keep up the great work and this is really getting exciting.
Hey man I actually build these systems from professionals, we have an automated email responder. If you're interested I'd be happy to do a free setup to show you how it works👍
Could be free one day when apple finds it smooth enough to add it to siri
Agreed. Shocked this isn't for Plus users. UI-TARS is open source, has a desktop app and can control LOTS of software, not just a browser. And you can FOR CERTAIN have it hosted for WAY less than $200/mo.
Most of all, since Google needs to maintain Web dominance, fully expect this to simply light a fire under Google to get Project Mariner out the door bundled into Google One for $20/mo.
@@Vastfill Just something else that Apple will screw up then. Siri is still crap even with the latest AI adds:
"Performance Issues: Users report that the AI features are slow, buggy, and often underwhelming12.
Functionality Concerns: Siri remains largely ineffective, with many features feeling more like gimmicks than substantial improvements2.
Delayed Rollout: Apple's cautious, privacy-focused approach has led to a fragmented and frustrating user experience1.
Specific Complaints:
The new AI image cleanup feature takes 10-15 seconds and doesn't produce impressive results2
Notification summaries have been criticized as inaccurate and unhelpful4
Many features seem to lag behind competitors like OpenAI and Google"
Why use a browser ? Why not use the APIs to do the same conversational flow ?
Never mind, someone answered as APIs may lack some features.
Because users don't have api keys and openai can't buy stuff like: user email --> buy it now. Maybe some open protocol but this will take 5+ years, like with responsive design, SSL etc.
Exactly. And that already exists, it's called Zapier.
Old technology, but pleased you are making it available to the masses :) well done
Can operator be used to apply to jobs.
Why not? It could do any web task in theory
A bug problem I see with this, which hasn't be discussed, is the AI's 'bias' on which platforms or services to use. Will it use Opentable by default by virtue of its training data, pushing out its competitors? This would lead to a world of 'ai marketing', where it's not humans deciding on the underlying apps they want to use in most cases, but ai agents that have been trained to bias towards one app over another.
The same goes for products. If I want 'eggs', will there be a bias if unspecified, towards specific brand of eggs? I can't imagine a statistically significantly number of people will be prompting "please buy local products over those from large chains". This could introduce a very very strong incentive for companies to "buy a particular bias", and change marketing as we know it. The marketing phase is effectively training data + a statistically insignificant number of humans manually prompting AI with great detail.
I am pretty sure they will add memory feature just like chatgpt already has. It will remember your preferences and there will be further customization open.
It would had been nice if you also let us Plus users who pay $20 per month use it now. Very much disappointed.
Cry harder
I bet plus users will get it in about 6 months
@@DarkandTwisted cry harder
Do all time travelling cyborgs have the same vocal fry ?
Just want to appreciate that the Microsoft-bought AI uses Google Chrome for web tasks 😆
In all seriousness though this is incredible. I can't imagine how good this will be when it can navigate the web much faster, and you can communicate with it at regular talking speed for tasks - like a reservation is completed in the response time of advanced voice mode.
Wow this is so good for people who are lazy. Automate tasks will save us energy.
Absolutely. Reminds me of the days when people commuted by foot or horse for so many hours a day. Nowadays we’re so lazy we have to use automatic vehicles to get us there. If we just did things the way we used to do them, things would be much better.
While costing gigawatts of energy...
@acommoncommenter9364 fair point
What’s the use case here? It took far too long to make a reservation at the restaurant. We still have to repeatedly instruct the model on what to do.
Why can’t we, for example, make a reservation directly through an API? This way, the model wouldn’t need to navigate through the website step by step but could complete the process much faster and more efficiently via the API.
I'm guessing it's because then you'd require an API, and be restricted to the API's functionality. Whereas Operator can technically be used anywhere.
Remember the guy who scraped JSTOR ?
Never forget
Oooo, operator get me the popcorn, I'm watching an embarrassing live demo of things we've had for two years 😂
from where?
Yeah where?
@@bhekistomakonnencustom gpts
I Will be amazed when this agent solve captcha to do the stuff 😂
Nope would ask the human
it can already do that and could do that before this feature.
can’t wait for this to be opensourced with R1 in a month
R1 sucks (atleast for coding), and opensourcing a browser container which won't become a bot, will probably be a tough one
@ “sucks” k.
From the genuine evolution all the way from gpt3.5 that’s real opinionated
It's a popetteer headless chromium browser bot that developers use for social media views.
Its not headless you can view it browsing the web. I've built something similar for web-scraping but i have addons to get rid of the adds and crap which makes it nicer to scrap data
I am going to use this to watch livestreams and shows and let the agent tell me when an Ads are done so I can get back
fry voice overload
It's like Tesla's FSD; you must put more energy and time into supervising it than operating it yourself. Also, sharing all your credentials with lovely people at OpenAI sounds super sensible!
To be fair the latest version of Tesla FSD is REALLY good, I recommended checking out AI driver’s video on it
I wouldn't trust it. Imagine you show up and the AI messed up your reservation on their side, or didn't book it at all. I suppose it will get better with time, but I don't think I'll use it until I no longer have the choice.
But remember that on the end operator ask you does the reservation details are ok (confirmation) so even when some mistakes happend with reservation.. sorry.. but the fault is also yours:)
interesting perspective but why not use it until there is no option left?
@ No, I'm talking on the receiving end. How can you be certain the hotel didn't receive a completely different date and time from the AI? Or no booking at all, because of some error, but the AI says it's booked. I don't trust LLMs enough to do tasks like that without making a mistake.
no one asked if youre gonna use it or not
This is what everyone says before new technology comes out and then after 10 or 15 years it becomes a normal part of everyday life. There’s always a learning curve and developers perfect the program over time.
If you draw a line out to where this could end up, it's going to be life changing.
A computer that can control a computer and hit buttons? Have you never heard of uiPath or Selenium? Your mind will be blown when you find out this capability has already existed for 20 yrs.
Soooo
Muchhh
Vocal fryyyyy
Canadians are very excited for this feature!
Curb your enthusiasm: Trump will tariff you 25% to use it. ;)
This is incredible work, OpenAI! I can't wait for it to become available here in New Zealand.
As cool of a demo as this is, I'd love to see something like this doing far more complicated tasks. Every single thing here is something I could easily do myself. I'd like to see how well this does on a genuinely difficult web browsing task.
Can it handle captchas?
*My only issue is the operator had a chance to click on the Baby Spinach that’s on sale and it choose the more expensive one.*
simple prompt addition to buy cheapest. That isn't the issue, spending $2,388/year to order spinach IS.
Notice the bias? There were two reservations that were 45 minutes different from the 7pm preferred time. Operator only offered the later one and called it “the closest” which is not true.
What’s going on here, with Sam present, is an example of good leadership.
I like this! I do hope in the future that the web browser can be a local one instead of a cloud browser, but this is an excellent start to 2025. I'm excited to see what new agents are coming out over the next few months.
I’m guessing this AI agent has a side hustle solving CAPTCHAs for fun.
🇧🇷🇧🇷🇧🇷🇧🇷👏🏻 can’t wait for these agents to adit my video Final Cut Pro the way I like it
Incredible work, can't wait for it to reach Plus.
Awesome as always! But can we move past the boiler plate "agent making restaurant reservation using opentable" use case please? It is easy to process as a prospective consumer but I feel like it is too cliched by now. Does this really represent a enough of a burning problem?
Most websites have bot protection. Would this currently work on those sites or would it be blocked?
It sounds nice, but I would like available with as voice chat sounds way better
It’s coming
“Asking ChatGPT for Restaurant Queries” Truly an OpenAi Dev Experience and Showcase…😅😢
I think the age where AIs browse the website by itself will be short-lived for most common requests. It will be replaced by an API or service provider-side AI that the client AI can talk to. I can imagine it being less ambiguous and cluttered for them. Although more interpreters (two AIs communicating instead of one AI browsing the web or API) means more likely for the information to be distorted.
Wow I can't wait to use this for work, this will save me time for sure.
You sure? It's pretty slow.
I would use it to find the best possible price for a certain product across the entire internet. This will increase competition and overall be great for the market
Now I can be a scalper buy all the Nintendo Switch 2. 👍👍
In the middle of this video he's like the invisible kid at the party. But I guess a part of that peculiar behaviour somehow got him to where he is now.
Does it work with LinkedIn?
I bet it does. This means the job of recruiter is dead in the water:
Login to LinkedIn recruiter
Login to your calendar (Google & 365)
Ask it to search, approach and pre screen candidates and when it is a fit, invite candidate to interview. Check availability and confirm by booking in appointment.
Check mate.
In these examples, while cool, it's way faster for me to just do this stuff myself.
Why would I use this? Can people actually not take 1 minute to order groceries themselves?
So, you can't even imagine ONE scenario that a fully AI automated browser would be a benefit?
don't get me wrong, this needed to be Plus, for $20/mo, not Pro for $200/mo...
I am imagine certain tedious and repetitive tasks that it might be faster to have an AI do while you work on something else.
@@crubs83 A LOT of using computers, especially in workplaces in doing just that so Agents will be huge once we get deep into them.
This is awesome! We will all have our own virtual assistants and soon we will all have our own robot assistants! Incredible time to be alive! 😎🤖
El inicio de los agentes, este es el primer paso, dejen que el modelo mejore y con el tiempo veran lo poderoso que es este tipo de agentes, me encanta, grande openai 😊
Great work! Super exciting!!!
what a time to live!
For a second, when opening stubhub, the developers panicked 😂. With your boss right beside you. The boss seems nice
Yash: it also makes mistakes, sometimes embarrassing ones ...
Altman: Stares in to his soul - HOW DARE YOU !?!
Honestly a college kid can make operator using chatgpt api and puppeteer. Nothing fancy 😭
Getting a general understanding of the screen is easy, but accurate clicks at the right coordinates is surprisingly hard.
In all seriousness, have any of the openAI developers thought about the point of all this? Ask yourself this question: do you think people hate doing things? Like, you're automating all the daily tasks and you're also automating art, video etc and all the fun stuff. What is even going to be left for humans to do? What's the endgame if all of this actually works as advertised? I worked in tech and know this for a fact: we get so focused on the solution we forget what we are solving for. Are we trying to augment the human experience or homogenize it?
Funny the white t-shirt guy sounds more like a AI than the AI from chatGPT lol
And here I thought that Sam's vocal fry was extreme 😂😂
"Operator, optemize my PC so I can get 120 fps on Marvel Rivals while you day trade crypto in the background"
For the hotel booking option, I really thing core to what Operator is doing is interacting with the as though an API could. If there was an existing API, this command would have been done in a few seconds.
And then it hits you: Joe Schmoe doesn't even know what "API" stands for.
That's me lol please tell me what an API is and if it's better why did they make one this way instead ?
@@gillian2915 APIs are just a "port" to communicate into any software that lets you communicate with it directly, skipping all the interface of Web sites, etc. and provide "just the facts ma'am" for it to do something.
APIs are for developers to write programs to leverage those capabilities.
Operator is for lowly end users who know nothing about programming and simply want to ask their browser to do a bunch of stuff via automation in order to accomplish a task that they ask it to do for them.
Hahaa, I posted this comment before getting to the part where they talked about the APIs.
Just saying, in the future I’d love to see a AI company/service, that translates all/most possible actions on a website into a structured API/methods that these Agents can utilize easily as how functions work.
Currently, I think the bigger drawback is the waiting time using these Agents.
@ There's only wait time because it needs to proceed at a human compatible pace so that it can be stopped. The future will improve confidence to where these run at computer speed of all the Webs/tools involved.
The raspy voices are impossible to listen to, why are they speaking like that?
Because it is sexy. The one I can't understand is the Indian guy. No offense.
My best guess is vaping lol
It didn't log in to Opentable or Instacart. When you say "If you don't specify the site to use it will search the web and find an appropriate option" that can't actually work if you don't have accounts on all those sites.
Also, what happens when one of the sites you've obviously partnered with dramatically changes their layout? Do these companies have to notify you everytime they want to make a major change? The agent likely still sees horizontal lines as a schoolbus.
Hope we can try Operator in UK too, are these APIs available to only paid users..
too expensive! maybe later when less priced will consider
Deepseek or any other local model soon maybe or with extra tools
They announced that plus users would get it soon.
@@ugwuanyicollins6136The only thing I saw announced was that they're gonna bring it to ChatGPT soon and that it will be available more broadly (a.k.a, not just US). These fuckers said nothing else.
@@ugwuanyicollins6136 Didn't hear that at all, timestamp?
This is a gamechanger
In ten years we’ll come back to this video and say, this is how Skynet started.
Skynet started with GPT3
The "vocal fry" was most prominently featured in this presentation
OpenAI, u did a wonderful job!
Agent Zero already does that and much more. It’s Open source and can now use BROWSER
UI-TARS is also open source, has a desktop app and runs lots of software, not just a browser...
but not all people can install them without understanding python and stuff
@ Agreed, but in some cases there are hosting sites that will simply let you specify the Github model and they'll spin it up for you and give you a URL and key to access it for low cost usage.
I've asked GLHF how much they'll host UI-TARS for, since it's an open source desktop app that can automate LOTS of software, not just a browser...
So we're getting closer to be able to spin up a service on some hosting site, install the software, put in the config of URL and key to access the AI backend, and rock on.
Love you all developer and Your teams ❤❤❤
This demo went great guys! I'm so excited to try and build something for the school/education community. Can't wait for Operator to be available for Plus users! 👷♀🎒🧡
Wait till the team at Rabbit R2 get their hands on this! 🔥
Really? Prompt would soon be something like: "Please subscribe to this contest 10,000 times by creating 10,000 email accounts and setting up a redirection to my primary email account. Thank you Operator!"
That's really cool. I hope it comes to mobile and the price goes down to at least $5 a month. $20 per month is not worth it right now at all.
Amazing breakthrough guys! This is the going to make humanity alot of productive by automating some of the mundane tasks using AI