Just imagine the accessibility possibilities. For those with mobility or visual impairments, Claude can assist with tasks by simply asking, like helping in usage with apps and systems that often lack proper accessibility features.
Actually any ai model with proper AI logic and ocr and nlp multimodal capabilities could do this generally it would be more use to create a hybrid system that can use ai because essentially they most likely are using Claude server side or uploading a local lightweight nlp to distribute localized automation. Which means you probably don't even generally need Claude you just need a localized app that can target any device and let you access Claude and chat gpt or any other ocr capable task. Which was achievable even before now. This actually isn't useful any only privatizes accessibility through Claude's pay wall please do not misinform people.
What I found particularly noteworthy in this demo was that the information wasn’t copied from the CRM, but typed letter by letter. Purely speculating, but perhaps because there are rare cases where websites do not accept copied input, which often also affects password managers.
AI malware is coming; imagine malware powered by AI that constantly watches your screen and awaits the precise conditions to drain your business bank account.
Computer Use is truly a pivotal advancement. Enabling AI to interact with computers like humans do is a significant leap towards AGI. Exciting times ahead!
@@funkfreeze I disagree. It requires an LLM to understand how a computer works without getting too much into coding and also how a human operates a computer to be able to do this. So, it has 'advanced' intellectually that it is able to do the things humans do on a daily basis. It is actually a 'leap' towards AGI. It would be breeze for it to do the same thing using APIs for both the CRM and Google Sheets and get it done in secs instead of minutes.
This is one more pivotal point in AI's evolution. In 2025, more innovation and use cases will emerge, and human involvement is slowly being eliminated. It looks like a small improvement, but it's huge at its core and will significantly impact how AI will be used in a few years. Kudos Claude Team!
Basically, UiPath has been doing this for years. with more and more cheaper RPA like automation, UiPath could lose a lot of customers since they are license based and their license are not cheap!
I was just looking at something like this a few days ago, and it seems Anthropic was already working in the background to deliver my hopes and dreams before I knew I even wanted it. Super exciting stuff 👍
Impressive to see Claude navigating screens like a human! Though still in beta, this could be a game-changer for automating tedious tasks. Can't wait to see how it develops! 🚀 written by *Claude 3.5 Sonnet (New)*
Slowly moving towards tens of thousands of people losing their jobs due to automation with AI in the tech industry and beyond and slowly having the average person barely able to survive. Thanks Anthropic you're heroes!
@@mihirvd01 Not in the USA you won't...Countries within the EU will have it…If something as universally accepted as single payer healthcare is rejected in the USA there’s no effing way US politicians will go for UBI…better start prepping now (or move to a first world country)
I'm a bit worried! Why aren't we working on the AI to pick and sort fruits and vegetables? or identify garbage and put it in the garbage truck? This is jobs that people don't want to do! Secretary job, create a song, design a building, ... These jobs are precious for people!
People are. Recently, I saw an AI robotic arm picking fruit, among other things. It's being worked on now, so in a few years, it will be more prevalent and eventually commonplace.
Very cool and better Sonnet is amazing. A lot of this AI web browsing stuff is probably better via API access but for now browser simulation is probably a useful feature.
I wonder how really financially sustainable this feature is going to be in a world where companies have worldwide solved this problem with APIs. To me it looks impressive but not necessarily game changing. Maybe if I can let the tech assist me while I’m learning a new skill that would be great. I’m thinking about co-editing a photograph with a post editing software, for example.
It is just the beginning of a new AI game, yesterday Microsoft with their Autonomous AI agent, today Anthropic and the others also will release their own just wait for the new trend.
Really?😆By the time he opened up all of the appropriate tabs, wrote detailed instructions in the prompt, he could have just fucking done it himself... He's misleading as to what's actually happening too. It's not even looking at the entire spreadsheet, just what's on screen. 'Ant Equipment Co' could have been on record 532 for all we know... my goodness the hype.
I am sorry, I am failing to see a point here. As an software test automation expert for the last 20 years, I am more or less doing the same thing as shown in demo, doing GUI automation, How this will be a game changer?
I would expect more people doing ticket scalping online/reserve seats for high sought after restaurants and reselling online. since AI can be trained to automate UI interface interaction, clicking through websites to buy tickets at lighting speed to snap up the best tickets and resell it at high price is definitely possible. from logging in (basic form filling), bypassing CAPTCHA (image recognition), waiting inline (timer event), selecting seats (train to select next best seat if taken), and lastly use credit card to make payment (basic form filling). who ever can program this efficiently can use it for scalping.
Have you thought about integrating this via the Sheets API? Taking a high res screenshot and then doing image to text conversion just to get some data that is already structured seems like a huge overhead to me. I guess changing this could cut your inferencing costs by 50% at least for this example :) but I assume you had good reasons for it. In addition, taking a screenshot cannot give you a view on the whole file in this case, might be something to consider. Still, great stuff I look forward to trying this myself!
@user-ti9yn8wg6o You'd think using the control+F function to search the sheet like a person would, wouldn't be that hard if it can search a CRM for info. I'm wondering if they mention the screenshot thing to simply illustrate that it can use image modality as well?
great video. however, the background sound is intrusive to the vocals - lower it will be helpful for future videos so we can clearly hear and understand the presenter.
What are the security implications of this? Could a bad actor use this to ask Claude to go into other people’s computers and access their confidential information?
"I am glad AI is taking over human labour, but the transition towards it will be full of suffering." you shouldn't, technocrats will genocide us since we are useless now.
Looking forward, although it seems like a clumsy interface right now, it is the worst it will ever be. What they are actually doing is creating an adaptable interface which bridges intent with outcome. While now it is just eye tracking, tighter embedding with wetware (ie: brains) will be the natural evolution of this awesome technology. Like those couple of kids in back to the future 2 said indignantly - "You have to use your hands!?"
I may be oversimplifying this but is this not just taking existing RPA style software like UIPath or AA and integrating it with AI to simplify it for the end user, and not have to worry about documenting detailed processes?
Can it scroll through different UIs? Like for the spreadsheet example is only considered the data available in the screenshot but it should have aceolled through the full spreadsheet to find what it was looking for before moving on. Is it not able to do that yet?
From an organization perspective, how can I lock this down so that employees cannot pull data back they are not authorized to access? On the other hand, how are you keeping this data secure on Claude's side of the house with this much visibility into organizational data?
Sorry, but why is this a good thing? I, mostly, see dangerous use of something like this-an AI can take control over someone’s device is surly going to pose a risk for everyone. Thinking, just because we can, doesn’t mean we should. What would stop the AI from posting on your behalf on SoMe? What would stop the AI from emptying all your bank accounts? Cool, but no, is my gut feeling.
Pretty good, but it has to actually surpass Robotic Process Automation which has the ability to do this, of course the implementation is quite different, and 100s of times faster.
yeah but when can it run all the software on my computer as i train it, or if it can read pdf on the software and then do all the operations needed for it? i have a lot of marketing and advertising tools i would love it to use on its own, and optimize my websites etc for ranking purposes. say i gave it a youtube channel to watch all the videos and then do all those opeartions itself. will that be coming in the future?
How do you ensure it doesn't take instructions from content shown on screen? Like with the demo you gave would a customer be able to insert commands for the ai in it's form so that the ai would provide sensitive data you would not want to share? I don't think you should ever let something take control over your computer if you can't guarantee the input it gets is safe. And even if you ask the ai to do a task solely on your computer without seeing files from a foreign source it may still come across them in the process of completing the task. Either through a notification pop up or the use the inbuild windows search engine that also shows internet results. Seems really not safe
It doesn't get said enough: Not only is Claude the most capable LLM, but they also have the best character. Great work Claude and Team! ❤
whatsinmy AI fixes this. aude 3.5 Sonnet beta capability
You guys are amazing! Please release it to individual users.
It is, you can go try it in the API today
Just imagine the accessibility possibilities. For those with mobility or visual impairments, Claude can assist with tasks by simply asking, like helping in usage with apps and systems that often lack proper accessibility features.
Actually any ai model with proper AI logic and ocr and nlp multimodal capabilities could do this generally it would be more use to create a hybrid system that can use ai because essentially they most likely are using Claude server side or uploading a local lightweight nlp to distribute localized automation. Which means you probably don't even generally need Claude you just need a localized app that can target any device and let you access Claude and chat gpt or any other ocr capable task. Which was achievable even before now. This actually isn't useful any only privatizes accessibility through Claude's pay wall please do not misinform people.
Wow, this is going to be quite game-changing!
Goodbye, office jobs
@@eyescreamcake good nobody likes the office jobs
boring music, boring examples....
@@TothTimea32 Reality can be boring sometimes. The example is a typical use case of boring office jobs.
Boring? As you say so. Realistic? Absolutely
Anthropic’s new release, what news could be better?
Keep it up guys, congrats on new version! 🎉
Claude is the most human AI.
What I found particularly noteworthy in this demo was that the information wasn’t copied from the CRM, but typed letter by letter. Purely speculating, but perhaps because there are rare cases where websites do not accept copied input, which often also affects password managers.
I'm liking Claude a lot. Please continue with this form of application.
This introduces a HUGE attack surface to the fraudsters.
AI malware is coming; imagine malware powered by AI that constantly watches your screen and awaits the precise conditions to drain your business bank account.
Computer Use is truly a pivotal advancement. Enabling AI to interact with computers like humans do is a significant leap towards AGI.
Exciting times ahead!
AGI is an intellectual threshold, not a UX threshold. This is incredible, but we're talking about usability and access here, not intellect.
@@funkfreeze I disagree. It requires an LLM to understand how a computer works without getting too much into coding and also how a human operates a computer to be able to do this. So, it has 'advanced' intellectually that it is able to do the things humans do on a daily basis. It is actually a 'leap' towards AGI.
It would be breeze for it to do the same thing using APIs for both the CRM and Google Sheets and get it done in secs instead of minutes.
Immediately prompting: "Do all my work" 🤣
This must be the most impressive step since the popularization of LLMs
Bosses prompt: "which employees can I totally replace with this?"
This is one more pivotal point in AI's evolution. In 2025, more innovation and use cases will emerge, and human involvement is slowly being eliminated. It looks like a small improvement, but it's huge at its core and will significantly impact how AI will be used in a few years. Kudos Claude Team!
This is RPA-like functionality. Wow, Will this be a game-changer?
Basically, UiPath has been doing this for years. with more and more cheaper RPA like automation, UiPath could lose a lot of customers since they are license based and their license are not cheap!
Claude is soooo much underrated
Finally! I have been waiting all my life for this. Let the machine handle itself. This will be included with all computers in future generations.
This could be huge for companies struggling with legacy systems and modernization.
I was just looking at something like this a few days ago, and it seems Anthropic was already working in the background to deliver my hopes and dreams before I knew I even wanted it. Super exciting stuff 👍
Impressive to see Claude navigating screens like a human! Though still in beta, this could be a game-changer for automating tedious tasks. Can't wait to see how it develops! 🚀 written by *Claude 3.5 Sonnet (New)*
That's epic, you guys have the best A.I. This company is something special.
i love how your ai explains reasons FOR an answer and reasons why an option is NOT the answer for quizzing
Claude AI has grown on me, it's the one I use the most now. This is impressive stuff.
I don't understand at all how Anthropic is taking screenshots, clicking, and scrolling if the interface is an HTTP API
Custom implementation. That's what you need to do as well to make it work.
Local app?
@@laurenz1337_ Yeah, just seems like that's the hard part.
@@watsomk ask claude to do it for you lol
@@watsomkthey provide an extensive reference implementation, works out of the box in docker, easy to adapt to your needs.
Anthropic dropping heaters!
Slowly moving towards tens of thousands of people losing their jobs due to automation with AI in the tech industry and beyond and slowly having the average person barely able to survive. Thanks Anthropic you're heroes!
Better adapt. Whining isn't gonna stop technological progress.
@@Roaming8667 Pointing out facts isn't whining. Come back to this comment in six years and tell me how great it is for civilization
@@CrayDilla Don't worry, you'll have UBI and a much better standard of living.
@@mihirvd01 Not in the USA you won't...Countries within the EU will have it…If something as universally accepted as single payer healthcare is rejected in the USA there’s no effing way US politicians will go for UBI…better start prepping now (or move to a first world country)
@@Roaming8667 Luddites are gonna luddite.
I'm a bit worried! Why aren't we working on the AI to pick and sort fruits and vegetables? or identify garbage and put it in the garbage truck? This is jobs that people don't want to do! Secretary job, create a song, design a building, ... These jobs are precious for people!
People are. Recently, I saw an AI robotic arm picking fruit, among other things. It's being worked on now, so in a few years, it will be more prevalent and eventually commonplace.
I’d like to have Claude automatically complete job applications that don’t accurately pull data from a resume.
This is a powerful feature! It opens so many opportunities and speedup economy.
Very cool and better Sonnet is amazing. A lot of this AI web browsing stuff is probably better via API access but for now browser simulation is probably a useful feature.
I wonder how really financially sustainable this feature is going to be in a world where companies have worldwide solved this problem with APIs. To me it looks impressive but not necessarily game changing. Maybe if I can let the tech assist me while I’m learning a new skill that would be great. I’m thinking about co-editing a photograph with a post editing software, for example.
Looks like Siri on screen awareness but two (or more) years early and available for use now (but meanwhile, on server.) WOW. Well done guys.
It is just the beginning of a new AI game, yesterday Microsoft with their Autonomous AI agent, today Anthropic and the others also will release their own just wait for the new trend.
Absolutely fantastic. I am looking forward to this being released as a desktop app
Absolutely incredible -- Super excited to build with this & see what others build!
Awesome stuff!
Love Anthropic. Still seem human and research focused unlike whatever is going on at open ai
This is absolutely amazing !!!! I love it
Can't wait to automate my RuneScape account thank you
Best innovation of the year
Really?😆By the time he opened up all of the appropriate tabs, wrote detailed instructions in the prompt, he could have just fucking done it himself... He's misleading as to what's actually happening too. It's not even looking at the entire spreadsheet, just what's on screen. 'Ant Equipment Co' could have been on record 532 for all we know... my goodness the hype.
I could anticipate a future where we just ask the pc to do tasks for us without user clicking or typing anything
Guys (Anthropic), I think you should sell this so we can use it locally. We won't have to worry about how our data is handled.
Anthropic is too performant to run locally, unlike mistral or llama
I am sorry, I am failing to see a point here. As an software test automation expert for the last 20 years, I am more or less doing the same thing as shown in demo, doing GUI automation, How this will be a game changer?
Exciting!!! great job.
Really? It submitted the form without you approving it first?
We are all out of a job in 5-7 years.
Out of this job onto another
I would expect more people doing ticket scalping online/reserve seats for high sought after restaurants and reselling online. since AI can be trained to automate UI interface interaction, clicking through websites to buy tickets at lighting speed to snap up the best tickets and resell it at high price is definitely possible. from logging in (basic form filling), bypassing CAPTCHA (image recognition), waiting inline (timer event), selecting seats (train to select next best seat if taken), and lastly use credit card to make payment (basic form filling). who ever can program this efficiently can use it for scalping.
LLMs are quite resistant to this due to their training, its easier to script it up with traditional programs.
Have you thought about integrating this via the Sheets API?
Taking a high res screenshot and then doing image to text conversion just to get some data that is already structured seems like a huge overhead to me.
I guess changing this could cut your inferencing costs by 50% at least for this example :) but I assume you had good reasons for it.
In addition, taking a screenshot cannot give you a view on the whole file in this case, might be something to consider.
Still, great stuff I look forward to trying this myself!
Is it "taking screenshots of the spreadsheet" or actually searching the whole spreadsheet?
It took a screenshot, it did not scroll down, it did not search.
@@jnevercast Sure, but then later there are examples of it searching through docs. Hence my question.
my question is does it even take a full page screenshot or just the current page - it's a long spreasheet
@user-ti9yn8wg6o You'd think using the control+F function to search the sheet like a person would, wouldn't be that hard if it can search a CRM for info.
I'm wondering if they mention the screenshot thing to simply illustrate that it can use image modality as well?
great video. however, the background sound is intrusive to the vocals - lower it will be helpful for future videos so we can clearly hear and understand the presenter.
What are the security implications of this? Could a bad actor use this to ask Claude to go into other people’s computers and access their confidential information?
How do you prevent Claude from storing or reusing my personal, PII and/or sensitive information while taking reading the data ?
Claude is Love
There go 70% of office jobs.
I am glad AI is taking over human labour, but the transition towards it will be full of suffering.
There won't be any transition.
"I am glad AI is taking over human labour, but the transition towards it will be full of suffering." you shouldn't, technocrats will genocide us since we are useless now.
@@darkspace5762 It's already started
Amazing! I was developing something like this!
AnthropicAI beige color is unique 😎🎉
it will be a game changer when it gets 1000x faster
Best AI platform ever!
Does this mean, it can now bypass re-captcha?
Negative feedback: The name "Computer Use" is confusing
Apart from that, this is just unbelievable
This is awesome!
Looking forward, although it seems like a clumsy interface right now, it is the worst it will ever be. What they are actually doing is creating an adaptable interface which bridges intent with outcome. While now it is just eye tracking, tighter embedding with wetware (ie: brains) will be the natural evolution of this awesome technology. Like those couple of kids in back to the future 2 said indignantly - "You have to use your hands!?"
I'll sacrifice my job, fuck it! Let's fucking go! ACCELERATE!
W josuke pfp
W josuke pfp
Singularity in sight 😅
gross
Awesome! Now this is a good step for AI agents
It would be nice to have the cheaper Haiku 4.
Since Google and OpenAI have reduced prices for smaller models.
🙀Gotta try it out.
Thanks heaps.
"This is so cool, it might replace automation testers faster than they can write ‘Hello World!’" 😅
The fact that this is coming from Claude, and not Microsoft…I mean…😅😅
Wow, cant wait to automate my unemployment forms.😅
what is the RPA tool that you use integrated with claude? I didnt understand exactly how you did that. Amazing :)
With this beta version, is the code already working on the client if called in a client program through API?
thanks - great - definitly I will try it :)
I may be oversimplifying this but is this not just taking existing RPA style software like UIPath or AA and integrating it with AI to simplify it for the end user, and not have to worry about documenting detailed processes?
Thank you ☺
Can it scroll through different UIs? Like for the spreadsheet example is only considered the data available in the screenshot but it should have aceolled through the full spreadsheet to find what it was looking for before moving on. Is it not able to do that yet?
How were you able to run macos in a virtual machine?
What is going on with that whiteboard?
From an organization perspective, how can I lock this down so that employees cannot pull data back they are not authorized to access? On the other hand, how are you keeping this data secure on Claude's side of the house with this much visibility into organizational data?
Wow ❤ you’re the best
Sorry, but why is this a good thing? I, mostly, see dangerous use of something like this-an AI can take control over someone’s device is surly going to pose a risk for everyone. Thinking, just because we can, doesn’t mean we should. What would stop the AI from posting on your behalf on SoMe? What would stop the AI from emptying all your bank accounts? Cool, but no, is my gut feeling.
Need that hoodie. 🤩
Beautiful.
where does he come up with the orders email ?
it's right there 1:18
Pretty good, but it has to actually surpass Robotic Process Automation which has the ability to do this, of course the implementation is quite different, and 100s of times faster.
Superb ! But why the annoying music is louder than the voice of the speaker. Very disturbing.
People must be worried about how much token it's going to consume to do this task ^_^
Where do I get the hoodie?
is it safe for webform that includes recaptcha?
It seems like a science fiction well done
What song is that though?
yeah but when can it run all the software on my computer as i train it, or if it can read pdf on the software and then do all the operations needed for it? i have a lot of marketing and advertising tools i would love it to use on its own, and optimize my websites etc for ranking purposes. say i gave it a youtube channel to watch all the videos and then do all those opeartions itself. will that be coming in the future?
Ok, but be warned... when it starts getting popup ads that's when it goes full Skynet
mindblowing if claude were human he would have been more famous than MUSK
How do you ensure it doesn't take instructions from content shown on screen? Like with the demo you gave would a customer be able to insert commands for the ai in it's form so that the ai would provide sensitive data you would not want to share? I don't think you should ever let something take control over your computer if you can't guarantee the input it gets is safe. And even if you ask the ai to do a task solely on your computer without seeing files from a foreign source it may still come across them in the process of completing the task. Either through a notification pop up or the use the inbuild windows search engine that also shows internet results. Seems really not safe
I wonder if he will pass the captcha. Will he stay true to his principles, or will he prove that he's not a robot after all?
For greater effect, next time keep a small in-picture of the presenter not moving his hands.
does it passes I'm not a Robot check ?
Very cool.
you guys literally built a web browser multi agent with such a long correct planning. is this fine tuned?
Wow, very cool.
Can I create a RuneScape bot with this? 😄
old school runescape is cooked
Not the fastest way to grab data but it is universal and I cannot wait till it is used for automatic UI testing by AI.