This Autonomous AI Agent is SURPRISINGLY GOOD | MultiOn Agents Gets Stress-Tested
HTML-код
- Опубликовано: 2 окт 2024
- Learn AI With Me:
www.skool.com/...
Join my community and classroom to learn AI and get ready for the new world.
MultiOn Links:
Website: multion.ai/
Twitter: / multion_ai
Discord: / discord
GitHub: github.com/MUL...
Getting Started: docs.multion.ai
My AI Playlist:
• AI Unleashed - The Com...
It's happening!!!!
Dave Shapiro cameo!
Respect to you Dave! (I'm over a week late catching up on almost daily business!)
One of the best use case for this in my opinion will be to unsubscribe to all spam emails
This is the first agent that looks actually useful
SO FREAKING COOOOOOOL!!!! Ai assistants for everyone will revolutionize tedious personal tasks 😊
And work tasks as well, like automatically pullling reports without code
I'm really excited for this but it's spooky too.
A lot will come from the AI agents
Or take your personal tasks and leave you jobless 😂
The only reason this is not a collosal waste of time, is that you amused everyone with the failure and got some ad dollars.
The ChatGPT bit was hilarious, I knew it would not be long before we seen one AI trolling another AI!
If the end of the world was started over some roasting it would make so much sense.
So you don't have access to the api? If these things interest you, I strongly suggest you look into using the API vs using GPT when needing specific tasks requiring more than one gpt, since chatgpt is just 1 instantiation of the model, you can create 5 of them, give them each a different system instruction (defining their profile ie Engineer, Project Manager, Lead Front-End Designer, Back-End and Networking Lead...) and get these guys to begin posting to a .txt file that theey all constantly monitor for changes then add to it as they please... you can use a "baton" like strategy to give one gpt the turn to speak in order for messages to be irrelevant due to other gpts already having moved on from a subject(this is just an easy way to create a conversation ut more advanced ways I just won;t get into) Hope you have fun
lol...a proof of concept of how to us AI to train AI to sabatoge the data it would use to improve itself
I wonder if companies will do that to each other behind the sences
"automous agent swarm, go teach competitors AI how to fail at tasks by spaming incorrect inputs"
@@samthibodeau3511 Access to OpenAI API and Mistral and and Together and a few others, Perplexity, Cohere etc. So yes I sort of do... but rarely do I spend money having one AI agent screw over another.... I think its funny, but I sure as shit am not going to pay GPT-4 prices for it!
@@memegazer Shhhh you'll be giving folks ideas man... lol
We NEED to build something similar that’s open source. This is really cool tech
This is so easy to build. Easiest way is to just use OpenAI API and pass it an entire page's source code as context then have it write Selenium scripts to manipulate it. Chat GPT already does this under the hood. I don't get what's so out of reach about what they're doing?
@@dave3269 Problem is if you use the ChatGPT API it's gonna cost you. Not much, but it adds up for every entry one of your agents does. SO what you really need is a free AI preferably with vision. What's new with MultiON is the teaching, that you can write a "macro" to navigate stuff and share them between users as actions. If you set up such a thing for yourself you have to write all this for yourself.
Easy but very very expensive @@dave3269
@@dave3269 So you wanna start an OSS version of this then?
I'll contribute.
@@dave3269 hahaha respectfully you’ve got no idea what you’re talking about. I’ve been exploring this space for months and here’s why it’s challenging:
1. Passing down the DOM isn’t always sufficient. The DOM can get really long and isn’t built for AI scrapers so a lot of its buttons and tags aren’t named to be read and clicked.
2. These guys are using a combination of GPT 3.5 or a different smaller model reading the DOM + GPT4V looking at the page to click on things when necessary. To accomplish this they built a sophisticated router to decide which one to use at any time
3. Their memory length is extremely impressive. It’s likely they’re using memgpt or a different proprietary framework to achieve this affordably but it’s NOT easy to build this at all
4. There’s a bunch of stuff you need to do (fork chromium) to be able to preserve user logins and bypass easy scraper guardrails
FYI this is after a bunch of Stanford researchers and professors that spent the last 4 months working on this problem. The best open source has so far is just a mediocre GPT4V agent.
Gmorning, thanks for the walkthrough. If there is any chance you can try to create an eBay listing with it. If it can be reliable and seo smart then it's a literal game changer. Small biz resellers are frothing at the mouth for this use case to arrive and eliminate the "least liked" and "time consuming/monotonous" portions of the entire process. Thanks in advance. Have s wonderful day! Subscribed*
I have viewed this video to fulfil my current objective
spoken like a true agent
I have viewed this comment to fulfill my current objective
How do you think the CEO of Rabbit feels about this? Lol :))
Holy shit how much does bottled water cost in the US, our local Aldi has like 24 bottles for about €5! Well here in Ireland I suppose 'water' is not an issue, when the tide doesn't come in it comes down here! so its about 25 cents a liter. But $15 for a few bottles of water, friggin hell you'd be better off buying beer!
Water will be the next gold
@@jambear7862Id prefer if we could make Beer the new gold!!
I live in the state of pa and water here costs $5 as well. It’s not a standard rate around u.s.
America is a salesman's dream. You don't need any special abilities to make it in sales there.
As for why... read the name of a popular brand of bottled water, "EVIAN" backwards.
thats wiothout the service and delivery fee
Rabbit SEO said in a recent interview that LAM actually runs on cpu and is using multiple virtual machines, I rather wait for open source code to get out than paying for another subscription.
Feels like every other day there are titles like "X AI beats Z AI" the next other day "B AI beats Z AI" the couple days "C AI beats B AI" the next other days "D AI beats C AI" the next other days "E AI beats D AI"
Either be agent or AI, it's always the same pattern always the same marketing title.
@@MangaGamified yes, unfortunately it's all hype at the moment.
Reminds me of what Rabbit is promising with their LAM
How effective do you think the rabbit will be able to create an SEO optimized and accurate eBay listing off of just a photo?
@@SoSoInfiniteI’d imagine you’d have to n use plugins
@@Glotaku it seems at this rate that the rabbit will be outdated by its release
@@SoSoInfinite It's software which is upgradable, and rabbit can be taught so it won't be outdated
@@willbagshaw720 so why would the R1 device stay relevant since every other device can have the same style software
IMAGINE if OpenAI themselves would create something like that, but making it so it can use your computer and any app/window. This would essentially be proto AGI already. Or maybe even AGI
This is already so easy to build but using Android and Appium, and have GPT 4 run commands through it. I guess I should just build this for people huh.
That would be the ultimate app
It’s open interpreter
@@dave3269build it man u
MSFT is already working on that.
we are paying them to train their model?
Wes can you let them know their discord link is down?
So basically by the end of 2024 many jobs will parish nice :)
Scary when you mentioned security... Imagine a hacker teaching this thing how to hack accounts and buy bitcoin or whatever.
Also, this is such a brilliant demonstration of the near future... It would be obvious for OpenAI to bake this in to their service.
I can see MS baking it into Windows. Actually, it’s probably not even an option. This is a huge market share grab and potentially an existential threat to Windows. What if Google were to create a fully autonomous OS? Or Apple? I’m sure MS doesn’t want to find out. And Apple should be worried too.
The utility of a fully autonomous OS that can operate any software is hard to imagine. One or two people could run a massive team of accountants, animators, you name it.
Or maybe we see software company’s baking this directly into their software.
Close.. so close. I’m excited and scared at the same time.
24 hours at the present pace of tech is literally starting to feel like an eternity to wait for something 🤣
Do you use an AI that detects your voice when you're talking and only uses the video segments of you talking for your videos?
They only have to wait for 20 hours and to them that's like 20 years meanwhile I am waiting for over 2 years 😂
Soon, it will become a prerequisite for any web or mobile app to release a large action model together with its installation pack, for integration purposes, or else they won't be accepted on any app store. Companies will have employees clicking around across their own apps just to teach their LAMs how to automatically use the said app. These people will be a different team than the testers but will probably work under the QA department, just like the testers. It will surely become industry standard in any software development lifecycle.
The Discord link is invalid here and on the official MultiOn AI webpage.
Same here
Yep... Maybe theyve closed the discord due to people flooding in. Please keep us updated and post a discord link here when available :)
I ran into the same issue. Following this thread
I skipped the discord part and just downloaded the chrome extension and it worked for me.
same!
I assume the discord link is invalid on purpose as they are likely limiting the service based on theri servers and their beta testing, hope i can get access any tips?
What if this got hijacked?
I want one of these, but it needs to deal with my emails perfectly. No mistakes. I want a fire and forget bot, and I just walk away from both my mails and the bot.
First like!
I need to see some tests of giving it acces to my files locally and asking about testing some simple theory, lets say physics where it needs to write some python, then run it and collect results from output file.
The Discord link is invalid
x2
When I deeply understand the concept of everything is a function, any project becomes easy with the help of AI
Thanks, I just signed up for it. Your videos are great for nonprogrammers.
Tbh.
Can we get our arms down from the sky just a few inches and cut through the hallelujah movement just a bit? Yes AI is massive, Yes AI will change the world. But it is still pretty borderline with regards to consistency. At least for the GPT agent (which I work with on a daily basis) And see them break and fall and fail again and again to much wasted time and frustration for me and customers and colleagues. They struggle hard with keeping themselves within the guard rails of their instructions and the performance are somewhat Meh.. At least compared to how they are advertised on most AI YT channels and what not. I am sure they will be big at some point, given that the tech aren't going to drive right by their current usability. I dont really see the agent store being a smash hit. I see general AI interface becoming good enough to somewhat driving by the agents in the near future. But for now I don't really find the consistent and rigid enough to be of commercial interest.
5:33 The narration is hilarious. It would be great to watch whole RUclips videos of AI doing stuff and narrating it
“I guess that’s fine.” Is how it all ends.
Can you use this now, or on the future, via free or premium, to auto post on platforms like LinkedIn, as in fully automated "post 4 articles a day on LinkedIn, on the power of AI, reach with 200-300 words, evenly spaced out between 9am-6pm 7 days a week for the next month"?
I want to see it get through a Captcha
Captcha's are used to train AI's
Indeed, yes.@@dave3269
This is soooo ugly, impressive but ugly. I feel like u fould api this and makr it run in like 1/8th the time and better reliability
Perhaps have two agents one who is looking at backend api. Another who has become familiar wkth frontend
Well what if it was perfect and ran without you...which eventually will happen.The real question is where do humans fit in an autonomous environment.And that "freeing u up to do other things" talking point only goes so far.You will incrementally gets squuezed out.Humans had better forcast where they are going what they'll be doing when they get there.
Is there a tool like MultiOn that actually works? Tried it, but just like int he video it just can't complete 95% of tasks because it can't figure out where to click.
After working in automation for over 6 years, i can safely say that those small issues like not clicking on the red button are the same problems that have existed forever. There is no easy fix!
10:15 “I’m not sure what the problem is”’- the nightmare scenario of all black box solutions
100% Agree, 2024 is the year of the AI agent teams... especially with private local LLMs connected in a global hivemind doing all our bidding while we chill...
The new internet. I call it H.A.I. Net :D
Hi Wes, thanks for a lot of great content. You're awesome! How to join the multi on discord?
I have the same question Multion sent me instructions to join their discord server, but no invitation link. I sear ched for any public servers with "MultiOn" but came up with nothing.
Third comment
One question. One only: is this even remotely safe to access your Google account etc??? 😅😅
I wounder how capable it will become and what its limites will be. I mean there is the one limit of simpley only being able to do what a human can do but that could be a lot of things, not necessarily good.
What’s with the satanist imagery, and then you produce it again and made sure it stayed in frame after the agent was messing up. Pretty weird.
Funny that people think that has some sort of "AI" touch, its basically just writing a normal E2E Test, too test your application / website if everything works like if Input field is accessible, clicking submit button see what it returns. As soon he cant find the HTML Element hes is fucked and getting stucked as you saw
The Discord link is not working. Looks great.
Wes, if you ever read this, I love what you do. But, the way you skip things make it very difficult to replicate what is shown in your videos.
I know and I understand: once you use an app you do not necessarily remember and tell what needs to be done (i.e., the very first time one is using it...)
E.g., at 4:12 you say "Let's go to the new tab". Mate, I have no idea what you do there 🙈. I assume you go back and probably click "Playground". But I can't be sure... anyhow, when I click on Playground it gives me this annoying message "Playground access is currently not available". First, I reacted to this by setting up payment, but even after that I'm stuck. I wrote on Discord to the guys and waiting.
Keep up the good work and sometimes think about this 'old' profile you mentioned the other day watching your videos.
I am so ignorant, but on my Macbook Pro, the page you show at 7:57 seem to not exists....
Is there a standalone app for this? Or is it only browser based?
Discord invite is invalid.... Door's closed already :(
It's not that good right now but I can imagine how good it will get in a few months time. Can't wait !!
It's like a screen scraper and ChatGPT had a baby
seems to me, the agent is going to be locked out of a lot of things because it will be unable to pass the "I am not a Robot" test
so does this make that rabbit ai device already obsolete like everyone predicted
10:15 no, it can't. because extensions can't just click on certain buttons designed only for users. for example, the pload button in your case. It requires your mouseclick to be in the chain of events. It is a safery feature of chrome. It also does not have access to your keyboard, and it fills elements by javascript functions. It's biggest drawback is the fact that it has to operate in constraints of browser extention.
11:28 same case here. It can't press tab button here. Also, it can't later access the saved image even if it somehow figures out to save it because extensions do not have access to your files for obvious reasons.
This is cool, but I cant think of what i would use this for.
This company is going to go under really fast. They want to charge $0.08 per request now 😂 Imagine if Google tried that crap, they'd go under just as fast.
I'm going to find a better one that charges a few bucks a month.
😂😂 imagine the audacity of them 😂😂
Is the discord server still up? i can't find it.
What a boring video. Nothing other than a puppeteer bot
As someone who was an automation engineer who wrote bots, this type of functionality is NOT new. We had bots running in the 2010s that could do a better job at this type of thing than the AI is doing. However, it did require a substantial amount of code and time investment in order to achieve. AI's ability to generalize will make this much simpler, eventually, but the functionality is nothing new. We were able to write "generalized" automatic crawlers/navigators/extractors already.
Its a weird concept to me, in that, I think it has limited lifespan. These concepts rely on services providing human-computer-interfaces to humans, in which, the LAM's mimic in order to get things done.
However, as less and less humans use these interfaces in favour of using LAM enabled Agents; the interfaces might eventually cease to have a reason to exist. Ergo, the future of LAMs might be in utilising APIs?
An odd thought experiment is the Tesla Optimus robot driving a Cybertruck with FSD turned off.
The success of this agent would depend on you already being logged into your account on a platform to automate some of those tasks. I'm curious if:
1. It can also help log you in from a password manager like 1Password, LastPast, etc.
2. If so, how about when you have more than one account on a platform.
3. Complete a bloody 2 step login process, and then a Catcha.
4. Figure out how to use the shitty UI & workflow to log out of a Google account!!!
Then I'll not only be impressed but amazed... 💕
How about providing permission to access your different accounts through MiltiOn?
What do you think of the Arc Browser? Could be interesting to see something like MultiOn integrated there 🤔
I’ve joined the waitlist straight away. I particularly like that there’s an iOS app on the way. Most autonomous agents stuff I get excited about seems tricky/impossible from an iPad.
invalid invite for discord damn
Why can’t these hardware devices emulate Bluetooth hardware and get a usbc image feed from a graphics card like a monitor would. Then it has a clean observable image and can act as hardware inputs so it can bypass security issues or compatibility challenges? Would be really useful and portable to different current gen computer environments
Hey Wes, could u make a video about recent development in Tsetlin machines?
They look promising in reducing power usage and increasing output.
I wish more commenters would understand you do both kinds of videos. I happen to not mind the "headline" type videos even if everyone seems to want these more in depth ones for every single upload... It doesn't have to be EVERY video, guys.
This is amazing, been waiting for it for a while. Sadly the Discord link looks dead (or expired?).
This is so exiting! Exactly what I need, have so many ideas and with this tool I can see it happen. Tried the Discord Link but the invite failed, tried it through their website as well, maybe overload?
can it play videogames well ?
The discord link is invalid for me, it says its either invalid or expired?
Yet another freemium scam
Discord is too chaotic. I don't like it. Discord is the Borg, and resistance is futile, Ugh!
“We’re so close!” 😂 you’ve spent too much time on r/singularity @Wes!
Waiting for my Rabbit R1. No monthly fees
"Invite invalid" on the Discord link :-(
I think it would have been great if it had asked grok about Sam Altman's latest post
Do you think GPT5 and Gemini Ultra will be autonomous?
GPT DaVinci could be autonomous if you make it so. Not hard.
@@dave3269 That isn’t what I asked and I’m not doing anything that either of those companies can be doing.
No not natively. OpenAI and google are trying to prioritize safety, an autonomous agent is hard to control/trust so likely not. I’m sure 3rd party developers could use their api’s to make autonomous agents
Does this mean that Rabbit’s R1 LAM approach is obsolete?
This is really awesome. I'd love to see an agent that can do government benefits sites/forms, apply for an apartment, complete insurance forms, and deal with financial and legal issues.
Question: Why did you have to pause it before checkout? Does it genuinely require your banking details and order authorization? Yep, BETA, bro! They better fix that or risk losing a significant share of the market, lol!
Can I get my money back?😢
@WesRoth the discord link is invalid for me :(
What is the discord name?
I think the issues navigating apps and web by the agent will be harder than you think.
The AI rugged you 😂🤣 Definitely does seem to be spot on decision making. Impressed!
Iv tested it but its not yet there iv asked it to make an booking app it doesnt understand or cant learn it somehow but its prommising
Can it write python script to have that same automation "offline"?
everyday i eat a wes roth video with my breakfast and i get satiated with dread (and motivation). thanks for another great ai tool video
Great video. Do you have the link to the Discord server, their invite seems invalid.
If AI tried to. Buy me arrowhead water we might have to have a talk.
/imagine prompt: …
Does anyone have an updated Discord link?
Thanks for sharing man. It really keeps us in the loop of what is to come if it has not already by the time I press send… at the speed things are going. I appreciate what you do man.
this is super impressive! i'm signing up. ~SWARM!