WebVoyager
HTML-код
- Опубликовано: 19 окт 2024
- WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models
WebVoyager is a new vision-powered web-browsing agent that uses browser screenshots and “Set-of-mark” prompting to conduct research, analyze images, and perform other tasks.
In this video, we will show you how to build WebVoyager using LangGraph, an open-source framework for building stateful, multi-actor AI applications.
Links:
Python Code: github.com/lan...
WebVoyager Paper: arxiv.org/abs/...
Set-of-mark Paper: arxiv.org/abs/...
Developing AI applications is easier with LangSmith. Create a free account at smith.langchai....
New to LangGraph? Check out the intro video: • LangGraph: Intro
My left ear enjoyed this video very much
LOL I thought my headphones were broken
@@Jakolo121i kept mine for charging 😂
Sorry about that... not sure why!
On mac: system settings > accessibility > audio > play stereo audio as mono. Just remember to switch it back to off after this video
I greatly appreciate the thorough, simple and easy to understand explanations, especially surrounding LangGraph
please lets make a crowfunding to give him money for a better microphone, his videos are really good, he deserves it, thanks for the amazing contribution to the community
That is so cool that you guy make video about different use cases. Please, improve sound quality and describe topics more detailed.🙂
Is there a way to do this using other LMMs such Gemini pro vision or Llava 1.6 ?
Can it used to define any url and do kind of functionality testing? Tried changing the url but didn't worked.
Creative and clean! the sound could be improved though. Still great value
I read the example code when I came here I was understanding a little bit the code but once I take a look at its langgraph video here I feel so confused because the pace of the video is so fast
Finally someone can relate
How you run this as a python script and not in jupyter notebook? I am getting an error "Event loop is closed", perhaps related to asyncio
Did you got it solved? If so can you help?
can we use llava model here from ollama?
We want agent with local
Open source Llm with memory implementation, 😊
This is great, ty!
Is anyone else getting prompt must be 'str' error with this code?
very interesting idea!
prompt error on the hub
Phenomenal
This is very Cool.😃
These are good , But looking for JavaScript support
I would like to implement a "Learning Mode" for this WebVoyager Agent. In order to teach this agent an action by recording a manual navigation through the browser and then save it as a "Tool" or a "Succesion of steps".
Could you please give me some references or some clues of how can I acchieve this ?
If you got the solution, please do share. working on something similar
perhaps use RAG for this purpose... so every set of action can be added to a vector database along with its result and before taking any steps the agent can do a quick vector search to see if that action has been done before and the successful series of steps taken
Did anyone try this with a local model? (Llava for example)
Awesome
Nice, but it seems to have some glitches that need to be ironed out. Nevertheless, great work!
Awesome project, but he is only speaking to my right ear.
You have your headphones on backward.