Building Agents: Visualize a Multi-Agent Workflow that Outperforms a Single SOTA Prompt
HTML-код
- Опубликовано: 3 июн 2024
- In this demo, I combine several agentic patterns - reflection, planning, and multi-agent workflows - to replace a complex prompt. I was able to match results from GPT-4 by combining multiple steps utilizing only GPT-3.5 and Claude Haiku.
This video was inspired by Andrew Ng's recent work on agentic workflows, in which he demonstrates the potential to exceed state-of-the-art performance in LLMs using agentic workflows over single prompts. Ng showed that non-SOTA models, like GPT-3.5, can outperform even GPT-4 when utilized within an agentic framework.
I recommend you watch Andrew's talk ( • What's next for AI age... ) or read his article (www.deeplearning.ai/the-batch..., they're both excellent.
This demo build on a previous demo I shared, where I explored creating an agent to extract long-term memories. You can view that demo here: • Build an Agent with Lo...
Interested in talking about a project? Reach out!
Email: christian@botany-ai.com
LinkedIn: linkedin.com/in/christianerice
Follow along with the code on GitHub:
github.com/christianrice/ai-d...
Timestamps:
0:00 - Intro
0:27 - Basic Demo
1:16 - Why Add Agentic Reasoning?
2:43 - Agentic Reasoning Design Patterns
4:41 - Improvements from Agentic Reasoning
5:42 - System Design
7:58 - Demo
11:22 - View the Prompts
13:30 - Considerations
14:04 - Code Explanation
15:10 - Closing Thoughts - Наука
I did something similar,
- generate triplets from the information
- check / review triplets (if bad refine, if good go to next step)
- save to neo4j as knowledge graph
Awesome! How did that approach work for you? If this were going to production, I'd definitely compare a few different workflow approaches.
Thanks for sharing this project!
Thanks for watching!
Hi! Really good walkthrough. Have one question. How did you create and deploy the ui front end part of this? Was wondering if you used the lang serve. As a non-developer wanting to create quick proof of concept for potential users, was wondering if there's a lean way to deploy on local the user interface. Realized your github only have ipynbs. Thank you so much!
Thanks for this video, it's really helpful!!
I like your videos! Keep on doing more videos 😊
Amazing improvement on your last video on Memory. Any way I can get access to the Frontend you're using in your videos?
This! Are you able to share it? Would even pay a small fee for it.
Thanks for sharing! Have you experimented with asking the models to generate prompts for you for each step? It could accelerate the workflow building :)
Thank you for your videos, your videos help me much more than the official LangChain videos. Could you please also a video where you use LangChain Agents e.g. the Tool Calling Agent?
Thanks for this straightforward explanation of agentic memory formation. I’m very curious about how you chose which layers should use Anthropic Claude Haiku vs OpenAI GPT-3.5-turbo.
Great question! Since this was just exploratory, I didn't give it too much thought. In my first iteration, I used GPT-3.5 for each step, and I didn't find it to be sufficiently critical of itself for reflection. I chose GPT-3.5 since it was one of the best inexpensive models that supported reliable JSON output and tool calling, and building and trying demos like this is effectively free to do. But now that Claude supports tool calling, I pulled in Haiku for reflection to give that a try. Its output is a bit more critical, but the prompts could use some work to improve it. If this were going to production, I'd evaluate the model choices a lot more carefully than I did for this demo.
@@deployingai Thanks for sharing your logic. I have been very impressed by the Claude 3 models’ new tool use capabilities too.
Thinking if there would be a way for it to build a knowledge graph... Somehow?
Yeah this could be a great use case for a knowledge graph. It would easily make sense for family members and foods to be entities with relationships likes like/dislike/allergy, and the whole 'attributes' catchall could be expanded to encompass much richer data.
What are you using to build the Frontend? looks neat
Darude sandstorm
I used Radix UI and Tailwind, great for throwing something together quickly!
@@deployingai can you share the code for this UI
Hi can you publish your langsmith traces for this? I am trying to implement this for models without tool calling. It will be incredibly helpful
my problem is that long term memory drastically increases prompt size. so you either need multiple long term memories depending on type of the prompt or local AI that server as AI router deciding what prompt needs from long term memory
🎯 Key Takeaways for quick navigation:
00:00 *🧠 Building an agentic workflow from a complex prompt*
- Demonstrated the process of dividing a single prompt into multiple agentic steps.
- Divided the prompt into memory extraction, reflective review, action assignment, and category assignment steps.
- Discussed the importance of breaking down prompts for improved accuracy and cost efficiency.
02:44 *🔄 Andrew Yang's Four Agentic Workflow Methods*
- Shared insights from Andrew Yang's work on improving agentic workflows using reflection, tool use, planning, and multi-agent collaboration.
- Explained how combining these methods can enhance the quality of results in workflows.
- Highlighted the benefits of incorporating reflection, tool use, planning, and multi-agent collaboration in workflows.
05:17 *🚀 Enhancing Accuracy with Multi-Agent Workflows*
- Demonstrated the implementation of a multi-agent workflow in processing prompts.
- Showcased the division of the prompt into memory extraction, action assignment, and category assignment steps with reflective feedback loops.
- Discussed the trade-offs between accuracy, cost efficiency, and processing speed in multi-agent workflows.
Made with HARPA AI
Grok would be fast as hell. I would be interested to see the performance looping grok in.
likwise. But currently find groq does not output consistently for tool_calls / JSON. Any experience with improving this?
@hiranga The only thing I can think of is you don't depend on grok for doing the main function calling and controlling of the application logic flow. So I would use gpt4 as the app controller / router and delegate work tasks to faster models.
Each model is going to have its own Unique challenges.
Good idea! You're right, if speed is important then grok could offer a big gain. And you could probably rework the workflow to reduce its reliance on structured outputs if that proves to be a problem.
Hey Christian! Really enjoy your videos, are you on Twitter by any chance? Would love to share some stuff with you
Are you open to consulting? I just emailed you