I am just recently learning about AIs. I think you did a great job explaining the connections and the tasks, progress, updating, multi jobs LLMs. For a lay person I feel I learned alot from your presentation. Thank you for sharing your knowledge in very understandable 😊 language.
I outlined this same architecture a year ago and implemented it in my general-purpose research agent 4 months ago. Since then, I've only become more convinced that this is our current best bet to move forward.
Can I assume that the orchestrator is only capable of linear execution? For example, if I were to say, "Provide me a report on Katherine Johnson and Johns Hopkins. Provide a comparison of their lives and contributions to the U.S. Generate a report on John F. Kennedy and compare his life with the others." The gathering of information on Johnson, Hopkins, and Kennedy, as well as writing the reports on these individuals, can be done in parallel. Most of the examples I have seen are not usually compound requests. Just thinking out loud here.
I think so to be able to accomplish the task the process should be sequential specially due to the fact that agent is using CoT (chain of thoughts) prompt engineering paradigm, one way for parallelization is to have sub-orchestor in the loop as much as needed and defined by the main orchestrator when detecting redudant workflow for different objective (sub-objective) defining the global objective .
challenges like the time spended in prompting to get best output , high token usage, llms bad in if / else statment, how hard it can be to RAG or do somthing in Ai feild without errors
Excited to try this with different models. Very interesting to see that o1-preview didn't perform better. I'd be curious to experiment with this for my use cases
Thanks Sam, interesting video. I have done a couple of PoCs with Autogen, as i was comparing it against CrewAI and LangGraph in the past and revisited the whole area today. The Autogen framework had a major upgrade to v.04 very recently, so this is now leveraging the new framework, I guess. Will certainly give it a test drive. Of the 3 frameworks I mentioned above, I found Autogen was the best one, a nice compromise between having some control but not having to code out the execution graphs in detail and having to especially use complex coding to create the router (that LangGraph needs).
Yes, totally agree that Autogen is an interesting framework. I've planned to do videos about it on the channel, but I've been sidetracked on other things. I will come back and maybe look at this for a future video
I think the inability to find a task to be done with an agent is less an issue with agents and more an issue with creativity. Off the top of my head: Tool use. There are things that are better with tools than without. LLMs are notoriously bad a mathematics, and while you can force them to get it (by using huge models like GPT-4, etc), it can also be done with a 7B agentic model running locally on a Raspberry Pi with access to appropriate tools and the ability to check its answer before giving it to you. Search. This is a special case of tool use, but you can give an LLM access to search functionality and let it search for the answer before presenting it to you, instead of needing to have that information in its weights. This is incredibly useful for keeping information up to date. I personally use this quite a bit for research myself. Code. You can create a coding agent that can compile code before it presents it to you, so you can verify that at the very least there are no obvious glaring errors, and if there are, the error code gets piped automatically to the LLM, to fix the issue. Workflows. You can create multiple agents which might even be the same LLM, but prompted in different ways, and they can pass information back and forth between one another. As an example, you could see a code setup where one interprets the user’s query, and then sends it to the next agent, who produces a high level overview, who passes, for instance, each bullet point to a separate coding agent who all are programmed to prioritize different things (one does business logic, another does networking, etc). You might think “oh, o1 can do this”, and that’s sort of true. o1 basically is an agent, just hidden behind an API, so you only see the “agent” that interacts with you, basically. Anyway, this sounds kind of indirect and unnecessary at first, but by limiting the focus of each agent to what it absolutely needs to do, you can massively improve their success rate, as they get side tracked easily if you give them too many details to worry about. There are some ways that agentic systems even outperform larger models, and you can actually have smaller agents prepare an issue for a larger model to solve in a more reliable manner. It’s a huge game changer in the reliability of these systems, and makes it possible for them to do…Well, real work. Guess and check. There is a class of problems that often have an elegant solution that you inevitably find a year after you finish solving it, and so at the time you’re architecting a solution, you’re left with not a lot of great ways to solve it other than brute force. There are some problems that are just easiest to make an educated guess, run the simulation, and adjust your answer slightly. Humans can do this, but LLMs don’t get tired, they don’t stop. You can just keep throwing them at the issue again and again until they get it, and this can be a massive time saver, automating certain difficult optimization problems that can require just enough intervention you can’t automate it with software, but don’t require so much attention as to justify a full person standing there waiting for it to run. SQL queries. If you have a database, it can be helpful to not have to manually craft every single query by hand, especially if they’re a one off, so having an agent who can interpret the question you have, and convert it to a query can be a huge time and headache saver. Large dataset queries. You might have large sets of images, or video, or other data that you’d like to go through to find something important, but it can be difficult to go through hours of content on your own. LLMs can use tools, meaning they can query other models, too. For instance, they can query computer vision models, vision-language models, long-context models, etc, and they can dynamically interpret the output of those models to semantically find what you’re looking for. You might know somebody suspicious was in the video, but not where, and being able to say “find the single person in suspicious clothing” isn’t exactly something you can ctrl-F, as such, but it is something that a sufficiently geared LLM might be able to interpret for you. Frankly, I think there are too many things you can solve with agents, than too few, and I think the ability to build custom software around them to do crazy things is a hugely valuable skill going forward, and it can also be a huge equalizer for open source models as agentic workflows sometimes cap-out regardless of which model you use (so you may as well run locally), and sometimes let a local (free) model equal an expensive paid one (so you may as well run locally). I suppose if you just view LLMs as a website that you go to and talk to agents might not seem as valuable, though.
A very similar Langgraph version of this structure with source code is Jar3d ruclips.net/video/4akq4SKZxyk/видео.htmlsi=irzTZ76NeorjO5Lz but without code execution.
Cool, I'll check this out. I was thinking of making a version of this with Langgraph and the dual ledger system. Because I've yet to see anyone do that in something with Langgraph.
You can. The key thing at the moment is they've got it set up to be compatible with the OpenAI API. So you would need models that understand that format etc. But like I mentioned in the video, you could actually just take their prompts and convert it to work with any model or framework that you want to.
The biggest challenge i think a lot of people have is choice. There are so many options out there. For the most part unless you are just testing things, or doing something very specific that one of these frameworks does well, you should probably just build a framework from scratch so you have total control and can get a total understanding of how these things work.
@andreasmuller5630 Not what I meant at all. I meant, aside from testing/experiments. You can look at what's available, take pieces from them, and build your own for your usecases. I'm just saying there are hundreds of agent frameworks. And a lot of them waste tokens and time. I'm also agreeing with Sam that you can just take these prompts and make your own system That doesnt rely on autogen/openai api. Even in the magentic one example from the video. You can already recreate this example without agents at all. That's all I'm trying to say.
Excellent you are the first i see who read the paper and laid out the typical Microsft way of structured compute system thinking
thanks much appreciated.
I am just recently learning about AIs. I think you did a great job explaining the connections and the tasks, progress, updating, multi jobs LLMs. For a lay person I feel I learned alot from your presentation. Thank you for sharing your knowledge in very understandable 😊 language.
I outlined this same architecture a year ago and implemented it in my general-purpose research agent 4 months ago. Since then, I've only become more convinced that this is our current best bet to move forward.
Exact same here :)
Great overview! 👍
Can I assume that the orchestrator is only capable of linear execution? For example, if I were to say, "Provide me a report on Katherine Johnson and Johns Hopkins. Provide a comparison of their lives and contributions to the U.S. Generate a report on John F. Kennedy and compare his life with the others."
The gathering of information on Johnson, Hopkins, and Kennedy, as well as writing the reports on these individuals, can be done in parallel.
Most of the examples I have seen are not usually compound requests. Just thinking out loud here.
I think so to be able to accomplish the task the process should be sequential specially due to the fact that agent is using CoT (chain of thoughts) prompt engineering paradigm, one way for parallelization is to have sub-orchestor in the loop as much as needed and defined by the main orchestrator when detecting redudant workflow for different objective (sub-objective) defining the global objective .
challenges like the time spended in prompting to get best output , high token usage, llms bad in if / else statment, how hard it can be to RAG or do somthing in Ai feild without errors
Money money. Not worthy
Excited to try this with different models. Very interesting to see that o1-preview didn't perform better. I'd be curious to experiment with this for my use cases
thanks for the video
Do you think it's possible to add and customize your own sub agents?
Totally, they've opened up their code that they used Autogen for this, so you could certainly add a sub-agent in there.
so it's a update version of ReAct when you not using one but multi model to do a task
Thanks Sam, interesting video. I have done a couple of PoCs with Autogen, as i was comparing it against CrewAI and LangGraph in the past and revisited the whole area today. The Autogen framework had a major upgrade to v.04 very recently, so this is now leveraging the new framework, I guess. Will certainly give it a test drive. Of the 3 frameworks I mentioned above, I found Autogen was the best one, a nice compromise between having some control but not having to code out the execution graphs in detail and having to especially use complex coding to create the router (that LangGraph needs).
Yes, totally agree that Autogen is an interesting framework. I've planned to do videos about it on the channel, but I've been sidetracked on other things. I will come back and maybe look at this for a future video
I find it very funny that they are using MacOS to show the agents on the paper, yet this is from Microsoft
Your tech tribalism is childish and unproductive.
@ Interesting that you assume my comment was aiming for productivity-I was just sharing a lighthearted observation about the irony.
@@daburritoda2255 That is not what unproductive means in that context.
are the agent customizable? Can we add or remove some of them in the pipeline?
yes the code is open source.
I could not find ANYTHING that is unique to agents, as LLM can do it all. please let me know if there is a task agents can do but LLM like o1 cant!
surf the web, search etc. are things that can't be done by a LLM alone. LLMs just input and output text they need to be connected to tools.
@@samwitteveenai not sure if you know, but chatGPT has search the web??!!??
Perplexity #1
I think the inability to find a task to be done with an agent is less an issue with agents and more an issue with creativity.
Off the top of my head:
Tool use. There are things that are better with tools than without. LLMs are notoriously bad a mathematics, and while you can force them to get it (by using huge models like GPT-4, etc), it can also be done with a 7B agentic model running locally on a Raspberry Pi with access to appropriate tools and the ability to check its answer before giving it to you.
Search. This is a special case of tool use, but you can give an LLM access to search functionality and let it search for the answer before presenting it to you, instead of needing to have that information in its weights. This is incredibly useful for keeping information up to date. I personally use this quite a bit for research myself.
Code. You can create a coding agent that can compile code before it presents it to you, so you can verify that at the very least there are no obvious glaring errors, and if there are, the error code gets piped automatically to the LLM, to fix the issue.
Workflows. You can create multiple agents which might even be the same LLM, but prompted in different ways, and they can pass information back and forth between one another. As an example, you could see a code setup where one interprets the user’s query, and then sends it to the next agent, who produces a high level overview, who passes, for instance, each bullet point to a separate coding agent who all are programmed to prioritize different things (one does business logic, another does networking, etc). You might think “oh, o1 can do this”, and that’s sort of true. o1 basically is an agent, just hidden behind an API, so you only see the “agent” that interacts with you, basically. Anyway, this sounds kind of indirect and unnecessary at first, but by limiting the focus of each agent to what it absolutely needs to do, you can massively improve their success rate, as they get side tracked easily if you give them too many details to worry about. There are some ways that agentic systems even outperform larger models, and you can actually have smaller agents prepare an issue for a larger model to solve in a more reliable manner. It’s a huge game changer in the reliability of these systems, and makes it possible for them to do…Well, real work.
Guess and check. There is a class of problems that often have an elegant solution that you inevitably find a year after you finish solving it, and so at the time you’re architecting a solution, you’re left with not a lot of great ways to solve it other than brute force. There are some problems that are just easiest to make an educated guess, run the simulation, and adjust your answer slightly. Humans can do this, but LLMs don’t get tired, they don’t stop. You can just keep throwing them at the issue again and again until they get it, and this can be a massive time saver, automating certain difficult optimization problems that can require just enough intervention you can’t automate it with software, but don’t require so much attention as to justify a full person standing there waiting for it to run.
SQL queries. If you have a database, it can be helpful to not have to manually craft every single query by hand, especially if they’re a one off, so having an agent who can interpret the question you have, and convert it to a query can be a huge time and headache saver.
Large dataset queries. You might have large sets of images, or video, or other data that you’d like to go through to find something important, but it can be difficult to go through hours of content on your own. LLMs can use tools, meaning they can query other models, too. For instance, they can query computer vision models, vision-language models, long-context models, etc, and they can dynamically interpret the output of those models to semantically find what you’re looking for. You might know somebody suspicious was in the video, but not where, and being able to say “find the single person in suspicious clothing” isn’t exactly something you can ctrl-F, as such, but it is something that a sufficiently geared LLM might be able to interpret for you.
Frankly, I think there are too many things you can solve with agents, than too few, and I think the ability to build custom software around them to do crazy things is a hugely valuable skill going forward, and it can also be a huge equalizer for open source models as agentic workflows sometimes cap-out regardless of which model you use (so you may as well run locally), and sometimes let a local (free) model equal an expensive paid one (so you may as well run locally).
I suppose if you just view LLMs as a website that you go to and talk to agents might not seem as valuable, though.
@@novantha1 did you prompt chatgpt to write this comment???
Is there a link to the code samples or anything that can be a starting point? The link in the description just seems to go to huggingface Ollama?
Sorry about that. Just updated the links in the description now so it links to both the blog and to the code in Autogen.
A very similar Langgraph version of this structure with source code is Jar3d ruclips.net/video/4akq4SKZxyk/видео.htmlsi=irzTZ76NeorjO5Lz but without code execution.
Cool, I'll check this out. I was thinking of making a version of this with Langgraph and the dual ledger system. Because I've yet to see anyone do that in something with Langgraph.
Ok, interesting been doing this for months. But instead of a ledger i do it recursively. Glad I'm on the right track.
well,may i use another LLM ?
You can. The key thing at the moment is they've got it set up to be compatible with the OpenAI API. So you would need models that understand that format etc. But like I mentioned in the video, you could actually just take their prompts and convert it to work with any model or framework that you want to.
@@samwitteveenai cool gogogo 谢谢
This is just async
The biggest challenge i think a lot of people have is choice. There are so many options out there. For the most part unless you are just testing things, or doing something very specific that one of these frameworks does well, you should probably just build a framework from scratch so you have total control and can get a total understanding of how these things work.
Ok, so because there are so many choices one should pick neither and do everything by themself. How does this make any sense?
@andreasmuller5630 Not what I meant at all. I meant, aside from testing/experiments. You can look at what's available, take pieces from them, and build your own for your usecases. I'm just saying there are hundreds of agent frameworks. And a lot of them waste tokens and time.
I'm also agreeing with Sam that you can just take these prompts and make your own system That doesnt rely on autogen/openai api.
Even in the magentic one example from the video. You can already recreate this example without agents at all. That's all I'm trying to say.
If you are having problems picking one, you can have an LLM pick for you.
heck yeah, GPT automated Symbolic AI 🦾
$ Ching Ching $