> 9:44 Why multi-agents In addition to the mentioned benefits of using multi-agents (specialization, parallelization, and reduced cost/latency), there are several other important advantages: - Enhanced Reliability: By having multiple diverse agents attempt the same task or decision, we have a better chance of avoiding disastrous/erroneous outcome. - Improved Quality: Constructive competition among agents (if set up to do so), where each agent critiques the work of others, can lead to higher quality results.
That’s the thing about automation and why it’s taken so long for companies to properly invest. It takes quite a bit of time to do it right however, once you do… unless you have another automation problem for those new developers, you might be back to the “replacement” conversation.
So this needs a sample application to demonstrate its value. Show me something I can’t currently do with API calls to my favorite LLM and good ole fashioned code.
there are already many tools you can use to do this. Here is an example, AutoGen is a framework that enables next-gen LLM applications via multi-agent conversation. Look it up!
3 hours later of playing around with it - This is awesome!!! can I make the agents into route endpoints with something like reverse proxy and query them directly as I would API endpoints?
If we see Agents as Microservices, why not reusing existing Microservices infrastructures proved reliable from years now? Truly curious about the reasons.
You absolutely should be, I’m of the opinion that’s where the biggest gains are being made. Micro agents can enhance old exception handling processes with specialized agents redirecting requests while factoring in live system information or contextual data. In general it allows your old micro services to handle more complex tasks or accept a wider variety of inputs. Think about all the processes with some type of minimum criteria requirement which failed requests get passed to more expensive, often manual, or human involved workflows. A cheap micro agents can fill in missing details or approve alternative workflows. To say it’s a polishing for micro services is an understatement, it’s more like a powered exoskeleton with Jarvis to keep them company. 😂
popular frameworks usually come from extracting resuable bits from a proven working production system. I don't think it's productive to try to come up with some all-encompassing framework out of nothing. I recommend AI engineers to just use your existing microservice solution, figure out what's lacking for serving LLM agents, and then derive a solution from there if actually necessary. It's quite unclear what problems Llama Agents solve that's worth the migration efforts from this presentation.
Outside of Python AI bubble this is so old and natural that you would never call it an Invention 😂 Well that happens when some data scientists try to host their Jupyter Notebook 😂
I really think llama agents is utterly useless. There's no point in making agents into micro services. Just make an async call instead. Much lower overhead in terms of development and performance.
Well sounds strong but is actually not really useful as youd need async services like bpm. So...I can see the worth in those agents. And its not a coincidence Goolge is going in the sqme direction.
@@yvestschischka9584 Both views are good. For simple applications, directly using asynchronous calls can indeed reduce development and operational costs, avoiding the complexity of message queues and proxies. However, for more complex applications that need to handle intricate tasks, using message queues and proxies can offer greater flexibility and scalability.
The problem is that LLM API calls can't be async and parallelized if subsequent calls depend on results from previous calls. The more agent calls you have, the longer it takes to get a completion reply to the user. There's so much needless abstraction when these are just API calls to an LLM service.
Take a basic internet dependent search everyone always uses as an agent use case but to refute your comment let’s not use the typical lazy examples. You need to take the user’s query and properly qualify it. This could be many micro agent calls if you’re doing it right, dozens really. First, you had better be using multiple search providers per culture/language region. Focusing on the U.S. you’d have the standard Google/Bing and Brave and one of the intelligent ones like Tavily, that’s 4 services, each with a large number of arguments to help tailor the results to better address the query. How are you determining the “freshness” of the results? What about if a date range is required? You can get away with one agent determining a start/end date range but you’ll need another to determine past week/month/year. What about the general search category like web/news/images/etc? You should send the query off to a micro agent to determine what results set(s) should be targeted. You can also pass the query off to another micro agent to be rewritten to enhance results if possible. What about if the question is more technical, what about if the query would benefit from Reddit or social media profiles? You would need to send it to a Reddit specialist agent who could determine if it should be included and if so, what those parameters might be and similar for different social media. Stack overflow, Wikipedia, etc, each would benefit from separate agents targeted at each site’s content, helping to map out the search plan. Once each of these micro agents have completed their query evaluation tasks, all run in parallel of course, you then fire off the searches, again, in parallel. What do you do with 5 or 10 sets of results? You need to go through them and begin to collect the useful information from the results, firing of scrapers if/when the user’s query requires further investigation. That’s a ton of micro agents and all we might have done is accept a query and hopefully communicated some details of each of these micro background processes taking place. Llama agents seem to be a step in the right direction for deployment, organizing and sharing/reusing micro agents. A side note, what’s utterly useless is the anti-pydantic LangChain Expression Language LCEL for Python. I think their detour set back the entire AI development industry 6 months, quite possibly 12 considering how it broke everything and made samples and demos worse than useless for about a year. Cheers
I want to use Llama 3.1 8b and use a Qwiki (Quality management wiki) for RAG. If possible I would like to use a Llamafile. This whole thing should run only locally with no connection to the internet. Is there anyway I could get a tutorial on this? Possibly with the advanced RAG featured you showed in the presentation because I really do not just want a "glorified search".
> 9:44 Why multi-agents
In addition to the mentioned benefits of using multi-agents (specialization, parallelization, and reduced cost/latency), there are several other important advantages:
- Enhanced Reliability: By having multiple diverse agents attempt the same task or decision, we have a better chance of avoiding disastrous/erroneous outcome.
- Improved Quality: Constructive competition among agents (if set up to do so), where each agent critiques the work of others, can lead to higher quality results.
Nice to see AI reinventing itself, we used to call these approaches as IR, Multi-Agent Systems.
I don’t blame him. AI has come to a point where every one and their mother wants to use it. My point being that it is much removed from academia.
Like the idea of agents as microservices;
We went from Gen AI will make things easier and replace developers, to having to hire more developers and the equivalent to rocket scientists 😅
That’s the thing about automation and why it’s taken so long for companies to properly invest. It takes quite a bit of time to do it right however, once you do… unless you have another automation problem for those new developers, you might be back to the “replacement” conversation.
It's a pyramid scheme :)
So this needs a sample application to demonstrate its value. Show me something I can’t currently do with API calls to my favorite LLM and good ole fashioned code.
there are already many tools you can use to do this. Here is an example, AutoGen is a framework that enables next-gen LLM applications via multi-agent conversation. Look it up!
3 hours later of playing around with it - This is awesome!!! can I make the agents into route endpoints with something like reverse proxy and query them directly as I would API endpoints?
If we see Agents as Microservices, why not reusing existing Microservices infrastructures proved reliable from years now? Truly curious about the reasons.
@Jerry Liu
You absolutely should be, I’m of the opinion that’s where the biggest gains are being made. Micro agents can enhance old exception handling processes with specialized agents redirecting requests while factoring in live system information or contextual data. In general it allows your old micro services to handle more complex tasks or accept a wider variety of inputs. Think about all the processes with some type of minimum criteria requirement which failed requests get passed to more expensive, often manual, or human involved workflows. A cheap micro agents can fill in missing details or approve alternative workflows. To say it’s a polishing for micro services is an understatement, it’s more like a powered exoskeleton with Jarvis to keep them company. 😂
I would assume that a lot of RAG tech ultimately would be using existing technologys e.g. search/IR etc etc,
this micro agents structure was exactly what i was thinking yesterday and want to sell a saas about it
popular frameworks usually come from extracting resuable bits from a proven working production system. I don't think it's productive to try to come up with some all-encompassing framework out of nothing. I recommend AI engineers to just use your existing microservice solution, figure out what's lacking for serving LLM agents, and then derive a solution from there if actually necessary. It's quite unclear what problems Llama Agents solve that's worth the migration efforts from this presentation.
look into semantic kernel and kernel memory
Damnn, that was soo insightful! Thanks man.
Went through the repo and checked the branch list to peep possible feature branches. Who tf is Logan!!?
Outside of Python AI bubble this is so old and natural that you would never call it an Invention 😂 Well that happens when some data scientists try to host their Jupyter Notebook 😂
It just bugged me that they couldn’t even fix the word ‘response’ in box in their diagram. Why they left it broken ‘respons-e’. Lazyyyyyy
I really think llama agents is utterly useless. There's no point in making agents into micro services. Just make an async call instead. Much lower overhead in terms of development and performance.
Well sounds strong but is actually not really useful as youd need async services like bpm. So...I can see the worth in those agents. And its not a coincidence Goolge is going in the sqme direction.
@@yvestschischka9584 Both views are good.
For simple applications, directly using asynchronous calls can indeed reduce development and operational costs, avoiding the complexity of message queues and proxies.
However, for more complex applications that need to handle intricate tasks, using message queues and proxies can offer greater flexibility and scalability.
The problem is that LLM API calls can't be async and parallelized if subsequent calls depend on results from previous calls. The more agent calls you have, the longer it takes to get a completion reply to the user.
There's so much needless abstraction when these are just API calls to an LLM service.
That's not scalable. That's the whole point of microservices
Take a basic internet dependent search everyone always uses as an agent use case but to refute your comment let’s not use the typical lazy examples.
You need to take the user’s query and properly qualify it. This could be many micro agent calls if you’re doing it right, dozens really. First, you had better be using multiple search providers per culture/language region. Focusing on the U.S. you’d have the standard Google/Bing and Brave and one of the intelligent ones like Tavily, that’s 4 services, each with a large number of arguments to help tailor the results to better address the query. How are you determining the “freshness” of the results? What about if a date range is required? You can get away with one agent determining a start/end date range but you’ll need another to determine past week/month/year. What about the general search category like web/news/images/etc? You should send the query off to a micro agent to determine what results set(s) should be targeted. You can also pass the query off to another micro agent to be rewritten to enhance results if possible. What about if the question is more technical, what about if the query would benefit from Reddit or social media profiles? You would need to send it to a Reddit specialist agent who could determine if it should be included and if so, what those parameters might be and similar for different social media. Stack overflow, Wikipedia, etc, each would benefit from separate agents targeted at each site’s content, helping to map out the search plan. Once each of these micro agents have completed their query evaluation tasks, all run in parallel of course, you then fire off the searches, again, in parallel. What do you do with 5 or 10 sets of results? You need to go through them and begin to collect the useful information from the results, firing of scrapers if/when the user’s query requires further investigation. That’s a ton of micro agents and all we might have done is accept a query and hopefully communicated some details of each of these micro background processes taking place. Llama agents seem to be a step in the right direction for deployment, organizing and sharing/reusing micro agents.
A side note, what’s utterly useless is the anti-pydantic LangChain Expression Language LCEL for Python. I think their detour set back the entire AI development industry 6 months, quite possibly 12 considering how it broke everything and made samples and demos worse than useless for about a year.
Cheers
I want to use Llama 3.1 8b and use a Qwiki (Quality management wiki) for RAG. If possible I would like to use a Llamafile. This whole thing should run only locally with no connection to the internet. Is there anyway I could get a tutorial on this? Possibly with the advanced RAG featured you showed in the presentation because I really do not just want a "glorified search".