"Make Agent 10x cheaper, faster & better?" - LLM System Evaluation 101
HTML-код
- Опубликовано: 28 июн 2024
- LLM System Eval 101 - Build better agents
Get free HubSpot report of how to land a Job using AI: clickhubspot.com/fo2
🔗 Links
- Follow me on twitter: / jasonzhou1993
- Join my AI email list: www.ai-jason.com/
- My discord: / discord
- Langsmith: smith.langchain.com/
- Phoenix: phoenix.arize.com/
- Arize LLM Evaluation guide: arize.com/blog-course/llm-eva...
- Web scraping agent video: • “Wait, this Agent can ...
- Signup for universal web scraper: forms.gle/zN9w9UyhMKx59yAE6
⏱️ Timestamps
0:00 Intro
0:27 Why Eval is important
3:30 LLM as evaluator
5:54 How to build eval system
15:10 Case study - Eval & improve research agent
👋🏻 About Me
My name is Jason Zhou, a product designer who shares interesting AI experiments & products. Email me if you need help building AI apps! ask@ai-jason.com
#gpt4o #aiagents #rag #llamaparse #llamaindex #gpt5 #autogen #gpt4 #autogpt #ai #artificialintelligence #tutorial #stepbystep #openai #llm #chatgpt #largelanguagemodels #largelanguagemodel #bestaiagent #chatgpt #agentgpt #agent #babyagi #evaluation - Наука
This is gold, most of people just show you how to build toy demo, but not many actually get into details of how to get into production; Thank you Jason!
Couldnt agree more. This is gold.
I recently be created a whole testing system for our LLM chatbots and we did exactly this:
LLM as evaluator and code
We created it as a series of unit tests with LLM generated cases.
Since our results were mostly conversational, we made tests pass/fail according to a scoring system
This is great. Loved the use of firecrawl (as a scrape tool) to get the website's data. Feel like it always helps improve the model output quality. Cheers!
Amazing work as always Jason!
Way excellent video that goes well beyond demo. Thank you very much for this guidance.
Been looking for more detail on eval on LLMs and been scratching around for a while. Thanks for this.
Finally you back 🎉
Awesome! Keep up the great work!
goddamn Jason your videos just blow my mind each time. Thanks for such a thorough explanation and example.
This is so good, thanks man!
I've used promptfoo for some of my test with local llm to test the ai workflow. It allow you to write assertion like you'll do with software
Great Video
lesgooo!! ❤🔥❤🔥❤🔥
I found langfuse metric monitoring little bit better.
fireeee content!
fine tune llama 3 (8bit) - you will get exactly the behavior you want - its what I do
Sick, whats the best practice metrics for evaluating agents?
great stuff, as new to hearing this, very interesting, can this be built by a novice . . .
jason can we get another video about comfy ui?
Who never spent 4 hours to save 10 min? That's our hobby spent time to save time.
If 25 people or more use it successfully then you literally gave humanity more time to live and be free
I love how my Ai girl insults the competion with flame balls,then tells me.she loves me.❤🎉😊
Audio could have been better imo
I agree Jason it sounded like Jason was a little too close to the microphone, but great video otherwise!
Why not use Gemini as the LLM? It is free.
Lets me share my experience about any google AI model ... because it doesn't understand human and it hallucinate way too much.
Practically ... in my cases 75% of the time what I get back is totally useless result. You cant use for anything... To be considered for evaluation ... you must be joking
I dont see the value of "Agents". All of this stuff is easily done with basic function calling. I think I'm going to need to see some more creative use cases before I jump on board, i just dont get it yet.
Maybe we can discuss this, I am trying to jump on in but not until I find a decent idea to apply.
when your assistant has a lot of functions, he starts giving out hallucinations, have you ever encountered this?
Good content but so hard to listen to his Engrish. Monotonous Pitch n sped up delivery didn’t seem to help either.