Data Centric
Data Centric
  • Видео 37
  • Просмотров 107 640
Llama 3 70B Custom AI Agent: Better Than Perplexity AI?
This is the second in a series of videos where I will be testing various open-source models with my custom web search agent to see how well they perform. In this video, I benchmark Llama 3 70b, hosted on Runpod, against Perplexity AI to see if my web search agent measures up.
Need to develop some AI? Let's chat: www.brainqub3.com/book-online
Register your interest in the AI Engineering Take-off course: ruclips.net/video/Fml9Hh_Xxms/видео.htmlfeature=shared
Hands-on project (build a basic RAG app): www.educative.io/projects/build-an-llm-powered-wikipedia-chat-assistant-with-rag
Stay updated on AI, Data Science, and Large Language Models by following me on Medium: medium.com/@johnadeojo
Build yo...
Просмотров: 32

Видео

Can My Ollama Local WebSearch Agent (With Llama 3 8B) Beat Perplexity AI?
Просмотров 3,1 тыс.9 часов назад
This is the first in a series of videos where I will be testing various open-source models with my custom web search agent to see how well they perform. In this video, I benchmark Llama 3 8b, hosted locally, against Perplexity AI to see if my web search agent measures up. Need to develop some AI? Let's chat: www.brainqub3.com/book-online Register your interest in the AI Engineering Take-off cou...
Build Open Source "Perplexity" agent with Llama3 70b & Runpod - Works with Any Hugging Face LLM!
Просмотров 4,5 тыс.12 часов назад
In this video, you'll learn how to build a custom AI agent using the powerful Llama 3 70b model deployed on Runpod using vLLM. This method is also compatible with any Hugging Face LLM, providing flexibility and scalability for your AI projects. Need to develop some AI? Let's chat: www.brainqub3.com/book-online Register your interest in the AI Engineering Take-off course: ruclips.net/video/Fml9H...
Build your own Local "Perplexity" with Ollama - Deep Dive
Просмотров 6 тыс.14 часов назад
Local Llama 3 Custom Web Search Agent with Ollama - Deep Dive Join me for another technical deep dive as I walk you through how I built a custom web search agent that runs entirely on your local machine. Need to develop some AI? Let's chat: www.brainqub3.com/book-online Register your interest in the AI Engineering Take-off course: ruclips.net/video/Fml9Hh_Xxms/видео.htmlfeature=shared Hands-on ...
Agency Swarm: Why It’s Better Than CrewAI & AutoGen
Просмотров 15 тыс.День назад
In this deep dive, I’ll explore how Agency Swarm is challenging CrewAI and AutoGen. Join me as I demonstrate how to use Agency Swarm’s framework to develop a simple web search agent. Stick around till the end for my final verdict on this innovative tool. Need to develop some AI? Let's chat: www.brainqub3.com/book-online Register your interest in the AI Engineering Take-off course: ruclips.net/v...
Forget CrewAI & AutoGen, Build CUSTOM AI Agents!
Просмотров 14 тыс.14 дней назад
If you are developing AI agents or multi-agent workflows, it is often better to create your own custom agents rather than relying on existing frameworks like CrewAI or Autogen. In this guide, I will demonstrate how I developed a simple custom web search agent in Python and explain why custom solutions are superior to the current one-size-fits-all frameworks available. Need to develop some AI? L...
Why I'm Staying Away from Crew AI: My Honest Opinion
Просмотров 17 тыс.14 дней назад
Crew AI is not suitable for production use cases. I’ll be going through why I believe this is the case and what you should do instead when building your own apps. Need to develop some AI? Let's chat: www.brainqub3.com/book-online Register your interest in the AI Engineering Take-off course: www.data-centric-solutions.com/course Hands-on project (build a basic RAG app): www.educative.io/projects...
How to get LLaMa 3 UNCENSORED with Runpod & vLLM
Просмотров 1,5 тыс.21 день назад
How to get up and running with the uncensored Llama 3 model. You'll learn how to build an uncensored Llama 3 chatbot using vLLM and Runpod. Need to develop some AI? Let's chat: www.brainqub3.com/book-online Register your interest in the AI Engineering Take-off course: ruclips.net/video/Fml9Hh_Xxms/видео.htmlfeature=shared Hands-on project (build a basic RAG app): www.educative.io/projects/build...
Host Your Own Llama 3 Chatbot in Just 10 Minutes! with Runpod & vLLM
Просмотров 1,4 тыс.Месяц назад
Get the lowdown on how to host the Llama 3 8B model for a slick chatbot, using vLLM's inference server, Runpod GPUs, and Chainlit for a smart front end. You'll learn the ins and outs of hosting your own Llama 3 model on Runpod and piecing together a chatbot that's light on its feet, without bogging down in hefty frameworks. This video serves as Lecture 3 of the AI Engineering Takeoff course. Ne...
Mistral AI vs Open AI: Who REALLY Has The Best AI? Rap battle using Suno AI
Просмотров 512Месяц назад
We put the Mistral models to the test, battling against OpenAI’s GPT models to see who comes out on top. Suno AI: sunoai.ai/ Need to develop some AI? Let's chat: www.brainqub3.com/book-online Register your interest in the AI Engineering Take-off course: ruclips.net/video/Fml9Hh_Xxms/видео.htmlfeature=shared Hands-on project (build a basic RAG app): www.educative.io/projects/build-an-llm-powered...
WHY Retrieval Augmented Generation (RAG) is OVERRATED!
Просмотров 2,4 тыс.Месяц назад
Retrieval augmented generation (RAG) is over-hyped. I'll explain why this is the case, having worked on products with RAG as their core functionality. Read the blog post that complements this RUclips content: Comin soon! Need to develop some AI? Let's chat: www.brainqub3.com/book-online Register your interest in the AI Engineering Take-off course: ruclips.net/video/Fml9Hh_Xxms/видео.htmlfeature...
Is AutoGen just HYPE? Why I would not use AUTOGEN in a REAL use case, Yet
Просмотров 5 тыс.3 месяца назад
In this video, we delve into AutoGen and learn about its functioning while distinguishing between the hype and reality. Although AutoGen is an excellent resource for creating multi-agent workflows, it is still unsuitable for production. Watch this video if you want to know what AutoGen can do beyond the hype. Read the blog post that complements this RUclips content: johnadeojo.medium.com/autoge...
Your GUIDE to Hugging Face, GPUs, OpenAI, LangChain + More in the LLM Ecosystem - Lecture 2
Просмотров 2,9 тыс.3 месяца назад
If you've ever felt puzzled and swamped by the numerous advancements in the Large Language Models (LLM) field, this video is designed for you. It offers insights into questions like "What is Hugging Face?", "How do LLMs work?", among others. I aim to clarify the LLM ecosystem, making it easier for you to understand the essentials for developing LLM applications. This forms the second lecture of...
Deploy Mixtral, QUICK Setup - Works with LangChain, AutoGen, Haystack & LlamaIndex
Просмотров 1,1 тыс.3 месяца назад
In this video, I demonstrate how you can swiftly get started with Mixtral. Utilising Runpod and vLLM, you will learn how to deploy a Mixtral endpoint that emulates OpenAI. I'll show you how we can seamlessly integrate this endpoint into a chatbot using Langchain. This deployment pattern can help you get up and running with any LLM. Read the blog post to learn how to integrate with Llama Index, ...
Building Chatbots with Hugging Face LLMs: 5 Expert Tips ft. Mistral
Просмотров 9034 месяца назад
In this video, I share five engineering tips for using open-source models to build chat-bot applications. If your experience lies predominantly with OpenAI or similar proprietary APIs, you'll find this particularly insightful. Transitioning to open-source models brings with it a host of technical nuances and tricks that are crucial to understand. Read the blog post that complements this RUclips...
CUSTOM RAG Pipelines & LLM Fine-Tuning: A GRADIENT Tutorial
Просмотров 8024 месяца назад
CUSTOM RAG Pipelines & LLM Fine-Tuning: A GRADIENT Tutorial
Zapier AI Gmail Automation: How to Automate Mundane Tasks and Save Hours
Просмотров 6724 месяца назад
Zapier AI Gmail Automation: How to Automate Mundane Tasks and Save Hours
LLMs & Transformers Demystified: Your Intro to AI Engineering (Lecture 1)
Просмотров 5355 месяцев назад
LLMs & Transformers Demystified: Your Intro to AI Engineering (Lecture 1)
GPT-4 vs Open Source LLMs: Epic Rap Battles Test Creativity with AutoGen!
Просмотров 6485 месяцев назад
GPT-4 vs Open Source LLMs: Epic Rap Battles Test Creativity with AutoGen!
Surprising Debate Showdown: GPT-4 Turbo vs. Orca-2-13B - Programmed with AutoGen!
Просмотров 7 тыс.5 месяцев назад
Surprising Debate Showdown: GPT-4 Turbo vs. Orca-2-13B - Programmed with AutoGen!
LLM Projects - How to use Open Source LLMs with AutoGen - Deploying Llama 2 70B Tutorial
Просмотров 3,4 тыс.5 месяцев назад
LLM Projects - How to use Open Source LLMs with AutoGen - Deploying Llama 2 70B Tutorial
AutoGen AI Travel Agent TUTORIAL with GitHub Repo!
Просмотров 8 тыс.6 месяцев назад
AutoGen AI Travel Agent TUTORIAL with GitHub Repo!
Live Demo: Building a Custom GPT with OpenAI's GPTs (Game Changer)
Просмотров 1,1 тыс.6 месяцев назад
Live Demo: Building a Custom GPT with OpenAI's GPTs (Game Changer)
LLM Projects - A Quick Tutorial on Multi-Agent Workflows with AutoGen
Просмотров 7 тыс.6 месяцев назад
LLM Projects - A Quick Tutorial on Multi-Agent Workflows with AutoGen
Exploring DSPy, a More Robust and Systematic Approach to Prompt Engineering
Просмотров 1,5 тыс.7 месяцев назад
Exploring DSPy, a More Robust and Systematic Approach to Prompt Engineering
Is Enterprise Jargon Confusing your RAG System? Try These Tactics
Просмотров 1178 месяцев назад
Is Enterprise Jargon Confusing your RAG System? Try These Tactics
Don't Build LLM-Powered Apps...Before Knowing About RAG (Retrieval Augmented Generation)
Просмотров 3118 месяцев назад
Don't Build LLM-Powered Apps...Before Knowing About RAG (Retrieval Augmented Generation)
6 High Impact Large Language Model Applications (Feasible Today!)
Просмотров 1808 месяцев назад
6 High Impact Large Language Model Applications (Feasible Today!)
Should We Be Concerned About AGI Emerging from Large Language Models?
Просмотров 1448 месяцев назад
Should We Be Concerned About AGI Emerging from Large Language Models?
Why you can't use Llama 2 (70B), Yet
Просмотров 1739 месяцев назад
Why you can't use Llama 2 (70B), Yet

Комментарии

  • @madhudson1
    @madhudson1 8 минут назад

    I haven't gone back and looked at your implementation of your agent workflow, but when you're talking about restrictions in context windows with your scraping. Are you using RAG with the large documents you're scraping?

  • @harshraj67
    @harshraj67 10 часов назад

    Great Video, is there any way to use groq api?

  • @6lack5ushi
    @6lack5ushi 12 часов назад

    I love this so much!!! Is this a bad answer to the Napoleon “Napoleon occupied the city where the mother of the woman who brought Louis XVI style to the court died in **1804**. Output from operation 0002: Marie Antoinette, the Queen consort of King Louis XVI of France, is known for bringing a more extravagant and luxurious style to the French court during her husband's reign. She was born an Archduchess of Austria and was the youngest daughter of Empress Maria Theresa and Emperor Francis I. Marie Antoinette's mother, Empress Maria Theresa, died on November 29, 1780, at the Hofburg Palace in Vienna, Austria. She died of natural causes at the age of 63, having ruled the Habsburg Empire for 40 years. Her death greatly affected Marie Antoinette, who was very close to her mother despite living in France since her marriage to Louis XVI in 1770. Output from operation 0000: Marie Antoinette, the Queen consort of King Louis XVI of France, is known for bringing a more extravagant and luxurious style to the French court during her husband's reign. Born an Archduchess of Austria, she was the youngest daughter of Empress Maria Theresa and Emperor Francis I. Empress Maria Theresa, Marie Antoinette's mother, died on November 29, 1780, at the Hofburg Palace in Vienna, Austria. She died of natural causes at the age of 63, having ruled the Habsburg Empire for 40 years. Her death greatly affected Marie Antoinette, who was very close to her mother despite living in France since her marriage to Louis XVI in 1770. Napoleon occupied the city where Marie Antoinette's mother, Empress Maria Theresa, died in **1804**.” Off by a year??? But this is exactly why we built our own in house solution and don’t rely on crew ai or any other multi agent framework

  • @6lack5ushi
    @6lack5ushi 12 часов назад

    The billy Giles answer is interesting there is a billy giles here who died in America New York. He died in New York “He died at Mount Sinai Hospital in New York City on Sept. 25, 2021 after an eight year struggle with progressive anti-MAG peripheral neuropathy” But I’m guessing this is the wrong billy giles

    • @6lack5ushi
      @6lack5ushi 12 часов назад

      At the same time Google will say Belfast?! So truth becomes the crux in these questions

  • @MrGluepower
    @MrGluepower 13 часов назад

    Agency Swarm might be good, but I doubt it and have no intention of checking it out. The founder's attitude, acting like he is the best thing since sliced bread, belittling other solutions instead of pointing out his strengths, just doesn't work for me.

  • @Mohamed-sq8od
    @Mohamed-sq8od 21 час назад

    if you base you whole opinion on crewai on the cost of tokens of openai, use a local model or ollama :p

  • @jeffreypaarhuis8169
    @jeffreypaarhuis8169 22 часа назад

    Great video again. Can't wait for you to try and run the coding models.

  • @MrGluepower
    @MrGluepower День назад

    The use case of multi-hop questions is such a niche use case and you are presenting it as core feature for multiagents. Sure, that use case might be challenging but in real corporate world usual workflows as not of that format. Each step is simple but there are 10,000 steps.

    • @Data-Centric
      @Data-Centric 20 часов назад

      We are building automated workflows for clients and we are not using crew AI. For your simple steps, RPA or something like Zapier can work. The use is niche, but it was a way to showcase what these agentic workflows can do without requiring access to corporate data.

  • @octadion3274
    @octadion3274 День назад

    why i cant connect to http 8000 ?

  • @rehaanathiq528
    @rehaanathiq528 День назад

    Hey man, Love your videos! Is there anyway you could do one on the RAPTOR method...by using knowledge graphs of the data as grounding please?

    • @Data-Centric
      @Data-Centric 20 часов назад

      I'll look into it. Thanks for the comment!

  • @jakeparker918
    @jakeparker918 День назад

    I've had the best luck constructing the json and dictionary with classic code based on whether or not the LLM thinks the content of a specific page answers a question.

  • @scotter
    @scotter День назад

    Apologies if you gave the following information and I missed it. What was the price difference between using 3.5-turbo and 4o?

  • @enkor349
    @enkor349 День назад

    Yeah, the pro version of perplexity had no issue with the last question.

  • @scotter
    @scotter День назад

    I love your way of giving tutorials! I'll dive fully into Agency Swarm if/when I can use underlying LLMs other than OpenAI. Philosophical reasons.

  • @alexemilcar6525
    @alexemilcar6525 День назад

    Great explanation video, but bad use case for multi agents framework

  • @carlosvasquez-xp8ei
    @carlosvasquez-xp8ei 2 дня назад

    Wow. Outstanding.

  • @robsoncoutinhoti
    @robsoncoutinhoti 2 дня назад

    Thank you, my brother, for sharing your knowledge. This video is very helpful for me because I'm creating a new feature and I'll use agents to accomplish my needs. This video provides me with many insights.

  • @leoncariz
    @leoncariz 2 дня назад

    forgot to say, love your content and thank you for providing such great insight

  • @leoncariz
    @leoncariz 2 дня назад

    I think they are different animals. I was able to get Llama 3 8b model to tell me when it was trained. The dataset dates back to December 2020. Facebook and alikes are providing stuff to us with 3-4 years lag while they stay on latest. Perplexity is even stating on their enterprise website that "Your data stays yours, period. We never train our LLMs on our enterprise customers’ data." so they are using our searches and Google crawling type technology to train their models constantly. They might even scrape this comment and your video transcript next and use to train their data. Latest date in training dataset: December 31, 2020 23:59:59 UTC Training data collection period: January 1, 2016 00:00:00 UTC to December 31, 2020 23:59:59 UTC Training data size: 45,678,901,234 bytes (approximately 45.7 GB) Training data sources: Common Crawl dataset, which includes web pages from the internet Training data format: JSON files containing text data, with a total of 12,345,678 files Training data processing: Preprocessed using NLTK and spaCy libraries for tokenization, stemming, and lemmatization Model architecture: Transformer-based architecture with 12 layers, 16 attention heads, and a hidden size of 128 Model training: Trained using the Adam optimizer with a learning rate of 0.001 and a batch size of 32 Model evaluation: Evaluated using the perplexity metric, with a minimum perplexity of 10 Model updates: Updated regularly to incorporate new data and improve performance

  • @felipeekeziarosa4270
    @felipeekeziarosa4270 2 дня назад

    Why using OpenAI key?

    • @Data-Centric
      @Data-Centric 18 часов назад

      The scripts works with Ollama, OpenAI, and Hugging Face models that you host yourself.

  • @keithmifsud
    @keithmifsud 2 дня назад

    Awesome video. Very informative and you sounded very charismatic. Thank you for adding your opinion at the end of the video. I agree, we should wait for the Assitants to be out of Beta and also have a clearer fee structure.

  • @andynguyen8847
    @andynguyen8847 2 дня назад

    Thanks for the video. I been playing around with similar stuffs primarily with open source models before I came across your videos. Since then I used a few of your prompts:) For open source models, in my case I have gotten llama3 7b and some other 7b model to answer a few questions right(with a lot of prompt tweaking) but struggle a lot of >1 hop questions. Llama3 70b got the 4 hop question right but took like 5+ iterations. I do use RAG with my custom QA agent. I also use playwright+bs as I find that the regular request doesn't work with a lot of website as they track and block your connection plus with playwright, you can filter out of lot of junks on webpages to save token. I used llama3 7b Q6_K locally with a 8gb graphic card and llama70b through Groq free api. Not sure with quantized version you use with Ollama but if you just pull it normally, it probably the Q4 version which I found there a def a noticeable different vs Q6 in term of quality.

  • @joels7605
    @joels7605 2 дня назад

    This is excellent. And even posting a Github link like a boss.

  • @vaioslaschos
    @vaioslaschos 3 дня назад

    Watching your videos acts as a reality check for me. I have the same view for most of the issues. Do you have a discord server for your audience?

    • @Data-Centric
      @Data-Centric 20 часов назад

      Thanks for the feedback. I don't yet, might consider it down the line.

  • @Techtantra-ai
    @Techtantra-ai 3 дня назад

    bro u r the first creator who work with not only GPT in the name of AI

  • @vaioslaschos
    @vaioslaschos 3 дня назад

    Is this really true for langchain as well? I thought is totally customizable.

  • @supercurioTube
    @supercurioTube 3 дня назад

    Nice reault here with Llama3 70b fp16. The whole time I was thinking "what about groq?" however. Since the inference for the same model appears to be free.

    • @alchemication
      @alchemication Час назад

      Groq has very low rate limits atm. But yeah speed is amazing

  • @supercurioTube
    @supercurioTube 3 дня назад

    I'm still watching, enjoying your presentation and that you're showing a project simple enough to follow easily. And ouch, that first result is painful hahah, but it's great to show that it's not exactly trivial to get agents to do what we actually want. `ollama list` return the list of installed models by the way

    • @Data-Centric
      @Data-Centric 3 дня назад

      Ha yes it was painful! I did a full test with llama3 8b and will release a test with the 70b model. Thanks for the tip on Ollama btw!

    • @supercurioTube
      @supercurioTube 3 дня назад

      @@Data-Centric yes thanks just watched the 2 videos following as well 🙂 I noticed that you the tool is using the completion endpoint instead of the chat one, also setting the instructions as plain text instead of following the chat template. Have you tried using Ollama chat endpoint and setting the instructions as system prompt? That's what I started to do in my own project but haven't compared the different approaches yet.

  • @SeattleShelby
    @SeattleShelby 3 дня назад

    CodeQwen is a 7B model. It might be useful as a coding agent. That’s kind of the point of agents, though - have one really good “manager” model (GPT-4) and a bunch of smaller, highly refined 7B or 8B models that use RAG to prevent garbage and hallucinations. Also, the manager model should be smart enough to detect nonsense from its subordinates.

    • @Data-Centric
      @Data-Centric 3 дня назад

      I'll check out CodeQwen! I find that the smaller 7B/8B models are more prone to hallucination than the larger models even with RAG. What's your experience with them been?

  • @jonathanholmes9219
    @jonathanholmes9219 3 дня назад

    Chunking and overlap is the main issue. I found that creating structured data from unstructured data using the LLM was the answer. SQL queries through LLM works very well. Thanks for the video and sharing your experiences.

  • @deathdefier45
    @deathdefier45 3 дня назад

    You sir are a pillar of this community, thank you <3

  • @darwinprod
    @darwinprod 3 дня назад

    you shold try nous-hermes2-mixtral, or maybe the brand new mistral 7B v0.3

  • @user-wy6hg9ej6f
    @user-wy6hg9ej6f 3 дня назад

    ❤❤❤❤❤❤❤

  • @amusendame
    @amusendame 3 дня назад

    You provided great insights here. I think this is where phidata comes in also. It allows you do define prompts for each agent you create. However custom builds are always the best

  • @OwlSpinning
    @OwlSpinning 4 дня назад

    Amazing video! I'm just about to do something similar, but at a much smaller scale. I wish Mistral v0.3 was in the mix also because that model is pretty amazing in my experience.

  • @ZacMagee
    @ZacMagee 4 дня назад

    Love your content boss. Keep them coming

  • @AI-Wire
    @AI-Wire 4 дня назад

    There is a problem with your logic in the first question. One does not need to first know who the first president of Namibia was in order to know who succeeded him. One can simply learn who was the second president.

  • @RayWrightRayrite
    @RayWrightRayrite 4 дня назад

    That's dope! Keep up to the good work!

  • @gtarptv_
    @gtarptv_ 4 дня назад

    I'm glad RUclips algorithm listed you on my results i i been looking hard wanting info for llms and agents as i am a noob coder learning and llm local learning

  • @EddiePick
    @EddiePick 4 дня назад

    Isn't the largest city north of London Manchester?

    • @Data-Centric
      @Data-Centric 3 дня назад

      Birmingham is larger than Manchester by population.

    • @truehighs7845
      @truehighs7845 3 дня назад

      North of London we have some barbarians I heard, those are 2 encampments really and nobody understands a word they say anyway🤣.

  • @caseyhoward8261
    @caseyhoward8261 4 дня назад

    RAG with a knowledge graph

  • @mikew2883
    @mikew2883 4 дня назад

    Great overview! Do you think it could get a more improved results if it could use a web search and a robust RAG solution than web search and agentic one?

    • @Data-Centric
      @Data-Centric 3 дня назад

      I'll be doing a series on RAG very soon!

  • @rodrigoamora
    @rodrigoamora 4 дня назад

    I have used AutoGen, CrewAI and other tools. Honestly, they are not production ready. They are over engineered. What I've learned from building my own tools will be very useful for the future. Learn as much as you can building your own tools, its a lot simpler than most people think, just give it a shot.

  • @jakeparker918
    @jakeparker918 4 дня назад

    Just started this video but looking forward to it. Definitely saw limitations due to the abstraction of CrewAI and Autogen. Recently learned about LangGraph but this looks like some nice next level customization.

  • @NoCodeFilmmaker
    @NoCodeFilmmaker 4 дня назад

    I'm curious bro why you chose Runpod over lighting ai?

    • @Data-Centric
      @Data-Centric 20 часов назад

      No reason other than I haven't used lighting ai.

  • @technovangelist
    @technovangelist 4 дня назад

    So many on my channel and my live streams have also indicated this frustration with the agent frameworks. The function calling in ollama is pretty rock solid but many frameworks have problems with it so that’s frustrating too. Thanks for putting this video together.

  • @SeattleShelby
    @SeattleShelby 4 дня назад

    Wow - big difference between the 8b and 70b models. Do you think the 70b models are good enough for agents?

  • @legendaryman4336
    @legendaryman4336 4 дня назад

    unfortunately it doesnt work now. the web terminal fails to start :(

    • @Data-Centric
      @Data-Centric 3 дня назад

      I think you have misunderstood the instructions. It definitely still works, I used it just today.

  • @dierbeats
    @dierbeats 4 дня назад

    Good stuff as always, thank you very much.

  • @camelCased
    @camelCased 4 дня назад

    Great, thanks! Now I'm curious how to connect SillyTavern to it. I've heard ST might be a bit incompatible with the latest vllm.