Deploy Mixtral, QUICK Setup - Works with LangChain, AutoGen, Haystack & LlamaIndex
HTML-код
- Опубликовано: 28 май 2024
- In this video, I demonstrate how you can swiftly get started with Mixtral. Utilising Runpod and vLLM, you will learn how to deploy a Mixtral endpoint that emulates OpenAI. I'll show you how we can seamlessly integrate this endpoint into a chatbot using Langchain. This deployment pattern can help you get up and running with any LLM.
Read the blog post to learn how to integrate with Llama Index, Haystack, and AutoGen: / deploy-mixtral-quickly...
Need to develop some AI? Let's chat: www.brainqub3.com/book-online
Want to transition into a career in AI-Engineering? Sign up for our free course and start learning today: www.data-centric-solutions.co...
Stay updated on AI, Data Science, and Large Language Models by following me on Medium: / johnadeojo
Runpod: runpod.io?ref=x5fziojy
This is an affiliate link, I get some credits on Runpod if you sign up.
Mixtral AWQ: huggingface.co/JAdeojo/casper...
"Can you run it?": huggingface.co/spaces/Vokturz...
Chapters
Intro to Mixtral: 00:00
Memory Requirements: 01:49
Runpod & vLLM Intro: 05:18
Create Template: 06:56
Deploy the Container: 12:43
Connecting to the Endpoint: 16:20
Integrating Endpoint in LangChain: 17:12 - Наука
If you're getting errors deploying the model on the GPU, set the --enforce-eager in the docker commands. Good luck!
amazing yet again. leading innovation. trendsetting!
Ver nice and comprehensive tutorial. Will give it a try. thank you jhon! Btw. i love the Alis picture behind you 😍
Thanks, and you’re welcome; let us know how it goes!
Nicely put together. I've used vLLM with serverless, but it's quite a bit harder with all the parameters such as concurrency and GPUs and such. I'll give a try to this method see what gives.
Thanks, I might do one on serverless
can do a call to v1/models and just dynamically pull the model name
Love it!!!!!!!!!!