API For Open-Source Models 🔥 Easily Build With ANY Open-Source LLM

Поделиться
HTML-код
  • Опубликовано: 16 окт 2024
  • In this video, we review OpenLLM, and I show you how to install and use it. OpenLLM makes building on top of open-source models (llama, vicuna, falcon, opt, etc) as easy as it is to build on top of ChatGPT's API. This allows developers to create incredible apps on top of open-source LLMs while having first-class support for tools (langchain, hugging face agents bentoML), and one-click deployment. Also, fine-tuning is coming soon!
    Enjoy :)
    Join My Newsletter for Regular AI Updates 👇🏼
    forwardfuture.ai/
    My Links 🔗
    👉🏻 Subscribe: / @matthew_berman
    👉🏻 Twitter: / matthewberman
    👉🏻 Discord: / discord
    👉🏻 Patreon: / matthewberman
    Media/Sponsorship Inquiries 📈
    bit.ly/44TC45V
    Links:
    OpenLLM Github - github.com/ben...

Комментарии • 190

  • @shotelco
    @shotelco Год назад +72

    Yet another piece to the democratization of AI! Very valuable.

    • @matthew_berman
      @matthew_berman  Год назад +4

      Agreed!

    • @marilynlucas5128
      @marilynlucas5128 Год назад +4

      Yes indeed!

    • @MrGaborKukucska
      @MrGaborKukucska Год назад +1

      The future is now 🙌🏼

    • @applyingpressureeveryday
      @applyingpressureeveryday Год назад

      Democracy means those in power rule. We live in a democracy that’s clearly 1000% centralized. I got the message tho. 👍🏿

    • @josephsagotti8786
      @josephsagotti8786 Год назад

      @@applyingpressureeveryday Democratization of technology means the de-centralization of technology.

  • @daithi007
    @daithi007 Год назад +12

    Incredible content, and he doesn't waffle either!!! just to the point, good pace, great voice, great cadence, and perfect audio levels. This channel is gonna be big.

    • @matthew_berman
      @matthew_berman  Год назад

      Thank you :)

    • @marilynlucas5128
      @marilynlucas5128 Год назад

      @@matthew_berman With open llm , you don't get an Open AI like Api token right?

    • @marilynlucas5128
      @marilynlucas5128 Год назад

      @@matthew_berman How can a project like Aider utilized open llm?

  • @VastIllumination
    @VastIllumination Год назад +19

    You are becoming my favorite AI channel! This is literally exactly what I've needed. I've been looking for an open llm alternative to openai API for querying PDFs with Langchain. I haven't been able to test the largest LLMs using Langflow because it always times out from Huggingface.

    • @matthew_berman
      @matthew_berman  Год назад

      Glad I could help 🎉

    • @sjimosui8279
      @sjimosui8279 Год назад

      Matthew are you pushing it to github? I 'm also working for the same looking for ideas but a beginner though looking for help

  • @paulbishop7399
    @paulbishop7399 Год назад +2

    stop it I cant keep up anymore :) everyday I am pivoting around your content, gimme a break already! What an exciting time to be alive!

    • @matthew_berman
      @matthew_berman  Год назад +1

      Haha nice :) wait until you see the next video!

  • @maxamad13
    @maxamad13 Год назад +2

    First time man. To the point and straight forward, Thank youuuuu
    !!!!!!!!!!!!

  • @pancakeflux
    @pancakeflux Год назад +2

    This is exactly what I’ve recently been looking for! Thanks for showing it off :)

  • @Tenly2009
    @Tenly2009 Год назад +11

    It would be a lot easier for us to follow along and be successful if you did these demos starting with a brand new machine with just python and conda pre-installed. That way our experience would be more likely to match the one in your video *exactly* and we wouldn’t struggle at the points where you say “the first time I tried this, I got an error” or “I already have this installed”. Just a suggestion.

  • @tiredlocke
    @tiredlocke Год назад +11

    This is awesome. I've played with some different open-source models in Runpod(which is great, btw). And I looked into installing the Text Generator WebUI locally... but I don't have a suitable GPU yet. Ultimately, I want a self-hosted (preferably in a container) API that can run various models and hit from a web browser, or from a console app, or from a game. This looks like exactly what I want.
    Now I just need to find a GPU to toss into my server...

    • @Trahloc
      @Trahloc Год назад +1

      Oobabooga's webui-text-generator is compatible with ggml models which are CPU only but can use gpu for speedup although latest versions don't use my gou for some reason.

    • @tiredlocke
      @tiredlocke Год назад +1

      @@Trahloc Good to know. I previously tried some stuff that wouldn't run without the Nvidia GPU. I'll have to give this a try and see how it works.

  • @ajith_e
    @ajith_e Год назад +1

    Had'nt heard of OpenLLM before but now I can't hold my excitement to test it out. Well paced, Well executed tutorial that touches on the important aspects of deployment. Please follow this space closely because we'll be following you !!
    Thank you for this great tutorial

  • @MeinDeutschkurs
    @MeinDeutschkurs Год назад +1

    I‘m excited! Yeah! I‘m interested in custom/not listed models, also NLLB-200… And what about Mac? There is no xformers available.

  • @antonioveloy9107
    @antonioveloy9107 Год назад +3

    I prefer the Oobabooba Web UI, which basically runs an API locally and has a nice button to "import" any hugging face model.. But this is interesting too

  • @wendellvonemet7443
    @wendellvonemet7443 5 месяцев назад +1

    When you cut out all the dead space, your sentences run together without the natural pause that would allow beginners to digest each new concept before being bombarded with the next five new concepts that are rattled off at the speed of light. Tutorials work best when the newbies have time to let new concepts sink in. I'll be stuck trying to wrap my head around what you just said, and I continually have to pause and rewind to catch what you said while I was still chewing on the first bite. You also run your words together, within the sentences, so I have to continually rewind to make sure that I heard you correctly. Many of us are complete newbs to all of this. The info you provide is great. I watch a ton of your videos. I just wish you'd go a hair slower and dumb it down for those of us who are brand new and have to look up the definition of each piece of new tech jargon used (had to ask AI what the hell a bento was, and it thought I was interested in Japanese cuisine).

  • @RyckmanApps
    @RyckmanApps Год назад +5

    Awesome quick video!

  • @8eck
    @8eck Год назад +2

    Will be waiting for their fine-tuning feature. Should be interesting.

  • @khandakerrahin1003
    @khandakerrahin1003 Год назад +15

    Are these models running locally? If yes, what are the hardware requirements?

    • @matthew_berman
      @matthew_berman  Год назад +4

      Yes. It depends on the model. Smaller models have very little requirements.

    • @khandakerrahin1003
      @khandakerrahin1003 Год назад +2

      @@matthew_berman thank you so much Sir.

    • @matthew_berman
      @matthew_berman  Год назад +2

      @@khandakerrahin1003 you got it!

    • @henkhbit5748
      @henkhbit5748 Год назад

      Wow, that is really simple. Thanks for showing this api tool for LLM 👍

    • @matthew_berman
      @matthew_berman  Год назад

      @@JohnSmith-jc7yi no way. You can run local models on much smaller machines

  • @williammixson2541
    @williammixson2541 Год назад

    My last computer was a gaming rig. My newest build this week will be specifically for ML and I cannot wait!!! Easy sub.

  • @nikdog419
    @nikdog419 Год назад +9

    I'm gonna need a cardboard box server again. Time to start a 24/7 AI stream. 😂

  • @vinylrebellion
    @vinylrebellion Год назад +3

    Looking forward to the mosaic 33b. Loving the videos

  • @build.aiagents
    @build.aiagents 8 месяцев назад +1

    Still not sure how building models work in the examples, i see you using the models but how do we build on top of them? Sorry if I missed it.

  • @NguyenHoangHuy
    @NguyenHoangHuy Год назад +2

    Are you using WSL? Would you recommend using Windows over Linux? I've had problems trying to install all the Nvidia GPU drivers and CUDA and pytorch modules... using Ubuntu, to the point I had to reinstall Ubuntu.

  • @brianv2871
    @brianv2871 Год назад +2

    Thanks for the video. This is getting close to something i'm looking for, but this still requires a permanent system set up with some decent hardware. Would be interesting to see this combined into a single google colab that could be run as needed, for those of us looking to utilize this on an occasional basis.

    • @doords
      @doords Год назад

      Colab would be very useful. I wish we can keep a colab running forever.

  • @jmanhype1
    @jmanhype1 Год назад +8

    Please explain if this is hosted locally as a server or if we need runpod or chainlit

    • @matthew_berman
      @matthew_berman  Год назад +6

      You can run this locally AND/OR deploy it to the cloud when you're ready for production.

    • @jmanhype1
      @jmanhype1 Год назад +3

      @@matthew_berman please go over steps to host for production

    • @scitechtalktv9742
      @scitechtalktv9742 Год назад

      I also would like to know how to deploy this to the cloud. And what alternatives there are for doing that. Does HuggingFace have a cloud solution (for free)?

  • @michaelberg7201
    @michaelberg7201 Год назад +3

    Super interesting and exciting project. I didn't quite get though if the models are running locally? I thought this required a lot of GPU power.

    • @BlackDragonBE
      @BlackDragonBE Год назад +3

      You can run LLMs locally on the the CPU, GPU or shared between the CPU and GPU. CPU only is quite slow though.

    • @faff
      @faff Год назад +2

      He's running a $2000 video card.

  • @ehsanrt
    @ehsanrt Год назад

    i was looking for something like this for 2 weeks, thank you for your video .. made my learning much easier ... please make a langchain video too

  • @GamingDaveUK
    @GamingDaveUK Год назад +3

    what is the advantage of this over textgen webui? and does it handle custom models as well as textgen webui (4bit gptq models etc)

  • @joseberlines-l4f
    @joseberlines-l4f Год назад +1

    Really dreaming about the moment this can be used to ask my own set of documents like in your previous videos about gpt4all

  • @8eck
    @8eck Год назад +3

    It is like personal computers in era of Steve Jobs, when they were still not so available to anyone. I guess soon this will become even more open with projects like that.

  • @patrickwalchshofer4004
    @patrickwalchshofer4004 Год назад +2

    Hey Maathew - this is really great! So with this I can replace the OpenAI api and run all the apps that are built to use OpenAI?

  • @thelalomorales
    @thelalomorales 9 месяцев назад

    dude, totally just did it! YOU RULE

  • @soubinan
    @soubinan Год назад +1

    very similar to localai
    seems the difference is the localai is compatible with the openai api

  • @Boboche
    @Boboche Год назад +3

    Love this channel

  • @GlobalOffense
    @GlobalOffense 3 месяца назад

    What is the beginning transition? That is epic looking.

  • @sevenkashtan
    @sevenkashtan Год назад +1

    Just adding a positive comment for the algorithm! Great video

  • @bowenchen4908
    @bowenchen4908 Год назад +2

    How fast if this is running locally? Is speed going to be an issue?

  • @nannan3347
    @nannan3347 Год назад +1

    A feature on my wish list is being able to GET and POST the context so it can be edited on the fly

  • @Heynmffc
    @Heynmffc Год назад

    “ I don’t know what any of that means but doesn’t seem to be causing any problems“ amen lol

  • @grimtagnbag
    @grimtagnbag Год назад

    Ty for all these videos getting tons of ideas

  • @clray123
    @clray123 Год назад +1

    Uhh so a wrapper over a wrapper (HuggingFace/LangChain)??? What does this new API add exactly (except for new bugs)?

  • @musikdoktor
    @musikdoktor Год назад

    Great youtuber. Regards from Uruguay!

  • @thehkmalhotra9714
    @thehkmalhotra9714 9 месяцев назад

    I loved your content mate ❤️ Thanks for your video. Just a quick question can we use localhost:3000 to a domain. This localhost url can be used as an API till I am running in my PC what if I want to point this url to a domain name which can be easily accessible to all?
    Will be waiting for your answer 😥
    Keep up the great work dude ❤

  • @heliosobsidian
    @heliosobsidian 9 месяцев назад

    Wonderful Content!! will this more easier to work with AutoGen? 🤔

  • @marcoamato8461
    @marcoamato8461 Год назад +2

    Maybe a silly question but what the minimum hardware requirements?

  • @user-wr4yl7tx3w
    @user-wr4yl7tx3w Год назад +1

    Is conda installation more stable than pip? Just wondering which one to use. mostly, I have used pip previously.

  • @SillyProphecies
    @SillyProphecies Год назад +1

    Awesome! Great stuff and thank you very much! Do you have an idea how to implement a qlora finetuned Falcon Modell?

  • @8eck
    @8eck Год назад +1

    Interesting, would be cool to have response streaming feature.

    • @s0ckpupp3t
      @s0ckpupp3t Год назад +1

      probably can through the gRPC interface

    • @8eck
      @8eck Год назад +1

      @@s0ckpupp3t yes, but they depend on another project, which doesn't support it 😕

    • @8eck
      @8eck Год назад +1

      @@s0ckpupp3t at least yet

  • @khorLDW
    @khorLDW 3 месяца назад

    Just wanted to point out for anyone trying, if you do this on Windows and wanted to install directly without conda, you'll get an error pointing to vllm library pointing that it can only be used on Linux.

  • @hrishabhg
    @hrishabhg Год назад

    It is superb knowledge. As a sequence, can you create a video, which can help user decide to choose GPU & CPU Configuration for serving.

  • @aliakbari8900
    @aliakbari8900 12 дней назад

    I want to create a custom chatbot that utilizes multiple Gemini and GPT APIs.
    Does an API remember the history of messages in a chat?
    This is crucial for maintaining context within the conversation.

  • @gavinray241
    @gavinray241 6 месяцев назад

    Why did you go through the process of creating a Conda env when you then install with Pip?

  • @erick2will
    @erick2will Год назад

    Awesome! Thanks for sharing! 😀

  • @xavierf2229
    @xavierf2229 Год назад

    I thing you should show what are these LLMs are really capable of,the examples you are showing are pretty simple

  • @gnosisdg8497
    @gnosisdg8497 Год назад +1

    well if and when the make the training section available and langchain then it will be really cool project to have !!!

  • @hermysstory8333
    @hermysstory8333 Год назад +3

    Many thanks!!!

  • @RixtronixLAB
    @RixtronixLAB Год назад

    Nice video, well done ,thanks :)

  • @Rundik
    @Rundik Год назад +3

    They don't say if the api is compatible with openai's counterpart. It would be nice to be able to use the tools built for openai's api

  • @revenger211
    @revenger211 Год назад

    I am facing issues with the "openllm start opt", I get an error of "KeyError: 'OPENLLM_OPT_MODEL_ID'" why is that? I searched online and still can't find a solution

  • @RodrigoRecio
    @RodrigoRecio 9 месяцев назад

    Great content. Thank you!

  • @akshatkant1423
    @akshatkant1423 6 месяцев назад

    Will there be input/output token limits when building custom llm models using openllm like we see in other monetized llm api models?

  • @fabianaltendorfer11
    @fabianaltendorfer11 Год назад

    Awesome, thank you for this vid

  • @forexhunter2040
    @forexhunter2040 9 месяцев назад

    Does using falcon model improve the accuracy more than the opt one?

  • @justin9494
    @justin9494 Год назад

    Please help. I have cuda and torch all working, but when running the model, it says Cuda not found or something. Any ideas?

  • @victordanneygarciaplaza2374
    @victordanneygarciaplaza2374 Год назад

    Hi Matthew, thanks for this video! I have a question about how to use open-llm and have documents as a knowledge base.

  • @PeacefulislamicLofi
    @PeacefulislamicLofi 5 месяцев назад

    when I installe openllm the installation processing started but was not complete I tried it 3 times but same result I get, what kind of mistake I don't know can you help me?

  • @BlayneOliver
    @BlayneOliver Год назад +1

    Sorry for my noob question, but could someone explain why we’d need more than ChatGPT 4?

  • @chrisBruner
    @chrisBruner Год назад +1

    If you've got models downloaded, can they be used?

  • @javiergimenezmoya
    @javiergimenezmoya Год назад +1

    Is it possible to link an own finetunned LLM stored in your local machine?

  • @pipoviola
    @pipoviola Год назад

    You are tooooo awesome!!!

  • @MarkDemarest
    @MarkDemarest Год назад

    AMAZING!!! 💪🔥🙌

  • @yacahumax1431
    @yacahumax1431 17 дней назад

    Very nice

  • @clear_lake
    @clear_lake Год назад

    Which server configuration do you reccomend if I wanna run falcon?

  • @LUDOVICOPAPALIA
    @LUDOVICOPAPALIA 9 месяцев назад

    I want to run the model on runpid and create some API to run a service (python) from my personal computer. Any idea on how to do that?

  • @jcfnetwork6768
    @jcfnetwork6768 Год назад +2

    Finally!!

  • @ThobaniMngoma
    @ThobaniMngoma Год назад

    Does this API also work when running LLMS using CPU resources?

  • @averaguilar
    @averaguilar Год назад

    Awesome!, I just want to know what model is rather good for spanish language. I have tried some and are just awful.

  • @ronaldkodras4527
    @ronaldkodras4527 Год назад

    It says I have no GpU available to run the falcon model. I have NVIDIA drivers down loaded but still no luck. What can I do? How about GPU from runpod?

  • @uuuuu4858
    @uuuuu4858 5 месяцев назад

    hey for me when i try to import openllm in python it shows me the module dosnt exist. any suggestions

  • @DeepKarmakar-i7v
    @DeepKarmakar-i7v 3 месяца назад

    can i use the same in a javascript application ??

  • @cavlab
    @cavlab 2 месяца назад

    What is the minimum GPU requirement to use this

  • @godned74
    @godned74 Год назад

    You are a freakin genius.😎

  • @ALFTHADRADDAD
    @ALFTHADRADDAD Год назад

    Absolutely crazy

  • @zepto5945
    @zepto5945 11 дней назад

    4:04 It started rambling like a mad man 😭

  • @MuhammadHadiHadi-w1r
    @MuhammadHadiHadi-w1r 9 месяцев назад

    Does it support autogen or crewAI?

  • @mijanurrahaman3778
    @mijanurrahaman3778 Год назад

    Can we provide a customized knowledge base to the system?

  • @ZakkFromSource
    @ZakkFromSource Год назад

    Do you know of any current services where you could host something like this on the cloud for free to test out creating something like a chat bot that you could call and add extra functionality to, via python code running locally on your machine?

  • @originalsuperheroguruji
    @originalsuperheroguruji Год назад

    Any Idea server configuration needed to use this AI models on custom servers of AWS or Linode ???

  • @ErnestGWilsonII
    @ErnestGWilsonII 8 месяцев назад

    How can we run an LLM at Home and have the same API that open API uses

  • @eyemazed
    @eyemazed Год назад

    does API support the embedding functionality?

    • @cheifei
      @cheifei Год назад

      Embeddings are just custom text that is passed to the LLM to use as a reference. To get the embeddings: You would need to run a model that can specifically convert text to vectors. Then send you embedded docs to that embeddings model via the API. Take the vector response and store it in a vector store. Then when you make a query, convert you query to a vector via your local model, then perform a similarity search on your vector store. That will return some docs, and you pass the text of those docs to the LLM.

    • @eyemazed
      @eyemazed Год назад

      @@cheifei are you implying it's absolutely irrelevant how you create the embeddings? don't different models use different ambedding algorithms, that's why they have different vector dimensionalities among other things?

    • @cheifei
      @cheifei Год назад

      @@eyemazed no, I am not implying that. I agree with you that you have to use the same embedding model for consistency. I think the missing piece is that that you pass the text of the query and the text (not vectors) of the embedded docs to the LLM.

    • @eyemazed
      @eyemazed Год назад +1

      @@cheifei i see, i thought you needed to use the same embedding API for vectorizing the context that you pass along with prompt to the LLM as the LLM uses to vectorize your prompt. so if I understand correctly you're free too choose any embedding API/vector store that you want because it's separate from the LLM and is only used to retrieve the context that you can send along with your prompt to the LLM

    • @cheifei
      @cheifei Год назад +1

      @@eyemazed That is correct.

  • @user-wr4yl7tx3w
    @user-wr4yl7tx3w Год назад

    Is this a replacement for TextGen WebUI? do they perform the same function?

    • @doords
      @doords Год назад

      Yeah same function but all of the textgen webuis cost money. If you build an app for many people you will have to pay a lot every time your users send a query

  • @Kulbaru
    @Kulbaru 3 месяца назад

    I am getting this error, and cant find any solutions to fix the dependency error:
    "Failed to build ninja
    ERROR: ERROR: Failed to build installable wheels for some pyproject.toml based projects (ninja)"

  • @DikkeKoelie
    @DikkeKoelie Год назад +1

    Nice now i just need only 24gb gpu card

  • @ganeshkgp
    @ganeshkgp Год назад

    Is there any free hosting so i can host and test it and also how to use domain instead of localhost?

    • @ganeshkgp
      @ganeshkgp Год назад

      Please dont get me wrong i am a software developer but i have no idea how to use llms.

  • @s0ckpupp3t
    @s0ckpupp3t Год назад

    does it have a streaming api endpoint?

  • @tahahuraibb5833
    @tahahuraibb5833 Год назад

    Does this run on CPU or GPU?

  • @Azcraz
    @Azcraz Год назад

    Has anyone been able to get this working recently? I follow the docs to a 'T' and the opt model is unable to start-up. I ran openllm models --show-available and it looks like it's not properly downloading the model locally after running 'pip install "openllm[opt]" as it says 'No local models available". Do we need to download the models with 'openllm import ...'? I've tried that as well with 'openllm import opt facebook/opt-1.3b" as well to no avail. Surely I must be doing somehting silly!? Any help appreciated!

    • @Azcraz
      @Azcraz Год назад

      Got it working! Turns out you need to manually import the models, running the pip install openllm[] does not download the model. You must use the import command and specify the model, in my case I also needed to pass the -serialisation legacy flag because the model did not support safetensors!

  • @eddymison3527
    @eddymison3527 Год назад

    I think it's great.

  • @avi7278
    @avi7278 Год назад +2

    Now all you need is $10,000 computer! No but seriously the last piece of the puzzle here is a service like runpod where you can install this and it charges you for exact inference time for each request. Does anybody know of anything like it?

    • @elchippe
      @elchippe Год назад

      I think the 3B and 7B parameters versions models can run locally with a CPU or even a 12GB 3060 RTX.

    • @clray123
      @clray123 Год назад

      No, the last piece of the puzzle is open source models that aren't crap.

    • @elchippe
      @elchippe Год назад

      @@clray123 The easy fine-tuning of these models for specific tasks and the algorithmic optimization to run these models more efficiently in a spectrum of hardware from low to the high end is what is going to make the difference against propetary models.

    • @clray123
      @clray123 Год назад

      @@elchippe LoRA fine-tuning is like a 200 lines script of Python code. You clone the script from Git and run it. The difficulty of fine tuning is not because you lack some silly API, but rather the choice of parameters and foremost of the input data. And you will not be able to finetune any serious models on "low end" hardware, even with (Q)LoRA and what not.

    • @avi7278
      @avi7278 Год назад

      @@clray123 Yeah well I meant this particular puzzle of being able host your own personal API of an open source model. Model quality is beside the point.

  • @Star-rd9eg
    @Star-rd9eg Год назад

    How would i use this in runpod ? :)

  • @any-shorts
    @any-shorts 6 месяцев назад

    But how to deploy it on db?

  • @ganeshkgp
    @ganeshkgp Год назад +1

    Can i deploy this to colab?

  • @chrisl4211
    @chrisl4211 11 месяцев назад

    i think langchain can create api ? no?