ChatGPT Killer On Your Computer? Let's Install FastChat Alpaca Vicuna!

Поделиться
HTML-код
  • Опубликовано: 2 дек 2024

Комментарии • 29

  • @dhruvpathak1850
    @dhruvpathak1850 7 месяцев назад

    Thanks for the walkthrough!
    So I'm running the utility on an i7 16 GB RAM (no GPU) laptop, but for me it takes close to 10 mins for each response to fully generate. How did you manage to speed yours up? I'm also using Vicuna 7b

  • @chrisanderson687
    @chrisanderson687 Год назад +1

    Thanks for the help! This is amazing. Sadly I had to run in CPU mode (I only have 8 GB VRAM and that was not enough, even with the 8-bit option turned on). I was curious, for anyone running this on GPU, what card(s) setup do you have?

    • @AemonAlgiz
      @AemonAlgiz  Год назад +1

      I’m running a 4080 and it runs without a hitch, though you can also use GGML models which are optimized for CPU!

  • @AemonAlgiz
    @AemonAlgiz  Год назад +3

    Update: I have figured out how to fix the Web UI on Windows. I created a Pull Request to the main repo to fix the bug. I will be live streaming tomorrow demonstrating this!

    • @anonymousmuskox1893
      @anonymousmuskox1893 Год назад +1

      Nice, you talking about oogabooga?

    • @AemonAlgiz
      @AemonAlgiz  Год назад

      It was part of their utils package. For Windows systems there was a bug with the logging system in it which expected UTF-8 but was getting Windows-1252. Setting the PYTHONUTF8 environmental variable equal to 1 for Python 3.7+ solves the issue. This let’s the model_worker and gradio server boot properly.

    • @AemonAlgiz
      @AemonAlgiz  Год назад

      Though, there does appear to be another issue, at least when running with CUDA. It has a slow but steady memory leak, which will eventually cause the model to crash.

    • @angelochu3156
      @angelochu3156 Год назад

      @@AemonAlgiz Right. I also discovered that the VRAM will only add up but not clear some of it. So I turn to GGML model only for now.

  • @johnhaas3191
    @johnhaas3191 Год назад

    I'm having all kinds of issues setting up the development environment. Tedious issues like setting up the proper paths, correct version of things, hardware compatibility, ext. I've spent multiple days on this. Is there a place online I should go for help on this? What subreddits? Forums?

    • @AemonAlgiz
      @AemonAlgiz  Год назад

      You could hop on TheBlokes AI discord, we can help you there!

  • @careyatou
    @careyatou Год назад +1

    Hi Aemon! Thanks for the video. I want to build a PC to run a model like you are doing. Would you mind sharing your specs and any advice you have on putting together a PC that can run this model?

    • @AemonAlgiz
      @AemonAlgiz  Год назад +3

      Hey Michael!
      Vicuna can run on a variety of hardware! So depending on how you want to run the model, will determine which hardware you will want to roll with. We have a video on it, if you’d like to check that out as well!
      There are two different models. The 7 billion parameter model and the 13 billion parameter model. The 7 billion parameter model requires 30 gigs of system RAM to convert from the LLaMa weights to Vicuna and the 13 billion requires 60 gigs. The 13 billion parameter model is a decent step up in overall performance, though the 7 billion parameter model is pretty impressive as well!
      To run either base model with a GPU, you will need 14 gigs for the 7 billion parameter and 28 gigs for the 13 billion parameter model. For CPU mode, with the base model, again 30 and 60 gigs respectively. This is a pretty harsh requirement, though fortunately we can make this a bit easier on your system, with quantization!
      Vicuna can run 8-bit quantization, which turns the FP16 weights to 8 bit integers, making it only require only 8.5 gigs of VRAM on your GPU for the 7B and 17 gigs for the 13B. For CPU mode roughly 16 gigs of RAM for the 7B and 32 gigs for the 13B. The only caveat with the quantization right now is that it’s slightly less performant but that should be fixed soon.
      If you want an absolutely monster, I would recommend the following specs:
      RAM: 64 gigs of DDR5 @ 5200MHz
      CPU: 7900/7950X
      Motherboard: Anything that supports this CPU will be fine
      GPU: RTX 3080/3090/4080/4090 with a heavy preference for the 3090 or 4090, for their VRAM.
      Thanks for watching and I hope this helps!

    • @careyatou
      @careyatou Год назад

      @@AemonAlgiz super helpful. Thank you! I'll let you know how it turns out 😀

  • @johnhaas3191
    @johnhaas3191 Год назад +1

    Can someone please share a link for the form at 1:44?

    • @AemonAlgiz
      @AemonAlgiz  Год назад +1

      You no longer need this, models are freely available on Huggingface now, I will update this video

    • @johnhaas3191
      @johnhaas3191 Год назад

      @@AemonAlgiz thank you! You are doing a lot to help me pivot my career. I'm learning a lot and really enjoying it. If you are ever in Austin, dinner and drinks on me!

  • @horikatanifuji5038
    @horikatanifuji5038 Год назад +1

    I have an RTX 3070, I don't know how many gigs of ram it has, will this not be able to run the 14b vicuna model?

    • @AemonAlgiz
      @AemonAlgiz  Год назад

      It can if you use the 4-bit quantized model. I have a video on the channel with how to get the models, though the installation has changed since I made the video. I am creating an updated version which will be out tomorrow. Though, if you feel comfortable walking through the installation yourself, the description has where to find the models!

    • @horikatanifuji5038
      @horikatanifuji5038 Год назад

      @@AemonAlgiz Thanks, I will wait or your new video, since I've been failing to install it.

  • @rageshantony2182
    @rageshantony2182 Год назад

    Can we get converted weights directly for Vicuna?

    • @AemonAlgiz
      @AemonAlgiz  Год назад

      You can! They’re all available on Huggingface now

  • @user-wr4yl7tx3w
    @user-wr4yl7tx3w Год назад +1

    but if we get it from elsewhere, how do we know it is safe?

    • @AemonAlgiz
      @AemonAlgiz  Год назад +1

      We’re dropping a new video in about an hour with the now publically available versions, so this is no longer accurate information! I’ll be updating this video to point at the new one :)

    • @AemonAlgiz
      @AemonAlgiz  Год назад +1

      Hey Y, here ya go! Vicuna 13B V1.1! With 4-Bit Quantization What Can't it Run On? OogaBooga One Click Installer.
      ruclips.net/video/Z3HIPGzZRnc/видео.html
      This video takes you right to where to get the model

  • @TomHimanen
    @TomHimanen Год назад +1

    Thanks mister! 👊

    • @AemonAlgiz
      @AemonAlgiz  Год назад +1

      Thank you for watching! I hope it was helpful

  • @kaymcneely7635
    @kaymcneely7635 Год назад +1

    This sounds like a lot of work, to say the least.

    • @AemonAlgiz
      @AemonAlgiz  Год назад

      I figured out how to fix this on Windows! Just made a pull request to fix the repo.

  • @williamnunes8732
    @williamnunes8732 6 месяцев назад

    tks