QWEN 2.5 Coder (32B) LOCALLY with Ollama, Open WebUI and Continue

Поделиться
HTML-код
  • Опубликовано: 23 дек 2024

Комментарии • 49

  • @blub5760
    @blub5760 Месяц назад +1

    Thank you so much for your video!
    Really informative.
    Didnt expect it to work that good.

  • @TestMyHomeChannel
    @TestMyHomeChannel Месяц назад +3

    Great video. I loved the way you covered all these technically challenging areas for me so quickly and so comprehensively! Best wishes!

  • @ComputerworxVideo
    @ComputerworxVideo 23 дня назад +3

    most important info missing!, what is the memory usage when 32B Qwen 2.5 running. Please provide the info.

  • @hannespi2886
    @hannespi2886 Месяц назад +1

    Thx for the conclusion on top at the end

  • @TimothyHuey
    @TimothyHuey Месяц назад +1

    Hey, Chris! That was a great video. So easy to understand and I setup everything and followed along. You are very easy to understand and do a great job of explaining the concepts you are discussing. I just subscribed and I hope you continue with this. I agree, this is all about what's coming and not necessarily "is this the end all, be all LLM for coding." If you take the time and follow along with the evolving AI it will be much easier to adapt to the next thing coming. Thanks for keeping me informed.

    • @chrishayuk
      @chrishayuk  Месяц назад +1

      Glad it’s useful, honestly I don’t think there is a more exciting time to be a developer

  • @hmmyaa7867
    @hmmyaa7867 Месяц назад +2

    What a great video, thanks man

  • @cwxuser5557
    @cwxuser5557 22 дня назад +5

    What's the memory usage while using 32B

    • @archjz
      @archjz День назад

      in this video , it was on a MAC Studio with 128G memory

    • @ComputerworxVideo
      @ComputerworxVideo 13 часов назад

      @@archjz howm much memory was used when generating responses?

  • @ziobuddalabs
    @ziobuddalabs 27 дней назад +2

    Hi, is your computer a mac ? If yes: which one ?

    • @chrishayuk
      @chrishayuk  27 дней назад +2

      mac m3 max with128GB of unified memory

  • @renerens
    @renerens Месяц назад +3

    I use the 14b model and continue on my 4090 and it is fast and works great!

    • @andrepaes3908
      @andrepaes3908 Месяц назад

      @@renerens what quantization are you using? And context length?

    • @asifudayan
      @asifudayan Месяц назад

      Which model/llm are you using?

    • @renerens
      @renerens Месяц назад

      @@andrepaes3908 I used the default model not a quantized one, 32768 context length.

    • @renerens
      @renerens Месяц назад

      @@asifudayan qwen2.5-coder:14b

  • @TheCopernicus1
    @TheCopernicus1 Месяц назад +3

    Great video, with Open Webui need to up the num_ctx as it defaults to 2048, perhaps 32768 might help with full response.

    • @andrepaes3908
      @andrepaes3908 Месяц назад +1

      @@TheCopernicus1 how do you increase context of a model length within openwebui?

    • @TheCopernicus1
      @TheCopernicus1 Месяц назад

      @@andrepaes3908 on the top right hand corner click on the controls icon (looks like sliders) then go down to "Context Length" and for starters try 16384 if your device supports it as 32768 might be really slow, goodluck!

    • @TheCopernicus1
      @TheCopernicus1 Месяц назад

      @@andrepaes3908 in Open Webui click on the controls icon top right hand corner then set context length to 16383 to start with and go up or depending on your system resources! good luck

    • @kepenge
      @kepenge Месяц назад +1

      @@andrepaes3908 you can change it easily in Openweb UI configuration or just create a new Modelfile and with Ollama create create a new version of the model with longer context.

  • @andrepaes3908
    @andrepaes3908 Месяц назад +1

    Great video, amazing content! I see you used the 4bit quantized version to run all tests. Since you've got 128gb RAM, could you run the same tests for the 32b model with 8bit and FP16 quants to check if it improves responses? If so please make another video to share the great news!

    • @chrishayuk
      @chrishayuk  Месяц назад +2

      that’s a really good shout, I’ll do that

    • @LoveMapV4
      @LoveMapV4 20 дней назад +1

      Yeah, I never use 4-bit quantization anymore because it often gives very poor output results. Q8 is okay and almost as perfect as FP16. Also, Q5K_M should be the minimum since it still gives very good results. In fact, I don’t notice any quality loss with Q5K_M models. I’ve tested it on the Gemma 2 27B model and the Llama 3.1 8B and 70B models. However, if you have extra RAM, I highly recommend always using Q8 for the best performance.

  • @themax2go
    @themax2go 13 дней назад

    why use that if you could use ottodev?

  • @godned74
    @godned74 Месяц назад +1

    For speed Cerebras A.I. is nuts. over 2000 tokens per second using meta lama 7 b and over 1800 using 70 b

  • @faiz697
    @faiz697 22 дня назад

    What is the minimum spec that is required to run this ?

    • @LoveMapV4
      @LoveMapV4 20 дней назад

      All PCs will be able to run it. The question is, how big is your RAM? The bigger your RAM, the larger the model you can run. Usually, an 8B model requires 8GB of RAM, a 27B model requires 32GB of RAM, and so on (this is if you are only using the Q4 quantization). The speed of your CPU doesn’t matter; it only affects the speed of generation. You can still run it, though it will take longer if you have a slow CPU.

  • @NScherdin
    @NScherdin Месяц назад +2

    ChatGPTs donkey is like the spherical cow meme in phyics is a cow. :)

  • @kdietz65
    @kdietz65 Месяц назад +1

    Good demo. You just went way, way too fast past the install/setup/config. I'm still trying to work my way past all the errors and figure out why I'm not getting any models showing up in my WebUI.

    • @chrishayuk
      @chrishayuk  Месяц назад

      Apologies I did that because I have a video where I l walk through ollama (much slower and in detail), this is the link
      ruclips.net/video/uAxvr-DrILY/видео.html hope it helps

    • @kdietz65
      @kdietz65 Месяц назад

      @@chrishayuk Okay thanks. I'll try that.

  • @spaul-vin
    @spaul-vin 13 дней назад

    32b-instruct-q4_K_M - 23gb vram

  • @kdietz65
    @kdietz65 Месяц назад +1

    No models. Why?

    • @chrishayuk
      @chrishayuk  Месяц назад

      This should help Getting Started with OLLAMA - the docker of ai!!!
      ruclips.net/video/uAxvr-DrILY/видео.html

    • @LoveMapV4
      @LoveMapV4 20 дней назад

      You should be connected to Wi-Fi for the models to appear. It’s strange since it doesn’t use the internet but requires Wi-Fi. The truth is, you can access Open WebUI on any device as long as it is connected to the same network as the server.

    • @kdietz65
      @kdietz65 20 дней назад

      @@LoveMapV4 Well I figured it out. You just gotta take your time getting the configuration right. I installed ollama as a local service. But I had to install open-webui using Docker because the Python PIP install didn't work. PIP didn't work because you need exactly version 3.11. It has to be exactly that version. 3.10 won't work. 3.12 won't work. It must be 3.11. Well, I had 3.10. I didn't spend enough time figuring out how to get exactly 3.11 installed. If I just blindly upgraded, that gave me 3.12, not 3.11. Arghhh!!! So I gave up on that path because I was impatient and just used Docker. But then I had to do something to configure open-webui in Docker to talk to ollama running locally, i.e., not in Docker. I followed the instructions on the web site and just took my time and finally got it to work. The installation and the documentation could both be better, but what the heck, that's what we get paid the big bucks for, right?
      I watched a few more of Chris's videos and they are really good. It's a good resource for doing this kind of work. Thank you.

  • @jargolauda2584
    @jargolauda2584 27 дней назад +1

    Claude sucks big time with VUE, Vuetify, Bootsrap, bootstrap-vue, Laravel etc. Qwen is absolutely amazing! It makes so good VUE components, it knows Vite, it knows Laravel, it does not confuse VUE2 to VUE3 and differentiates versions. GPT-4-turbo does not understand different versions and just produces garbage.