Run the newest LLM's locally! No GPU needed, no configuration, fast and stable LLM's!

Поделиться
HTML-код
  • Опубликовано: 19 ноя 2024

Комментарии • 31

  • @AwkwardTruths
    @AwkwardTruths 7 месяцев назад +3

    I ran this on a laptop, 8GB, CPU, Window 10 as a test. Ollama (I used the ollama windows installer), when running, was eating up 42% of memory -- this is before the LLM code was ever engaged. When I did a LLM query, memory hit 100% and the answer took nearly 12 mins. Result: Ollama on Windows used so much memory that the LLM could not run efficiently.

    • @FE-Engineer
      @FE-Engineer  7 месяцев назад +2

      This is not a huge surprise. It has to load the model which is gonna eat memory. But that is a good point, this is likely not a good solution for those with lower amounts of overall ram.

  • @humansvd3269
    @humansvd3269 11 месяцев назад +5

    Dude, you keep consistently drop great videos. I hope your channel grows. Mine is a 7900X Ryzen, I'm certain it should be enough

    • @FE-Engineer
      @FE-Engineer  11 месяцев назад

      I think any cpu will work. I think you may notice a big difference between high end latest models and 4 generations ago low or mid level models. Still with running cpu only entirely in ram. I was amazed at how well this performs. I’m gonna start running it on my server and see how it does, server cpu, and old but lots of cores. Going to test out the performance and see if it’s still reasonable.

    • @FE-Engineer
      @FE-Engineer  11 месяцев назад +2

      Thank you so much for the kind words as well. I hope so too. Maybe one day I can quit my job and do this full time. I could make a lot more videos that way. 😂

    • @markistheone947
      @markistheone947 11 месяцев назад

      I want to see that for sure, have a couple of Dell R720x being used as a shelf in the rack.

  • @nextgodlevel
    @nextgodlevel 11 месяцев назад

    yesterday i found your channel, I have watched at least 20 videos of your, greater content bro, I have bought 6750xt which is cheaper and have equal performance then 3080. I thought I would never do ai in this card, but after watching your video, I started AI again after 1 year gap :))))))))))

    • @FE-Engineer
      @FE-Engineer  11 месяцев назад +2

      Well thank you! I appreciate it and I hope you are having fun. AMD is very focused on AI and while even now they are still behind nvidia overall in some ways I think that it is getting to where AMD owners will be able to do all the same things as nvidia and things will stop being nvidia only soon.

  • @AjiPratamax
    @AjiPratamax 4 месяца назад

    How much cores on your processor and how much RAM do you use?
    In my laptop, ollama finish the respose about 1 to 5 minutes.

  • @brisson4993
    @brisson4993 11 месяцев назад

    i'd love to see more about ollama and running it on a server (:

    • @FE-Engineer
      @FE-Engineer  11 месяцев назад +2

      So yesterday today I recorded 2-3 videos that will be edited and coming out in the next week.
      Running ollama on an actual server is something I’m very interested in. So I promise. That is coming and probably in the next week. I have really been slacking on the homelab side of my channel so I really needed to do some things for that side.
      Having a server and multiple VM’s running multiple LLM’s will be part of building a big AI task bot that will research and do things for you. It’s what I’m aiming to build. :)

  • @technicolourmyles
    @technicolourmyles 11 месяцев назад +1

    I tried this out on my MacBook Pro M1 Max, it seems to be running on the GPU, and it's wicked fast! Either that, or the default settings are different and I'm using some other configuration.

    • @FE-Engineer
      @FE-Engineer  11 месяцев назад +2

      Interesting. M1 Max eh? I have a laptop with M1 Pro I think but not the max. I might have to try it out to see if it’s different.
      Anytime I’ve seen things running in the cpu before like oobabooga. CPU only was just unbearably slow. This one. Even cpu only. Was entirely ok, it wasn’t crazy fast but it was absolutely fine in my opinion. Apple chips running arm might process it very differently and so might be a big difference in speed.
      Thanks for letting me know, I’ll have to take a look!

  • @alvarocoronel67
    @alvarocoronel67 8 месяцев назад

    Great video! I’ll probably try this in my potato laptop, see how it goes.
    Right now Am looking for information regarding local LLMs with controlled access to files and to databases.

    • @FE-Engineer
      @FE-Engineer  8 месяцев назад

      It works surprisingly well. I tried on a server with fairly old CPUs. It’s a mixture of core count plus overall cpu speed which will determine the overall usefulness. I was able to get it generating quickly on my server. But I had to feed it 32 cores since the cores were slow (2Ghz). With faster cores like you find in pc’s running at like 3-5+ GHz it will run better with fewer cores.

    • @FE-Engineer
      @FE-Engineer  8 месяцев назад

      I think LangChain is what you want to have it learn files and documents.

  • @ammartaj
    @ammartaj 7 месяцев назад

    I love it !

  • @Fastcash_
    @Fastcash_ 11 месяцев назад

    Great tutorial! I'm running now Mistral with ollama on a Ryzen 5800X and the result is quite impressive. I wonder what can improve its performance? RAM or CPU speed?

    • @FE-Engineer
      @FE-Engineer  11 месяцев назад +1

      I think since these are using cpu only.
      If you have an nvidia GPU it can use nvidia GPU’s but installing that requires a few steps. I no longer have an nvidia GPU so I can not really do that.
      Otherwise if it loads your ram should be good. Faster ram might make a small difference, but I think honestly cpu speed or number of cpu cores is what will make the biggest difference here.
      I’m currently making a video about running this on a server and seeing how performance is with fewer and more cores and threads. So I should have something here soon that will provide more details around this specifically.

    • @FE-Engineer
      @FE-Engineer  11 месяцев назад

      Thank you for the kind words as well!

    • @Fastcash_
      @Fastcash_ 11 месяцев назад

      @@FE-Engineer I really curious to see your result on a server. I wonder if it could even run with decent speed on a little Synology NAS with enough RAM like mine.

  • @anirudh7150
    @anirudh7150 2 месяца назад

    Can I run this in Google colab ?

  • @guesswho2778
    @guesswho2778 11 месяцев назад

    if this is what i think it is, ive been using the one thats under microsofts 4gb .exe file size limit.
    its been working fairly quickly considering my laptop has an i5-8250u in it.
    ive just chucked wsl on said laptop and am attempting to install it the proper version through that and see if it "just works"

    • @FE-Engineer
      @FE-Engineer  11 месяцев назад

      It should. It uses cpu only so it’s extremely forgiving regarding hardware.

  • @merselfares8965
    @merselfares8965 8 месяцев назад

    would a i3 11gen with 8 ram and 630uhd graphics card be enough ?

    • @FE-Engineer
      @FE-Engineer  8 месяцев назад

      So the integrated graphics is unlikely to be able to be used at all.
      An i3 will run it. Although it will be a bit sluggish. It should be ok though.
      I have run it in my ryzen 7950x and it’s surprisingly fast.
      I have also run it on an actual server. But I had to dedicate almost 32 threads to get it to be in a reasonably usable state. It was still a bit slow. For running it with cpu. The clock speeds really make the biggest difference. So even an i3 with pretty fast clocks should do a reasonable job. :)
      I hope that helps.
      The simple question. Can you run it. Yes.

  • @gregoryvanny-is2of
    @gregoryvanny-is2of 7 месяцев назад

    Didnt work for me when i type in ollama serve to start it since it says ollama server not connected
    2024/04/01 15:01:06 routes.go:682: Warning: GPU support may not enabled, check you have installed install GPU drivers: nvidia-smi command failed
    It dossent relaise i wont cpu only ):

  • @bfdhtfyjhjj
    @bfdhtfyjhjj 11 месяцев назад

    Dude, this fairy tale is bullshit. Just read it :)

    • @FE-Engineer
      @FE-Engineer  11 месяцев назад +2

      It was a bit odd. I did read most of it. 😂
      In fairness the model I was running is not really meant for creative writing. It is more meant for reasoning. There are models that are significantly better suited for writing stories.
      I tend to use prompts like write me a story to see if it only writes one paragraph then freezes or gets caught and stops or hits a loop etc. they tend to be enough text and tokens that I can see if it runs into a problem or is limited to very few tokens etc.