Testing LLMs on the NEW 16gb Raspberry Pi5 (Llama 11B & Qwen 14B)

Поделиться
HTML-код
  • Опубликовано: 26 янв 2025

Комментарии • 78

  • @TechySpeaking
    @TechySpeaking 12 дней назад +4

    16GB on a Pi was unheard of just a few years ago

    • @OminousIndustries
      @OminousIndustries  11 дней назад

      Exactly, I still have a bunch of 2gb 4B models that I am using for random things. I was very excited when I saw these were in stock and ran out to grab one haha

  • @martinparker9044
    @martinparker9044 12 дней назад +3

    You mentioned that the Raspberry Pi got hot, if you didn’t have any cooling on it then it may of been getting throttled why you had a low initial test.

    • @OminousIndustries
      @OminousIndustries  11 дней назад +1

      Yes, it was quite hot! I am going to get an active cooler for it and will test again to see if there is a difference. I was also wondering about throttling.

  • @MrDvaz
    @MrDvaz 11 дней назад

    Great work !!!!

  • @demetriusbazos5557
    @demetriusbazos5557 13 дней назад +6

    Nice test. I decided to add a second m.2 to my Jetson Orin Nano Super to give 20gb of virtual ram. Given the better Architecture it should run the 11b and 13b models at a reasonable pace. Exciting times for mobile platforms and local LLMs.

    • @ianfoster99
      @ianfoster99 12 дней назад

      I have 2 Orins ordered in UK. Do you recommend any particular m2 SSD? Is it just set up as swap? What command did you use? Thanks in advance

    • @demetriusbazos5557
      @demetriusbazos5557 12 дней назад

      @@ianfoster99 Any fast SSD should be fine. I did a 256Gb but only set a ram swap of 20Gb. You don't want to over allocate. Plus wear rates will be higher so I have lots of unallocated space as regions go bad. You can set it up with the Disks utility in Ubuntu, or with python code. ChatGPT can help with that. Good luck.

    • @OminousIndustries
      @OminousIndustries  11 дней назад

      Thats a very cool setup, with the increased swap. I am interested to know your results when you test the larger llms. Definitely exciting time for local llm and SBC's

    • @ElvisMorales
      @ElvisMorales 11 дней назад

      Interesting! Could you or anyone on this thread share a link to a video explaining how to add this additional virtual ram with a second m.2 on the Jetson Orin Nano Super? Thanks!

  • @marcomerola4271
    @marcomerola4271 11 дней назад

    Cool, thanks

  • @Rockhownd87
    @Rockhownd87 13 дней назад +2

    Great work, thanks a lot!

  • @Unineil
    @Unineil 7 дней назад

    Great video! I just wonder if this is the same exact dimensions as the regular pi 5 for cluster cases if so I’ll finish my cluster with this to replace the 8gb home assistant server. I think these pi’s are amazing for what they are.

    • @OminousIndustries
      @OminousIndustries  7 дней назад +1

      Thanks very much! It is the exact same footprint as the regular pi 5.

  • @youroldmangaming8150
    @youroldmangaming8150 13 дней назад +1

    mate this is perfect I am about to buy a couple of these!!!

    • @OminousIndustries
      @OminousIndustries  11 дней назад

      Glad to hear, they will definitely fit well into some of your projects!

  • @mooninthewater3705
    @mooninthewater3705 12 дней назад +2

    How about using it with the AI hat?

    • @OminousIndustries
      @OminousIndustries  11 дней назад +2

      Unfortunately the current Hailo AI hat does not work for llms (based on what they themselves have said) I have not personally tried it.

    • @mooninthewater3705
      @mooninthewater3705 11 дней назад +1

      @@OminousIndustries thanks for taking the time to comment. Appreciate you and your efforts, and this video.

    • @OminousIndustries
      @OminousIndustries  11 дней назад +1

      @@mooninthewater3705 No problem at all, thanks for the kind words!

  • @Qamar92
    @Qamar92 13 дней назад

    Nice videos. i cant afford anything on LLM AI hardware myself, but your videos satisfy my curiosity.
    Great work !

    • @OminousIndustries
      @OminousIndustries  11 дней назад

      Thanks very much! Soon the hardware will be more and more accessible :)

  • @newDell-q4w
    @newDell-q4w 13 дней назад +2

    How hot is that getting without a heatsink? Might be thermal throttling. Edit: I see you wondered that as well. Would be interesting to test with active cooling

    • @OminousIndustries
      @OminousIndustries  11 дней назад +2

      I did touch the cpu and it was extremely hot haha. Active cooling is on the agenda ASAP.

  • @lovebutnosoilder
    @lovebutnosoilder 11 дней назад

    Please do the orange pi vs

    • @OminousIndustries
      @OminousIndustries  10 дней назад

      I have a video on the 8gb pi 5 vs Opi here: ruclips.net/video/OXSsrWpIm8o/видео.html

  • @nithinbhandari3075
    @nithinbhandari3075 13 дней назад +5

    Nice video.
    Please test 7B or 8B model on orange pi NPU.
    Do multiple test like input token 0, input token 500, input token 1k, input token 4k and input token 7k.
    I am creating a AI notes app, so currently using Groq, but want to see that on hardware like nvidia (test in future) and self host, max how much token can i generate on 7B or 8B model.

    • @OminousIndustries
      @OminousIndustries  11 дней назад

      That's a great idea, I will try to squeeze that in!

  • @jeffwads
    @jeffwads 12 дней назад

    It is interesting to see what the llama 3.2 vision model is trained on. It is very impressive.

    • @OminousIndustries
      @OminousIndustries  11 дней назад

      Yes it is, I was impressed it got that, I would love to have the HP to run the 90b one at a decent quant but I need more than 48gb vram.

  • @gu9838
    @gu9838 5 дней назад

    not bad but clearly not really fast enough for regular use but interesting

    • @OminousIndustries
      @OminousIndustries  5 дней назад

      Agreed, only acceptable for smaller 1-3b models and anything above that gets very slow.

  • @FrankHouston-v5e
    @FrankHouston-v5e 13 дней назад

    Very useful video especially when you compare against similar priced SBC. PI foundation really needs to add an NPU!

    • @OminousIndustries
      @OminousIndustries  11 дней назад

      Thanks very much! Yes, I would like to see the introduction of a more AI focused Pi.

  • @ToTheMaxWorld
    @ToTheMaxWorld 10 дней назад

    i wish i had smart friends

  • @nikobellic570
    @nikobellic570 13 дней назад

    Does nvme SSD storage make a difference? Jeff geerling got interesting LLM results

    • @estusflask982
      @estusflask982 12 дней назад

      no, after the initial load it's stored in the ram

    • @OminousIndustries
      @OminousIndustries  11 дней назад

      I don't believe it would make a difference in this scenario, no.

  • @Danoman812
    @Danoman812 12 дней назад

    Are there smaller models that would make it a little quicker in it's replies? Or have i missed the whole idea on the models? lol
    I was thinking of that TARS AI thing they are building. I'm really not interested in it moving or even the vision side of it but, i would like to have it's brain. hahaha
    (as strange as that sounds... almost Frankenstein type talk)

    • @OminousIndustries
      @OminousIndustries  11 дней назад

      Yes there are small models like llama 1b and 3b that would be much much quicker. I wanted to test the larger models for this video as the new 16gb ram variant was able to run them, something that the previous 8gb max RAM pi could not!

  • @PerFeldvoss
    @PerFeldvoss 13 дней назад

    Just wondering how it performs with small models, will the extra ram give a boost or it the same . would like to have a help LLM for the vscode helper.., but not if it's so slow..

    • @OminousIndustries
      @OminousIndustries  11 дней назад

      If you were going to use a small Llama 1b or 3b I don't believe there would be a speed difference between the 8gb and 16gb pi.

  • @ianfoster99
    @ianfoster99 12 дней назад

    What is the best SLM to use for basic chat? I'm using RAG extensively (using c# code to hit database) and am looking for a SLM which supports function calling

    • @OminousIndustries
      @OminousIndustries  11 дней назад

      I can't definitively answer this, but I have had good luck with some of the Qwen models and function calling abilities.

  • @Seriouslydave
    @Seriouslydave 13 дней назад

    There's going to be a flood of mini pc's with some Linux distro targeting local llms next year. But what do we want them for.

    • @OminousIndustries
      @OminousIndustries  11 дней назад

      I'm okay with having more options! haha. Use cases are a different story

  • @estusflask982
    @estusflask982 12 дней назад

    why

  • @dr_harrington
    @dr_harrington 12 дней назад

    Would have been nice if you had another terminal window open to monitor temp while running ollama
    watch -n 1 'vcgencmd measure_temp'

    • @OminousIndustries
      @OminousIndustries  11 дней назад

      You're absolutely right. I will make sure to better show temps/etc in the future!

  • @nikobellic570
    @nikobellic570 13 дней назад

    Overclock it for marginally quicker results?

  • @ianfoster99
    @ianfoster99 12 дней назад

    Had the same progress reset issue on win 64bit for a large LLM

  • @MrKim-pt2vm
    @MrKim-pt2vm 13 дней назад

    Try run phi4

    • @OminousIndustries
      @OminousIndustries  11 дней назад

      I've been meaning to try phi4 at some point.

    • @MrKim-pt2vm
      @MrKim-pt2vm 11 дней назад

      @@OminousIndustries step by step tutorial without ollama. Only Python, only hardcore

    • @OminousIndustries
      @OminousIndustries  11 дней назад

      @@MrKim-pt2vm LOL hackathon vibes

  • @GeyzsonKristoffer
    @GeyzsonKristoffer 13 дней назад +1

    First!

  • @Lemure_Noah
    @Lemure_Noah 13 дней назад

    I guess it works better with llama3.2 1B and 3B, as well Phi 3 mini.

    • @OminousIndustries
      @OminousIndustries  11 дней назад

      Yes, these smaller models are much better suited for lower-powered hardware.

  • @brianbai9385
    @brianbai9385 13 дней назад +1

    Second!

  • @patrickng1287
    @patrickng1287 13 дней назад

    Impossible is possible

  • @ESGamingCentral
    @ESGamingCentral 12 дней назад

    I was expecting an egpu setup but he is not like Jeff

    • @OminousIndustries
      @OminousIndustries  11 дней назад

      I do actually have the components to do an e-gpu on the pi, at some point I would like to try it as his videos showcasing that were very very cool!

    • @ESGamingCentral
      @ESGamingCentral 11 дней назад

      @ yeah with the 16Gb that would be a first and with power consumption. I have the equipment also just busy

    • @OminousIndustries
      @OminousIndustries  11 дней назад

      @@ESGamingCentral I may try it sooner than later but only have a 3060 12gb to do it with.

    • @ESGamingCentral
      @ESGamingCentral 11 дней назад

      @ as far I’m aware you need an AMD card; I have a 6600XT. I don’t believe there are drivers for Nvidia cards in arm RPI

    • @OminousIndustries
      @OminousIndustries  11 дней назад

      @@ESGamingCentral That's very interesting and something I wasn't aware of. I don't actually have any modern AMD cards so damn haha

  • @rascalwind
    @rascalwind 13 дней назад

    When your doing a matrix multiply across 16GB vectors space it will definitely be slower than an 8GB vector space. Double the time.

    • @OminousIndustries
      @OminousIndustries  11 дней назад

      Yes, but if I am not mistaken, if the model was a smaller parameter model like a 1b or 3b the total system ram wouldn't make a difference in the speed as it would only be allocated across what was needed and the extra ram wouldn't come into play.