Exactly, I still have a bunch of 2gb 4B models that I am using for random things. I was very excited when I saw these were in stock and ran out to grab one haha
You mentioned that the Raspberry Pi got hot, if you didn’t have any cooling on it then it may of been getting throttled why you had a low initial test.
Yes, it was quite hot! I am going to get an active cooler for it and will test again to see if there is a difference. I was also wondering about throttling.
Nice test. I decided to add a second m.2 to my Jetson Orin Nano Super to give 20gb of virtual ram. Given the better Architecture it should run the 11b and 13b models at a reasonable pace. Exciting times for mobile platforms and local LLMs.
@@ianfoster99 Any fast SSD should be fine. I did a 256Gb but only set a ram swap of 20Gb. You don't want to over allocate. Plus wear rates will be higher so I have lots of unallocated space as regions go bad. You can set it up with the Disks utility in Ubuntu, or with python code. ChatGPT can help with that. Good luck.
Thats a very cool setup, with the increased swap. I am interested to know your results when you test the larger llms. Definitely exciting time for local llm and SBC's
Interesting! Could you or anyone on this thread share a link to a video explaining how to add this additional virtual ram with a second m.2 on the Jetson Orin Nano Super? Thanks!
Great video! I just wonder if this is the same exact dimensions as the regular pi 5 for cluster cases if so I’ll finish my cluster with this to replace the 8gb home assistant server. I think these pi’s are amazing for what they are.
How hot is that getting without a heatsink? Might be thermal throttling. Edit: I see you wondered that as well. Would be interesting to test with active cooling
Nice video. Please test 7B or 8B model on orange pi NPU. Do multiple test like input token 0, input token 500, input token 1k, input token 4k and input token 7k. I am creating a AI notes app, so currently using Groq, but want to see that on hardware like nvidia (test in future) and self host, max how much token can i generate on 7B or 8B model.
Are there smaller models that would make it a little quicker in it's replies? Or have i missed the whole idea on the models? lol I was thinking of that TARS AI thing they are building. I'm really not interested in it moving or even the vision side of it but, i would like to have it's brain. hahaha (as strange as that sounds... almost Frankenstein type talk)
Yes there are small models like llama 1b and 3b that would be much much quicker. I wanted to test the larger models for this video as the new 16gb ram variant was able to run them, something that the previous 8gb max RAM pi could not!
Just wondering how it performs with small models, will the extra ram give a boost or it the same . would like to have a help LLM for the vscode helper.., but not if it's so slow..
What is the best SLM to use for basic chat? I'm using RAG extensively (using c# code to hit database) and am looking for a SLM which supports function calling
Yes, but if I am not mistaken, if the model was a smaller parameter model like a 1b or 3b the total system ram wouldn't make a difference in the speed as it would only be allocated across what was needed and the extra ram wouldn't come into play.
16GB on a Pi was unheard of just a few years ago
Exactly, I still have a bunch of 2gb 4B models that I am using for random things. I was very excited when I saw these were in stock and ran out to grab one haha
You mentioned that the Raspberry Pi got hot, if you didn’t have any cooling on it then it may of been getting throttled why you had a low initial test.
Yes, it was quite hot! I am going to get an active cooler for it and will test again to see if there is a difference. I was also wondering about throttling.
Great work !!!!
Thanks very much!
Nice test. I decided to add a second m.2 to my Jetson Orin Nano Super to give 20gb of virtual ram. Given the better Architecture it should run the 11b and 13b models at a reasonable pace. Exciting times for mobile platforms and local LLMs.
I have 2 Orins ordered in UK. Do you recommend any particular m2 SSD? Is it just set up as swap? What command did you use? Thanks in advance
@@ianfoster99 Any fast SSD should be fine. I did a 256Gb but only set a ram swap of 20Gb. You don't want to over allocate. Plus wear rates will be higher so I have lots of unallocated space as regions go bad. You can set it up with the Disks utility in Ubuntu, or with python code. ChatGPT can help with that. Good luck.
Thats a very cool setup, with the increased swap. I am interested to know your results when you test the larger llms. Definitely exciting time for local llm and SBC's
Interesting! Could you or anyone on this thread share a link to a video explaining how to add this additional virtual ram with a second m.2 on the Jetson Orin Nano Super? Thanks!
Cool, thanks
Thanks for watching!
Great work, thanks a lot!
Thanks very much!
Great video! I just wonder if this is the same exact dimensions as the regular pi 5 for cluster cases if so I’ll finish my cluster with this to replace the 8gb home assistant server. I think these pi’s are amazing for what they are.
Thanks very much! It is the exact same footprint as the regular pi 5.
mate this is perfect I am about to buy a couple of these!!!
Glad to hear, they will definitely fit well into some of your projects!
How about using it with the AI hat?
Unfortunately the current Hailo AI hat does not work for llms (based on what they themselves have said) I have not personally tried it.
@@OminousIndustries thanks for taking the time to comment. Appreciate you and your efforts, and this video.
@@mooninthewater3705 No problem at all, thanks for the kind words!
Nice videos. i cant afford anything on LLM AI hardware myself, but your videos satisfy my curiosity.
Great work !
Thanks very much! Soon the hardware will be more and more accessible :)
How hot is that getting without a heatsink? Might be thermal throttling. Edit: I see you wondered that as well. Would be interesting to test with active cooling
I did touch the cpu and it was extremely hot haha. Active cooling is on the agenda ASAP.
Please do the orange pi vs
I have a video on the 8gb pi 5 vs Opi here: ruclips.net/video/OXSsrWpIm8o/видео.html
Nice video.
Please test 7B or 8B model on orange pi NPU.
Do multiple test like input token 0, input token 500, input token 1k, input token 4k and input token 7k.
I am creating a AI notes app, so currently using Groq, but want to see that on hardware like nvidia (test in future) and self host, max how much token can i generate on 7B or 8B model.
That's a great idea, I will try to squeeze that in!
It is interesting to see what the llama 3.2 vision model is trained on. It is very impressive.
Yes it is, I was impressed it got that, I would love to have the HP to run the 90b one at a decent quant but I need more than 48gb vram.
not bad but clearly not really fast enough for regular use but interesting
Agreed, only acceptable for smaller 1-3b models and anything above that gets very slow.
Very useful video especially when you compare against similar priced SBC. PI foundation really needs to add an NPU!
Thanks very much! Yes, I would like to see the introduction of a more AI focused Pi.
i wish i had smart friends
They exist, keep looking!
Does nvme SSD storage make a difference? Jeff geerling got interesting LLM results
no, after the initial load it's stored in the ram
I don't believe it would make a difference in this scenario, no.
Are there smaller models that would make it a little quicker in it's replies? Or have i missed the whole idea on the models? lol
I was thinking of that TARS AI thing they are building. I'm really not interested in it moving or even the vision side of it but, i would like to have it's brain. hahaha
(as strange as that sounds... almost Frankenstein type talk)
Yes there are small models like llama 1b and 3b that would be much much quicker. I wanted to test the larger models for this video as the new 16gb ram variant was able to run them, something that the previous 8gb max RAM pi could not!
Just wondering how it performs with small models, will the extra ram give a boost or it the same . would like to have a help LLM for the vscode helper.., but not if it's so slow..
If you were going to use a small Llama 1b or 3b I don't believe there would be a speed difference between the 8gb and 16gb pi.
What is the best SLM to use for basic chat? I'm using RAG extensively (using c# code to hit database) and am looking for a SLM which supports function calling
I can't definitively answer this, but I have had good luck with some of the Qwen models and function calling abilities.
There's going to be a flood of mini pc's with some Linux distro targeting local llms next year. But what do we want them for.
I'm okay with having more options! haha. Use cases are a different story
why
For science!
Would have been nice if you had another terminal window open to monitor temp while running ollama
watch -n 1 'vcgencmd measure_temp'
You're absolutely right. I will make sure to better show temps/etc in the future!
Overclock it for marginally quicker results?
Was scared to without any form of cooler haha
Had the same progress reset issue on win 64bit for a large LLM
Good to know I wasn't going nuts haha
Try run phi4
I've been meaning to try phi4 at some point.
@@OminousIndustries step by step tutorial without ollama. Only Python, only hardcore
@@MrKim-pt2vm LOL hackathon vibes
First!
Cheers!
I guess it works better with llama3.2 1B and 3B, as well Phi 3 mini.
Yes, these smaller models are much better suited for lower-powered hardware.
Second!
Cheers!
Impossible is possible
Absolutely!
I was expecting an egpu setup but he is not like Jeff
I do actually have the components to do an e-gpu on the pi, at some point I would like to try it as his videos showcasing that were very very cool!
@ yeah with the 16Gb that would be a first and with power consumption. I have the equipment also just busy
@@ESGamingCentral I may try it sooner than later but only have a 3060 12gb to do it with.
@ as far I’m aware you need an AMD card; I have a 6600XT. I don’t believe there are drivers for Nvidia cards in arm RPI
@@ESGamingCentral That's very interesting and something I wasn't aware of. I don't actually have any modern AMD cards so damn haha
When your doing a matrix multiply across 16GB vectors space it will definitely be slower than an 8GB vector space. Double the time.
Yes, but if I am not mistaken, if the model was a smaller parameter model like a 1b or 3b the total system ram wouldn't make a difference in the speed as it would only be allocated across what was needed and the extra ram wouldn't come into play.