Fun fact, Vulkan is *not* a Graphics API. It's a GPU programming API. "Presenting Graphics on a Screen" is a Vulkan extension the programmer will need to enable manually (VK_KHR_surface, if my memory does not let me down). Vulkan ultimately is meant to replace OpenGL *and* OpenCL
@@blisphul8084 I think the reason is flexibility. Vulkan supports a ton of different GPUs, so it is really hard to optimize Vulcan for any hardware. CUDA on the other hand runs on comparatively few GPUs, so Nvidia can pour tons of resources into optimizing it to be as fast as possible on their own hardware alone. It's similar to why Apple's ARM chips are so good, they are vertically integrated and by controlling the entire ecosystem they get a lot of opportunities that a more general vendor just does not have.
@@blisphul8084 as is OpenCL... CUDA is a proprietary Nvidia solution that is 100% vertically integrated. When you run CUDA, you run a nvidia programming language, processed by an Nvidia compiler (cudac) running on Nvidia hardware. I guess they are able to optimize things further. I have no idea, I am not a specialist in GPGPU programming…
I love videos on how to set up local AI, the companies always charge exorbitant amount for what you can do for free on a gaming GPU without being spied on
LOL.. Theres something of a huge difference between the LLMs that we mortals can run and the AIs such as Claude, chatgpt and the like... I could buy 4 4090s and have a world of fun building my local AI server.. but then what? Spend $10000 only for that hardware to be obsolete in 1-2 years. Not to mention the electricity cost of me running this 24/7 and the degradation of the hardware... Paying $20 a month is not that bad and not that i dont love the idea of hosting our own LLM, its just not even comparable.
It's also a good exercise for demystifying AI and realizing just how overblown the hype is. Especially with image generation-I'm no graphic designer, but I have consistently found it easier to composite an image in GIMP than get Stable Diffusion to follow a basic proompt.
I strongly disagree. We can sort of run a trained model to get an output. We cannot even come close to the computing power required to train these algorithms into a usable state.
The problem I see with this is model storage. I currently have a 1TB NVME drive connected to my Pi-5, which is pretty much a necessity for storing and playing with these models. This of course uses the only PCIe port on the Pi, so no way to connect a graphics card. I know, there are some adapters, but if you look into them you'll soon discover they have issues. If anyone has tested and tried one that'll allow an NVME disk and one of these graphics cards I'm all ears. Any thanks Jeff for finally testing and demonstrating this and all the work you've put into helping make the Pi better for everyone!
@@JeffGeerlingHere's hoping that the Raspberry Pi 6 will have more and faster pcie slots and lanes and maybe even an ITX version with a real PCIE slot.
If you keep the model loaded all the time, it's not an issue. The lag would get annoying if you swapped between different models like for a home assistant vs a coding assistant.
The other advantage of using an NVME is being able to have a large swap partition. It's not practical to do this with a micro SD or USB disk because of bandwidth and wear, but having a large swap on NVME will give you a little more flexibility to run slightly larger models.
For me its Transcribing, Text Extraction/Reformatting (PDF's other weird formats to Json), and other general OCR are three area's I'm very happy to have local AI applications, llama3.2-vision:11b on the latest Ollama and hopefully eventually Pixtral:12b with "relatively open" licenses may have quite a lot of interesting small business use cases all around. Having a low power setup (even if its janky) like this is a rack versus an Ubuntu Server that guzzles 2-5x the power on idle seems like a big win for the home-labbers & early adopters.
I found that used thin clients offer way better performance per watt than any pi setup. Currently my server on a 8500T HP thin client runs at 3.4watts idle. Yes it pulls 58w on full tilt, but it outperforms the rpi by leaps and bounds that it'll take less power for the same task at a way shorter time. The rpi 4 on the other hand runs at 4.4w idle. So I still am not convinced that the pi is in any way a good option for a server application if you can get a thin client with more RAM, better performance and lower effective power consumption for cheaper than a 8gb rpi 5.
@@chocolateztif you can get a cheap used thin client. The Pi is more easily available. I'd get the Pi and make a custom NUC casing for it and the piggybacked GPU.
@fallinginthed33p they are abundant on eBay, at least here in Europe. Yes you have to check the options and know what is good and what is bad, check power efficiency etc, but if you wanna build a server I think that is something everyone can do. Also, a rpi won't ever get close the performance from a GPU you could get on a even just decent pcie 4x4 interface you get on thin clients compared to the pcie 2x1 you get on the pi, and then you lose all other fast up options. I think it's a cool project for tinkering, but if you could get better performance and energy efficiency for the same price, on a better supported x86/64 platform, why go with the overpriced pi? Maybe in the future if they decide to use a better soc, give us some actual PCI lanes and don't price it unreasonably high just because they can, then I'll might consider it an option. But if such a future comes it's more likely that the used thin clients at that point will still be better value...
You don't have to worry about power usage if you have solar, wind, hydrothermal and/or hydroelectric power. I would suggest nuclear, but as you're a civilian that's not generally feasible.
@@JeffGeerlingI'd settle for an RTG. I have this fantasy that the DOE solves the Yucca Mountain bottleneck by paying citizens to bury spent fuel under their properties for a monthly stipend. I'd sign up in a heartbeat, especially if I could siphon some of the heat!
@@GSBarlevGood luck farming on it or occasionally suffering from radiation every once in a while. This needs to be done on a VERY small scale first, and IF it works, it should be implemented in a VERY safe manner.
@@fujinshu Oh naturally-it's pure fantasy. But what I will say is that you'd be surprised how much properly shielded material you'd need to bury-and how close it would need to be to the surface-in order for the dose to even come close to the background levels of radon already found in most basements.
Can't wait to see how you managed the local voice control. When Home Assistant did there voice project it was exciting but the results where at best Painful to get to work. Hopefully newer versions have fixed a lot of that.
AI on a PI, rolls off the tongue. We oughtta get Weird Al to write a spoof song to lift the Pi from the obscurity of the general public not knowing ANYTHING about Raspberry Pi!
Seriously. There is no "intelligence," in terms of reasoning. That's literally the point of the Transformer architecture-no recurrent layers, "Attention is All You Need."
Well that's only cause then you understand that true generative AI will never be accomplished on this platform but LLM will most likely be a small implementation of what that kind of model would look like. Or whatever.
@@Yuriel1981 generative AI I think is possible but it’s a long shot to have it before 2030. All the development gains in hardware and software simply are better methods of brute forcing data.
Thanks for the shout out! Great idea with llama.cpp. Up until now I’ve been doing CPU inference on an old Mac Pro 5,1 (+ lots of cheap RAM 😂). Will compare it to the Pi + 6600 XT tonight!
Great video, Jeff. I am closely following your LLM/Pi experiments. For the moment, I am still sticking to PCs with Nvidea cards. As we know, the Pi CPU can't handle LLMs well enough to be useful. But I will jump back to Pi experiments when things get a little better sorted out. (either Nvidea support, or simpler AMD installation). Thanks for keeping the local LLMs in your focus, despite your hesitation on the AI front.
I have a similar setup, but used a Latte Panda instead of a Pi, and was able to get a used Nvidia Tesla P41 running on it. I was able to pick the P41 up for ~$100 off of eBay. Going for the older server compute cards as a basis for a local LLM server is a much better IMO for a cost to performance ratio. They extra VRAM in the P41 compute card is hard to get close to in any consumer GPU.
This is an interesting topic-the ones available right now have the Hailo-8 and 8L, which aren't suitable for LLMs (unless extremely tiny), because they don't have much RAM. The Hailo-10 supposedly does, but I haven't seen one in the wild yet.
AMD drivers for linux is the best decision ever. Nvidia drivers is windows exclusive. I won't be surprised if Nvidia buy out Microsoft for open AI or their cloud ideas.
Nvidia has their 'open drivers' now, which are a lot better than the status quo a few years ago, but they're miles behind AMD on the Linux front (IMO), but only from a community perspective. Unfortunately with how prevalent CUDA support is, they still lead based on hardware + dev tools.
Thanks Jeff. Too much we give up privacy for convenience, but the cloud is just some else's computer. A little effort and video watching can get the average computer guy up and running with alot of creature comforts without the loss of data control
Given the cost of the setup, I'm curious how the base m4 mac mini might stack up against it. The storage on the base model is the obvious drawback. After seeing your storage hooked up through USB, perhaps a PCIE gen 3 or 4 m.2 through thunderbolt would be more than sufficient.
@@JeffGeerlingCan you see if you can get multiple Mac Minis in a cluster to run larger models too? The upgrade cost to add more ram is the same price as the whole system itself, so an m4 cluster is very appealing here.
I also wonder if you can scale to larger models with the number of mac minis too? So if you have 4 minis or 8 minis, can you run a 64 or 128GB model on the cluster with really good performance? If so, please test Mistral Large 123b on it.
Caring about a couple of watts probably isn't worth the effort of using a different cpu. Running a gpu on the pi is a cool concept regardless. Im not sure about the 80+ specification on a regular pc psu but wouldn't it make more sense to measure at the dc bus instead of at the wall? Depending if you can get a quality psu you might be able to power the riser and gpu with some homemade adapters off a 12v only power supply, idk a meanwell for security cameras. That would cut out the other atx bits and voltages from a regular psu and possibly increase efficiency. They are trying to go that way with atx12vo for efficiency in pc's, but then your drives need power off the motherboard or another step down thing for 5v and 3.3. Or even go the other way and run the pi off the psu's 5v rail to eliminate the need for another charger. No idea if those chargers are efficient but they sure aren't free if you're considering them in the setup price and pc psus should have plenty of amps left on the 5v rail to run multiple PI's
oh man… this makes me really really really wish there was a PCI dock that was built on a CM# carrier board. It would be soooooo slick to pop a CM4+RX580 or a CM5+6700 together depending on your use case. Thanks for blazing this trail!
Question: Jeff, which donation platform gives the largest percentage directly to you? We support you and all benefit from your open source documentation. Happy to make another donation. - As someone who has set up a quad AMD 7900XT rig for LLM use, I don't envy the amount of debugging you did!
GitHub Sponsors is the best, with Patreon just slightly behind. RUclips Memberships takes the biggest cut away, but it's more convenient for some people!
Neat! There's a nice Windows client on the MS store for ollama so I grabbed that to play around with local models on my laptop. Thanks for opening my eyes to that option.
Interesting. There's a lot of information available around Cuda/Tensor performance for "AI" (for me that goal is LLM for Home Assistant and inference for Frigate) but I'm unable to understand how AMD fits in this space. Notionally, there's better value to be had from AMD GPUs but how does that stack up in the tokens/sec comparison against NVidia? I'm starting out with an old intel platform so not limited by the ARM compatibility/drivers or concerned about power (it'll be an OMV server too).
@@GSBarlev I'm building my own cinema camera, will probaby resort to a off-the-shelf power bank yeah. It won't need to be powered by a typical power adapter anyway
I love this idea, but how practical is it from daily use perspective? Assuming I want to run a home lab with NAS, collab apps, home assist, video streaming, this personal assistant. I would think that a setup like this would require multiple rPIs over running on one desktop PC? Would we see real power savings in this case?
Hi, Jeff. Nice video. But in my opinion, it's a bit incorrect comparison of "RPi vs RPi + GPU". You can take RockChip NPU or Jetson Orin NX for the models you are using. It will be cheaper, and it will solve the home assistant case (maybe not 40 t/s speed, but it is much better than RPi). I hope that Hailo-10 will also be released soon. A few less popular platforms are working with LLMs as well.
The main reason I subscribed this channel is to run llms on pi I am big fan of these sbc hope the next will come up on nvidia graphics card working with cuda
There is an open PR to add Vulkan support to Ollama which would support Intel and AMD GPUs better and make it easier to integrate in home assistant and vscode, but ollama seems uninterested for some reason.
How about dual GPU's? And I'd automate it to make it more power efficiënt. With two smart sockets and automate it with entering zones in home assistant. Like if you're entering the zone home, then switch on power plug named GPU, wait 10 seconds and then turn on the power plug of the pi. This makes it even more affordable, since it doesn't have to run 24/7 since you're not at home. With my calculations you'll get two hours on full power when in use. You can even turn off the plugs at midnight if you want, by making an automation.
I wonder if Stable Diffusion/Onnxstream will recognize the GPU card through Vulkan? Local AI Images, Videos and LLM's. I priced a PC box needed to do this and it is outside my budget. Plus I love pushing my Pi's to doing useful things.
Cool stuff! Let’s say I built this rig exlusively for hosting LLM on it. And I have another laptop. And I have Joplin on that laptop with my notes. Everything on the same network. Is it possible to utilize this LLM remotely for such purpose? So to query LLM on the rig for my notes which are stored separately. Or any other files.
I'd love a Pi5 in Micro-ATX form factor with one or more PCIE sltos so you would not need all those cables and wires. Also, it would look smug if put inside a nice case together with the GPU. Is there a Pi5-Compute-Module to Micro-ATX solution out yet?
Love the setup with PI 5. But I think a used Nuc 9 extreme + 3060 12 Gb VRam (in total around $600) could be more cost effective. I would trade the different in price for the idle power consumption to the difficulty in the setup process.
Or just buy an older second hand PC and a second hand rtx 3090 with 24gb of memory 3090s are now really cheap because for gaming they are slower the a 4070 super. But do still have twice the vram
@@Merlinvn82 got mine for €600. Also bought a CPU, RAM and motherboard for €120. Did need to buy a new PSU though. So yeah it will be more expensive, but will have 2x the VRAM and more CUDA cores
i'm a bit of a newbie with these amd cards. how do they compare to the new m4 mac mini? is it cheaper (hardware or power wise) to go with the base end m4 mac mini or one with a pi + amd graphics card?
Good question; my M4 Mini (non-Pro) is pretty fast, and can handle models about as fast as the 6700 XT. It's not a bad option at all, and great for general purpose use too.
LLMs are a small subset of AI tech. So are search algorithms like what power google maps, and so are the algorithms that curate your social media feeds. So LLMs are AI, but AI is much more than just LLMs.
Raspberry Pi has a pretty good one in their documentation! If you want to see how it goes, try it sometime, it's not *too* intimidating, just do it on a fresh install of Pi OS and one that you don't need running (in case you mess things up-which is totally normal the first few times).
@@JeffGeerling That's my real challenge. I rarely use Pi OS. I'm just really comfortable with Ubuntu/KDE Plasma. The upside is 'Hey, they're both debian based', and of course, the nice thing about the pi is that I don't feel near as bad if I have to unplug the NVME and put in an SSD to try some stuff out. I think that is the more liberating thing I feel over my Windows 10 desktop.
Jeff, do you have an affiliate link to the smart power outlet things that you use and connect to your home assistant? Gonna add to my holiday gift wishlist..
Always has been. AI is a research field and researchers usually don't see a point in moving away from Nvidia if their cards work fine (and even the same cards can be used in production). Many projects also use custom CUDA kernels code, which complicates things more
I just got inspired by this and installed my first local language model using an old GTX 1080 with a Ryzen 3080. With llama 3.2 I'm getting about 70 tokens per second with this venerable old GPU!
I know this would be even more bottleneck but have you tried a GPU with the cm4? I’m not hopeful that the cm5 will have anymore PCIe lanes since it looks like the RP1 is in there and it’ll need some bandwidth to do its thing. Also, are you running these GPUs at PCIe 2.0 or 3.0? You might have mentioned it somewhere but it just hasn’t clicked for me.
the 4 kilobyte kernel page size issue were the same thing the asahi linux devs ran into with the m-series chips trying to run x86 games, i wonder how well their work on it with the micro-vms might translate over to the raspberry pi stuff
I had good luck and performance with an Intel N100 CPU running Ollama and llama3.2:3b. It would put out good T/s for shorter answers while boosting, and you were not hitting it constantly with question after question. Let it cool down a bit (~30 secs) and ask another question and it would do well. It isn't as good as my 3060 12GB, but usable.
Hey I was wondering if the rusticl OpenCL driver that's part of Mesa3D could compete with the Vulkan model, have you tried it? It should deliver greater performance than Rocm but I'm not sure about plain Vulkan. Also if you run this through Vulkan than would it go through RADV on Linux?
Interesting video, thank you for sharing this! Out of curiosity: how long does it take to compile llama.cpp? You are compiling using the Raspberry Pi’s SOC, right?
Hi Jeff do you think M2 riser adapter with power connectors could be working with RPI5? I found it on Aliexpress. "Laptop PC R43SG M.2 key M for NVMe External Graphics Card Stand Bracket PCIe3.0 x4 Riser Cable for ITX STX NUC VEGA64 GTX1080ti" Please check it thanks!
So I really want to create an arm-based gaming computer, which I could also program on and I think it would be cool to if this would work with the orange pie series cuz their CPUs tend to be faster and RAM also comes to me faster with more cores
Месяц назад
I know the wizard trick here is using the Raspberry in unorthodox ways, but it would be so nice to see how it compares when pairing the graphics cards with an N100 board. That's in the same ballpark in terms of cost and power usage as the Pi5, but it would enable much more, since it's just kitchen standard x86.
considering Microsoft has released the Iso for windows 11 arm, I'm curious to see if you could get one of these gpu's running in windows like a normal system (if the drivers dont shit themselves)
I have to admit, I originally thought what you were doing (supporting graphics cards on a PI ) was just for bragging rights, but I have changed my mind. Combining something like this with a project like EXO would allow clusters of old (e-waste) GPUs to be turned into quite potent LLMs. I watch with interest!
@@ESGamingCentral I assume you are referring to exo ? it is being worked on. someone has managed to get it to run on a single pi but is having issues with multiple pis. I have not tried it but I did try and get it to run on a Jetson and had issues with the clang compiler (at that point I gave up).
thanks for another great video! Maybe you've been asked this before, but where did you get the stickerboard that's in the background of your videos? i can't find one in that size!!
Couple of questions. What are your predictions about Intel, is it or will it be supported on Linux, and in this case would there be a chance running it with Raspberry? I'm getting more need for a more powerful computer. But I dont know what can and cannot run without nvidia. Games would be a bonus, but more importantely - to program on CUDA or whatever is the alternative on AMD/ATI cards and Intel, and experiment with LLMs and Machine learning. Nvidia if i understand is a nono on Linux. Do LLMs require specific technology or framework, like CUDA to run? Specific manufacturer? Or you can use any powerful enough grphics or phisics card with enough VRAM (there were crds for data centers without actual video outputs, second hads some of them are wery compelling priccewise for everything else besides gaming). To my defence I never owned a computer with dedicated graphics card.
What's the simplest way to get hardware encoding working on the pi 5. I have my media server running on my pi 5 but transcoding apparently doesn't work at all on the pi.
Wonder if it would be able to run on the RPi gpu as well eventually given that mesa devs are adding more and more vulkan support to the v3d driver. The V3DV driver recently got Vulkan 1.3 support (though it doesn't actually run most vk 1.3 stuff since it lacks a lot of other extensions still).
@@FlintStone-c3s I think it's mainly comes down to improvements in mesa at this stage but maybe it also needs some stuff in the kernel part of the driver for some of the missing vulkan extensions It's too bad the pi is based on a broadcom SoC though. Qualcomm's GPUs have by far the best and most feature-complete foss linux GPU drivers when it comes to arm system on a chips (though the new drivers for the early apple arm chips are starting to rival them) but idk if there are any single board computers using them
So far it hasn't been an issue-I have both the GPU PSU and the Pi PSU plugged into the same power strip, and I also don't hold both at the same time that much, so can't say if it's going to be a perfect solution or just works for me. Ideally there'd be a riser that powers the Pi and the card at once.
no, as reported by jeff, the kernel support for that chip is a mess, devs abandoning, por drivers, all is barely working on pi as you see, support is unmatched, and this is why no one talks about other sbcs, for the lack of such great support
RAM - they don't have the direct memory access required to accelerate anything faster than the CPU on the Pi. They'd only be able to operate on very tiny models.
When I watch this videos my first question which comes in my mind is: WHY ? I know it's useless to do things like this. BUT: It it's still very amusing to watch experiments like this. 😆
Fun fact, Vulkan is *not* a Graphics API. It's a GPU programming API. "Presenting Graphics on a Screen" is a Vulkan extension the programmer will need to enable manually (VK_KHR_surface, if my memory does not let me down).
Vulkan ultimately is meant to replace OpenGL *and* OpenCL
So why is it slower than cuda?
@@blisphul8084 I think the reason is flexibility. Vulkan supports a ton of different GPUs, so it is really hard to optimize Vulcan for any hardware. CUDA on the other hand runs on comparatively few GPUs, so Nvidia can pour tons of resources into optimizing it to be as fast as possible on their own hardware alone. It's similar to why Apple's ARM chips are so good, they are vertically integrated and by controlling the entire ecosystem they get a lot of opportunities that a more general vendor just does not have.
😮
@@blisphul8084 as is OpenCL... CUDA is a proprietary Nvidia solution that is 100% vertically integrated. When you run CUDA, you run a nvidia programming language, processed by an Nvidia compiler (cudac) running on Nvidia hardware.
I guess they are able to optimize things further. I have no idea, I am not a specialist in GPGPU programming…
@@lbgstzockt8493 So..cute! Closing the environment using elemental math on a closed system! More useful than competition.
I love videos on how to set up local AI, the companies always charge exorbitant amount for what you can do for free on a gaming GPU without being spied on
LOL.. Theres something of a huge difference between the LLMs that we mortals can run and the AIs such as Claude, chatgpt and the like...
I could buy 4 4090s and have a world of fun building my local AI server.. but then what? Spend $10000 only for that hardware to be obsolete in 1-2 years.
Not to mention the electricity cost of me running this 24/7 and the degradation of the hardware...
Paying $20 a month is not that bad and not that i dont love the idea of hosting our own LLM, its just not even comparable.
@@RocketLR newer Llama is surprisingly close
If you use their api instead of the public consumer product it’s much cheaper and they can’t train on your API calls
It's also a good exercise for demystifying AI and realizing just how overblown the hype is. Especially with image generation-I'm no graphic designer, but I have consistently found it easier to composite an image in GIMP than get Stable Diffusion to follow a basic proompt.
I strongly disagree. We can sort of run a trained model to get an output. We cannot even come close to the computing power required to train these algorithms into a usable state.
Alexa, the gift that keeps on taking. You are absolutely right that subscriptions are not needed to do this
The problem I see with this is model storage. I currently have a 1TB NVME drive connected to my Pi-5, which is pretty much a necessity for storing and playing with these models. This of course uses the only PCIe port on the Pi, so no way to connect a graphics card. I know, there are some adapters, but if you look into them you'll soon discover they have issues. If anyone has tested and tried one that'll allow an NVME disk and one of these graphics cards I'm all ears. Any thanks Jeff for finally testing and demonstrating this and all the work you've put into helping make the Pi better for everyone!
For my testing, I'm using a 1TB USB SSD. A little slower than NVMe, but only slightly-plenty fast for loading models in and out of RAM though!
@@JeffGeerlingHere's hoping that the Raspberry Pi 6 will have more and faster pcie slots and lanes and maybe even an ITX version with a real PCIE slot.
If you keep the model loaded all the time, it's not an issue. The lag would get annoying if you swapped between different models like for a home assistant vs a coding assistant.
The other advantage of using an NVME is being able to have a large swap partition. It's not practical to do this with a micro SD or USB disk because of bandwidth and wear, but having a large swap on NVME will give you a little more flexibility to run slightly larger models.
How about using PCIE switch? Wouldn't it make NVME to boot? I'm planning to buy one so I only had thought on it.
For me its Transcribing, Text Extraction/Reformatting (PDF's other weird formats to Json), and other general OCR are three area's I'm very happy to have local AI applications, llama3.2-vision:11b on the latest Ollama and hopefully eventually Pixtral:12b with "relatively open" licenses may have quite a lot of interesting small business use cases all around.
Having a low power setup (even if its janky) like this is a rack versus an Ubuntu Server that guzzles 2-5x the power on idle seems like a big win for the home-labbers & early adopters.
I found that used thin clients offer way better performance per watt than any pi setup. Currently my server on a 8500T HP thin client runs at 3.4watts idle. Yes it pulls 58w on full tilt, but it outperforms the rpi by leaps and bounds that it'll take less power for the same task at a way shorter time.
The rpi 4 on the other hand runs at 4.4w idle.
So I still am not convinced that the pi is in any way a good option for a server application if you can get a thin client with more RAM, better performance and lower effective power consumption for cheaper than a 8gb rpi 5.
@@chocolateztif you can get a cheap used thin client. The Pi is more easily available. I'd get the Pi and make a custom NUC casing for it and the piggybacked GPU.
@fallinginthed33p they are abundant on eBay, at least here in Europe. Yes you have to check the options and know what is good and what is bad, check power efficiency etc, but if you wanna build a server I think that is something everyone can do. Also, a rpi won't ever get close the performance from a GPU you could get on a even just decent pcie 4x4 interface you get on thin clients compared to the pcie 2x1 you get on the pi, and then you lose all other fast up options.
I think it's a cool project for tinkering, but if you could get better performance and energy efficiency for the same price, on a better supported x86/64 platform, why go with the overpriced pi?
Maybe in the future if they decide to use a better soc, give us some actual PCI lanes and don't price it unreasonably high just because they can, then I'll might consider it an option. But if such a future comes it's more likely that the used thin clients at that point will still be better value...
You don't have to worry about power usage if you have solar, wind, hydrothermal and/or hydroelectric power. I would suggest nuclear, but as you're a civilian that's not generally feasible.
If I have any shot at nuclear power for personal use, I'm taking it! :D
@@JeffGeerlingI'd settle for an RTG. I have this fantasy that the DOE solves the Yucca Mountain bottleneck by paying citizens to bury spent fuel under their properties for a monthly stipend.
I'd sign up in a heartbeat, especially if I could siphon some of the heat!
@@GSBarlevGood luck farming on it or occasionally suffering from radiation every once in a while.
This needs to be done on a VERY small scale first, and IF it works, it should be implemented in a VERY safe manner.
@@fujinshu Oh naturally-it's pure fantasy. But what I will say is that you'd be surprised how much properly shielded material you'd need to bury-and how close it would need to be to the surface-in order for the dose to even come close to the background levels of radon already found in most basements.
A few people in the world have run their own nuclear installation, but often the government wants to take it away for safety reasons.
YES! I was waiting for this video!
2:19 "This is my rifle. There are many like it, but this one is mine."
Can't wait to see how you managed the local voice control. When Home Assistant did there voice project it was exciting but the results where at best Painful to get to work. Hopefully newer versions have fixed a lot of that.
AI on a PI, rolls off the tongue. We oughtta get Weird Al to write a spoof song to lift the Pi from the obscurity of the general public not knowing ANYTHING about Raspberry Pi!
Heh, sans-serif does NOT like making it easy to distinguish between an I and an l!
Lol we need an update, it's not All About The Pentiums anymore
Weird Al is heroic
@@JeffGeerling blame Bob Marley, as he famously _shot the serif_ ... 🥁
Super sweet Jeff! Wow, its been a journey but here we are, thanks for all the hard work!
Would you be honoured or offended if we call our voice assistants "Jeff"?:D
have the AI say “as always, i’m jeff” 🤣
I just love your videos! Can't wait to see a local voice interface for Homeassistant !
Finally, someone else that isnt brain damaged with the "AI" hype. If you understand how an LLM works, you will never see it as "AI" again.
Seriously. There is no "intelligence," in terms of reasoning. That's literally the point of the Transformer architecture-no recurrent layers, "Attention is All You Need."
a better name for the hype is artificial stupidity.
Well that's only cause then you understand that true generative AI will never be accomplished on this platform but LLM will most likely be a small implementation of what that kind of model would look like. Or whatever.
@@Yuriel1981 generative AI I think is possible but it’s a long shot to have it before 2030.
All the development gains in hardware and software simply are better methods of brute forcing data.
narrow AI is just applied statistics
Thank you again *THIS* was what I was looking for to complete a specialist project idea I have been working on for years now
Thanks for the shout out!
Great idea with llama.cpp. Up until now I’ve been doing CPU inference on an old Mac Pro 5,1 (+ lots of cheap RAM 😂).
Will compare it to the Pi + 6600 XT tonight!
Keep it up Norm, this is GOLDEN!
Of all the youtubers I watch, I am confident that Jeff is the one who spends the most time watching youtube videos
2:20 missed opportunity! there are other boards like it, but this one is mine!
Thank you So much! Jeff for trying this out !!
Great video, Jeff. I am closely following your LLM/Pi experiments. For the moment, I am still sticking to PCs with Nvidea cards. As we know, the Pi CPU can't handle LLMs well enough to be useful. But I will jump back to Pi experiments when things get a little better sorted out. (either Nvidea support, or simpler AMD installation). Thanks for keeping the local LLMs in your focus, despite your hesitation on the AI front.
What a fantastic initiative. I’m going to try this in the holidays
You finally did it 🎉🎉🎉
I have a similar setup, but used a Latte Panda instead of a Pi, and was able to get a used Nvidia Tesla P41 running on it. I was able to pick the P41 up for ~$100 off of eBay. Going for the older server compute cards as a basis for a local LLM server is a much better IMO for a cost to performance ratio. They extra VRAM in the P41 compute card is hard to get close to in any consumer GPU.
WOW, Jeff - keep up the great work, just got a RP4, hope this works.
Unfortunately Pi 4 is limited to slower and smaller Ollama use.
Hey ... a comparison of the GPU and PI AI Hat will be interesting 😊 ... how is fastest on tokens ... and so on 😮
This is an interesting topic-the ones available right now have the Hailo-8 and 8L, which aren't suitable for LLMs (unless extremely tiny), because they don't have much RAM. The Hailo-10 supposedly does, but I haven't seen one in the wild yet.
AMD drivers for linux is the best decision ever. Nvidia drivers is windows exclusive. I won't be surprised if Nvidia buy out Microsoft for open AI or their cloud ideas.
Nvidia has their 'open drivers' now, which are a lot better than the status quo a few years ago, but they're miles behind AMD on the Linux front (IMO), but only from a community perspective. Unfortunately with how prevalent CUDA support is, they still lead based on hardware + dev tools.
That's very cool. Having your very own ChatGPT at home.
One that doesn't suck up all your private data, too!
Thanks Jeff. Too much we give up privacy for convenience, but the cloud is just some else's computer. A little effort and video watching can get the average computer guy up and running with alot of creature comforts without the loss of data control
Thanks. Didn't knew about running on vulkan option, now my 5700 XT works for some models
Given the cost of the setup, I'm curious how the base m4 mac mini might stack up against it. The storage on the base model is the obvious drawback. After seeing your storage hooked up through USB, perhaps a PCIE gen 3 or 4 m.2 through thunderbolt would be more than sufficient.
I've been testing, will maybe post something on Level2Jeff soon!
@@JeffGeerlingkeep us posted
Ok but the base model Mac Mini is smaller and cheaper than this setup, even more power efficient. Could you do a direct comparison?
A worthy idea! I haven't had a chance to get all my testing done on my M4 mini yet.
@@JeffGeerlingCan you see if you can get multiple Mac Minis in a cluster to run larger models too? The upgrade cost to add more ram is the same price as the whole system itself, so an m4 cluster is very appealing here.
I also wonder if you can scale to larger models with the number of mac minis too? So if you have 4 minis or 8 minis, can you run a 64 or 128GB model on the cluster with really good performance? If so, please test Mistral Large 123b on it.
Oh man, you make Raspberry Pi so interesting.
Caring about a couple of watts probably isn't worth the effort of using a different cpu. Running a gpu on the pi is a cool concept regardless. Im not sure about the 80+ specification on a regular pc psu but wouldn't it make more sense to measure at the dc bus instead of at the wall? Depending if you can get a quality psu you might be able to power the riser and gpu with some homemade adapters off a 12v only power supply, idk a meanwell for security cameras. That would cut out the other atx bits and voltages from a regular psu and possibly increase efficiency. They are trying to go that way with atx12vo for efficiency in pc's, but then your drives need power off the motherboard or another step down thing for 5v and 3.3. Or even go the other way and run the pi off the psu's 5v rail to eliminate the need for another charger. No idea if those chargers are efficient but they sure aren't free if you're considering them in the setup price and pc psus should have plenty of amps left on the 5v rail to run multiple PI's
oh man… this makes me really really really wish there was a PCI dock that was built on a CM# carrier board. It would be soooooo slick to pop a CM4+RX580 or a CM5+6700 together depending on your use case. Thanks for blazing this trail!
These are super fun videos. Not practical for most of us, but cool to see.
Question: Jeff, which donation platform gives the largest percentage directly to you?
We support you and all benefit from your open source documentation. Happy to make another donation.
- As someone who has set up a quad AMD 7900XT rig for LLM use, I don't envy the amount of debugging you did!
GitHub Sponsors is the best, with Patreon just slightly behind. RUclips Memberships takes the biggest cut away, but it's more convenient for some people!
AI. the 'cloud' buzzword of 2024.
Need a new version of the Cloud-to-Butt plugin!
@alch3myau You GOT it right. And thats what he is doing: capitlizing on half baked tech for the clicks.
Neat! There's a nice Windows client on the MS store for ollama so I grabbed that to play around with local models on my laptop. Thanks for opening my eyes to that option.
Thank you for making this
Interesting. There's a lot of information available around Cuda/Tensor performance for "AI" (for me that goal is LLM for Home Assistant and inference for Frigate) but I'm unable to understand how AMD fits in this space.
Notionally, there's better value to be had from AMD GPUs but how does that stack up in the tokens/sec comparison against NVidia?
I'm starting out with an old intel platform so not limited by the ARM compatibility/drivers or concerned about power (it'll be an OMV server too).
Just amazing Jeff. And I think this will encurise manufactorers to create somethink like this to compute with Microsoft and Sony.
Id love to learn about RPI's with batteries. Battery packs? Powerbanks? UPSs?
I looked into this a while back. There are a few UPS HATs. With that, given how little power Pis require, you can use any off-the-shelf power bank.
@@GSBarlev I'm building my own cinema camera, will probaby resort to a off-the-shelf power bank yeah. It won't need to be powered by a typical power adapter anyway
I love this idea, but how practical is it from daily use perspective?
Assuming I want to run a home lab with NAS, collab apps, home assist, video streaming, this personal assistant.
I would think that a setup like this would require multiple rPIs over running on one desktop PC? Would we see real power savings in this case?
So when's the first full-sized GPU HAT+ coming I wonder?
CM5 with CM5I/O board with PCIe slot?
Hi, Jeff. Nice video.
But in my opinion, it's a bit incorrect comparison of "RPi vs RPi + GPU". You can take RockChip NPU or Jetson Orin NX for the models you are using. It will be cheaper, and it will solve the home assistant case (maybe not 40 t/s speed, but it is much better than RPi).
I hope that Hailo-10 will also be released soon. A few less popular platforms are working with LLMs as well.
Hailo-10 will be an interesting bit of hardware to test. Hope they are more available at some point.
The main reason I subscribed this channel is to run llms on pi I am big fan of these sbc hope the next will come up on nvidia graphics card working with cuda
To get the total joules per run, just calculate the area under you power consumption graph (watts * seconds = joules).
Could you go into more detail about the Raspberry's PCI-E lanes and which cards can cope with fewer lanes and how this actually works?
Thank you 😊
There is an open PR to add Vulkan support to Ollama which would support Intel and AMD GPUs better and make it easier to integrate in home assistant and vscode, but ollama seems uninterested for some reason.
We'll see... I'm going to try running that on my Pi setup!
Please videos of you working on the kernel and drivers (low level stuff) we'd love that.
How about dual GPU's? And I'd automate it to make it more power efficiënt. With two smart sockets and automate it with entering zones in home assistant. Like if you're entering the zone home, then switch on power plug named GPU, wait 10 seconds and then turn on the power plug of the pi. This makes it even more affordable, since it doesn't have to run 24/7 since you're not at home. With my calculations you'll get two hours on full power when in use. You can even turn off the plugs at midnight if you want, by making an automation.
I wonder if Stable Diffusion/Onnxstream will recognize the GPU card through Vulkan? Local AI Images, Videos and LLM's. I priced a PC box needed to do this and it is outside my budget. Plus I love pushing my Pi's to doing useful things.
Waiting for your local voice assistants video. Great stuff!
Cool stuff!
Let’s say I built this rig exlusively for hosting LLM on it. And I have another laptop. And I have Joplin on that laptop with my notes. Everything on the same network. Is it possible to utilize this LLM remotely for such purpose? So to query LLM on the rig for my notes which are stored separately.
Or any other files.
I'd love a Pi5 in Micro-ATX form factor with one or more PCIE sltos so you would not need all those cables and wires. Also, it would look smug if put inside a nice case together with the GPU.
Is there a Pi5-Compute-Module to Micro-ATX solution out yet?
Love the setup with PI 5. But I think a used Nuc 9 extreme + 3060 12 Gb VRam (in total around $600) could be more cost effective. I would trade the different in price for the idle power consumption to the difficulty in the setup process.
Definitely a viable option.
Or just buy an older second hand PC and a second hand rtx 3090 with 24gb of memory
3090s are now really cheap because for gaming they are slower the a 4070 super. But do still have twice the vram
@@FuZZbaLLbee a used RTX 3090 alone could be around $700 already....
@@Merlinvn82 got mine for €600. Also bought a CPU, RAM and motherboard for €120.
Did need to buy a new PSU though. So yeah it will be more expensive, but will have 2x the VRAM and more CUDA cores
i'm a bit of a newbie with these amd cards. how do they compare to the new m4 mac mini? is it cheaper (hardware or power wise) to go with the base end m4 mac mini or one with a pi + amd graphics card?
Good question; my M4 Mini (non-Pro) is pretty fast, and can handle models about as fast as the 6700 XT. It's not a bad option at all, and great for general purpose use too.
Finally someone calling the right turn, LLM is not an AI, all of marketing junks need to be cleared out of the tech space
LLMs are a small subset of AI tech. So are search algorithms like what power google maps, and so are the algorithms that curate your social media feeds. So LLMs are AI, but AI is much more than just LLMs.
Lmao LLM is an AI model
What about the instinct cards with boat loads of memory on eBay for 100$ to 300$? Directly used for accelerating AI tasks?
I'm always nervous when I hear 'Compile your own linux kernel'. Is there a HOWTO I can read on this?
Raspberry Pi has a pretty good one in their documentation! If you want to see how it goes, try it sometime, it's not *too* intimidating, just do it on a fresh install of Pi OS and one that you don't need running (in case you mess things up-which is totally normal the first few times).
@@JeffGeerling That's my real challenge. I rarely use Pi OS. I'm just really comfortable with Ubuntu/KDE Plasma. The upside is 'Hey, they're both debian based', and of course, the nice thing about the pi is that I don't feel near as bad if I have to unplug the NVME and put in an SSD to try some stuff out. I think that is the more liberating thing I feel over my Windows 10 desktop.
Jeff, do you have an affiliate link to the smart power outlet things that you use and connect to your home assistant? Gonna add to my holiday gift wishlist..
Are a lot of AI/ML projects still tied to cuda or is this no longer the case?
Always has been. AI is a research field and researchers usually don't see a point in moving away from Nvidia if their cards work fine (and even the same cards can be used in production). Many projects also use custom CUDA kernels code, which complicates things more
tyvm for great vid
I just got inspired by this and installed my first local language model using an old GTX 1080 with a Ryzen 3080. With llama 3.2 I'm getting about 70 tokens per second with this venerable old GPU!
I wonder how this setup compares with the new base-model Mac Mini for running LLMs
Testing that :)
This man has no limits. He will make a pi do ANYTHING.
I know this would be even more bottleneck but have you tried a GPU with the cm4? I’m not hopeful that the cm5 will have anymore PCIe lanes since it looks like the RP1 is in there and it’ll need some bandwidth to do its thing. Also, are you running these GPUs at PCIe 2.0 or 3.0? You might have mentioned it somewhere but it just hasn’t clicked for me.
the 4 kilobyte kernel page size issue were the same thing the asahi linux devs ran into with the m-series chips trying to run x86 games, i wonder how well their work on it with the micro-vms might translate over to the raspberry pi stuff
I had good luck and performance with an Intel N100 CPU running Ollama and llama3.2:3b. It would put out good T/s for shorter answers while boosting, and you were not hitting it constantly with question after question. Let it cool down a bit (~30 secs) and ask another question and it would do well. It isn't as good as my 3060 12GB, but usable.
I'm really curious about local LLM, but it's sad that the compatibility is limited .
Teşekkürler Allah razı olsun
i was thinking the exact same thing when you did the other video with the gpu for gaming
thanks for everything
Great work
Hey I was wondering if the rusticl OpenCL driver that's part of Mesa3D could compete with the Vulkan model, have you tried it? It should deliver greater performance than Rocm but I'm not sure about plain Vulkan. Also if you run this through Vulkan than would it go through RADV on Linux?
how does it compare to a jetson AGX btw ?
Interesting video, thank you for sharing this! Out of curiosity: how long does it take to compile llama.cpp? You are compiling using the Raspberry Pi’s SOC, right?
yeah. it was maybe 10 minutes or so? I cant recall exactly.
@@JeffGeerling ah interesting. I thought it would have taken much longer, more like a couple hours or so. Thanks for your reply! :)
Great shirt! 😅
Haha was wondering when you'd spot it!
Hi Jeff do you think M2 riser adapter with power connectors could be working with RPI5? I found it on Aliexpress. "Laptop PC R43SG M.2 key M for NVMe External Graphics Card Stand Bracket PCIe3.0 x4 Riser Cable for ITX STX NUC VEGA64 GTX1080ti" Please check it thanks!
Looking forward to plug in my rx 6400 into the pi, to make a tiny form factor arm pc
Smooth move on the BlueSky reference by asking about... the sky.
So I really want to create an arm-based gaming computer, which I could also program on and I think it would be cool to if this would work with the orange pie series cuz their CPUs tend to be faster and RAM also comes to me faster with more cores
I know the wizard trick here is using the Raspberry in unorthodox ways, but it would be so nice to see how it compares when pairing the graphics cards with an N100 board. That's in the same ballpark in terms of cost and power usage as the Pi5, but it would enable much more, since it's just kitchen standard x86.
We're finally here! GPU's on pi's
Jeff just killing it with the coding!
considering Microsoft has released the Iso for windows 11 arm, I'm curious to see if you could get one of these gpu's running in windows like a normal system (if the drivers dont shit themselves)
So far Nvidia and Microsoft haven't released ARM64 Windows drivers for their cards yet, though I'm told they might exist.
I have to admit, I originally thought what you were doing (supporting graphics cards on a PI ) was just for bragging rights, but I have changed my mind. Combining something like this with a project like EXO would allow clusters of old (e-waste) GPUs to be turned into quite potent LLMs. I watch with interest!
Exp runs on rpi?
@@ESGamingCentral I assume you are referring to exo ? it is being worked on. someone has managed to get it to run on a single pi but is having issues with multiple pis. I have not tried it but I did try and get it to run on a Jetson and had issues with the clang compiler (at that point I gave up).
thanks for another great video! Maybe you've been asked this before, but where did you get the stickerboard that's in the background of your videos? i can't find one in that size!!
It's custom from a 2x4 sheet of marker board from Lowe's!
@ thank you!! (:
Couple of questions. What are your predictions about Intel, is it or will it be supported on Linux, and in this case would there be a chance running it with Raspberry?
I'm getting more need for a more powerful computer. But I dont know what can and cannot run without nvidia. Games would be a bonus, but more importantely - to program on CUDA or whatever is the alternative on AMD/ATI cards and Intel, and experiment with LLMs and Machine learning.
Nvidia if i understand is a nono on Linux.
Do LLMs require specific technology or framework, like CUDA to run? Specific manufacturer? Or you can use any powerful enough grphics or phisics card with enough VRAM (there were crds for data centers without actual video outputs, second hads some of them are wery compelling priccewise for everything else besides gaming).
To my defence I never owned a computer with dedicated graphics card.
What's the simplest way to get hardware encoding working on the pi 5. I have my media server running on my pi 5 but transcoding apparently doesn't work at all on the pi.
Is there specific Pi5 code you deployed or would it run also on e.g. an OrangePi 5 Plus with its PCIe 3.0 x4 and 32GB RAM? It might be better suited?
The RK3588 has some PCIe bus quirks, so far nobody's been able to work around its quirks to get a GPU going.
@JeffGeerling that's a shame. Thank you for these insights!
Wonder if it would be able to run on the RPi gpu as well eventually given that mesa devs are adding more and more vulkan support to the v3d driver. The V3DV driver recently got Vulkan 1.3 support (though it doesn't actually run most vk 1.3 stuff since it lacks a lot of other extensions still).
6.13 Kernel has better support for Pi's, early next year release?
@@FlintStone-c3s I think it's mainly comes down to improvements in mesa at this stage but maybe it also needs some stuff in the kernel part of the driver for some of the missing vulkan extensions
It's too bad the pi is based on a broadcom SoC though. Qualcomm's GPUs have by far the best and most feature-complete foss linux GPU drivers when it comes to arm system on a chips (though the new drivers for the early apple arm chips are starting to rival them) but idk if there are any single board computers using them
Great job! Awesome video. Did you have to do anything to share the ground between graphics card and raspberry pi?
So far it hasn't been an issue-I have both the GPU PSU and the Pi PSU plugged into the same power strip, and I also don't hold both at the same time that much, so can't say if it's going to be a perfect solution or just works for me. Ideally there'd be a riser that powers the Pi and the card at once.
Use more powerful arm sbc's like rk3588 Or radxa nio 12l with mediatek for better performance
no, as reported by jeff, the kernel support for that chip is a mess, devs abandoning, por drivers, all is barely working
on pi as you see, support is unmatched, and this is why no one talks about other sbcs, for the lack of such great support
Quick question: Why can't I use CoralTPU devices for LLMs?
RAM - they don't have the direct memory access required to accelerate anything faster than the CPU on the Pi. They'd only be able to operate on very tiny models.
When I watch this videos my first question which comes in my mind is: WHY ? I know it's useless to do things like this. BUT: It it's still very amusing to watch experiments like this. 😆
How is this compare to the m4 Mac mini ?
Testing this now!
I wonder how that compares to a base Mac mini m4
Testing now!
can you test rpi5 + AI hat for llama model to use in home assistant?