Private AI Revolution: Setting Up Ollama with WebUI on Raspberry Pi 5!

Поделиться
HTML-код
  • Опубликовано: 25 янв 2025

Комментарии • 77

  • @Clipzz.z
    @Clipzz.z Месяц назад +3

    installation starts at 19:20

    • @kevinmcaleer28
      @kevinmcaleer28  Месяц назад +1

      @Clipzz.z I added chapters!

    • @Clipzz.z
      @Clipzz.z Месяц назад +1

      Thanks for liking my comment❤️❤️❤️

    • @Clipzz.z
      @Clipzz.z Месяц назад

      @ that’s very helpful. Thank you for that

  • @jajabinx35
    @jajabinx35 11 месяцев назад +10

    Is there a way to give it a voice, and speech recognition?

  • @aravjain
    @aravjain Год назад +7

    Great video, Kevin!

  • @cartermclaughlin2908
    @cartermclaughlin2908 8 месяцев назад +2

    Is there anything like this that can do image recognition? I do wound care and have always wondered when my computer could help with wound assessment - provide measurements, describe wound bed, maybe even temperature with the right camera.
    I'm only at 15:30, but i get the impression you are not using TPU? I am just getting into raspberry pi and I'm looking for an excuse to get one. There only seem to be a couple of TPU options and I'm leaning toward the USB 3 version but I have no idea how to judge them.

  • @chrisumali9841
    @chrisumali9841 10 месяцев назад +2

    Thanks for the demo and info, have a great day

  • @xpavpushka
    @xpavpushka Год назад +7

    THX OLLAMA!

  • @galdakaMusic
    @galdakaMusic 7 месяцев назад +2

    Is posible remake this video with the new Hat AI Hailo 8L?

  • @lorisrobots
    @lorisrobots Год назад +3

    Great show!

  • @whitneydesignlabs8738
    @whitneydesignlabs8738 Год назад +5

    Thanks, Kevin! I have been running Ollama on a Pi5 for a few weeks now. But I look forward to following your docker example and trying out the GUI. My use case is robotics and still trying to solve how to run a Pi5 at full power on a DC battery. (harder than it seems)

    • @evanhiatt9755
      @evanhiatt9755 Год назад

      I am waiting for my pi5, and I plan to also use it for robotics and potentially run some local LLM. I would be curious to hear how you solve the power. Not much info online about it. Seems like the pi5 is picky? Not having it in person though, I can't test. I was hoping to just use a buck converter and power the pi5 in some way through that. I haven't seen any definitive videos or articles on this.

    • @whitneydesignlabs8738
      @whitneydesignlabs8738 Год назад

      True. There is almost nothing out on the Internet right now about using a Pi5 on dc battery source. (as of Feb, 2024). My current solution to be able to run all CPU cores at pedal to the metal, is using a small automotive inverter powered by a 36v Lion battery via a dc-dc BUCK to make 12vdc. Then plugging the OEM Pi wall wart into the inverter. It is the only way I have discovered to get max power. Even $150 200W power banks, claiming to be USB PD will not deliver more than about 10-12W max to the Pi5. Very frustrating as obviously a car inverter on a robot is a highly wonkified, not to mention going around three sides of a square in voltage conversion takes an efficiency toll.@@evanhiatt9755

    • @whitneydesignlabs8738
      @whitneydesignlabs8738 Год назад

      True. There is almost nothing out on the Internet right now about using a Pi5 on dc battery source. (as of Feb, 2024). My current solution to be able to run all CPU cores at pedal to the metal, is using a small automotive inverter powered by a 36v Lion battery via a dc-dc BUCK to make 12vdc. Then plugging the OEM Pi wall wart into the inverter. It is the only way I have discovered to get max power. Even $150 200W power banks, claiming to be USB PD will not deliver more than about 10-12W max to the Pi5. Very frustrating as obviously a car inverter on a robot is a highly wonkified, not to mention going around three sides of a square in voltage conversion takes an efficiency toll. @@evanhiatt9755

    • @dariovicenzo8139
      @dariovicenzo8139 11 месяцев назад

      Hi, what use cases in robotics if you can share. Thank you.

    • @whitneydesignlabs8738
      @whitneydesignlabs8738 11 месяцев назад +1

      Sure. Happy to share. I am running Ollama on the Pi5 using a local large language model. Running a local LLM really taxes the Pi5 to max CPU while it is processing. So any amount of throttling due to low power, slows down the processing time. And the processing time of a 3B LLM, while acceptable, the outputs are dodgy. So, trying to run 7B models... They outputs are acceptable, but processing time is noticeably slower. Every second counts when trying to have a conversation with a robot and have any hope of it feeling natural. Side note, a Pi4 handles animatronic eyes, speech to text and text to speech. I am working on passing this data via MQTT to the Pi5, whose sole job is to run the LLM. So I am all ready dividing labor best I can. I might have to just give up on the Pi5 for LLM and use a NUC. Still trying to resolve the power issue. @@dariovicenzo8139

  • @saokids
    @saokids 11 месяцев назад +1

    Nice tutorial , possible deploy ollama as stack in docker swarm as cluster improve performance?

  • @Yabbo06
    @Yabbo06 11 месяцев назад +2

    nice video !

  • @NicolasSilvaVasault
    @NicolasSilvaVasault 8 месяцев назад

    pretty good stuff, i'm planning to install ollama on a nas xd, hope your channel grows up

  • @hkiswatch
    @hkiswatch 8 месяцев назад +1

    Fantastic video. Thank you very much. I have a Coral TPU. Would it be possible to use it with Ollama on the Pi 5?

    • @kevinmcaleer28
      @kevinmcaleer28  7 месяцев назад

      Large language models need a lot of ram to run, so things like the Coral are not really suited to them.

  • @Bakobiibizo
    @Bakobiibizo 11 месяцев назад +1

    do you have a seed on your prompt? because if not you will generate the same prompt each time, with out some for of randomization a transformer model is more or less deterministic with most modern schedulers. If you do have a see you might have it cached.

  • @bx1803
    @bx1803 11 месяцев назад +1

    it would be awesome if you could tell it to ssh to other devices via ssh and run commands

  • @jonathanmellette8541
    @jonathanmellette8541 8 месяцев назад +1

    The first time you send a prompt it the WebUI loads the model into memory, so the first response always takes the longest.

  • @Systematic_Sean
    @Systematic_Sean 3 месяца назад

    If I arleady have a service running on localhost:8080 is there a way to have Ollama run on a different port?

  • @BillYovino
    @BillYovino Год назад +4

    Thanks for this. I'm having trouble getting started. I have a fresh install of bookworm on a RPi5. I installed Docker and created the compose.yaml file as shown in your blog. I'm getting an error when I try to run "docker-compose up -d". "yaml line 1:did not find expected key". Do you know what I'm missing?

    • @kevinmcaleer28
      @kevinmcaleer28  Год назад

      Can you share your compose file; its yaml so its very particular about spacing and formatting

    • @BillYovino
      @BillYovino Год назад

      @@kevinmcaleer28 I keep replying with the text from the file but I guess RUclips is blocking it. I'm using the file you posted on your blog and made sure that the file contains Linux style line feeds instead of Windows.

    • @kevinmcaleer28
      @kevinmcaleer28  11 месяцев назад +1

      @@BillYovinohave you joined our discord server (its completely free) - its easier to troubleshoot as you can share files and screenshots etc. Its hard for me to troubleshoot without looking at the file. There are YAML Validator sites you can run the file through, they will also tell you what is wrong

    • @BillYovino
      @BillYovino 11 месяцев назад

      @@kevinmcaleer28 I'm not getting the confirmation email from Discord after signing up from your link. I've checked all of my email boxes including spam and trash.

  • @navi-technology
    @navi-technology 11 месяцев назад

    minute 5:03, how can you display linux activity % like that, sir?

    • @kevinmcaleer28
      @kevinmcaleer28  11 месяцев назад

      Hi, I used the 'htop' command, it will show all the processes, and CPU, Memory and swapfile usage

  • @marioa6942
    @marioa6942 9 месяцев назад +1

    Is there anyway to reliably increase the performance of a raspberry pi 5 to support the larger models in ollama?

    • @kevinmcaleer28
      @kevinmcaleer28  8 месяцев назад

      Not that I'm aware of - the smaller models will run a lot faster, the pay off is in the accuracy/depth of knowledge contained in the model

  • @sidneyking11
    @sidneyking11 Год назад +2

    @kevinmcaleer, How would you run this on a proxmox server? would you use a lxc or full vm running ubuntu? Thank you

    • @kevinmcaleer28
      @kevinmcaleer28  Год назад +1

      I've not got a lot of experience of lxc (Linux Containers for anyone else whos reading), is supposed to be faster than docker, but I wouldn't know how to create an lxc container to run Ollama in. Ollama needs quite a few Python libraries so any bare bones linux with Python installed should be ok). Ollama is VERY resource intensive so it may steal all the compute cycles from anything else running on Proxmox. Worth trying though

    • @zacharymontgomery5597
      @zacharymontgomery5597 11 месяцев назад +1

      I was just wondering the same thing since I just installed proxmox on my home server

    • @davidbayliss3789
      @davidbayliss3789 11 месяцев назад +1

      I've installed it in an LXC container on Proxmox. I'm using Proxmox on a 13th gen I7 machine that has hybrid cores. In Proxmox it's easy to dedicate certain cores to VMs but a bit more faffing to say what E cores or P cores to assign to LXC containers. (I have a Windows VM on Proxmox and pass through the GPU to that to use as my daily-driver so it's not available to me to use with my containers etc. normally and normally I use LM Studio in Windows and it's server, for when I want GPU powered LLM models).
      All I've done so far is clone a priviledged Debian 12 LXC container (not very secure lol - should there be supply chain attacks etc. in the Python libraries for example), and I did the one-liner Ollama install in that and then tested with Postman.
      You can limit in the Proxmox GUI how much memory or CPU cores to give to the LXC container. (It only gives the container what it asks for up to the limit, and of course the container doesn't need memory for it's kernel as that's shared with the host). I gave this container a limit of 8 cores (leaving Proxmox to decide) and I've found that the container only maxes out 4 of them when I send Ollama a request from Postman.
      --
      My plan is to set-up an unpriviledged container for Ollama. I still want to install it directly in the LXC rather than in a docker container. I plan to use Podman for WebUi - hosting Podman in the same container and using Portainer with Podman. I have another unpriviledged LXC container using Cockpit stuff to run as a SMB NAS, and using "pct set" in Proxmox I can assign my underlying ZFS dataset from that NAS container to be available in my Ollama container. That's useful to store models in so I only need store them once (on fast nvme), and be available to the various utilities that might consume them.
      That's the plan anyway. Always limited by time. :)
      I try to script everything now-a-days because I can't remember a thing lol. I'm hoping LLMs with RAG and memory and agency will help me out there!
      Videos / channels like these really help with getting going quickly and can save a huge amount of time (thanks Kevin!). :)

    • @davidbayliss3789
      @davidbayliss3789 11 месяцев назад

      nb: I hope I can run Podman in an unprivileged LXC container (with nesting) ... I can't remember now off the top of my head what problems might get in the way that might effect my planned implementation - it was so long ago when I last looked.
      Just for experiments I'd use a VM for kernel separation and a privileged LXC container in that - efficient and easy to SSH into using Visual Studio Code remote explorer etc. so I can mess about with Ollama and running stuff besides it within the same "scope" etc. I'd rather have some sort of container rather than Python venv or anaconda etc. ... just less to think about in terms of isolation.
      As a "commodity" sort of thing, just running / listening on my hardware, 24/7, I like the idea of using an LXC container for it rather than in docker just to strip away some of the management complexity and use Proxmox, and while idle I want it to use very little resource. It's for that scenario that reducing the security footprint if possible makes sense.

  • @royotech
    @royotech 11 месяцев назад

    Ollama in Orange Pi 5 plus in NPU chip, it's possible?

  • @zerotsu9947
    @zerotsu9947 8 месяцев назад +1

    Can I run ollama on a powerful system and serve the webui on another system?

    • @kevinmcaleer28
      @kevinmcaleer28  8 месяцев назад

      Yep - you sure can

    • @zerotsu9947
      @zerotsu9947 8 месяцев назад

      @@kevinmcaleer28 can you help me with it.
      How about a video about this topic.

  • @peterhickman386
    @peterhickman386 11 месяцев назад +8

    This looked interesting so I went to the github page and followed the instructions to run the docker version and the first thing I saw was a sign in screen with the need to create an account! Not really sure this counts as private at this point

    • @kevinmcaleer28
      @kevinmcaleer28  11 месяцев назад +8

      That account is only stored on your machine (it’s not a cloud account). This is so you can provide this to many users, such as your family, school or business

    • @peterhickman386
      @peterhickman386 11 месяцев назад +3

      @@kevinmcaleer28 Thanks, it wasn't clear. Quite a few of the "run you own server" apps seem to need you to log into them with no real necessity so I noped out. But lets try that again 👍

  • @haydenc2742
    @haydenc2742 Год назад +1

    Can it run in a cluster on multiple RPi's?
    ooh looks like it can on a kubernetes cluster!!!

  • @onekycarscanners6002
    @onekycarscanners6002 11 месяцев назад +1

    Q how often will you need updates.

    • @kevinmcaleer28
      @kevinmcaleer28  11 месяцев назад

      You don’t need to update it ever, if you want newer and better versions of the model then you can download them whenever you want (when they are released)

  • @GeistschroffReikoku
    @GeistschroffReikoku 11 месяцев назад

    is there a way to update the ollama it builds? i follow the video and is working fine but gemma just come out and it works only with the latest ollama 1.26... and when i followed the video it installed 1.23... what do i have to erase or what do i do?

    • @kevinmcaleer28
      @kevinmcaleer28  11 месяцев назад

      Simply rebuild it using docker-compose build -no-cache and then docker-compose up -d

  • @vivekverma5178
    @vivekverma5178 9 месяцев назад

    How are you accessing rasberry pi from another computer

    • @vivekverma5178
      @vivekverma5178 9 месяцев назад

      I am able resolve this issue, thanks

  • @babbagebrassworks4278
    @babbagebrassworks4278 Год назад +2

    Got another reason to get a few Pi5's.

  • @Gabitrompeta
    @Gabitrompeta 9 месяцев назад

    Probably is the version v0.1.119But it is not working anymore. It doesnt have the option to download and has server 500

    • @kevinmcaleer28
      @kevinmcaleer28  9 месяцев назад

      Are you talking about ollama.com? I've just checked and the website is up and running

    • @Gabitrompeta
      @Gabitrompeta 9 месяцев назад

      @@kevinmcaleer28 The webui doesnt have the option to download the model anymore.
      It sends an error when you write dolphin-phi, for example.
      The /ollama/api send status 500 and I was not able to use a local language model...
      If you can clarify, maybe the new version of the webui doesnt allow to download anymore or the docke-compose file needs to be updated

  • @build.aiagents
    @build.aiagents 11 месяцев назад

    Phenomenal

  • @bryanteger
    @bryanteger 10 месяцев назад

    Been there done that 15-minute response times are just not good. Building myself a small llm box that's going to be going against my creedence low power. High efficiency servers, but it's the only way to acceptably run llms and other inference models.

  • @dnzrobotics
    @dnzrobotics 6 месяцев назад

    👍👍👍👍

  • @ADOTT261
    @ADOTT261 9 месяцев назад +1

    Gonna nitpick here, ollama is not an LLM - from Langchain "Ollama allows you to run open-source large language models, such as Llama 2, locally."

  • @JNET_Reloaded
    @JNET_Reloaded 11 месяцев назад

    wheressssstherevat linnnnnnnblog pos lth istrutins plz???

  • @zyroxiot9417
    @zyroxiot9417 11 месяцев назад

    Grato,

  • @fredpourlesintimes
    @fredpourlesintimes 9 месяцев назад

    Too slow

    • @kevinmcaleer28
      @kevinmcaleer28  8 месяцев назад +1

      Which model are you using? The larger the model, the slower it is.

  • @evoelias6035
    @evoelias6035 10 месяцев назад

    wtf !? container? docker? what happened to the simple exe file that runs with a mouse click? Most people are using windows, not linux. There is ollama for windows now. Is there any chance to install the WebUI without any additional program on windows? I dont want to have all that docker crap and dependencies on my machine.

    • @kevinmcaleer28
      @kevinmcaleer28  10 месяцев назад +4

      Three things - 1) this video is about running Ollama on a raspberry pi, which doesn’t run Windows desktop or server - most people who use a raspberry pi run Linux on it, 2) docker makes it easier to add and remove stuff on your computer without leaving a trace, 3) the webui is just the front end; it needs Ollama to run in the background - I’m sure you’ll find a windows
      Package 📦 for this. Hope this helps

    • @bryanteger
      @bryanteger 10 месяцев назад +2

      People serious about self-hosting don't use Windows Server. And docker works on Windows too. Containerized environments are awesome and you obviously don't want to learn. Otherwise, stop complaining and do the research yourself.

    • @MrChinkman37
      @MrChinkman37 9 месяцев назад

      Keep in mind, when you install an exe, all it has done is turned all of the same steps that you take going the long route and turn them into a single GUI type install to make you feel more comfortable and make things a little more streamlined.

  • @ypkoba84
    @ypkoba84 9 месяцев назад

    I get this error
    step 5/31 : FROM --platform=$BUILDPLATFORM node:21-alpine3.19 as build
    failed to parse platform : "" is an invalid component of "": platform specifier component must match "^[A-Za-z0-9_-]+$": invalid argument
    ERROR: Service 'ollama-webiu' failed to build : Build failed
    What am i doing wrong?

    • @kevinmcaleer28
      @kevinmcaleer28  9 месяцев назад

      you need to download the ollama webui first - step two: www.kevsrobots.com/blog/ollama.html

    • @ypkoba84
      @ypkoba84 9 месяцев назад

      @@kevinmcaleer28 Yeah i followed the instructions but still get the error

    • @kevinmcaleer28
      @kevinmcaleer28  9 месяцев назад

      Hi - this is due to a change in the webui code (for some reason they have switched to using arguments in the dockerfile which means it doesn't run automatically). I've updated the script that installs webui (now called open-webui) so that should fix things. let me know how you get on (you'll need to do a git pull in the folder where you run this to get the new changes)

    • @ypkoba84
      @ypkoba84 9 месяцев назад

      @@kevinmcaleer28 I am sorry I am quiet new to github and I only used the clone function. I tried the pull function but I am not sure I use the right arguments. In documentation I understand you need to write it as "git pull webui webui" as an example but I am not sure.