Ollama with GPU on Kubernetes: 70 Tokens/sec !

Поделиться
HTML-код
  • Опубликовано: 17 дек 2024

Комментарии • 15

  • @JoeIrizarry88
    @JoeIrizarry88 12 дней назад

    Nice. Time Slicing piece was the cherry to make the Sunday.

  • @farzadmf
    @farzadmf 28 дней назад +1

    Great video, thank you!

    • @mathisve
      @mathisve  28 дней назад

      Thank you for the feedback! Any topics you'd like me to cover?

    • @farzadmf
      @farzadmf 28 дней назад

      @@mathisve Nothing comes to mind; just enjoying your videos 🙂

  • @satyarsh665
    @satyarsh665 27 дней назад +1

    very helpful thank you very much

    • @mathisve
      @mathisve  26 дней назад

      Thanks for the feedback! Are there any other topics you would like to see me cover?

    • @satyarsh665
      @satyarsh665 19 дней назад

      @@mathisve Well i'm very new to the whole k8s thing and got this video recommended :) maybe more DevOps stuff perhaps?

  • @MsSteganos
    @MsSteganos 17 дней назад +1

    Great video, its original content, not seen in youtube. Congratulations
    Its possible to ollama in multi nodes and multi gpu in a kubenetes cluster ? Its better performance if run ollama 2 gpus in 1 node or if run ollama in 2 nodes and 1 gpu per node ?

  • @jorgearagon8053
    @jorgearagon8053 26 дней назад +2

    Hi, this video and series in general is great. as someone interested in using AWS and other cloud providers, I would like to know how much this exercise as well as the others actually costed. could you please indicate the total costs of the activity so I can try to replicate it without and unexpected hidden cost? thx in advance

    • @mathisve
      @mathisve  26 дней назад +1

      Thanks! The instance I used (g4dn.2xlarge) costs about $0.75 an hour. Meaning I probably spent around $2-3 making this video. For reference, running this instance for an entire month would cost about $550.

  • @Earthvssuna
    @Earthvssuna 17 дней назад

    Which are the benefits of running ollama with k8s in the cloud instead of ollama container in the cloud without k8s? Thanks very much for this video

    • @mathisve
      @mathisve  16 дней назад +1

      If you are running a single instance of ollama, there aren't many benefits. If you need lots of ollama instances (for a public api, or text generation, etc) using Kubernetes will help you simplify the operations.

    • @Earthvssuna
      @Earthvssuna 8 дней назад

      @ we are about 1000 people in the company…. How would you approach this for using ollama i want to have some way from tracking which user is using it how much tokens. Not the content but to know how much which department is using it

  • @Earthvssuna
    @Earthvssuna 11 дней назад

    Hey another question i had was that i found a video where it shows that ollama on its simolest installation is already able to be opened up in several terminals and responding. Concurrency was until a few months ago not possible but now it is. So i wonder why do all this hussle in making virtual gpus to make better availability. Im sure you know why. Its just what i am struggling to understand right now. Hope you find time to answer again. Thanks very much much 😊

    • @Earthvssuna
      @Earthvssuna 11 дней назад

      ruclips.net/video/8r_8CZqt5yk/видео.htmlsi=PzyzG4KSBiM371e8