Kubernetes Network Policy Deep Dive

GPUs in Kubernetes for AI Workloads

What is Kubernetes? | Kubernetes Explained

Hollywood - Peso Pluma, Estevan Plazola (Video Oficial)

Trying EVERY Fast Food Holiday Item!

I Spent 100 Hours for IMPOSSIBLE Dragon Race V4 in Blox Fruits!

vLLM on Kubernetes in Production

Kubesimplify

Просмотров 4,2 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 1 янв 2025

Комментарии • 12

@JohnCodes 7 месяцев назад ⁺⁶
Thanks for having me on Saiyam!! It was alot of fun to show you how we use vLLM at OpenSauced!! Happy to answer any questions here people might have!
@DestinoDello 3 месяца назад
Can share the yaml for deployment please?
@aireddy 5 месяцев назад ⁺²
This is absolutely wonderful session to understand how can we deploy LLMs in production on Kubernetes cluster!!
@kubesimplify 5 месяцев назад
@@aireddy glad it was helpful!
@nickytonline 4 месяца назад
Great video and great breakdown @JohnCodes!
@umeshjaiswal5298 7 месяцев назад
Thanks for this tutorial Saiyam.
@kubesimplify 7 месяцев назад
Glad its useful, you building something with LLM?
@DaewonSuh 5 месяцев назад
Thanks for the wonderful Demo!
I was wondering why you deploy vllm pod through demonsets rather than deployments.
With daemonset, you can only deploy one pod in one node and a pod occupying a single gpu.
Considering that nodes are usually attached with multiple gpus, I am afraid that using daemonset might make a lot of gpus idle.
@shivangsharma1 5 месяцев назад ⁺¹
Loved it...❤
@kubesimplify 5 месяцев назад
Glad you found it useful!
@divyamchandel8734 6 месяцев назад
Hi John / Saiyam. In the last part you mentioned "In lot of cases could be cheaper"
What are those cases where locally hosting it is cheaper vs when using openai is cheaper:
Is it just dependent on the load which we will have (RPD and max RPM)?
@matrix9083 5 месяцев назад
openai is $.50 per million tokens for gpt 3.5 for example. If you rent a gpu server for that same amount, you can generate tens or hundred of millions of tokens in one hour depending on which text generation model you choose. something like mistral 7b, phi 3 series, llama 3 8b, gemma 2b,etc all deliver about the same results if not better than gpt 3.5 and also all fit on a gpu server that costs 44 cents per hour on runpod. (the A5000 gpu server for example.)

Следующие

Автовоспроизведение

Kubernetes Network Policy Deep Dive

Kubernetes Network Policy Deep Dive

GPUs in Kubernetes for AI Workloads

GPUs in Kubernetes for AI Workloads

What is Kubernetes? | Kubernetes Explained

What is Kubernetes? | Kubernetes Explained

Hollywood - Peso Pluma, Estevan Plazola (Video Oficial)

Hollywood - Peso Pluma, Estevan Plazola (Video Oficial)

Trying EVERY Fast Food Holiday Item!

Trying EVERY Fast Food Holiday Item!

I Spent 100 Hours for IMPOSSIBLE Dragon Race V4 in Blox Fruits!

I Spent 100 Hours for IMPOSSIBLE Dragon Race V4 in Blox Fruits!

Nardwuar vs. Chappell Roan

Nardwuar vs. Chappell Roan

Run Uncensored LLAMA on Cloud GPU for Blazing Fast Inference ⚡️⚡️⚡️

Run Uncensored LLAMA on Cloud GPU for Blazing Fast Inference ⚡️⚡️⚡️

Do NOT Learn Kubernetes Without Knowing These Concepts...

Do NOT Learn Kubernetes Without Knowing These Concepts...

Unlocking the Full Potential of GPUs for AI Workloads on Kubernetes - Kevin Klues, NVIDIA

Unlocking the Full Potential of GPUs for AI Workloads on Kubernetes - Kevin Klues, NVIDIA

Accelerating LLM Inference with vLLM

Accelerating LLM Inference with vLLM

CUDA Mode Keynote | Lily Liu | vLLM

CUDA Mode Keynote | Lily Liu | vLLM

Day 9/40 - Kubernetes Services Explained - ClusterIP vs NodePort vs Loadbalancer vs External

Day 9/40 - Kubernetes Services Explained - ClusterIP vs NodePort vs Loadbalancer vs External

Deploy LLMs using Serverless vLLM on RunPod in 5 Minutes

Deploy LLMs using Serverless vLLM on RunPod in 5 Minutes

Most Common Kubernetes Deployment Strategies (Examples & Code)

Most Common Kubernetes Deployment Strategies (Examples & Code)

Why is Kubernetes Popular | What is Kubernetes?

Why is Kubernetes Popular | What is Kubernetes?

Скулбой 3: НОВЫЙ (СРАНЫЙ ) ГОД - ТРЕЙЛЕР ( Будьте Осторожны С Желаниями ) Schoolboy 2 Trailer

Скулбой 3: НОВЫЙ (СРАНЫЙ ) ГОД - ТРЕЙЛЕР ( Будьте Осторожны С Желаниями ) Schoolboy 2 Trailer

НОВАЯ МАЛЫШКА и ГИТАРИСТ довели ДО СЛЕЗ на уроках ВОКАЛА И ГИТАРЫ | ПРАНК .ft Polnalyubvi

НОВАЯ МАЛЫШКА и ГИТАРИСТ довели ДО СЛЕЗ на уроках ВОКАЛА И ГИТАРЫ | ПРАНК .ft Polnalyubvi

Ловлю рыбку!🥰 #симбочка #симба #рыбалка

Ловлю рыбку!🥰 #симбочка #симба #рыбалка

Монеточка - Выше крыш (Новогоднее видео 2025)

Монеточка - Выше крыш (Новогоднее видео 2025)

Rusiyada yaşayan azərbaycanlılara XƏBƏRDARLIQ -MOSKVA VAXT VERDİ -Aprel ayından sonra ölkədən çıx...

Rusiyada yaşayan azərbaycanlılara XƏBƏRDARLIQ -MOSKVA VAXT VERDİ -Aprel ayından sonra ölkədən çıx...

Finally Big Size RC Bus Banadya 😍

Finally Big Size RC Bus Banadya 😍

главный минус популярности? #asata #борода #амирансардаров

главный минус популярности? #asata #борода #амирансардаров

‘’ОДИН В ТАЙГЕ” ПЕРВЫЕ СНАСТИ НА НАЛИМА. РЕМОНТ ТАЁЖНОЙ ИЗБЫ. НАИКРАСИВЕЙШИЕ МЕСТА! НОВЫЙ 2025 ГОД..

‘’ОДИН В ТАЙГЕ” ПЕРВЫЕ СНАСТИ НА НАЛИМА. РЕМОНТ ТАЁЖНОЙ ИЗБЫ. НАИКРАСИВЕЙШИЕ МЕСТА! НОВЫЙ 2025 ГОД..