NVIDIA Triton Inference Server and its use in Netflix's Model Scoring Service

NVIDIA DeepStream Technical Deep Dive : Multi-Object Tracker

NVIDIA DeepStream Technical Deep Dive: DeepStream Inference Options with Triton & TensorRT

7,780 calorie MASSIVE Slice of Pizza Challenge (36")

ASMR *warning* at 1:42 you will get tingles

Joker: Folie à Deux - Movie Review

Optimizing Model Deployments with Triton Model Analyzer

NVIDIA Developer

Просмотров 6 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 1 окт 2024
How do you identify the batch size and number of model instances for the optimal inference performance? Triton Model Analyzer is an offline tool that can be used to evaluate 100’s of configurations to meet the latency, throughput & memory requirements of your application.
Get started with model analyzer here: github.com/tri...
#Triton #Inference #ModelAnalyzer #AI

Комментарии • 6

@razalminhas6349 5 месяцев назад
Awesome overview. Better than a 100 presentations to make you understand what Triton Inference Server vs. Model Analyzer are.
@siddharthsharma4072 Год назад
I was facing issues when I followed the steps specified in the video. Seems like along with model repository we would need to bind the config volume e.g.
docker run -it --rm --gpus all -v /var/run/docker.sock:/var/run/docker.sock \
-v /home/ec2-user/SageMaker/workspace/model_repository:/models \
-v /home/ec2-user/SageMaker/workspace/output:/output \
-v /home/ec2-user/SageMaker/workspace/model_config:/config \
--net=host model-analyzer
if we don't bind config volume then model analyzer would throw error "can't find config ..."
@qfz3711758 2 года назад ⁺¹
Nice video👍👍👍
@ibrahimgul9716 Год назад
hi , thanks for this video and i got error such as below. do u have any suggestion ?
@ibrahimgul9716 Год назад
Model add_sub_config_3 load failed: [StatusCode.UNAVAILABLE] explicit model load / unload is not allowed if polling is enabled
[Model Analyzer]
[Model Analyzer] Creating model config: add_sub_config_4
[Model Analyzer] Enabling dynamic_batching
[Model Analyzer] Setting instance_group to [{'count': 5, 'kind': 'KIND_GPU'}]
@nneeerrrd 2 года назад ⁺¹
Echo 🤦‍♂️

Следующие

Автовоспроизведение

NVIDIA Triton Inference Server and its use in Netflix's Model Scoring Service

NVIDIA Triton Inference Server and its use in Netflix's Model Scoring Service

NVIDIA DeepStream Technical Deep Dive : Multi-Object Tracker

NVIDIA DeepStream Technical Deep Dive : Multi-Object Tracker

NVIDIA DeepStream Technical Deep Dive: DeepStream Inference Options with Triton & TensorRT

NVIDIA DeepStream Technical Deep Dive: DeepStream Inference Options with Triton & TensorRT

7,780 calorie MASSIVE Slice of Pizza Challenge (36")

7,780 calorie MASSIVE Slice of Pizza Challenge (36")

ASMR *warning* at 1:42 you will get tingles

ASMR *warning* at 1:42 you will get tingles

Joker: Folie à Deux - Movie Review

Joker: Folie à Deux - Movie Review

The Cringiest Enemies to Lovers Show

The Cringiest Enemies to Lovers Show

Creating a License Service for NVIDIA AI Enterprise or Virtual GPU

Creating a License Service for NVIDIA AI Enterprise or Virtual GPU

Шестопалов Егор - Как мы сервинг на Triton переводили

Шестопалов Егор - Как мы сервинг на Triton переводили

Exploring the Latency/Throughput & Cost Space for LLM Inference // Timothée Lacroix // CTO Mistral

Exploring the Latency/Throughput & Cost Space for LLM Inference // Timothée Lacroix // CTO Mistral

Deep Dive: Optimizing LLM inference

Deep Dive: Optimizing LLM inference

Inference Optimization with NVIDIA TensorRT

Inference Optimization with NVIDIA TensorRT

Scaling AI Workloads with Kubernetes: Sharing GPU Resources Across Multiple Containers - Jack Ong

Scaling AI Workloads with Kubernetes: Sharing GPU Resources Across Multiple Containers - Jack Ong

Intro to Triton: A Parallel Programming Compiler and Language, esp for AI acceleration (updated)

Intro to Triton: A Parallel Programming Compiler and Language, esp for AI acceleration (updated)

Deploying and Scaling AI Applications with the NVIDIA TensorRT Inference Server on Kubernetes

Deploying and Scaling AI Applications with the NVIDIA TensorRT Inference Server on Kubernetes

Lightning Talk: Triton Compiler - Thomas Raoux, OpenAI

Lightning Talk: Triton Compiler - Thomas Raoux, OpenAI

Lp. Сердце Вселенной #22 ВЕЛИКИЙ ОТЕЦ [Начало Церкви] • Майнкрафт

Lp. Сердце Вселенной #22 ВЕЛИКИЙ ОТЕЦ [Начало Церкви] • Майнкрафт

Блюдо в котором нельзя накосячить! Ссылка на полное видео в строке выше - жми и смотри #shorts

Блюдо в котором нельзя накосячить! Ссылка на полное видео в строке выше — жми и смотри #shorts

Этот пульт управляет пространством и временем!

Этот пульт управляет пространством и временем!

لحظة سقوط أعداد كبيرة للصواريخ الإيرانية على بئر السبع بإسرائيل

لحظة سقوط أعداد كبيرة للصواريخ الإيرانية على بئر السبع بإسرائيل

Учёные из Тринидад и Тобаго

Учёные из Тринидад и Тобаго

OYUNCAK DİREKSİYON İLE ARABAYI SÜRDÜ 😱

OYUNCAK DİREKSİYON İLE ARABAYI SÜRDÜ 😱

Израиль начал вторжение в Ливан

Израиль начал вторжение в Ливан

Иран атаковал Израиль ракетами. Бои за Угледар. Теракт в Тель-Авиве. Убийство военных РФ командирами

Иран атаковал Израиль ракетами. Бои за Угледар. Теракт в Тель-Авиве. Убийство военных РФ командирами