Generative AI Inference Powered by NVIDIA NIM: Performance and TCO Advantage

Поделиться
HTML-код
  • Опубликовано: 30 сен 2024
  • Visualize the impact of high-performance generative AI inferencing with NVIDIA NIM microservices. This video showcases how NIM prebuilt, optimized microservices outperform popular alternative inferencing engines, delivering up to 3x more tokens per second throughput when running on the same NVIDIA accelerated infrastructure.
    The video demonstrates the benefits of NIM microservices through a crossword puzzle-solving application powered by LLMs, scaling concurrent LLM requests from 50 to 200. Watch the throughput advantage grow as the inferencing workload increases, and more tokens are processed per second on the same infrastructure to power more generative AI applications with lower overall TCO.
    0:15 - Value of optimizing generative AI inference for maximum performance
    0:28 - Overview of NIM microservices (nvda.ws/4bZLY9E)
    0:44 - Demo of a crossword puzzle-solving application deployed with NIM and popular alternative inferencing software
    1:33 - 2.4x more tokens per second when solving nearly 50 crosswords
    1:41 - 3x more tokens per second when solving 225 crosswords
    2:03 - Impact on business productivity
    Get started today at ai.nvidia.com: nvda.ws/3Y2Po7U
    Developer resources:
    ▫️ Learn more about NIM: nvda.ws/3yqsuNw
    ▫️ Join the NVIDIA Developer Program: nvda.ws/3OhiXfl
    ▫️ Access downloadable NIM microservices on the API catalog: nvda.ws/4bZLY9E
    ▫️ Read the Mastering LLM Techniques series to learn about inference optimization, LLM training, and more: resources.nvid...
    #inferencemicroservices #inferenceoptimization #api #selfhosting #modeldeployment #aimodel #LLM #generativeai #aimicroservices #nvidianim #generativeaideployment #aiinference #productiongenai #enterprisegenerativeai #acceleratedinference #nvidiaai #apicatalog

Комментарии • 2