vLLM Office Hours - Multimodal Models in vLLM with Roblox - August 8, 2024

Поделиться
HTML-код
  • Опубликовано: 16 янв 2025

Комментарии • 2

  • @hari000-f6y
    @hari000-f6y 4 месяца назад

    I have a question!. I'm serving multimodal on vLLM, quantized (InternVL2) on L4 , it takes ~5-6 secs to complete a request, so when multiple request hit at a time, it takes much time ~30 secs to complete the requests. how to handle it like multiple requests also gets completed in ~5 secs. I have less understanding in batch_requesting and all.

  • @shumshvenhiszali
    @shumshvenhiszali 5 месяцев назад

    Say code opensource but where?