DistServe: disaggregating prefill and decoding for goodput-optimized LLM inference

Поделиться
HTML-код
  • Опубликовано: 2 фев 2025

Комментарии • 3

  • @SC-uv5le
    @SC-uv5le 17 дней назад

    Safes me from reading the entire paper on a Friday 😂! Thanks very insightful! Great questions as well.

  • @kartikramesh9649
    @kartikramesh9649 18 дней назад

    really cool talk, thanks!

  • @PyTorch
    @PyTorch  3 месяца назад

    Slides available at: drive.google.com/file/d/1MDw6zBzQFc2mkgUCy09ORwFRZYb-UuyU/view