Serving 100s of LLMs on 1 GPU with LoRAX - Travis Addair | Stanford MLSys #84

Поделиться
HTML-код
  • Опубликовано: 28 дек 2024
  • НаукаНаука

Комментарии • 8

  • @voncolborn9437
    @voncolborn9437 Год назад +2

    Great presentation. It is interesting to see the practical side of running a bunch of LLMs. Ops makes it happen. Coming from the old, really old, school of computing with massive multi-user, time-share systems, it is interesting to see how no matter how much computing changes, aspects of it remain the same. Through-put, latency, caching and scheduling is still central. All that seems to have changed is the problem domain. We do, in deed, live in intereswting times.

  • @conan_der_barbar
    @conan_der_barbar Год назад +1

    great talk! still waiting for the open source release 👀

  • @Gerald-iz7mv
    @Gerald-iz7mv 9 месяцев назад

    hi, do you have any links to benchmarks you can run to measure latency, throughput for different model and frameworks etc?

  • @suleimanshehu5839
    @suleimanshehu5839 Год назад

    Please create a video on fine tuning MoE LLM using LoRa adapters such as Mixtural 8x7B MoE LLM within your framework

  • @fastcardlastname3353
    @fastcardlastname3353 Год назад

    This shall change the landscape of multiple agents if it's promised.

  • @mohamedfouad1309
    @mohamedfouad1309 Год назад

    Github link😅

  • @nithinrao7191
    @nithinrao7191 Год назад

    Second

  • @absbi0000
    @absbi0000 Год назад

    First