Great presentation. It is interesting to see the practical side of running a bunch of LLMs. Ops makes it happen. Coming from the old, really old, school of computing with massive multi-user, time-share systems, it is interesting to see how no matter how much computing changes, aspects of it remain the same. Through-put, latency, caching and scheduling is still central. All that seems to have changed is the problem domain. We do, in deed, live in intereswting times.
Great presentation. It is interesting to see the practical side of running a bunch of LLMs. Ops makes it happen. Coming from the old, really old, school of computing with massive multi-user, time-share systems, it is interesting to see how no matter how much computing changes, aspects of it remain the same. Through-put, latency, caching and scheduling is still central. All that seems to have changed is the problem domain. We do, in deed, live in intereswting times.
great talk! still waiting for the open source release 👀
hi, do you have any links to benchmarks you can run to measure latency, throughput for different model and frameworks etc?
Please create a video on fine tuning MoE LLM using LoRa adapters such as Mixtural 8x7B MoE LLM within your framework
This shall change the landscape of multiple agents if it's promised.
Github link😅
Second
First