Видео 324
Просмотров 509 988

Optimizing LLM Inference with AWS Trainium, Ray, vLLM, and Anyscale

Scalable and Cost Efficient AI Workloads with AWS and Anyscale

Anyscale Job Queues

The Anyscale Unified Log Viewer

Anyscale Replica Compaction

Fast and Scalable Model Training with PyTorch and Ray

Ray Summit 2024

We started Ray Summit 2024 off with a look at all of the amazing things the #Ray community is doing with AI. AI is built by Ray and powered by you!

Просмотров: 777

Видео

Optimizing LLM Inference with AWS Trainium, Ray, vLLM, and Anyscale

Optimizing LLM Inference with AWS Trainium, Ray, vLLM, and Anyscale

Optimizing LLM Inference with AWS Trainium, Ray, vLLM, and Anyscale

Просмотров 41621 день назад

Webinar Details Organizations are deploying LLMs for inference across many workloads. A common challenge that arises is how to scale and productionize these workloads cost effectively. In this webinar with Anyscale and AWS, you will learn how to leverage AWS accelerator instances, including AWS Inferentia, to reliably serve LLMs at scale using vLLM and Ray, all hosted on Amazon EKS. You’ll also...

Scalable and Cost Efficient AI Workloads with AWS and Anyscale

Scalable and Cost Efficient AI Workloads with AWS and Anyscale

Scalable and Cost Efficient AI Workloads with AWS and Anyscale

Просмотров 2942 месяца назад

Organizations are already making significant investments in the GenAI and LLMs space. Here at Anyscale, we work closely with leading companies like OpenAI, Canva, and DoorDash to enable their ML workloads. A common challenge that arises is how to scale and productionize GenAI and LLMs workloads cost-effectively. In this webinar with Anyscale and AWS, you will learn how to leverage cutting-edge ...

Anyscale Job Queues

Anyscale Job Queues

Anyscale Job Queues

Просмотров 1662 месяца назад

Newly available, Anyscale Job Queues enable multiple Ray Jobs to be executed on a shared cluster for batch “offline” workloads like data processing, model training, or batch inference. Job Queues make it easier than ever to streamline job scheduling and optimize resource allocation. Get started on Anyscale: consolte.anyscale.com

The Anyscale Unified Log Viewer

The Anyscale Unified Log Viewer

The Anyscale Unified Log Viewer

Просмотров 1712 месяца назад

With the Unified Log Viewer access and search logs to debug and optimize Ray applications. The Anyscale Unified Log Viewer gives users continuous persistent access to logs, simplifies the user interface, and integrates a scalable centralized system to reduce complexity and setup time. Enhanced with searchable attributes like instance ID or task / actor ID, simplifying searching and resolving is...

Anyscale Replica Compaction

Anyscale Replica Compaction

Anyscale Replica Compaction

Просмотров 2662 месяца назад

Learn how Anyscale Replica Compactions increases utilization and lowers cost by avoiding resource fragmentation. Resource fragmentation occurs when scaling activities from online model serving and inferencing lead to uneven resource utilization across nodes. As models scale up, new nodes may be launched. When traffic decreases and models scale down, some nodes may become underutilized, increasi...

Fast and Scalable Model Training with PyTorch and Ray

Fast and Scalable Model Training with PyTorch and Ray

Fast and Scalable Model Training with PyTorch and Ray

Просмотров 5453 месяца назад

Organizations are making substantial investments in GenAI and LLMs, and Anyscale is at the forefront of this innovation. Our Virtual AI Tutorial Series introduces core concepts of modern AI applications, emphasizing large-scale computing, cost-effectiveness, and ML models. In this webinar, we focus on distributed model training with PyTorch and Ray. You'll learn how to migrate your code from pu...

End-to-End LLM Workflows with Anyscale

End-to-End LLM Workflows with Anyscale

End-to-End LLM Workflows with Anyscale

Просмотров 1,2 тыс.3 месяца назад

Webinar to explore how a modern platform can support every stage of the AI app development lifecycle. Learn to build and scale end-to-end LLM workflows with Anyscale. Gain insight into the complete LLM lifecycle with fully runnable code, covering: 1. Data processing 2. Model fine-tuning 3. LLM evaluations and offline inference 4. Online inference for production traffic Blog post instructions to...

Meetup: Evaluating LLMs: Needle in a Haystack

Meetup: Evaluating LLMs: Needle in a Haystack

Meetup: Evaluating LLMs: Needle in a Haystack

Просмотров 1,4 тыс.7 месяцев назад

LLM evaluation is a discipline where confusion reigns and foundation model builders are effectively grading their own homework. Building on the viral threads on X/Twitter, Greg Kamradt, Robert Nishihara, and Jason Lopatecki discuss highlights from Arize AI's ongoing research on how major foundation models - from OpenAI’s GPT-4 to Mistral and Anthropic’s Claude - are stacking up against each ot...

Build a chat assistant fast using Canopy from Pinecone and Anyscale Endpoints

Build a chat assistant fast using Canopy from Pinecone and Anyscale Endpoints

Build a chat assistant fast using Canopy from Pinecone and Anyscale Endpoints

Просмотров 1,1 тыс.9 месяцев назад

This webinar will explore the challenges of building a chat assistant and how Canopy and Anyscale endpoints provide the fastest and easiest way to build your RAG based applications for free. We will go through the architecture, a real live example, and a guide on how to get started with building your own chat assistant. Canopy is a flexible framework built on top of the Pinecone vector database...

Elevate Your AI Applications with Anyscale and Ray: Simple, Scalable, Secure

Elevate Your AI Applications with Anyscale and Ray: Simple, Scalable, Secure

Elevate Your AI Applications with Anyscale and Ray: Simple, Scalable, Secure

Просмотров 1,1 тыс.10 месяцев назад

🚀 The AI Challenge: Explore the increasing scale and complexity needs in AI. 🌐 Anyscale Solutions: Introducing Anyscale Endpoints, Anyscale Private Endpoints, and the Anyscale Platform, each designed for different stages of AI adoption. 💡 Starting with Anyscale Endpoints: Learn how this API integrates popular AI models into your applications, offering customization and cost efficiency. 🛡️ Growi...

Ray Train: A Production-Ready Library for Distributed Deep Learning

Ray Train: A Production-Ready Library for Distributed Deep Learning

Ray Train: A Production-Ready Library for Distributed Deep Learning

Просмотров 2,6 тыс.10 месяцев назад

With the growing complexity of deep learning models and the emergence of Large Language Models (LLMs) and generative AI, scaling training efficiently and cost-effectively has become an urgent need. Enter Ray Train, a cutting-edge library designed specifically for seamless, production-ready distributed deep learning. In this talk, we will take a deep dive into the architecture of Ray Train, emph...

Gismo for Ray: A Multi-Node Shared Memory Object Store That Accelerates Ray Workloads

Gismo for Ray: A Multi-Node Shared Memory Object Store That Accelerates Ray Workloads

Gismo for Ray: A Multi-Node Shared Memory Object Store That Accelerates Ray Workloads

Просмотров 84211 месяцев назад

Ray is a powerful distributed computing framework. However, as data sets grow and computation requirements become more complex, managing memory usage across multiple computing nodes becomes increasingly challenging. Issues that slow down performance include the data copying between the computing nodes, data spilling out of memory into storage, and the data skew among computing nodes. We'll intr...

How to simplify execution of cloud-native model training & validation with CodeFlare: A HandsOn Demo

How to simplify execution of cloud-native model training & validation with CodeFlare: A HandsOn Demo

How to simplify execution of cloud-native model training & validation with CodeFlare: A HandsOn Demo

Просмотров 34311 месяцев назад

Join us for a hands-on demo of the CodeFlare-SDK, an open-source project that simplifies cloud-native data pre-processing, model training and validation with an intuitive Python interface to Ray, PyTorch/TorchX, and Kubernetes. With the CodeFlare-SDK, you can easily manage your cloud resources, submit jobs, and monitor job status, without worrying about the complexities of DevOps and cloud infr...

Building an Instant-On Serverless Platform for Large-Scale Data Processing Using Ray

Building an Instant-On Serverless Platform for Large-Scale Data Processing Using Ray

Building an Instant-On Serverless Platform for Large-Scale Data Processing Using Ray

Просмотров 39011 месяцев назад

AWS Glue has been pioneering in the space of automating ETL processes by providing a fully managed serverless data integration service. This service is a simple and cost-effective way for customers to categorize their data, clean it, enrich it, and move it swiftly and reliably between various data stores. AWS Glue is made up of a Data Catalog (i.e a metadata store), sophisticated ETL engines wi...

Developing and Serving RAG-Based LLM Applications in Production

Developing and Serving RAG-Based LLM Applications in Production

Developing and Serving RAG-Based LLM Applications in Production

Просмотров 20 тыс.11 месяцев назад

Developing and Serving RAG-Based LLM Applications in Production

NLP And The Future of Search With You.com

NLP And The Future of Search With You.com

NLP And The Future of Search With You.com

Просмотров 1,1 тыс.11 месяцев назад

NLP And The Future of Search With You.com

From Spark to Ray: An Exabyte-Scale Production Migration Case Study

From Spark to Ray: An Exabyte-Scale Production Migration Case Study

From Spark to Ray: An Exabyte-Scale Production Migration Case Study

Просмотров 2,3 тыс.11 месяцев назад

From Spark to Ray: An Exabyte-Scale Production Migration Case Study

Ray Scalability Deep Dive: The Journey to Support 4,000 Nodes

Ray Scalability Deep Dive: The Journey to Support 4,000 Nodes

Ray Scalability Deep Dive: The Journey to Support 4,000 Nodes

Просмотров 1 тыс.11 месяцев назад

Ray Scalability Deep Dive: The Journey to Support 4,000 Nodes

Ray Observability 2.0: How to Debug Your Ray Applications with New Observability Tooling

Ray Observability 2.0: How to Debug Your Ray Applications with New Observability Tooling

Ray Observability 2.0: How to Debug Your Ray Applications with New Observability Tooling

Просмотров 69811 месяцев назад

Ray Observability 2.0: How to Debug Your Ray Applications with New Observability Tooling

Modernizing DoorDash Model Serving Platform with Ray Serve

Modernizing DoorDash Model Serving Platform with Ray Serve

Modernizing DoorDash Model Serving Platform with Ray Serve

Просмотров 1,3 тыс.11 месяцев назад

Modernizing DoorDash Model Serving Platform with Ray Serve

Deploying Many Models Efficiently with Ray Serve

Deploying Many Models Efficiently with Ray Serve

Deploying Many Models Efficiently with Ray Serve

Просмотров 4,3 тыс.11 месяцев назад

Deploying Many Models Efficiently with Ray Serve

How Spotify Built a Robust Ray Platform with a Frictionless Developer Experience

How Spotify Built a Robust Ray Platform with a Frictionless Developer Experience

How Spotify Built a Robust Ray Platform with a Frictionless Developer Experience

Просмотров 75011 месяцев назад

How Spotify Built a Robust Ray Platform with a Frictionless Developer Experience

Scaling AI Health Assistants: Challenges and Solutions

Scaling AI Health Assistants: Challenges and Solutions

Scaling AI Health Assistants: Challenges and Solutions

Просмотров 25711 месяцев назад

Scaling AI Health Assistants: Challenges and Solutions

Forecasting Covid Infections for the UK's National Health Service using Ray and Kubernetes

Forecasting Covid Infections for the UK's National Health Service using Ray and Kubernetes

Forecasting Covid Infections for the UK's National Health Service using Ray and Kubernetes

Просмотров 16711 месяцев назад

Forecasting Covid Infections for the UK's National Health Service using Ray and Kubernetes

Supercharging self-driving algor dev w/ Ray: scaling sim workloads and democratizing autotuning@Zoox

Supercharging self-driving algor dev w/ Ray: scaling sim workloads and democratizing autotuning@Zoox

Supercharging self-driving algor dev w/ Ray: scaling sim workloads and democratizing autotuning@Zoox

Просмотров 24311 месяцев назад

Supercharging self-driving algor dev w/ Ray: scaling sim workloads and democratizing autotuning@Zoox

AI Factory Accelerating Solutions with Ray

AI Factory Accelerating Solutions with Ray

AI Factory Accelerating Solutions with Ray

Просмотров 49311 месяцев назад

AI Factory Accelerating Solutions with Ray

How Ray Empowered Ant Group to Deliver a Large-Scale Online Serverless Platform

How Ray Empowered Ant Group to Deliver a Large-Scale Online Serverless Platform

How Ray Empowered Ant Group to Deliver a Large-Scale Online Serverless Platform

Просмотров 25811 месяцев назад

How Ray Empowered Ant Group to Deliver a Large-Scale Online Serverless Platform

Python-centric AI Application Building in Minutes with Lepton and Ray

Python-centric AI Application Building in Minutes with Lepton and Ray

Python-centric AI Application Building in Minutes with Lepton and Ray

Просмотров 1,5 тыс.11 месяцев назад

Python-centric AI Application Building in Minutes with Lepton and Ray

On-Demand Ray Clusters in ML Workflows via KubeRay & Sematic

On-Demand Ray Clusters in ML Workflows via KubeRay & Sematic

On-Demand Ray Clusters in ML Workflows via KubeRay & Sematic

Просмотров 57311 месяцев назад

On-Demand Ray Clusters in ML Workflows via KubeRay & Sematic

Комментарии

@RollandWensman-s3y 7 часов назад
Stroman Walks
@ChaplinBobby-g7n День назад
Kunze Junctions
@LeonardBuck-s3l 4 дня назад
Declan Mews
@HelenJackson-r6n 4 дня назад
Lebsack Light
@SydneyThomson-p3y 5 дней назад
Tyrell Mountain
@KennethWilson-g4d 6 дней назад
Ella Burgs
@RafaelaKrahulec 8 дней назад
470 White Branch
@CarllyleLynn-b4y 9 дней назад
Thurman Terrace
@PhilippWillms 13 дней назад
Inspiring talk how to bring RL into industrial practice, thanks for sharing!
@FitzGeraldMamie-d6f 13 дней назад
Lenora Isle
@MaryTaylor-d8r 15 дней назад
Ettie Road
@MadgePapiernik-c6d 17 дней назад
Fae Harbors
@fenderbender28 20 дней назад
Excellent talk
@WyattWayne-g8w 22 дня назад
Magnus Ridges
@hugosonnery 28 дней назад
Thank you very much for this !
@jeevanbeniwal3019 28 дней назад
this talk can't be more good. Thanks Hao!!!
@felicialynch35663 29 дней назад
This was a really insightful presentation. It made me think about how different tools approach information retrieval. I've been using Myko Assistant recently, and its deep internet search feature really helps me find accurate answers quickly, especially compared to some others like Perplexity.
@keithschaub7863 Месяц назад
is the PDF all text? Does it have images, tables, graphs? And if so, how well does it convert?
@gabrielpreciado5699 Месяц назад
Impressive
@keshmesh123 Месяц назад
It was great. Thank you!
@talfranji Месяц назад
The first code slide contains an error. Profing your code example when aiming at software engineers is important :)
@Simeon1337 Месяц назад
Great vid
@MrEmbrance 2 месяца назад
no thanks
@Mohsenghq 2 месяца назад
Thanks for explaining simple, I migrated to RLlib and it's really efficient.
@ndamulelosbg8887 2 месяца назад
Great presentation. Just one question: What is relevance_score in this case? Is it an aggregation of grounding metrics for all reference examples?
@sherlockho4613 2 месяца назад
very helpful and distinguish presentation!
@elephantum 2 месяца назад
It should be noted, that since this talk, Anyscale deprecated Ray LLM and now recommend vLLM
@mattgrant4143 3 месяца назад
Hi! What are the chances we can get the source code for this demo!! Trying to learn about KubeRay myself, have a k8s cluster (with gpu and taints setup, but cant get kuberay to play nicely with them. my job just gets stuck w/ pending)
@tunglee4349 3 месяца назад
great content! thanks a lot!
@Emerson1 3 месяца назад
great video, that's a lot of useful features !
@JavierTorres-st7gt 3 месяца назад
How to protect a company's information with technology ?
@fantasyxpress7966 3 месяца назад
Thanks but what about scanned pdfs any way to handle the exceptions
@ReflectionOcean 4 месяца назад
By YouSum Live 00:00:11 Future of search innovation. 00:01:17 Linear story vs. reality of innovation. 00:02:37 Building solutions for personal challenges. 00:05:28 Importance of personal product usage. 00:06:02 Learning through personal experiences. 00:09:19 Iterative improvement and momentum. 00:10:34 Enhancing search with live knowledge integration. 00:13:20 Introduction of Copilot for interactive browsing. 00:18:06 Transition to a comprehensive research platform. 00:19:49 Importance of orchestration in complex systems. 00:20:41 Challenges in plugin and API integration reliability. 00:21:21 Evaluation metrics for generative search engines. 00:23:01 Continuous iteration and improvement for product success. 00:23:22 Fusion of Wikipedia and chat for deep topic exploration. 00:23:55 Development of faster and efficient llama models. 00:24:42 Customization of models for improved performance. 00:30:25 Balancing between open-source and proprietary models. 00:31:19 Control over pricing and business perspective in model development. By YouSum Live
@simbasrv30 4 месяца назад
If my models are unrelated and have no functional requirements to run together in a single application, can I still use Model composition in Ray serve to deploy multiple model in a single application providing a unified API endpoint (with different route for each model) for better resource utilisation and easier deployment? Is it a good practice? What about the security aspects and user authentications?
@Inceptionxg 4 месяца назад
I love the way how he shared the story
@Karthikprath 4 месяца назад
How do we calculate memory used by kv cache in paged attention.Example for input 500 and output 1000
@ashrafamad-ds8er 4 месяца назад
you are reading what is on the screen, do you think that you are "useful" ? or "informative "??! not at all
@TheAIEpiphany 4 месяца назад
Great talk and amazing work guys!
@StoryTimeWithX 4 месяца назад
Neat.
@vaporeon2822 4 месяца назад
Interesting sharings. Curious about the underlying implementation for KV blocks sharing part you have a copy-on-write mechanism, but how does it avoid dirty-read condition, where both request reads that ref count is 2 and both request copies the block simultaneously.
@LiangyueLi 5 месяцев назад
great work
@ailabinsev 5 месяцев назад
Great explanation!
@antonidabrowski4657 5 месяцев назад
Good content, thanks for your research
@dave_by_day7632 5 месяцев назад
When I use ray.init() it starts both the dashboard and a job at the same time. Is there a way to start the dashboard, then a separate command to start individual jobs?
@AnnerdeJong 5 месяцев назад
Considering the 300g ray vs spark comparison (~15m30s-18m30s) - the spark side seems to save all the prediction outputs (`...write...save()`), but I don't see that on the ray side (`for _ in ds.iter_batches(..: pass`). Does ray's 'iter_batches()` automatically dump outputs somewhere? (e.g. when specifying `batch_format='pyarrow'` does it get automatically cached or sth in the ray object store, or sth similar?) If not - I'd argue it's not an entirely fair apples-to-apples comparison?
@fenderbender28 5 месяцев назад
In his code, the spark writer is using format(“noop”) which means it’s not also persisting the outputs anywhere
@carrocesta 5 месяцев назад
I really like your approach, some libraries like keras are difficult to use
@arnony1 6 месяцев назад
Excellent, very educating
@sennetor 6 месяцев назад
First Impressions! So human. :)
@yukewang3164 6 месяцев назад
awesome talk, with useful insights!