Anyscale
Anyscale
  • Видео 324
  • Просмотров 509 988
Ray Summit 2024
We started Ray Summit 2024 off with a look at all of the amazing things the #Ray community is doing with AI. AI is built by Ray and powered by you!
Просмотров: 777

Видео

Optimizing LLM Inference with AWS Trainium, Ray, vLLM, and Anyscale
Просмотров 41621 день назад
Webinar Details Organizations are deploying LLMs for inference across many workloads. A common challenge that arises is how to scale and productionize these workloads cost effectively. In this webinar with Anyscale and AWS, you will learn how to leverage AWS accelerator instances, including AWS Inferentia, to reliably serve LLMs at scale using vLLM and Ray, all hosted on Amazon EKS. You’ll also...
Scalable and Cost Efficient AI Workloads with AWS and Anyscale
Просмотров 2942 месяца назад
Organizations are already making significant investments in the GenAI and LLMs space. Here at Anyscale, we work closely with leading companies like OpenAI, Canva, and DoorDash to enable their ML workloads. A common challenge that arises is how to scale and productionize GenAI and LLMs workloads cost-effectively. In this webinar with Anyscale and AWS, you will learn how to leverage cutting-edge ...
Anyscale Job Queues
Просмотров 1662 месяца назад
Newly available, Anyscale Job Queues enable multiple Ray Jobs to be executed on a shared cluster for batch “offline” workloads like data processing, model training, or batch inference. Job Queues make it easier than ever to streamline job scheduling and optimize resource allocation. Get started on Anyscale: consolte.anyscale.com
The Anyscale Unified Log Viewer
Просмотров 1712 месяца назад
With the Unified Log Viewer access and search logs to debug and optimize Ray applications. The Anyscale Unified Log Viewer gives users continuous persistent access to logs, simplifies the user interface, and integrates a scalable centralized system to reduce complexity and setup time. Enhanced with searchable attributes like instance ID or task / actor ID, simplifying searching and resolving is...
Anyscale Replica Compaction
Просмотров 2662 месяца назад
Learn how Anyscale Replica Compactions increases utilization and lowers cost by avoiding resource fragmentation. Resource fragmentation occurs when scaling activities from online model serving and inferencing lead to uneven resource utilization across nodes. As models scale up, new nodes may be launched. When traffic decreases and models scale down, some nodes may become underutilized, increasi...
Fast and Scalable Model Training with PyTorch and Ray
Просмотров 5453 месяца назад
Organizations are making substantial investments in GenAI and LLMs, and Anyscale is at the forefront of this innovation. Our Virtual AI Tutorial Series introduces core concepts of modern AI applications, emphasizing large-scale computing, cost-effectiveness, and ML models. In this webinar, we focus on distributed model training with PyTorch and Ray. You'll learn how to migrate your code from pu...
End-to-End LLM Workflows with Anyscale
Просмотров 1,2 тыс.3 месяца назад
Webinar to explore how a modern platform can support every stage of the AI app development lifecycle. Learn to build and scale end-to-end LLM workflows with Anyscale. Gain insight into the complete LLM lifecycle with fully runnable code, covering: 1. Data processing 2. Model fine-tuning 3. LLM evaluations and offline inference 4. Online inference for production traffic Blog post instructions to...
Meetup: Evaluating LLMs: Needle in a Haystack
Просмотров 1,4 тыс.7 месяцев назад
LLM evaluation is a discipline where confusion reigns and foundation model builders are effectively grading their own homework. ​Building on the viral threads on X/Twitter, Greg Kamradt, Robert Nishihara, and Jason Lopatecki discuss highlights from Arize AI's ongoing research on how major foundation models - from OpenAI’s GPT-4 to Mistral and Anthropic’s Claude - are stacking up against each ot...
Build a chat assistant fast using Canopy from Pinecone and Anyscale Endpoints
Просмотров 1,1 тыс.9 месяцев назад
This webinar will explore the challenges of building a chat assistant and how Canopy and Anyscale endpoints provide the fastest and easiest way to build your RAG based applications for free. We will go through the architecture, a real live example, and a guide on how to get started with building your own chat assistant. Canopy is a flexible framework built on top of the Pinecone vector database...
Elevate Your AI Applications with Anyscale and Ray: Simple, Scalable, Secure
Просмотров 1,1 тыс.10 месяцев назад
🚀 The AI Challenge: Explore the increasing scale and complexity needs in AI. 🌐 Anyscale Solutions: Introducing Anyscale Endpoints, Anyscale Private Endpoints, and the Anyscale Platform, each designed for different stages of AI adoption. 💡 Starting with Anyscale Endpoints: Learn how this API integrates popular AI models into your applications, offering customization and cost efficiency. 🛡️ Growi...
Ray Train: A Production-Ready Library for Distributed Deep Learning
Просмотров 2,6 тыс.10 месяцев назад
With the growing complexity of deep learning models and the emergence of Large Language Models (LLMs) and generative AI, scaling training efficiently and cost-effectively has become an urgent need. Enter Ray Train, a cutting-edge library designed specifically for seamless, production-ready distributed deep learning. In this talk, we will take a deep dive into the architecture of Ray Train, emph...
Gismo for Ray: A Multi-Node Shared Memory Object Store That Accelerates Ray Workloads
Просмотров 84211 месяцев назад
Ray is a powerful distributed computing framework. However, as data sets grow and computation requirements become more complex, managing memory usage across multiple computing nodes becomes increasingly challenging. Issues that slow down performance include the data copying between the computing nodes, data spilling out of memory into storage, and the data skew among computing nodes. We'll intr...
How to simplify execution of cloud-native model training & validation with CodeFlare: A HandsOn Demo
Просмотров 34311 месяцев назад
Join us for a hands-on demo of the CodeFlare-SDK, an open-source project that simplifies cloud-native data pre-processing, model training and validation with an intuitive Python interface to Ray, PyTorch/TorchX, and Kubernetes. With the CodeFlare-SDK, you can easily manage your cloud resources, submit jobs, and monitor job status, without worrying about the complexities of DevOps and cloud infr...
Building an Instant-On Serverless Platform for Large-Scale Data Processing Using Ray
Просмотров 39011 месяцев назад
AWS Glue has been pioneering in the space of automating ETL processes by providing a fully managed serverless data integration service. This service is a simple and cost-effective way for customers to categorize their data, clean it, enrich it, and move it swiftly and reliably between various data stores. AWS Glue is made up of a Data Catalog (i.e a metadata store), sophisticated ETL engines wi...
Developing and Serving RAG-Based LLM Applications in Production
Просмотров 20 тыс.11 месяцев назад
Developing and Serving RAG-Based LLM Applications in Production
NLP And The Future of Search With You.com
Просмотров 1,1 тыс.11 месяцев назад
NLP And The Future of Search With You.com
From Spark to Ray: An Exabyte-Scale Production Migration Case Study
Просмотров 2,3 тыс.11 месяцев назад
From Spark to Ray: An Exabyte-Scale Production Migration Case Study
Ray Scalability Deep Dive: The Journey to Support 4,000 Nodes
Просмотров 1 тыс.11 месяцев назад
Ray Scalability Deep Dive: The Journey to Support 4,000 Nodes
Ray Observability 2.0: How to Debug Your Ray Applications with New Observability Tooling
Просмотров 69811 месяцев назад
Ray Observability 2.0: How to Debug Your Ray Applications with New Observability Tooling
Modernizing DoorDash Model Serving Platform with Ray Serve
Просмотров 1,3 тыс.11 месяцев назад
Modernizing DoorDash Model Serving Platform with Ray Serve
Deploying Many Models Efficiently with Ray Serve
Просмотров 4,3 тыс.11 месяцев назад
Deploying Many Models Efficiently with Ray Serve
How Spotify Built a Robust Ray Platform with a Frictionless Developer Experience
Просмотров 75011 месяцев назад
How Spotify Built a Robust Ray Platform with a Frictionless Developer Experience
Scaling AI Health Assistants: Challenges and Solutions
Просмотров 25711 месяцев назад
Scaling AI Health Assistants: Challenges and Solutions
Forecasting Covid Infections for the UK's National Health Service using Ray and Kubernetes
Просмотров 16711 месяцев назад
Forecasting Covid Infections for the UK's National Health Service using Ray and Kubernetes
Supercharging self-driving algor dev w/ Ray: scaling sim workloads and democratizing autotuning@Zoox
Просмотров 24311 месяцев назад
Supercharging self-driving algor dev w/ Ray: scaling sim workloads and democratizing autotuning@Zoox
AI Factory Accelerating Solutions with Ray
Просмотров 49311 месяцев назад
AI Factory Accelerating Solutions with Ray
How Ray Empowered Ant Group to Deliver a Large-Scale Online Serverless Platform
Просмотров 25811 месяцев назад
How Ray Empowered Ant Group to Deliver a Large-Scale Online Serverless Platform
Python-centric AI Application Building in Minutes with Lepton and Ray
Просмотров 1,5 тыс.11 месяцев назад
Python-centric AI Application Building in Minutes with Lepton and Ray
On-Demand Ray Clusters in ML Workflows via KubeRay & Sematic
Просмотров 57311 месяцев назад
On-Demand Ray Clusters in ML Workflows via KubeRay & Sematic

Комментарии

  • @RollandWensman-s3y
    @RollandWensman-s3y 7 часов назад

    Stroman Walks

  • @ChaplinBobby-g7n
    @ChaplinBobby-g7n День назад

    Kunze Junctions

  • @LeonardBuck-s3l
    @LeonardBuck-s3l 4 дня назад

    Declan Mews

  • @HelenJackson-r6n
    @HelenJackson-r6n 4 дня назад

    Lebsack Light

  • @SydneyThomson-p3y
    @SydneyThomson-p3y 5 дней назад

    Tyrell Mountain

  • @KennethWilson-g4d
    @KennethWilson-g4d 6 дней назад

    Ella Burgs

  • @RafaelaKrahulec
    @RafaelaKrahulec 8 дней назад

    470 White Branch

  • @CarllyleLynn-b4y
    @CarllyleLynn-b4y 9 дней назад

    Thurman Terrace

  • @PhilippWillms
    @PhilippWillms 13 дней назад

    Inspiring talk how to bring RL into industrial practice, thanks for sharing!

  • @FitzGeraldMamie-d6f
    @FitzGeraldMamie-d6f 13 дней назад

    Lenora Isle

  • @MaryTaylor-d8r
    @MaryTaylor-d8r 15 дней назад

    Ettie Road

  • @MadgePapiernik-c6d
    @MadgePapiernik-c6d 17 дней назад

    Fae Harbors

  • @fenderbender28
    @fenderbender28 20 дней назад

    Excellent talk

  • @WyattWayne-g8w
    @WyattWayne-g8w 22 дня назад

    Magnus Ridges

  • @hugosonnery
    @hugosonnery 28 дней назад

    Thank you very much for this !

  • @jeevanbeniwal3019
    @jeevanbeniwal3019 28 дней назад

    this talk can't be more good. Thanks Hao!!!

  • @felicialynch35663
    @felicialynch35663 29 дней назад

    This was a really insightful presentation. It made me think about how different tools approach information retrieval. I've been using Myko Assistant recently, and its deep internet search feature really helps me find accurate answers quickly, especially compared to some others like Perplexity.

  • @keithschaub7863
    @keithschaub7863 Месяц назад

    is the PDF all text? Does it have images, tables, graphs? And if so, how well does it convert?

  • @gabrielpreciado5699
    @gabrielpreciado5699 Месяц назад

    Impressive

  • @keshmesh123
    @keshmesh123 Месяц назад

    It was great. Thank you!

  • @talfranji
    @talfranji Месяц назад

    The first code slide contains an error. Profing your code example when aiming at software engineers is important :)

  • @Simeon1337
    @Simeon1337 Месяц назад

    Great vid

  • @MrEmbrance
    @MrEmbrance 2 месяца назад

    no thanks

  • @Mohsenghq
    @Mohsenghq 2 месяца назад

    Thanks for explaining simple, I migrated to RLlib and it's really efficient.

  • @ndamulelosbg8887
    @ndamulelosbg8887 2 месяца назад

    Great presentation. Just one question: What is relevance_score in this case? Is it an aggregation of grounding metrics for all reference examples?

  • @sherlockho4613
    @sherlockho4613 2 месяца назад

    very helpful and distinguish presentation!

  • @elephantum
    @elephantum 2 месяца назад

    It should be noted, that since this talk, Anyscale deprecated Ray LLM and now recommend vLLM

  • @mattgrant4143
    @mattgrant4143 3 месяца назад

    Hi! What are the chances we can get the source code for this demo!! Trying to learn about KubeRay myself, have a k8s cluster (with gpu and taints setup, but cant get kuberay to play nicely with them. my job just gets stuck w/ pending)

  • @tunglee4349
    @tunglee4349 3 месяца назад

    great content! thanks a lot!

  • @Emerson1
    @Emerson1 3 месяца назад

    great video, that's a lot of useful features !

  • @JavierTorres-st7gt
    @JavierTorres-st7gt 3 месяца назад

    How to protect a company's information with technology ?

  • @fantasyxpress7966
    @fantasyxpress7966 3 месяца назад

    Thanks but what about scanned pdfs any way to handle the exceptions

  • @ReflectionOcean
    @ReflectionOcean 4 месяца назад

    By YouSum Live 00:00:11 Future of search innovation. 00:01:17 Linear story vs. reality of innovation. 00:02:37 Building solutions for personal challenges. 00:05:28 Importance of personal product usage. 00:06:02 Learning through personal experiences. 00:09:19 Iterative improvement and momentum. 00:10:34 Enhancing search with live knowledge integration. 00:13:20 Introduction of Copilot for interactive browsing. 00:18:06 Transition to a comprehensive research platform. 00:19:49 Importance of orchestration in complex systems. 00:20:41 Challenges in plugin and API integration reliability. 00:21:21 Evaluation metrics for generative search engines. 00:23:01 Continuous iteration and improvement for product success. 00:23:22 Fusion of Wikipedia and chat for deep topic exploration. 00:23:55 Development of faster and efficient llama models. 00:24:42 Customization of models for improved performance. 00:30:25 Balancing between open-source and proprietary models. 00:31:19 Control over pricing and business perspective in model development. By YouSum Live

  • @simbasrv30
    @simbasrv30 4 месяца назад

    If my models are unrelated and have no functional requirements to run together in a single application, can I still use Model composition in Ray serve to deploy multiple model in a single application providing a unified API endpoint (with different route for each model) for better resource utilisation and easier deployment? Is it a good practice? What about the security aspects and user authentications?

  • @Inceptionxg
    @Inceptionxg 4 месяца назад

    I love the way how he shared the story

  • @Karthikprath
    @Karthikprath 4 месяца назад

    How do we calculate memory used by kv cache in paged attention.Example for input 500 and output 1000

  • @ashrafamad-ds8er
    @ashrafamad-ds8er 4 месяца назад

    you are reading what is on the screen, do you think that you are "useful" ? or "informative "??! not at all

  • @TheAIEpiphany
    @TheAIEpiphany 4 месяца назад

    Great talk and amazing work guys!

  • @StoryTimeWithX
    @StoryTimeWithX 4 месяца назад

    Neat.

  • @vaporeon2822
    @vaporeon2822 4 месяца назад

    Interesting sharings. Curious about the underlying implementation for KV blocks sharing part you have a copy-on-write mechanism, but how does it avoid dirty-read condition, where both request reads that ref count is 2 and both request copies the block simultaneously.

  • @LiangyueLi
    @LiangyueLi 5 месяцев назад

    great work

  • @ailabinsev
    @ailabinsev 5 месяцев назад

    Great explanation!

  • @antonidabrowski4657
    @antonidabrowski4657 5 месяцев назад

    Good content, thanks for your research

  • @dave_by_day7632
    @dave_by_day7632 5 месяцев назад

    When I use ray.init() it starts both the dashboard and a job at the same time. Is there a way to start the dashboard, then a separate command to start individual jobs?

  • @AnnerdeJong
    @AnnerdeJong 5 месяцев назад

    Considering the 300g ray vs spark comparison (~15m30s-18m30s) - the spark side seems to save all the prediction outputs (`...write...save()`), but I don't see that on the ray side (`for _ in ds.iter_batches(..: pass`). Does ray's 'iter_batches()` automatically dump outputs somewhere? (e.g. when specifying `batch_format='pyarrow'` does it get automatically cached or sth in the ray object store, or sth similar?) If not - I'd argue it's not an entirely fair apples-to-apples comparison?

    • @fenderbender28
      @fenderbender28 5 месяцев назад

      In his code, the spark writer is using format(“noop”) which means it’s not also persisting the outputs anywhere

  • @carrocesta
    @carrocesta 5 месяцев назад

    I really like your approach, some libraries like keras are difficult to use

  • @arnony1
    @arnony1 6 месяцев назад

    Excellent, very educating

  • @sennetor
    @sennetor 6 месяцев назад

    First Impressions! So human. :)

  • @yukewang3164
    @yukewang3164 6 месяцев назад

    awesome talk, with useful insights!