DeepSpeed: All the tricks to scale to gigantic models

Поделиться
HTML-код
  • Опубликовано: 8 янв 2025

Комментарии • 20

  • @Emily-p8e5q
    @Emily-p8e5q Год назад +1

    Thanks mark!. You have been helping me understand concepts better.

  • @darrenbrien
    @darrenbrien 3 года назад +5

    Thanks Mark great vid. Good update on SOTA in distributed training since horovod

  • @mekaneeky
    @mekaneeky Год назад +3

    Thanks Mark! Quite a thorough and useful explanation.

  • @randolphzeng6051
    @randolphzeng6051 Год назад +2

    Thanks for such an inspiring and insightful video. What a knowledge feast to enjoy !

  • @sandraviknander7898
    @sandraviknander7898 3 года назад +3

    If you just add a pair of aviator sunglasses then this is a Yannic Kilcher video. Instant 100k sub upgrade.
    Jokes aside, this was a great explanation of a great library!

  • @saratbhargavachinni5544
    @saratbhargavachinni5544 Год назад +1

    Great Video Mark! A few corrections, A100 is available in 40 GB and 80 GB variants.

  • @shivangsharma1
    @shivangsharma1 Месяц назад

    Loved it , Thanks for making it

  • @adriangabriel3219
    @adriangabriel3219 2 года назад +3

    Hi Mark, great vid. Could you make a video on how to fine-tune large transformer models (e.g. T5 B-11) without running into CUDA errors?

    • @marksaroufim
      @marksaroufim  2 года назад +4

      Great suggestion! Yes I’ll do it

    • @adriangabriel3219
      @adriangabriel3219 2 года назад +1

      @@marksaroufim great! There is a lot information about fine-tuning T-5 base , but not about fine-tuning models above T-5 base

    • @JordanArsenaultYT
      @JordanArsenaultYT 2 года назад

      @@adriangabriel3219 Did you ever get t5-11b working?

  • @vini8123
    @vini8123 4 месяца назад

    I tried to train a model that has embedding layer having vocab size of 100 million and embedding dim 128 on a 3 A100 80GiB Gpus with deepspeed (zero stage 3, offloading parameters and optimizers to cpu) but it fails with cuda Out of memory error 😢

  • @user-wp8yx
    @user-wp8yx Год назад

    Nice explanation, but how to do in ooba?

  • @limitlesslife7536
    @limitlesslife7536 Год назад

    amazing!

  • @Georgesbarsukov
    @Georgesbarsukov Год назад

    You're looking at RAM, not vRAM btw.

  • @AndersOland
    @AndersOland Год назад

    A 2080ti with 30 gigs? 🤭 If only my 4090 had that much RAM 😅

  • @juliusvalentinas
    @juliusvalentinas 3 месяца назад

    A100 gpu is 30k usd, is this offloading all theoretical nonsense? Where is apps that allow to run actual llama 3.1 on one or two 3090? Offloading non used stuff on nvme ssd?