Это видео недоступно.
Сожалеем об этом.

FlashAttention-3 is Here

Поделиться
HTML-код
  • Опубликовано: 15 авг 2024
  • This video introduces FlashAttention-3 which uses asynchrony to perform multiple operations simultaneously. By overlapping the main operations like matrix multiplications and softmax, it ensures the GPU stays busy and efficient.
    🔥 Buy Me a Coffee to support the channel: ko-fi.com/fahd...
    🔥 Get 50% Discount on any A6000 or A5000 GPU rental, use following link and coupon:
    bit.ly/fahd-mirza
    Coupon code: FahdMirza
    ▶ Become a Patron 🔥 - / fahdmirza
    #flashattention #flashattention3 #softmax #matmul
    PLEASE FOLLOW ME:
    ▶ LinkedIn: / fahdmirza
    ▶ RUclips: / @fahdmirza
    ▶ Blog: www.fahdmirza.com
    RELATED VIDEOS:
    ▶ Resource pytorch.org/bl...
    All rights reserved © Fahd Mirza

Комментарии • 4

  • @xspydazx
    @xspydazx Месяц назад

    maybe you should do a video on mass compute ? and how to set it up for training etc? - it seems expensive ? it has the machines but it seems expensive ? is that what you use for your tutorials ?
    does it have a colab ? or is it vscode ?

  • @MrMoonsilver
    @MrMoonsilver Месяц назад

    The question is... will it work on a 3090?

    • @alfinal5787
      @alfinal5787 Месяц назад

      No, it’s for Hopper only. The algorithm is completely different and uses support from newer features .