Sliding Window Attention (Longformer) Explained

Поделиться
HTML-код
  • Опубликовано: 29 июл 2024
  • In this video we talk about the sliding window attention, the diluted sliding window attention and the global+sliding window attention, as introduced in the Longformer paper. We take a look at the main disadvantage of the classical attention mechanism introduced in the Transformer paper (i.e. the quadratic time complexity) and how the sliding window attention proposes to solves this issue.
    References
    ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
    Transformer Self-Attention Mechanism Explained: • Transformer Self-Atten...
    "Longformer: The long-document transformer" paper: arxiv.org/abs/2004.05150
    Related Videos
    ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
    BART model explained: • BART Explained: Denois...
    Why Language Models Hallucinate: • Why Language Models Ha...
    Grounding DINO, Open-Set Object Detection: • Object Detection Part ...
    Detection Transformers (DETR), Object Queries: • Object Detection Part ...
    Wav2vec2 A Framework for Self-Supervised Learning of Speech Representations - Paper Explained: • Wav2vec2 A Framework f...
    Transformer Self-Attention Mechanism Explained: • Transformer Self-Atten...
    How to Fine-tune Large Language Models Like ChatGPT with Low-Rank Adaptation (LoRA): • How to Fine-tune Large...
    Multi-Head Attention (MHA), Multi-Query Attention (MQA), Grouped Query Attention (GQA) Explained: • Multi-Head Attention (...
    LLM Prompt Engineering with Random Sampling: Temperature, Top-k, Top-p: • LLM Prompt Engineering...
    Contents
    ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
    00:00 - Intro
    00:26 - Original attention mechanism
    00:50 - Sliding window attention
    01:56 - Dilated sliding window attention
    02:40 - Global + Sliding window attention
    03:31 - Outro
    Follow Me
    ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
    🐦 Twitter: @datamlistic / datamlistic
    📸 Instagram: @datamlistic / datamlistic
    📱 TikTok: @datamlistic / datamlistic
    Channel Support
    ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
    The best way to support the channel is to share the content. ;)
    If you'd like to also support the channel financially, donating the price of a coffee is always warmly welcomed! (completely optional and voluntary)
    ► Patreon: / datamlistic
    ► Bitcoin (BTC): 3C6Pkzyb5CjAUYrJxmpCaaNPVRgRVxxyTq
    ► Ethereum (ETH): 0x9Ac4eB94386C3e02b96599C05B7a8C71773c9281
    ► Cardano (ADA): addr1v95rfxlslfzkvd8sr3exkh7st4qmgj4ywf5zcaxgqgdyunsj5juw5
    ► Tether (USDT): 0xeC261d9b2EE4B6997a6a424067af165BAA4afE1a
    #slidingwindowattention #longformer #attentionmechanism

Комментарии • 4

  • @datamlistic
    @datamlistic  3 месяца назад +2

    "Transformer Self-Attention Mechanism Explained" video link: ruclips.net/video/u8pSGp__0Xk/видео.html

  • @miketoreno8371
    @miketoreno8371 3 месяца назад

    Best

  • @mutantrabbit767
    @mutantrabbit767 3 месяца назад

    this was super helpful, thanks so much!

    • @datamlistic
      @datamlistic  3 месяца назад

      You're welcome! Happy to hear you found it helpful! :)