RoPE Rotary Position Embedding to 100K context length

Поделиться
HTML-код
  • Опубликовано: 22 май 2024
  • ROPE - Rotary Position Embedding explained in simple terms for calculating the self attention in Transformers with a relative position encoding for extended Context lengths of LLMs.
    All rights w/ authors:
    ROFORMER: ENHANCED TRANSFORMER WITH ROTARY POSITION EMBEDDING (RoPE)
    arxiv.org/pdf/2104.09864
    #airesearch
    #aiexplained
  • НаукаНаука

Комментарии • 8

  • @desmur36
    @desmur36 Месяц назад

    Amazing content! The explanations are SO clear! Thank you!

  • @LamontCranston-qh2rv
    @LamontCranston-qh2rv Месяц назад +1

    Thank you SO MUCH for providing such high quality conten! Very much enjoying all your many videos! If you have a chance, I'd love to see you discuss the recent work in giving AI spatial reasoning. I.e. artificial "imagination". (In it's natural form, very much a core feature of human thought.) Perhaps one might think about the creation of a "right brain" to go along with the "left brain" language models we have now? (Please forgive the over-simplification of human neuroscience.) Thanks again! All the best to you sincerely!

  • @AYUSHSINGH-db6ev
    @AYUSHSINGH-db6ev Месяц назад

    Hi Sir! Really love your videos! How can we access your presentation slides?

  • @mshonle
    @mshonle Месяц назад

    If one rotation is good, how about going into three dimensional rotations and using quaternions? Is there any work using that?

  • @paratracker
    @paratracker Месяц назад +1

    Maybe it's obvious to YOU that the solution is that complex exponential, but I wish you hadn't assumed that WE would all see that as self-evident as you do.

    • @code4AI
      @code4AI  Месяц назад +9

      I see what you mean. You know, I spend some days to find simple explanations for the not so self explanatory RoPE algo, especially I will build on this in my second video, and then we examine more complex, more recent ideas about RoPE. I decided for a way, that will enable my audience to understand the main ideas and methods, and go from there. I recorded 90 min for the second part, and currently I cut it to max 60 min, striking a balance of providing insights for all my viewers. I'll try harder ....

  • @hangjianyu
    @hangjianyu 10 дней назад

    there is a mistake, smaller dimensions change more quickly,and large dimensions change more slowly