RoPE Rotary Position Embedding to 100K context length

Anthropic's new improved RAG: Explained (for all LLM)

LIQUID AI 40B (MIT): REAL Performance on Reasoning (My 5 Tests)

Throne And Liberty: A Warning To New Players

Overnight in 6 Micro Trampoline Houses!

LongRoPE & Theta Scaling to 1 Mio Token (2/2)

Discover AI

Просмотров 1,5 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 30 сен 2024
LongRoPE & Theta Extrapolation Scaling of RoPE for extreme context length explained - in scientific detail. To increase the context lengths of modern LLMs, we evaluate the performance and methods of LongRope and Theta Extrapolation /Scaling for extreme context length extensions. From 8K to 4M context length for a Llama 3-7B LLM.
Rope encoding works well within the training context length but faces challenges when the sequence length during inference exceeds the training length, leading to a performance drop. This is primarily because the positional encodings become out-of-distribution (OOD), causing instability in the attention mechanism.
To overcome this issue, theta scaling is introduced. The idea is to adjust the "rotary base," which is a key parameter in RoPE. By increasing this base value, the model can extend its effective context length, allowing it to handle longer sequences more accurately. This adjustment aligns the positional encodings with the longer input texts, improving the model's ability to extrapolate and maintain performance.
Interestingly, decreasing the rotary base can also enhance the model's extrapolation capabilities. By doing so, the positional encodings are more tightly packed, ensuring that the model can fully learn the positional patterns within the training context. This approach helps the model generalize better to longer sequences beyond its training data. Both increasing and decreasing the rotary base offer ways to extend the context length that RoPE-based models can handle effectively, providing a versatile solution to improve their performance on longer texts.
#airesearch
#aieducation

Комментарии • 4

@MattJonesYT 4 месяца назад ⁺⁴
Cutting edge stuff, this is great!!
@joelvalim 4 месяца назад ⁺¹
it seems they are doing the very opposite to quantize. (I am being very visual here ok?). Quantize is kind of squashing preserving proportions and shape. LongRoPE seems to act as a kind of hologramatic projection.... and a little bit of a hamer to adjust the edges... The final fine tuning would be a way to fill the voids created by the projection, which is imperfect by nature, cause it would be able to project a shadow, not a perfect picture. Final fine tuning would fill these voids, conecting the points in that weak blue print created by the rescaled new hiper dimensional space.
@manslaughterinc.9135 4 месяца назад ⁺¹
On the topic of attention and context, would love to see a video on Needle-in-a-hastack and multi-needle-in-a-haystack performance of these different kinds of context extension approaches.
@simonstrandgaard5503 4 месяца назад ⁺¹
Excellent topic. Fine tuning with a longer context length.

Следующие

Автовоспроизведение

RoPE Rotary Position Embedding to 100K context length

RoPE Rotary Position Embedding to 100K context length

Anthropic's new improved RAG: Explained (for all LLM)

Anthropic's new improved RAG: Explained (for all LLM)

LIQUID AI 40B (MIT): REAL Performance on Reasoning (My 5 Tests)

LIQUID AI 40B (MIT): REAL Performance on Reasoning (My 5 Tests)

Throne And Liberty: A Warning To New Players

Throne And Liberty: A Warning To New Players

Overnight in 6 Micro Trampoline Houses!

Overnight in 6 Micro Trampoline Houses!

TVA: Nolichucky Dam failure is imminent, could cause life-threatening flooding

TVA: Nolichucky Dam failure is imminent, could cause life-threatening flooding

Why Do LLM’s Have Context Limits? How Can We Increase the Context? ALiBi and Landmark Attention!

Why Do LLM’s Have Context Limits? How Can We Increase the Context? ALiBi and Landmark Attention!

Multi-Scale Insight Agents for Advanced AI Reasoning (Stanford)

Multi-Scale Insight Agents for Advanced AI Reasoning (Stanford)

How to code long-context LLM: LongLoRA explained on LLama 2 100K

How to code long-context LLM: LongLoRA explained on LLama 2 100K

Why Chain-of-Thought Isn't Enough & Google's SCoRe Method Explained

Why Chain-of-Thought Isn't Enough & Google's SCoRe Method Explained

Space-Time: The Biggest Problem in Physics

Space-Time: The Biggest Problem in Physics

Extending Context Window of Large Language Models via Positional Interpolation Explained

Extending Context Window of Large Language Models via Positional Interpolation Explained

This is why Deep Learning is really weird.

This is why Deep Learning is really weird.

ChatGPT Position and Positional embeddings: Transformers & NLP 3

ChatGPT Position and Positional embeddings: Transformers & NLP 3

OYUNCAK DİREKSİYON İLE ARABAYI SÜRDÜ 😱

OYUNCAK DİREKSİYON İLE ARABAYI SÜRDÜ 😱

Эминем предупреждал о Дидди #эминем #eminem #diddy #дидди #разоблачение

Эминем предупреждал о Дидди #эминем #eminem #diddy #дидди #разоблачение

Разбитый горем Лещенко О ПОСЛЕДНИХ МИНУТАХ ЖИЗНИ Добрынина

Разбитый горем Лещенко О ПОСЛЕДНИХ МИНУТАХ ЖИЗНИ Добрынина

إخفاء الطعام سرًا تحت الطاولة للتناول لاحقًا 😏🍽️

إخفاء الطعام سرًا تحت الطاولة للتناول لاحقًا 😏🍽️

"ВОТ БЫЛА ЖИЗНЬ В ДУШАНБЕ!" / таджикский дедушка снова скучает по СССР / ссылка на серию в описании

"ВОТ БЫЛА ЖИЗНЬ В ДУШАНБЕ!" / таджикский дедушка снова скучает по СССР / ссылка на серию в описании

#慧慧很努力#家庭搞笑#生活#亲子#记录

#慧慧很努力#家庭搞笑#生活#亲子#记录

Лучше одной, чем с такими

Лучше одной, чем с такими