Chain-of-Verification (COVE) Reduces Hallucination in Large Language Models - Paper Explained

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits and BitNet

LoRA explained (and a bit about precision and quantization)

Kendrick Lamar. Super Bowl LIX Halftime Show

Dude Perfect vs Mark Rober: Battle Bots

Squishy Makeovers Fixing Your Squishies #37

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits - Paper Explained

DataMListic

Просмотров 2,2 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 7 сен 2024

Комментарии • 12

@mohammedrumaan2704 5 месяцев назад ⁺¹
Lovely work. Keep em coming!!
@datamlistic 5 месяцев назад ⁺¹
Thanks! Surely will do! Stay tuned for the new content. :)
@jeffg4686 5 месяцев назад ⁺¹
Whoever figured out that one needs to get PAID. Possibly billions saved.
@datamlistic 5 месяцев назад ⁺¹
Most likely those that discover such things are paid generously, not billions, but still. :)
@vlastimilmartinek9800 5 месяцев назад ⁺¹
Good walkthrough! One thing i don't understand is how they can backpropagate through that RoundClip function when it's not differentiable.
@datamlistic 5 месяцев назад ⁺¹
Thanks for the feedback! I don't think that's mentioned in the paper, but this Reddit post explains what happens during backprop: www.reddit.com/r/MachineLearning/comments/1b22izk/comment/ksjphhj/?
@Jdog1681 5 месяцев назад ⁺¹
Can someone smart explain why we evaluate log2 (3)?
@datamlistic 5 месяцев назад
Well, surely I'm not the smartest one around here, but here's my explanation: in information theory, by using the entropy (see the first equation here: en.wikipedia.org/wiki/Entropy_(information_theory)), you can measure the expected value of information, in bits, that you need to represent a random variable.
In BitNet 1.58, the random variable can take three values {-1, 0, 1}, each equally likely. If you put the numbers into the entropy equation, you get -1/3 * log2(1/3) - 1/3*log2(1/3) - 1/3*log2(1/3) = -log2(1/3) = log2(3) bits necessary to represent that random variable.
Hope that helps! :)
@nyx211 3 месяца назад ⁺¹
A binary number containing N bits can represent 2^N possible values. So if you want to represent 3 possible values (3 = 2^N) then N = log2(3).
@bobsmithy3103 5 месяцев назад
10:48 i doesn't out perform llama, they just bolded their results which is a bit misleading
@datamlistic 5 месяцев назад
Thanks for noticing that and pointing it out! Well, maybe not on the all tasks, but the average seems to be higher for BitNet1.58. Anyway, the fact that they bolded only their results is kinda misleading, I genuinely didn't notice that.
@datamlistic 5 месяцев назад
The paper explained series can be found here: ruclips.net/p/PL8hTotro6aVHhn5QUB3HDJTu3rPJ48LeP

Следующие

Автовоспроизведение

Chain-of-Verification (COVE) Reduces Hallucination in Large Language Models - Paper Explained

Chain-of-Verification (COVE) Reduces Hallucination in Large Language Models - Paper Explained

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits and BitNet

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits and BitNet

LoRA explained (and a bit about precision and quantization)

LoRA explained (and a bit about precision and quantization)

Kendrick Lamar. Super Bowl LIX Halftime Show

Kendrick Lamar. Super Bowl LIX Halftime Show

Dude Perfect vs Mark Rober: Battle Bots

Dude Perfect vs Mark Rober: Battle Bots

Squishy Makeovers Fixing Your Squishies #37

Squishy Makeovers Fixing Your Squishies #37

Rich Homie Quan Passes Away At Age 34 🕊️ Rappers & Celebs React Quavo, Ralo, 2 Chainz, Boosie

Rich Homie Quan Passes Away At Age 34 🕊️ Rappers & Celebs React Quavo, Ralo, 2 Chainz, Boosie

The fastest matrix multiplication algorithm

The fastest matrix multiplication algorithm

Visualizing Neural Network Training and Predictions: A Universal Function Approximator

Visualizing Neural Network Training and Predictions: A Universal Function Approximator

LLaMa GPTQ 4-Bit Quantization. Billions of Parameters Made Smaller and Smarter. How Does it Work?

LLaMa GPTQ 4-Bit Quantization. Billions of Parameters Made Smaller and Smarter. How Does it Work?

UNCENSORED Mistral v0.2 Dolphin LLM - Won't Refuse Anything!

UNCENSORED Mistral v0.2 Dolphin LLM - Won't Refuse Anything!

Will NVIDIA Survive The Era of 1-Bit LLMs?

Will NVIDIA Survive The Era of 1-Bit LLMs?

Vector Database Search - Hierarchical Navigable Small Worlds (HNSW) Explained

Vector Database Search - Hierarchical Navigable Small Worlds (HNSW) Explained

How AI 'Understands' Images (CLIP) - Computerphile

How AI 'Understands' Images (CLIP) - Computerphile

Floating Points are no more, Changes everything for LLMs!!!

Floating Points are no more, Changes everything for LLMs!!!

Direct Preference Optimization: Your Language Model is Secretly a Reward Model | DPO paper explained

Direct Preference Optimization: Your Language Model is Secretly a Reward Model | DPO paper explained

👆🏻Жми на «МЫ поехали в Питер…» и смотри 1 из 48 видео про мою жизнь

👆🏻Жми на «МЫ поехали в Питер…» и смотри 1 из 48 видео про мою жизнь

不出意外的话女孩都喜欢两种男生，一种强势霸道，一种温柔体贴，你喜欢哪种类型？#最温柔男医生

不出意外的话女孩都喜欢两种男生，一种强势霸道，一种温柔体贴，你喜欢哪种类型？#最温柔男医生

ПРИКОЛЫ НАД БРАТОМ #shorts

ПРИКОЛЫ НАД БРАТОМ #shorts

ПРОСТИ МЕНЯ, АСХАБ ТАМАЕВ

ПРОСТИ МЕНЯ, АСХАБ ТАМАЕВ

😱СБЕЖАЛ С СОСЕДОМ ОТ ЗЛЫХ РОДИТЕЛЕЙ в SCHOOLBOY RUNAWAY в Майнкрафт..

😱СБЕЖАЛ С СОСЕДОМ ОТ ЗЛЫХ РОДИТЕЛЕЙ в SCHOOLBOY RUNAWAY в Майнкрафт..

С ПОДРУГОЙ В СТАРОСТИ В ОТПУСКЕ

С ПОДРУГОЙ В СТАРОСТИ В ОТПУСКЕ

В конце дочь Мии Бойки? 😱👩🏻‍🎤 #виола #шортс

В конце дочь Мии Бойки? 😱👩🏻‍🎤 #виола #шортс

Популярные сквиши 🐾😱 #виола #шортс

Популярные сквиши 🐾😱 #виола #шортс