AI can't cross this line and we don't know why.

Can ChatGPT o1-preview Solve PhD-level Physics Textbook Problems?

Flow Matching for Generative Modeling (Paper Explained)

I’m Leaving The Ninja Fam!!

CELTICS vs NUGGETS | NBA ABU DHABI GAMES | FULL GAME HIGHLIGHTS | October 4, 2024

Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters (Paper)

Yannic Kilcher

Просмотров 4,9 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 5 окт 2024
Наука

Комментарии • 37

@mshonle 12 часов назад ⁺¹⁷
I tried reading this paper three times but then decided it would have been more optimal if they doubled the number of scientists writing it…
@guccigav7912 12 часов назад ⁺¹
lol same
@ultrasound1459 7 часов назад ⁺²
They didn't share any code 🔴❌️
@tingtingin 4 часа назад ⁺⁵
He's alive!
@kikijuju4809 11 часов назад ⁺³
Long time no see
@akanjiemmanuel4807 13 часов назад ⁺²
Interesting paper
@existenceisillusion6528 23 минуты назад ⁺¹
Are we sure a* is not a type-o that should have been y*?
Also, best of weighted N beam majority?
@ChocolateMilkCultLeader 12 часов назад ⁺²
My goat is back
@Veptis 9 часов назад
Will have to check the whole video later. But I think IBM has had a somewhat similar paper recently. about the training rate changing based on epoch/mini batch performance on the benchmark or something. It's called scheduler something
@MasamuneX 13 часов назад ⁺³
what if we use monty carlo tree search on tree of thought llms then we just keep the highest quality output and train a new foundation model on that synthetic data and repeat until asi
@montymemoladi8067 12 часов назад ⁺²
Sounds like a promising approach and I think its reasonably close to what the big labs are planning to do
@AtAtaylor 11 часов назад
People have already done this
@scoffpickle9655 11 часов назад ⁺¹
Or just use something similar to Thinker:learning to plan and act to kinda (predict) a few tokens ahead which might increase quality
@Adhil_parammel 24 минуты назад
Oracle to guide and reach asi required.
@benedictsmith2415 Минуту назад
Equation 1 just serves as a theoretical foundation for the "compute-optimal" concept but it cannot be directly used for optimization because:
Intractability: Finding the truly optimal hyperparameters θ across all possible prompts and compute budgets a*(q) would require an exhaustive search.....
Unknown Ground Truth: In a real-world setting, we don't know the ground-truth correct answer y*(q) for unseen prompt, so directly optimizing the indicator function is impossible.
@islandfireballkill 13 часов назад ⁺⁴
Wake up, babe. New Yannic video just dropped.
@MinecraftJuiceHD 3 часа назад
Isn't beam search done per token? Why does yannic say that they grade the answers?
@KadeemSometimes 13 часов назад ⁺¹
Nice
@TheAIEpiphany 5 часов назад
21:48 What can be unburdened by what has been
@gileneusz 12 часов назад
he's the best
@aa-xn5hc 4 часа назад
Please the news back!
@LysergicKids 12 часов назад
It can't be, a new paper that's not 98% marketing wank? Is the world healing, brothers
@csabaczcsomps7655 46 минут назад
I think wath you want. When a kid see you put one apple than put one more he will answer we have 2. So we write 1+1=2. Then he will take notation always as true wthitout recall the apple video. This mean some training need 2 module, video then video-notation asociotion. And probable use notation is 3 step. My noob opinion.
@nineteenfortyeight6762 9 часов назад ⁺²
Why in the name of all that's holy are we asking an LLM to do arithmetic?? 😭
@hunterkudo9832 7 часов назад ⁺¹
Because being able to do arithmetic is a good indicator of being able to reason. We want LLMs to be good reasoners because a lot of tasks in the real world will require LLMs and soon AI agents to reason like a human can.
@HUEHUEUHEPony 3 часа назад ⁺¹
Because not all of us are interested in roleplay slop
@fontenbleau 12 часов назад ⁺¹
Python is just dead end pathway. One guy on RUclips writes neural network in Assembly low-level language and it's 500 times faster than Pytorch on 1 CPU core on one same task. We need full rewrite of networks and models.
@scoffpickle9655 11 часов назад ⁺¹
Please tell me who made that. It seems so interesting
@scoffpickle9655 11 часов назад ⁺¹
Also yeah, C or C++ is better for actually useful and fast models, python is good for modularity and prototyping but god it is so fucking slow
@biomerl 11 часов назад ⁺²
Wat? 99 percent of training is done on gpu which is already cpp
@scoffpickle9655 10 часов назад
@biomerl Yeah sorry I dont have much knowledge on low level ML
@kennycommentsofficial 9 часов назад
@@scoffpickle9655easiest starting place is search youtube for matrix multiplication with cuda (basically just c code)
@ozordiprince9405 13 часов назад
200 views in 15 minutes. Bro fell off

Следующие

Автовоспроизведение

AI can't cross this line and we don't know why.

AI can't cross this line and we don't know why.

Can ChatGPT o1-preview Solve PhD-level Physics Textbook Problems?

Can ChatGPT o1-preview Solve PhD-level Physics Textbook Problems?

Flow Matching for Generative Modeling (Paper Explained)

Flow Matching for Generative Modeling (Paper Explained)

I’m Leaving The Ninja Fam!!

I’m Leaving The Ninja Fam!!

CELTICS vs NUGGETS | NBA ABU DHABI GAMES | FULL GAME HIGHLIGHTS | October 4, 2024

CELTICS vs NUGGETS | NBA ABU DHABI GAMES | FULL GAME HIGHLIGHTS | October 4, 2024

Our Annual Family Pumpkin Carving Challenge...

Our Annual Family Pumpkin Carving Challenge...

Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

Fine-Tuning, RAG, or Prompt Engineering? The Ultimate LLM Showdown Explained!

Fine-Tuning, RAG, or Prompt Engineering? The Ultimate LLM Showdown Explained!

Why Vertical LLM Agents Are The New $1 Billion SaaS Opportunities

Why Vertical LLM Agents Are The New $1 Billion SaaS Opportunities

Noam Chomsky - Why Does the U.S. Support Israel?

Noam Chomsky - Why Does the U.S. Support Israel?

Rich Sutton, Toward a better Deep Learning

Rich Sutton, Toward a better Deep Learning

The Value of Source Code

The Value of Source Code

The Problem With Microservices

The Problem With Microservices

llm.c's Origin and the Future of LLM Compilers - Andrej Karpathy at CUDA MODE

llm.c's Origin and the Future of LLM Compilers - Andrej Karpathy at CUDA MODE

The Best RAG Technique Yet? Anthropic’s Contextual Retrieval Explained!

The Best RAG Technique Yet? Anthropic’s Contextual Retrieval Explained!

Скучнее iPhone еще не было!

Скучнее iPhone еще не было!

Купил FAKE гаджеты на 100 000₽ | Apple Vision, iPhone, Playstation

Купил FAKE гаджеты на 100 000₽ | Apple Vision, iPhone, Playstation

Subscribe for more Coding Tips!🔥 #javascript #coding #programming #school #codelife

Subscribe for more Coding Tips!🔥 #javascript #coding #programming #school #codelife

🤯 20x Zoom on Samsung S24 Ultra 🔥 🤫 🙆‍♂️ #s24ultra #Samsung #shortfeed #shorts #shortsvideos #trend

🤯 20x Zoom on Samsung S24 Ultra 🔥 🤫 🙆‍♂️ #s24ultra #Samsung #shortfeed #shorts #shortsvideos #trend

🌑 Магнит невидимый враг вашей электронной техники Никогда так не делайте #Shorts Игорь Белецкий

🌑 Магнит невидимый враг вашей электронной техники Никогда так не делайте #Shorts Игорь Белецкий

Bu telefonda oyun oynamak ister misiniz?

Bu telefonda oyun oynamak ister misiniz?

Samsung s23 ultra Zoom test video | Natural View Rainy Day

Samsung s23 ultra Zoom test video | Natural View Rainy Day

Apple designers updating the iPhone design 😫

Apple designers updating the iPhone design 😫