Deep Dive into LLMs like ChatGPT

Главные мифы об Украине. Выпуск, который не понравится всем

LoRA explained (and a bit about precision and quantization)

Tornado touches down in Santa Cruz County, several injured

How To Get Dragon Race Part 1 + Full Guide In Blox Fruits Update 24

AMAD WORLD CLASS! MAN CITY 1-2 MAN UTD GOLDBRIDGE MATCH REACTION

Reinforcement Learning for LLMs in 2025

Trelis Research

Просмотров 1,5 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 10 фев 2025

Комментарии •

@NaturalMelodicHarmonic 12 часов назад
Great video. Deffo think model size is what is hampering meaningful results.
@TrelisResearch 2 часа назад
Yeah I’m gonna try ablate bigger. This is a key Q
@thanartchamnanyantarakij9950 20 часов назад
Waiting for this!
@TheLokiGT 17 часов назад
Very good content, as usual.
@TrelisResearch 17 часов назад ⁺¹
many thanks Loki
@biochemcompsci 15 часов назад
Top Notch content. Thank you.
@TrelisResearch 15 часов назад
Cheers, many thanks
@gileneusz 17 часов назад
very good explanation of current RL approaches, SOTA video about it. Ideas for improvement: you can do some kind of presentation giving more insights about RL and fine tuning, I remember you did something like that in the past, but maybe updated version on DeepSeek approaches. Maybe stages on how to do a model and then fine tune it so it would be able to reason but with no-code presentation form.
@TrelisResearch 17 часов назад
good points, yeah will aim to do that in the follow-on
@gileneusz 17 часов назад
@@TrelisResearch there were some news that Berkeley reaserchers recreated "aha moment" for $30, you can also do video on that (just sharing ideas on videos, not demanding them lol)
@rajaakhil588 16 часов назад
Your in-depth content surpasses that of many other RUclipsrs. A tutorial demonstrating computer use model training using reinforcement learning and simulated UI would be highly valuable. Would a (GRPO) approach be suitable for this image-inclusive data? Finally, to enhance the reasoning process, could we incorporate a "tool_call" tag enabling LLMs to utilize tools during reasoning, rather than solely in the answer phase?
@TrelisResearch 2 часа назад
That’s a cool idea and I’ll add to my list of potential ideas.
Yea you can add tools. It does make eval a bit harder because now there can be stochasticity in the tool. But broadly a good idea
@TemporaryForstudy 20 часов назад
Nice man, much needed after deepseek. I am gonna watch and do hands on. Hey, do you have any job for AI engineer? may be someone in your network? Please let me know I want to do remote work
@TrelisResearch 19 часов назад ⁺¹
check trelis.com for developer collaborations, which are the path way to joining the Trelis team
@akshayvasisht 16 часов назад
can u pls make a video of applying RL to vision llms
@TrelisResearch 2 часа назад
Good idea. Will add to my list
@Little-bird-told-me 21 час назад
Could ORPO’s balance of cross-entropy and odds ratios make it a more stable alternative to PPO-based RLHF? Also, does the beta parameter generalize across models, or does it require fine-tuning?
@TrelisResearch 18 часов назад
I'll talk more about this in the next video but the x-entropy inclusion kind of serves a similar role to the KL divergence in GRPO or PPO (which keeps the model grounded towards original weights).
Beta does not generalise all that well in my experience and needs tuning. Somewhere between 0.2 and 0.5

Следующие

Автовоспроизведение

Deep Dive into LLMs like ChatGPT

Deep Dive into LLMs like ChatGPT

Главные мифы об Украине. Выпуск, который не понравится всем

Главные мифы об Украине. Выпуск, который не понравится всем

LoRA explained (and a bit about precision and quantization)

LoRA explained (and a bit about precision and quantization)

Tornado touches down in Santa Cruz County, several injured

Tornado touches down in Santa Cruz County, several injured

How To Get Dragon Race Part 1 + Full Guide In Blox Fruits Update 24

How To Get Dragon Race Part 1 + Full Guide In Blox Fruits Update 24

AMAD WORLD CLASS! MAN CITY 1-2 MAN UTD GOLDBRIDGE MATCH REACTION

AMAD WORLD CLASS! MAN CITY 1-2 MAN UTD GOLDBRIDGE MATCH REACTION

BLACK BAG - Official Trailer [HD] - Only in Theaters March 14

BLACK BAG - Official Trailer [HD] - Only in Theaters March 14

Better not Bigger: Distilling LLMs into Specialized Models

Better not Bigger: Distilling LLMs into Specialized Models

C can do this too and it's faster than Python

C can do this too and it's faster than Python

Transformers (how LLMs work) explained visually | DL5

Transformers (how LLMs work) explained visually | DL5

Advanced Embedding Models and Techniques for RAG

Advanced Embedding Models and Techniques for RAG

DeepSeek is a Game Changer for AI - Computerphile

DeepSeek is a Game Changer for AI - Computerphile

Developing an LLM: Building, Training, Finetuning

Developing an LLM: Building, Training, Finetuning

How Deepseek v3 made Compute and Export Controls Less Relevant

How Deepseek v3 made Compute and Export Controls Less Relevant

Fine-tuning Large Language Models (LLMs) | w/ Example Code

Fine-tuning Large Language Models (LLMs) | w/ Example Code

Workshop: building agents and multi-agent systems with AgentWorkflow in LlamaIndex

Workshop: building agents and multi-agent systems with AgentWorkflow in LlamaIndex

Подземелья Чикен Карри #35 Второй Шанс (Сулим, Макс +100500, Гоген Солнцев, Гудков, БРБ)

Подземелья Чикен Карри #35 Второй Шанс (Сулим, Макс +100500, Гоген Солнцев, Гудков, БРБ)

Тигровая черепаха

Тигровая черепаха

Книжный клуб. Глава последняя. [Приключения Тома Сойера. М. Твен]

Книжный клуб. Глава последняя. [Приключения Тома Сойера. М. Твен]

🤣 Хитрый мужик "оштрафовал" гаишников за нарушение! | Новостничок

🤣 Хитрый мужик "оштрафовал" гаишников за нарушение! | Новостничок

НАВИ - СПИРИТ, МАТЧ НА МИЛЛИОН ЗРИТЕЛЕЙ! КАТОВИЦЕ 2025

НАВИ - СПИРИТ, МАТЧ НА МИЛЛИОН ЗРИТЕЛЕЙ! КАТОВИЦЕ 2025

БИТВА БЛОГЕРОВ - ДЕРЖИМ ТОП-1 ПРОТИВ LEBWA TEAM [ВСЕ В ИГРУ!]

БИТВА БЛОГЕРОВ - ДЕРЖИМ ТОП-1 ПРОТИВ LEBWA TEAM [ВСЕ В ИГРУ!]

У МЕНЯ ЕСТЬ ДОЧЬ? СНОВА СТАЛА МАМОЙ!

У МЕНЯ ЕСТЬ ДОЧЬ? СНОВА СТАЛА МАМОЙ!