Synthetic Data Generation and Fine tuning (OpenAI GPT4o or Llama 3)

How AI Cracked the Protein Folding Code and Won a Nobel Prize

Transformers (how LLMs work) explained visually | DL5

WE DECORATED MY HOUSE FOR CHRISTMAS ft: DESSXMX

Angelina Jolie & Cynthia Erivo | Actors on Actors

Mafia: The Old Country - The Initiation Trailer

Distillation of Transformer Models

Trelis Research

Просмотров 2,1 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 14 дек 2024

Комментарии • 17

@gregmeldrum 2 месяца назад ⁺³
Thank you for consistently producing such in-depth, informative content. Your long-format videos are a treasure trove of knowledge. Really appreciate the effort you put into making these detailed explanations!
@TrelisResearch 2 месяца назад
thanks greg
@sergerylenberg8711 2 месяца назад
Thank you! This is fascinating AND instructive. You have a true talent for explaining complex ideas.
@TrelisResearch 2 месяца назад
thanks!
@loicbaconnier9150 2 месяца назад
Always an excellent share, congratulations
@TrelisResearch 2 месяца назад ⁺¹
merci loic
@EternalKernel 2 месяца назад ⁺¹
Nice work. So glad you do such in depth processes. Question; in this video you go over distillation with the goal of keeping as much knowledge and functionality of the original model as possible. But what about if you really are only interested in a smaller domain of said functionality? i would assume instead of using 2% of whatever dataset you could use even fewer samples of a compatible dataset? You would end up with a very small very specialized model that may be better then the original at your specific domain?
Even better if I could train locally on a single 3090.
@TrelisResearch 2 месяца назад
Yes perhaps.
The thing is that the background knowledge may provide useful scaffolding for your smaller subset of knowledge.
My guess is that you should distill on 2% plus your subset of data.
And yes, if you are doing less than 1B models, then distilling on local hardware is possible. Much bigger is hard although perhaps - with galore approaches or adafactor - you could do a 4-5B modem
@danieladama8105 2 месяца назад
Nice 🔥🔥🔥
@TrelisResearch 2 месяца назад
cheers
@EternalKernel 2 месяца назад ⁺¹
How would distilation compare to archetecture search, when only concerned with a smaller domain. For instance in T2I only pictures of animals. Would it be less compute in total to find and train a NOVEL 100M param architecture vs a 4B param distilled model.
I feel like there is more work to be done in model archetecture.
@TrelisResearch 2 месяца назад ⁺¹
Well if the task you’re developing a model for is novel, you may not be able to distil.
However, maybe you could distill and then do fine tuning. Or do fine tuning and distill from that
@EternalKernel 2 месяца назад
@@TrelisResearch Thank you. The purpose of the exercise would be mainly to find a new layer or sub layer architecture for the same task as the original model.
@btaranto 2 месяца назад
Hi! What models do you recommend for coding smaller than 48gb? Do you have any fine-tuned?
@TrelisResearch 2 месяца назад ⁺¹
Check the latest qwen and deepseek models
@SiD-hq2fo 2 месяца назад
very helpful, thanks Trelis
also is there a discord server we can join and get connected
@TrelisResearch 2 месяца назад
there is, but - fair warning - it's paid lifetime access. You can find some free and paid options for support at trelis.com/about though .

Следующие

Автовоспроизведение

Synthetic Data Generation and Fine tuning (OpenAI GPT4o or Llama 3)

Synthetic Data Generation and Fine tuning (OpenAI GPT4o or Llama 3)

How AI Cracked the Protein Folding Code and Won a Nobel Prize

How AI Cracked the Protein Folding Code and Won a Nobel Prize

Transformers (how LLMs work) explained visually | DL5

Transformers (how LLMs work) explained visually | DL5

WE DECORATED MY HOUSE FOR CHRISTMAS ft: DESSXMX

WE DECORATED MY HOUSE FOR CHRISTMAS ft: DESSXMX

Angelina Jolie & Cynthia Erivo | Actors on Actors

Angelina Jolie & Cynthia Erivo | Actors on Actors

Mafia: The Old Country - The Initiation Trailer

Mafia: The Old Country - The Initiation Trailer

The Witcher 4 - Official Reveal Trailer | The Game Awards 2024

The Witcher 4 - Official Reveal Trailer | The Game Awards 2024

Test Time Compute, Part 1: Sampling and Chain of Thought

Test Time Compute, Part 1: Sampling and Chain of Thought

This Is The World's First Geared CVT and It Will Blow Your Mind - Ratio Zero Transmission

This Is The World's First Geared CVT and It Will Blow Your Mind - Ratio Zero Transmission

WOW! These new AI Tools are amazing!

WOW! These new AI Tools are amazing!

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

How AI 'Understands' Images (CLIP) - Computerphile

How AI 'Understands' Images (CLIP) - Computerphile

Fine tune and Serve Faster Whisper Turbo

Fine tune and Serve Faster Whisper Turbo

Microservices are Technical Debt

Microservices are Technical Debt

The Narrated Transformer Language Model

The Narrated Transformer Language Model

Attention in transformers, visually explained | DL6

Attention in transformers, visually explained | DL6

Dégage, Bébé ! 🦟👶 Le gadget parental pour sauver votre petit des ravages des moustiques !

Dégage, Bébé ! 🦟👶 Le gadget parental pour sauver votre petit des ravages des moustiques !

skibidi toilet multiverse 046

skibidi toilet multiverse 046

ВСЕ ГОВОРЯТ МЕНЯЙ RANGE ROVER НА Li 9. А МОЖЕТ И ПРАВДА ?

ВСЕ ГОВОРЯТ МЕНЯЙ RANGE ROVER НА Li 9. А МОЖЕТ И ПРАВДА ?

I wanted to respect them, but I didn't expect to be bullied!

I wanted to respect them, but I didn't expect to be bullied!

Я СБЕЖАЛ ИЗ СМЕРТЕЛЬНОЙ ЛАБОРАТОРИИ В МАЙНКРАФТЕ!

Я СБЕЖАЛ ИЗ СМЕРТЕЛЬНОЙ ЛАБОРАТОРИИ В МАЙНКРАФТЕ!

СБЕГАЕМ от ЗЛОЙ УЧИЛКИ 😱 СКУЛБОЙ в РЕАЛЬНОЙ ЖИЗНИ

СБЕГАЕМ от ЗЛОЙ УЧИЛКИ 😱 СКУЛБОЙ в РЕАЛЬНОЙ ЖИЗНИ

МЕНТ ВЫТАЛКИВАЕТ ИЗ МАГАЗИНА и ОБВИНЯЕТ ПОТЕРПЕВШЕГО! НАКИНУЛИСЬ НА ОПЕРАТОРА. Всё куплено? ФИНАЛ 4Ч

МЕНТ ВЫТАЛКИВАЕТ ИЗ МАГАЗИНА и ОБВИНЯЕТ ПОТЕРПЕВШЕГО! НАКИНУЛИСЬ НА ОПЕРАТОРА. Всё куплено? ФИНАЛ 4Ч

Женская супер-сила 😂 #ComedyClub #КамедиКлаб #харламов #тнт4 #тнт #демискарибидис #богатство #кравец

Женская супер-сила 😂 #ComedyClub #КамедиКлаб #харламов #тнт4 #тнт #демискарибидис #богатство #кравец