Colin Raffel: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Attention in transformers, visually explained | DL6

PEGASUS Explained!

Duramax Diesel "Extreme" Tune and Allison Transmission Service (My Going Ta' Town Rig!)

Vermont vs. Marshall: 2024 NCAA men’s soccer championship highlights

I GOT BULLIED INTO CUTTING MY HAIR :(

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Connor Shorten

Просмотров 16 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 10 янв 2025
Наука

Комментарии •

@connor-shorten 4 года назад ⁺⁷
2:00 Pushing the NLP State-of-the-Art
2:40 Text-to-Text Framework
3:28 Factors of Variation Explored
5:00 Value of Pre-Training
5:25 Attention Masking
6:18 Architecture Results
7:02 Denoising Objectives
8:47 Span Corruption Strategy
9:45 Self-Supervised Learning Study Overview
11:14 Datasets
12:24 Dataset Size
12:56 Fine-Tuning Strategy
14:25 Task Imbalance
15:20 Pre-Train, then Fine-Tune
16:26 How should we use extra computation?
18:47 Scaling up to 11B parameters
19:30 What Didn’t Make the List
22:08 Context-Free Question Answering
@vatsalkrishna5627 Год назад ⁺⁴
I never expected to learn so much from one single video. Amazing work presenting the paper in such a nuanced way!
@emanuelgerber 3 года назад ⁺⁵
Thank you! This helped me a lot to understand all the different aspects of T5
@connor-shorten 3 года назад
Thanks Emanuel, really glad to hear that!
@BiancaAguglia 4 года назад ⁺¹
You're getting better and better at explaining these papers, Connor. Great job. Also, I enjoyed the conversation on the Machine Learning Street Talk channel. Looking forward to seeing more videos there too. 😊
I've decided to start studying NLP in a more organized manner (right now I have some intuition about how it works, but not much theoretical or practical knowledge.) I'll be watching your NLP videos when I need a productive break from my studies. 😊
P.S. I'm embarrassed to admit that only today I found out your first name was Connor. For some reason I thought it was Henry.
@SantoshGupta-jn1wn 4 года назад
These videos are amazing, thanks Henry
@TimScarfe 4 года назад ⁺³
Amazing job Connor!
@connor-shorten 4 года назад
Thanks Tim!
@MakerBen 4 года назад ⁺¹
Thanks for posting this! This is super helpful!
@connor-shorten 4 года назад
Thank you so much! Glad you found this useful!
@---kt8cs 4 года назад
Thank you, sir, your videos are gold!
@tommykelly6840 2 года назад
What is the difference between iid mask tokens and Bert Style mask tokens
@justinmilner8 Год назад
Is 'deshuffling' really an accurate description of the XLNet pre-training objective? To me, deshuffling indicates prediction of the order of tokens within the text - which is not matching with my understanding of XLNet's pretraining objective.
@justinmilner8 Год назад
Yes I can confirm now that the deshuffling objective referred to in the T5 paper is not referencing XLNet's permutation masking objecive. (Deshuffling it is cited to SummAE in the T5 paper)
@L33TNINJA51 4 года назад ⁺³
A little hard to follow as someone who hasn't learned much about AI, but still enjoy your videos!
@hoangnhatpham8076 4 года назад
The target audience of these kinds of videos isn't supposed to be someone who hasn't learned much about AI anyway.
@L33TNINJA51 4 года назад
@@hoangnhatpham8076 I guess I just need to get to learning and stop being just a fanboy :)
@bikideka7880 4 года назад ⁺¹
I think it is hard to follow the videos for someone who do not encounter with research papers(like me).
@heinsaar 4 года назад
Thanks for sharing! It would be wonderful if you could get a better mic though. The laptop mic has a very unpleasant echo.
@salimbo4577 4 года назад ⁺¹
how much time does it take you guys to read a research paper and what parts do you read. because everytime i try to read one i strart loosing focus, any tips pls help
@zeinramadan 3 года назад ⁺³
read the paper in 3 different passes:
1) In your first pass, start with reading the following sections within the paper: title, abstract and figures.
2) The second pass entails you reading the following sections: introduction, conclusion, another pass through figures and scan through the rest of the content.
⁃ The introduction and conclusion section of a paper contains clear and concise information on the content of the paper and a summary of any findings. The information presented in this section usually dismisses any supplementary information and only key information are included. This is beneficial to you as a reader as you get the vital information required to proceed to the other sections within the paper.
3) The third pass of the paper involves reading the whole sections within the paper but skipping any complicated maths or technique formulations that might be alien to you
⁃During this pass, you can also skip any terms and terminologies that you do not understand or aren’t familiar.
Check out this article by Andrew Ng about how to efficiently read papers: towardsdatascience.com/how-you-should-read-research-papers-according-to-andrew-ng-stanford-deep-learning-lectures-98ecbd3ccfb3
@rohitghule9437 4 года назад ⁺³
Why so fast
@forcedlevy 4 года назад ⁺²
Watch at 0.75 speed
@taku8751 4 года назад
amazing
@dislike__button 3 года назад
I still don't understand how did they combine training on the C4 dataset and all the task specific datasets (squad etc).
What role did the C4 datset play? How did they turn the raw text data of C4 into a input output task to train on?
Would be grateful if someone could explain, thanks.

Следующие

Автовоспроизведение

Colin Raffel: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Colin Raffel: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Attention in transformers, visually explained | DL6

Attention in transformers, visually explained | DL6

PEGASUS Explained!

PEGASUS Explained!

Duramax Diesel "Extreme" Tune and Allison Transmission Service (My Going Ta' Town Rig!)

Duramax Diesel "Extreme" Tune and Allison Transmission Service (My Going Ta' Town Rig!)

Vermont vs. Marshall: 2024 NCAA men’s soccer championship highlights

Vermont vs. Marshall: 2024 NCAA men’s soccer championship highlights

I GOT BULLIED INTO CUTTING MY HAIR :(

I GOT BULLIED INTO CUTTING MY HAIR :(

UPSET ALERT! Jaime Munguia Gets KNOCKED OUT By Bruno Surace | FIGHT HIGHLIGHTS

UPSET ALERT! Jaime Munguia Gets KNOCKED OUT By Bruno Surace | FIGHT HIGHLIGHTS

Why Does Diffusion Work Better than Auto-Regression?

Why Does Diffusion Work Better than Auto-Regression?

Longformer: The Long-Document Transformer

Longformer: The Long-Document Transformer

ELECTRA: Pre-Training Text Encoders as Discriminators Rather than Generators

ELECTRA: Pre-Training Text Encoders as Discriminators Rather than Generators

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

NLP Demystified 15: Transformers From Scratch + Pre-training and Transfer Learning With BERT/GPT

NLP Demystified 15: Transformers From Scratch + Pre-training and Transfer Learning With BERT/GPT

Transformers (how LLMs work) explained visually | DL5

Transformers (how LLMs work) explained visually | DL5

Reformer: The Efficient Transformer

Reformer: The Efficient Transformer

[1hr Talk] Intro to Large Language Models

[1hr Talk] Intro to Large Language Models

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Страх каждого геймера? 😥 Envonix AV6

Страх каждого геймера? 😥 Envonix AV6

Как Сделать Идеальный Сервопривод из любого BLDC мотора | Векторное управление | Робособака №1

Как Сделать Идеальный Сервопривод из любого BLDC мотора | Векторное управление | Робособака №1

Никогда не покупайте ПК на Ozon! (часть 3) [полная версия] #топ3хреновыхпк

Никогда не покупайте ПК на Ozon! (часть 3) [полная версия] #топ3хреновыхпк

Сколько стоит IPhone на родине Samsung?

Сколько стоит IPhone на родине Samsung?

Проблема 2038 года. Старые ПК реально ПЕРЕСТАНУТ РАБОТАТЬ?

Проблема 2038 года. Старые ПК реально ПЕРЕСТАНУТ РАБОТАТЬ?

1080P high-definition resolution is brighter, faster and more fun to meet the projection needs o

1080P high-definition resolution is brighter, faster and more fun to meet the projection needs o

КАК СОБРАТЬ КОМПЬЮТЕР 2025? ПОДРОБНЫЙ ГАЙД ШАГ ЗА ШАГОМ ДЛЯ НОВИЧКОВ! СОБИРАЙТЕ ПК КАК ПРОФЕССИОНАЛ

КАК СОБРАТЬ КОМПЬЮТЕР 2025? ПОДРОБНЫЙ ГАЙД ШАГ ЗА ШАГОМ ДЛЯ НОВИЧКОВ! СОБИРАЙТЕ ПК КАК ПРОФЕССИОНАЛ

Smart Appliances! New Gadgets, Versatile Utensils, Tool Items #shorts #gadgets

Smart Appliances! New Gadgets, Versatile Utensils, Tool Items #shorts #gadgets