Can Wikipedia Help Offline Reinforcement Learning? (Paper Explained)

Yannic Kilcher

Просмотров 12 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 31 янв 2025

Комментарии • 39

@YannicKilcher 2 года назад ⁺⁶
OUTLINE:
0:00 - Intro
1:35 - Paper Overview
7:35 - Offline Reinforcement Learning as Sequence Modelling
12:00 - Input Embedding Alignment & other additions
16:50 - Main experimental results
20:45 - Analysis of the attention patterns across models
32:25 - More experimental results (scaling properties, ablations, etc.)
37:30 - Final thoughts
Paper: arxiv.org/abs/2201.12122
Code: github.com/machelreid/can-wikipedia-help-offline-rl
My Video on Decision Transformer: ruclips.net/video/-buULmf7dec/видео.html
@lucasbeyer2985 2 года назад ⁺³⁴
Thanks for bringing this format back. I much prefer listening to Yannic explain-rambling a paper than him preparing me to an interview with authors. Please keep being critical/opinionated, even if in the back of your mind you know you'll "face" the authors later on.
@jameswhite2133 2 года назад ⁺³⁵
Really like this format of interviewing the creators of the paper after review.
@vzxvzvcxasd7109 2 года назад ⁺³
It's almost like the peer review process, but better
@mgostIH 2 года назад ⁺⁸
It reminds me of when we discussed the perceiver paper which could handle multiple modalities (including RL) but they separated them as different tasks, maybe the approach we should follow is to throw at transformers literally every problem we can think of and let them learn information across multiple domains
@jawadmansoor6064 2 года назад
Good idea, need a lab to do that.
@purplesquirrelindustries5003 2 года назад ⁺¹
I really like having the medium or long paper overview followed by an interview with the author. Whether it's split into two videos or not doesn't matter too much to me--might as well do whichever is better for making the algorithm happy. ;)
@vzxvzvcxasd7109 2 года назад
I love that with this new format, you can explore alot of fringe papers, test the boundaries more but in a constructive way!
@ChaiTimeDataScience 2 года назад
Yannic knows we love to binge his videos over the weekend!
@paxdriver 2 года назад
Perfect format, one of a kind on the internet at large. 👌
@deepvalue363 2 года назад
Since these reviews are so useful, I'd suggest to release them as soon as they're ready and not to wait for authors to watch it first and arranging an interview with them.
@techtam3505 2 года назад
The decay lambda 1 seems reasonable as the goal is to create linear projections of state,reward, and action and match it to the embedding space of the transformers. so, decay is used here to make sure that the update to input projections doesn't happen for full training loop, making sure that similar input projections has similar input embeddings.
@pensiveintrovert4318 2 года назад
The benefit appears to reside in the initialization values. Pretraining creates a smoother manifold which can be then be morphed smoothly to adapt it to a new task.
@midna8031 2 года назад ⁺¹
But then why would Image-GPT and CLIP perform worse than GPT?
@bogdan_has_fun 2 года назад ⁺²
Shorter videos without authors -- 😍
Yellow pre-comments and green comments -- 🔥
Thanks!
@Kram1032 2 года назад ⁺¹
Since they tried it on CLIP and that didn't work so well, I'd love to see how CM3 would do in this regard, combining structured seq2seq website language-and-image modelling.
@raihaan1819 2 года назад
Off the bat, this idea sounds fun!
@shreyassrinivasa5983 3 месяца назад
This works because language is a sequence model. Use a video transformer, it will work well
@DistortedV12 2 года назад ⁺³
Okay, I don't know if you guys know but the main author is only 17 years old and this paper won a best paper award at EMNLP.
@a00z24 2 года назад
I don't think this is correct. EMNLP 2022 didn't happen yet (due in December 2022) and in 2021 winner papers are here: 2021.emnlp.org/blog/2021-10-29-best-paper-awards
Machel Reid and Yutaro Yamada have a nontrivial number of papers from 2020 and are mature enough to not require "age" as a differentiating factor.
@sidaliu8989 2 года назад
This is interesting. According to Figure 2. Attention analysis, the action basically only attended to previous states, then what if we just throw away all previous actions and rewards, just keep the previous states? 😉 23:19
@004307ec 2 года назад ⁺¹
If I remember coorrectly, there is one paper suggesting frozen pre-trained transformer on text also works on image classification?
@YannicKilcher 2 года назад ⁺¹
yes, which is why it is a bit surprising that they found it didn't work when they froze the transformer, and also that there seems to be limited transfer from iGPT
@Supreme_Lobster 2 года назад ⁺²
Wouldn't that mean that language is a function that models (approximates) reality itself?
@paxdriver 2 года назад
Is it possible somehow the Wikipedia anchors or page tags help supervise the language model and vice versa? The formulaic and encoded approach to wiki publication maybe leading the model in some latent ways?
@jabowery 2 года назад
Yeah this is one of the reasons for the Hutter Prize specifying Wikipedia clear back in 2006. We have been arguing ever since then with people who don't believe that language models are all that relevant to modeling the physical world. But this is a consequence of using algorithmic information (Kolmogorov Complexity) approximation for unsupervised model selection. If any of the big boys were serious they would back the Hutter Prize with orders of magnitude more money.
@serta5727 2 года назад ⁺¹
I grew up with Wikipedia. So I am interested with what the paper will show.
@erickmarin6147 2 года назад
Can you machine intuition your way to selecting a model from a set based on the task, to converge cheaply to a solution?
@drdca8263 2 года назад ⁺¹
The question that comes to my mind is what if you pretrain with one of the language models which are primarily trained for filling in words rather than the autoregressive thing? Like BERT I think?
Assuming that would even make any sense, which I don’t know.
Like, can you try to use BERT autoregressively even though it wasn’t trained for it and get something which isn’t completely garbage?
Like, if you just mask out all the future tokens even though BERT expects to only have only one or a few tokens masked out?
What I’m saying here might be confused.
@YannicKilcher 2 года назад ⁺³
It's not stupid. Many people have looked into using BERT for text generation, either just decoding autoregressively, or actually training it like that, but results are not very good.
@drdca8263 2 года назад
@@YannicKilcher Thanks!
@wagbagsag 2 года назад ⁺¹
If their goal is to assess whether language model pretraining is better than image pretraining, then they should be using the same architecture for both! Comparing GPT2 to IGPT is useless. The idea is cool, but this paper is a letdown.
These types of papers require so much compute that only places like Google can foot the bill. You'd think they'd spend a little more time ironing out their argument before then crank up all those TPUs...
@serta5727 2 года назад
Ok now I can imagine it more that the transformer can treat language similarly too playing games
@serta5727 2 года назад ⁺²
AI gets plus 50 buff 💪 points because retraining with Wikipedia was very effective 🥇
@boffo25 2 года назад
This sounds like one of those ideas born drunk in a pub.
@Mrbits01 2 года назад ⁺¹
Criticisms are way easier to come by, but absolutely unreliable results. These guys just thought "eh what do we have to lose" and just made a paper out of it. It's like training a audio classification 2D-CNNs pretrained imagenet CNNs: it works, but very unreliable. Hard pass on this paper.
@DistortedV12 2 года назад
#Blackhistorymonth
@kimchi_taco 2 года назад
The title is phishing. I imagined ML learns world model from Wikipedia?
Title should be "Can LM pretrain Help Offline Reinforcement Learning?"

Следующие

Автовоспроизведение

Can Wikipedia Help Offline Reinforcement Learning? (Author Interview)