Which Tokens You Predict Underlie the Reversal Curse and More
HTML-код
- Опубликовано: 29 сен 2024
- The Factorization Curse: Which Tokens You Predict Underlie the Reversal Curse and More
arxiv.org/abs/...
The other video of mine that I mentioned:
• The Pitfalls of Next T...
Support my learning journey either by clicking the Join button above, becoming a Patreon member, or a one-time Venmo!
/ tunadorable
account.venmo....
Discuss this stuff with other Tunadorks on Discord
/ discord
All my other links
linktr.ee/tuna...
That finding is mildly disconcerting. Doesn’t it imply that even at the higher layers of abstraction it doesn’t glean the concept of identity or simple IS-A relationships ? If that’s the case, then what else *isn’t* it understanding?
12:22 I actually really appreciate the quick break downs. I've been learning all about the guts of neural networks and the math behind them, but only in an ad hoc way through RUclips (cog sci background). The in-line banter and commentary is wonderful :)
Harrison Kinsley mentioned this when GPT3.5 was released that he first asked it "Who is Harrison Kinsley?" and it did not know but when he asked it "Who is SentDex it mentioned it s a channel run by Harrison Kinsley." So its probably safe to assume its a reversal curse.
lovin what you’re laying down on these paper overviews. Very interesting selections.
Don't know yet how masking works, I still need to study that one. But great video as always. I didn't know before that Reversal Curse is a thing before this vid.
This is so cool! Makes me excited for what's possible!
I think explanation is good when things haven't been covered before. Thanks for the video.
This further convinces me that we ought to be incorporating some sort of knowledge graph into LLMs.
Without any order enhanced pretraining, would still have the limitation if you consumed that KG using next-token prediction though…but I definitely agree with this sentiment in general.
Combine them with graph neural networks that take knowledge as input. AI could put relevant parts in the gnn input itself
@@BooleanDisorder true. I seen research on linearizing and tokenizing KG’s…with any order optimized pretraining, you would get the same benefit as combining a LM + GNN…with the added benefit of the scaling benefit of LM’s.
Last paragraph on p3 implies that “entity pre-parsing” would be one way around the issue. Does that mean training the model on parse trees instead of linear text order?
oi
Thanks for sharing! This solution along with a dynamic tokenizer that is allowed to have multiple tokens or multi symbol representations in it vocabulary that it is allowed to learn on the fly as it sees new input would be the way to go. I think the Tokenizer can even learn things at the level of lexical units so that the model only has to see abstractions it must solve.
The Wikireversal table of results was enlightening.
1. The fact that MLM-U trained model had a much more similar backwards vs forwards score, gives me confidence that its learning was probably more conceptual and relational vs pure memorization which we would expect if the learning was strongly influenced by the direction or chain of events.
oi
oi
Oi