Which Tokens You Predict Underlie the Reversal Curse and More

Поделиться
HTML-код
  • Опубликовано: 29 сен 2024
  • The Factorization Curse: Which Tokens You Predict Underlie the Reversal Curse and More
    arxiv.org/abs/...
    The other video of mine that I mentioned:
    • The Pitfalls of Next T...
    Support my learning journey either by clicking the Join button above, becoming a Patreon member, or a one-time Venmo!
    / tunadorable
    account.venmo....
    Discuss this stuff with other Tunadorks on Discord
    / discord
    All my other links
    linktr.ee/tuna...

Комментарии • 19

  • @andrewsilber
    @andrewsilber 3 месяца назад +4

    That finding is mildly disconcerting. Doesn’t it imply that even at the higher layers of abstraction it doesn’t glean the concept of identity or simple IS-A relationships ? If that’s the case, then what else *isn’t* it understanding?

  • @OpenSourceAnarchist
    @OpenSourceAnarchist 3 месяца назад +2

    12:22 I actually really appreciate the quick break downs. I've been learning all about the guts of neural networks and the math behind them, but only in an ad hoc way through RUclips (cog sci background). The in-line banter and commentary is wonderful :)

  • @mickelodiansurname9578
    @mickelodiansurname9578 3 месяца назад +1

    Harrison Kinsley mentioned this when GPT3.5 was released that he first asked it "Who is Harrison Kinsley?" and it did not know but when he asked it "Who is SentDex it mentioned it s a channel run by Harrison Kinsley." So its probably safe to assume its a reversal curse.

  • @kevon217
    @kevon217 2 месяца назад

    lovin what you’re laying down on these paper overviews. Very interesting selections.

  • @jakeaustria5445
    @jakeaustria5445 3 месяца назад

    Don't know yet how masking works, I still need to study that one. But great video as always. I didn't know before that Reversal Curse is a thing before this vid.

  • @Tolken00
    @Tolken00 3 месяца назад +1

    This is so cool! Makes me excited for what's possible!

  • @marcfruchtman9473
    @marcfruchtman9473 3 месяца назад

    I think explanation is good when things haven't been covered before. Thanks for the video.

  • @andybrice2711
    @andybrice2711 3 месяца назад +6

    This further convinces me that we ought to be incorporating some sort of knowledge graph into LLMs.

    • @alexanderbrown-dg3sy
      @alexanderbrown-dg3sy 3 месяца назад +2

      Without any order enhanced pretraining, would still have the limitation if you consumed that KG using next-token prediction though…but I definitely agree with this sentiment in general.

    • @BooleanDisorder
      @BooleanDisorder 3 месяца назад

      Combine them with graph neural networks that take knowledge as input. AI could put relevant parts in the gnn input itself

    • @alexanderbrown-dg3sy
      @alexanderbrown-dg3sy 3 месяца назад

      @@BooleanDisorder true. I seen research on linearizing and tokenizing KG’s…with any order optimized pretraining, you would get the same benefit as combining a LM + GNN…with the added benefit of the scaling benefit of LM’s.

  • @aboubenadhem9066
    @aboubenadhem9066 3 месяца назад

    Last paragraph on p3 implies that “entity pre-parsing” would be one way around the issue. Does that mean training the model on parse trees instead of linear text order?

  • @sikunowlol
    @sikunowlol 3 месяца назад

    oi

  • @wwkk4964
    @wwkk4964 3 месяца назад +1

    Thanks for sharing! This solution along with a dynamic tokenizer that is allowed to have multiple tokens or multi symbol representations in it vocabulary that it is allowed to learn on the fly as it sees new input would be the way to go. I think the Tokenizer can even learn things at the level of lexical units so that the model only has to see abstractions it must solve.

    • @wwkk4964
      @wwkk4964 3 месяца назад +1

      The Wikireversal table of results was enlightening.
      1. The fact that MLM-U trained model had a much more similar backwards vs forwards score, gives me confidence that its learning was probably more conceptual and relational vs pure memorization which we would expect if the learning was strongly influenced by the direction or chain of events.

  • @Proprogrammer001
    @Proprogrammer001 3 месяца назад +1

    oi

  • @shrokompany4611
    @shrokompany4611 3 месяца назад

    oi

  • @waveFunction25
    @waveFunction25 3 месяца назад

    Oi