Hanna Hajishirzi (AI2) - OLMo: Findings of Training an Open LM

Поделиться
HTML-код
  • Опубликовано: 28 апр 2024
  • Talk from the Open-Source Generative AI Workshop at Cornell Tech.
    Speaker: homes.cs.washington.edu/~hann...
    Slides - drive.google.com/file/d/1BlHJ...
  • НаукаНаука

Комментарии • 2

  • @AM-yk5yd
    @AM-yk5yd 20 дней назад

    Not fan until they'll bring something new to architecture. Compare it to striped hyena/mamba/rwkv. Whatever kneading they do with training datasets, in the end its worse than Apache-licensed mistral.
    There are hundreds of new papers and thousands of oldish papers that were not implemented into big textgen models. Yet we see yet another decoder-only-transformer model.

    • @srush_nlp
      @srush_nlp  20 дней назад +1

      While I'm interested in different architectures, my guess is that they will end up performing similar to decoder-only Transformers at the end of the day. Changes in data and amount of training seem to have a larger impact on the actual performance of the model. While Mistral / Llama 3 are extremely good, we do not really know why. Presumably it is do the data ingestion processes.