Hanna Hajishirzi (AI2) - OLMo: Findings of Training an Open LM
HTML-код
- Опубликовано: 28 апр 2024
- Talk from the Open-Source Generative AI Workshop at Cornell Tech.
Speaker: homes.cs.washington.edu/~hann...
Slides - drive.google.com/file/d/1BlHJ... - Наука
Not fan until they'll bring something new to architecture. Compare it to striped hyena/mamba/rwkv. Whatever kneading they do with training datasets, in the end its worse than Apache-licensed mistral.
There are hundreds of new papers and thousands of oldish papers that were not implemented into big textgen models. Yet we see yet another decoder-only-transformer model.
While I'm interested in different architectures, my guess is that they will end up performing similar to decoder-only Transformers at the end of the day. Changes in data and amount of training seem to have a larger impact on the actual performance of the model. While Mistral / Llama 3 are extremely good, we do not really know why. Presumably it is do the data ingestion processes.