Do not use Llama-3 70B for these tasks ...
HTML-код
- Опубликовано: 12 май 2024
- A detailed data analysis of the 1 mio votes by the AI community of the performance of LLMs open up new insights to areas where LLMs outperform, and areas where you better do not use a particular LLM, but opt for a better performance LLM.
all rights w/ authors:
What’s up with Llama 3? Arena data analysis
lmsys.org/blog/2024-05-08-lla...
#airesearch #ai #newtechnology Наука
this is great video! really amazing explanation
One of the best comments today! 😊
“of course, those people were wrong”…..hahahaha.
Finally, someone is laughing ! Success! 😂
summarization might be low because of llama3's context length., that's my best guess. ill have to test it more as i like using the llm's to summarize youtube videos ( thought i watched this one ). I have found some areas llama3 works well and use it for that. one is creative writing / poems, but the result is then used to produce creative lists for other tasks works really well.
If an opensource llm perform well for your particular usecase then, for me, it Will always have my preference than a big monolithic closed source llm from ClosedAi!
Love how your critiques shred the populist AI community while providing useful info.
I couldn't care less about friendliness. We can get that from low param models and use them to reform texts. Larger models should just care about reasoning above all else.
Now I know you are tripping. Unless I can't read that graph properly you are trying tell us that a 44-45% win rate is a big loss!
Especially as this is a 70b open weights model, while the others are all closed weights.
And as another commenter noted Llama 3 has only 4k context window so of course it will be poor at summarisation and other tests that rely on a long context.
We will be getting longer context versions from Meta, multi model and huge parameters.
Llama 3 was trained on 8192 token 😂
@@code4AI ok it has a 8k token length, GPT4 Turbo 128k, Claude 200K, Gemini 1000K+, so 16 times longer my point still stands.
And I notice how you did not address my first point, Like I said you are tripping.
I found it essentailly useless and a waste of my time. I gave it a dataset of 10,000 lines with 22 variables and asked for summary statistics in cumulative blocks of 1000. 10 blocks in total, I reposed this question about 8 times over hours and each time the answer was DRIBBLE. And that was a very easy task. Imagine giving it a little bit more difficulta task like time series modelling. I will check the alternatives.
Maybe you should choose an appropriate tool for the task.