Did one test uploaded a pdf and asked it to summarize the document. It spit out jibberish not even the same topic as the paper. So obviously how could i begin to trust anything from this if it fails something very simple.
This is a good fine tune starting jumpoff point. It answers your questions and in short you can and the -q flag in ollama can create your specific quant desired. The huggingface fine tune adapter framework and safetensor adapter for Llama3.1 is a thing so I would assume it is for 3.2 as well. That's a good starting point for you. github.com/ollama/ollama/blob/main/docs/import.md
It got it right for me... (or so I thought, lol) If A is equal to number 0, what is the number of M, S and z? llama3.2:latest Based on the standard ordering of the alphabet, where A is indeed equal to 0: M = 13 S = 19 Z = 26 Let me know if you have any further questions!
I went back and asked it again, same answer. Tried 3.1, could not get it either. However I gave it one clue A=0 and B=1 and boom it got it. Probably too easy of a clue but I'm surprised it could not answer it. I asked Claude and ChatGPT...they couldn't get it either....very odd. Good question!
This is the result of AI inbreeding aka training on Synthetic data. I have a prompt that gets counting etc consistently accurate on llama 3.1 8b. However, on the 3.2 models they get things wrong all the time.
@@mayankmaurya8631 Nope, tried my prompts on the 3.2 11b and 3.2 90b as well. They's just inferior and keep getting things wrong. I get consistently correct responses from llama 3.1 8b using my special prompts.
@@DavidVincentSSM Im not trying to larp as a pro or anything but I am interested in what makes for a good product. Im thinking less and less benchmarks make for a good product.
You're prompting the model wrong. The "strawberry" tests fail due to the _tokenization_ methods of the given model. Prompt it as if you wanted to place the sentence into an array, then ask your third word second letter. It won't fail.
@@_s.i.s.u. Other models I've tested, including qwen 2.5, can and do nail that exact question. Copy pasted. If a question has to be asked in a specific way to elicit a correct response, that is a failure.
Did one test uploaded a pdf and asked it to summarize the document. It spit out jibberish not even the same topic as the paper. So obviously how could i begin to trust anything from this if it fails something very simple.
It does seem to be pretty bad frankly
16:00 synthetic as in synthetic data?
Synthetic benchmarks like MMLU
What *top program are you running under WSL2?
will you test the 11b one?
What are you running in the powershell in the top right that shows your GPU status?
Im actually ssh'd into the host computer Guide Here: ruclips.net/video/TmNSDkjDTOs/видео.html and running NVTOP
How can you quantize that model? And also can we fine tuning a model we downloaded from ollama?
This is a good fine tune starting jumpoff point. It answers your questions and in short you can and the -q flag in ollama can create your specific quant desired. The huggingface fine tune adapter framework and safetensor adapter for Llama3.1 is a thing so I would assume it is for 3.2 as well. That's a good starting point for you. github.com/ollama/ollama/blob/main/docs/import.md
It got it right for me... (or so I thought, lol)
If A is equal to number 0, what is the number of M, S and z?
llama3.2:latest
Based on the standard ordering of the alphabet, where A is indeed equal to 0:
M = 13
S = 19
Z = 26
Let me know if you have any further questions!
If A=0, then B=1...M=12...S=18...z=25
@@DigitalSpaceport hahahaha....yeah. Oh well.
I was surprised at a few other models that did get this right and it tracks closely also to counting letters question also I've noticed.
I went back and asked it again, same answer. Tried 3.1, could not get it either. However I gave it one clue A=0 and B=1 and boom it got it. Probably too easy of a clue but I'm surprised it could not answer it. I asked Claude and ChatGPT...they couldn't get it either....very odd. Good question!
Qwen 2.5 here - ruclips.net/video/dOrgIn2ztvY/видео.htmlsi=mb33EAbMjXk55YC3&t=555
This is the result of AI inbreeding aka training on Synthetic data. I have a prompt that gets counting etc consistently accurate on llama 3.1 8b. However, on the 3.2 models they get things wrong all the time.
Oh that's a great term! Ai inbreeding 😅
You sure it wasn't because of 3b vs 8b ?
@@mayankmaurya8631 Nope, tried my prompts on the 3.2 11b and 3.2 90b as well. They's just inferior and keep getting things wrong. I get consistently correct responses from llama 3.1 8b using my special prompts.
i agree that the results don't seem to match benchmarks in real world performance.. maybe something everyone else is missing?
@@DavidVincentSSM Im not trying to larp as a pro or anything but I am interested in what makes for a good product. Im thinking less and less benchmarks make for a good product.
Is there any possibility to earn money with this model?
You're prompting the model wrong. The "strawberry" tests fail due to the _tokenization_ methods of the given model. Prompt it as if you wanted to place the sentence into an array, then ask your third word second letter. It won't fail.
@@_s.i.s.u. Other models I've tested, including qwen 2.5, can and do nail that exact question. Copy pasted. If a question has to be asked in a specific way to elicit a correct response, that is a failure.