Great video. When you started to do the English Korean translation it reminded me of a recent video about LCMs which Meta has been researching. May be worth doing a session on them
Hello Matt, results in open webui with Ollama using two or more models at the same time are very unstable. I was always using 3 models or 4 at the same time and got very frustrated. I moved back to OpenAI. My Ollama server was a test unit again and I noticed that one model at the time was providing amazing answers.
Ollama does multiple models very well at the same time, especially when you have multiple gpus. The machine I am using has 8 H100 GPUs, so it performs very well.
Some obvious answers to the questions. Time Travel, obviously the one guy got spread over two timelines, therefore arriving before and after the other person. (Probably not intact though). Lightbulb answer. Many of the lightbulbs being replaced were previously burnt out.
I think the benchmark has to align with your own task, if no benchmark aligns then you have to create your own one, testing what you want to accomplish. Speed is easy, but maybe you have some spezialized task like doing taxes in some complicated country, no benchmark would cover that. Maybe you are looking for an engaging conversation where the scientific accuracy of answers are secondary to you just having fun - like a starship commad AI story game. The question is though for me, how do I automate it? Do I use a llm backend, then some pything and excel? or is there already a harness for that?
Great video. When you started to do the English Korean translation it reminded me of a recent video about LCMs which Meta has been researching. May be worth doing a session on them
Can you show us how to implement LLMs in multiple machine as a cluster of 4 maybe?
Hello Matt, results in open webui with Ollama using two or more models at the same time are very unstable. I was always using 3 models or 4 at the same time and got very frustrated. I moved back to OpenAI. My Ollama server was a test unit again and I noticed that one model at the time was providing amazing answers.
Ollama does multiple models very well at the same time, especially when you have multiple gpus. The machine I am using has 8 H100 GPUs, so it performs very well.
Some obvious answers to the questions. Time Travel, obviously the one guy got spread over two timelines, therefore arriving before and after the other person. (Probably not intact though). Lightbulb answer. Many of the lightbulbs being replaced were previously burnt out.
I think the benchmark has to align with your own task, if no benchmark aligns then you have to create your own one, testing what you want to accomplish. Speed is easy, but maybe you have some spezialized task like doing taxes in some complicated country, no benchmark would cover that. Maybe you are looking for an engaging conversation where the scientific accuracy of answers are secondary to you just having fun - like a starship commad AI story game.
The question is though for me, how do I automate it? Do I use a llm backend, then some pything and excel? or is there already a harness for that?
"Lies, damn lies and statistics" 😊
Nothing ever works in a live demo. Murphy's law at it's finest.
except i would say that it all worked very well
How is that when possible? That amount of RAM is insane to the nut 🧠 😀