Ai Server Hardware Tips, Tricks and Takeaways - Watch before you SHOP BF/CM

Open Reasoning vs OpenAI

Does your GPU orientation actually matter? Surprising results...

Fortnite's *FREE* JUICE WRLD SKIN NOW AVAILABLE! (How to Unlock)

Can I remake those SH*TTY mobile games in 1 HOUR?

Nick Eh 30 VS 100 Nick Eh 30's!

Qwen QwQ 2.5 32B Ollama Local AI Server Benchmarked w/ Cuda vs Apple M4 MLX

Digital Spaceport

Просмотров 6 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 28 ноя 2024

Комментарии • 31

@frankjohannessen6383 5 часов назад ⁺²
It's the first open model that has perfectly solve a logic puzzle I've asked a lot of models. I also like the very verbose answers. That way you can verify that it didn't just get to the answer by a lucky guess. As for the inconsistency, I think that is because of the very long responses. A few low-probability tokens early is probably sending it far off course. So it should probably be run at a very low temperature.
@DigitalSpaceport 5 часов назад
Oh I didnt adjust my temp on it, good call! This is by far the best assistive model for thoughtful explorations I've found. Very correctable and feels like im working with a human almost.
@dijikstra8 11 часов назад
Very cool that this kind of model is open sourced that can be run locally given sufficient resources, I think this bodes well for the future as we get more specialized chips in our computers, we could have very competent local personalized models for e.g. coding. It's also very interesting to see an open Chinese model perform like this, from a geopolitical point of view.
@DigitalSpaceport 5 часов назад
Yes this being open is pretty wild. The commitment of the qwen team is awesome. Im eager for llama 4 also
@DigitalDesignET День назад ⁺⁵
We need to try Aider in Architect Mode, with Qwen-Coder 32B/72B as the coder and QwQ 32B as an architect. What do you think?
@DigitalSpaceport День назад ⁺³
This sounds interesting and aider looks approachable also. Im going to try to get it running.
@Eldorado66 21 час назад ⁺²
You should try this out with LM Studio. It’s always worked best for me and is much easier to customize, especially when it comes to loading the model. Open WebUI has some issues and the connection to Ollama, especially at the start, can be pretty laggy.
@andrepaes3908 День назад ⁺⁴
Great analysis! Good insight to see the 3090's running at almost 2x the speed of M4 Max. Also interesting to see the QWQ context size allocates the same amount of VRAM than the model. For 32 Q8 = 34+34 and 32 Q4 = 20+20. This is way more than the Qwen Coder 2.5 32b context size consumes! Any thoughts why this?
@DigitalSpaceport День назад ⁺¹
@andrepaes3908 i dont have any firm insight as to why but there is variation ive seen among models. Not like this however. I did try setting the num gpu to 2 and running the q8 but it spilled out. Could be a sw thing, but its notable. If you observe different lmk. Im always sus of a potential sw issue.
@thcleaner22 3 часа назад
with 8bit model on a M1 Ultra with mlx-lm
2024-11-29 20:22:25,189 - DEBUG - Prompt: 147.551 tokens-per-sec
2024-11-29 20:22:25,189 - DEBUG - Generation: 14.905 tokens-per-sec
2024-11-29 20:22:25,189 - DEBUG - Peak memory: 35.314 GB
@TheZEN2011 День назад ⁺¹
I played with QwQ a little bit. I don't know what to think of it quite yet. Quinn coder, seem to work better for coding. But yeah, QwQ is kind of lively in it's thinking process.
@aidanpryde7720 20 часов назад
omg that powershell gpu monitor is so cool, any chance you can share what program/script it is?
@DigitalSpaceport 15 часов назад
Its nvtop cmd. Im not sure if it runs in ps but pmk if you find out. Its shown here running in Linux via my ssh term.
@UCs6ktlulE5BEeb3vBBOu6DQ День назад ⁺¹
For the P40 crowd, Q8 with 2x P40 gives me 8 t/s.
@DigitalSpaceport День назад
Full model fit into 2 at 32769 context?
@UCs6ktlulE5BEeb3vBBOu6DQ 23 часа назад
@@DigitalSpaceport I have a tiny RTX A2000 12GB in there for larger models. But it would fit without because nvidia-smi reports the vram usage as 16gb/24 for both P40 and 8 out of 12gb for the A2000.
@thaifalang4064 День назад ⁺⁴
On M1 Max 15.5t/s 4Bit/ 9,3t/s 8Bit (LM Studio) (Qwen_QwQ-32B-Preview_MLX-8bit)
@DigitalSpaceport День назад
@@thaifalang4064 thanks for adding more datapoints. Did you observe the ram allocation? Seems like a very ram hungry model.
@manofmystery5709 День назад
I've read that someone found a way to string together multiple 4090's using PCIe (they don't support NVLINK). Would that be a configuration possible to set up on consumer motherboards and PSUs?
@DigitalSpaceport День назад
The ollama/llama.cpp software does it automagically over pcie for inference workloads. You need nvlink for training, but not inference really. These 3090s are just running off the pcie bus.
@BeastModeDR614 3 часа назад
Athene-V2 is a 72B parameter model is much better and is available in Ollama. I can run it locally with my 48GB M3 MAX. the 72b-q3_K_L Model version
@thingX1x День назад
The camera was shaking so much in the intro it almost gave motion sickness, lol. but cool content!
@DigitalSpaceport День назад ⁺¹
@@thingX1x sry should have fed camerawife first
@A_Me_Amy 22 часа назад
I keep thinking small models properly optimized are best
@DigitalSpaceport 22 часа назад
This is really good for a 32 q4 imho
@blisphul8084 18 часов назад
Imagine running this chinese AI model on a Chinese Moore Threads GPU. If Nvidia keeps stalling with the vram, perhaps we'll see that soon.
@DigitalSpaceport 5 часов назад
I didnt think about that til now but you have a good point. VRAM moat is practically understandable, but def not secure for nvidia.
@rogerc7960 День назад
Runs on just a CPU!
@DigitalSpaceport День назад
Super slow from what I saw but yeah you can also run a 405b low quant on CPU provided you have the ram. Just too slow to be useful.
@大支爺 5 часов назад
NO APUs able to beat 3090/4090 in at least 10 yrs.

Следующие

Автовоспроизведение

Ai Server Hardware Tips, Tricks and Takeaways - Watch before you SHOP BF/CM

Ai Server Hardware Tips, Tricks and Takeaways - Watch before you SHOP BF/CM

Open Reasoning vs OpenAI

Open Reasoning vs OpenAI

Does your GPU orientation actually matter? Surprising results...

Does your GPU orientation actually matter? Surprising results...

Fortnite's *FREE* JUICE WRLD SKIN NOW AVAILABLE! (How to Unlock)

Fortnite's *FREE* JUICE WRLD SKIN NOW AVAILABLE! (How to Unlock)

Can I remake those SH*TTY mobile games in 1 HOUR?

Can I remake those SH*TTY mobile games in 1 HOUR?

Nick Eh 30 VS 100 Nick Eh 30's!

Nick Eh 30 VS 100 Nick Eh 30's!

Dad Reacts to Kendrick Lamar - GNX

Dad Reacts to Kendrick Lamar - GNX

Unlocking Modern CPU Power - Next-Gen C++ Optimization Techniques - Fedor G Pikus - C++Now 2024

Unlocking Modern CPU Power - Next-Gen C++ Optimization Techniques - Fedor G Pikus - C++Now 2024

Data Architecture Elevator Episode 4 - Privacy

Data Architecture Elevator Episode 4 - Privacy

OpenAI's Sora Leaked, My Favorite Claude Update & More AI Use Cases

OpenAI's Sora Leaked, My Favorite Claude Update & More AI Use Cases

I Bought a $30 RTX4090 from China!

I Bought a $30 RTX4090 from China!

VFX Artist Reacts to AI Coke Commercial

VFX Artist Reacts to AI Coke Commercial

WE GOT ACCESS TO GPT-3! [Epic Special Edition]

WE GOT ACCESS TO GPT-3! [Epic Special Edition]

Proxmox GPU Passthrough LXC with Docker Ultimate Guide

Proxmox GPU Passthrough LXC with Docker Ultimate Guide

Local LLM Challenge | Speed vs Efficiency

Local LLM Challenge | Speed vs Efficiency

AI Server Thread Inference CPU Speed Impact - Threadripper vs EPYC

AI Server Thread Inference CPU Speed Impact - Threadripper vs EPYC

Как я пытался выйти в АСТРАЛ... больше не хочу... (Анимация)

Как я пытался выйти в АСТРАЛ... больше не хочу... (Анимация)

Зомби с РЕЖИМОМ БОГА против КОМАНДЫ ПРО ИГРОКОВ! ЗОМБИ АПОКАЛИПСИС😰

Зомби с РЕЖИМОМ БОГА против КОМАНДЫ ПРО ИГРОКОВ! ЗОМБИ АПОКАЛИПСИС😰

Это точно ремонт авто ? 🥲 #юмор #авто

Это точно ремонт авто ? 🥲 #юмор #авто

Tallest and shortest woman meet for the first time 🥰

Tallest and shortest woman meet for the first time 🥰

ТЫ С ДРУГОМ В ДЕТСТВЕ ИГРАЕШЬ В ГАРРИ ПОТТЕРА😂#shorts

ТЫ С ДРУГОМ В ДЕТСТВЕ ИГРАЕШЬ В ГАРРИ ПОТТЕРА😂#shorts

Открыли бесплатный банк в поселке - раздаем бесплатные деньги ч4

Открыли бесплатный банк в поселке - раздаем бесплатные деньги ч4

Quando A Diferença De Altura É Muito Grande 😲😂

Quando A Diferença De Altura É Muito Grande 😲😂

S.T.A.L.K.E.R 2 - Кровосос

S.T.A.L.K.E.R 2 - Кровосос