$60 AI GPU???

CJ's Workshop

Просмотров 1,9 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 20 сен 2024

Комментарии • 35

@cj-pais 2 месяца назад ⁺²
One thing to note: The time to first token as discussed here is incorrect. it is representing the overall time to response of the LLM instead of when the first token comes. This was due to a bug in the benchmark itself.
@vulcan4d 18 дней назад ⁺²
I have 3 p102-100 GPUs. One by itself runs great, however combined with larger models they do struggle. For example a 8B Q8 model runs at 32tk/s but a 27B Q6 runs at 6tk/s. Also Ollama uploads the models into memory sequentially, meaning you have to wait 10seconds per GPU with PCIe 1 x4 until processing starts. Still, these are cheap to play around.
@UnkyjoesPlayhouse 23 дня назад ⁺²
I have 2 P40 units and one M40Telsa as well as a RTX 3060 12 GB for A.I. spread around my lab, I love the P40's as they both have 24GB vram.
@bourgogneguillaume 11 дней назад
Please, how did you use 2 P40 and one M40 in the same computer ? I didnt understant why you need the M40. Thanks
@UnkyjoesPlayhouse 11 дней назад ⁺¹
@@bourgogneguillaume I did not use them in the same computer, I used the 2 P40's in one machine, the M40 and RTX3060 together in another computer.
@MrMoonsilver Месяц назад ⁺¹
Cool video, novel thoughts! Thank you!
@cj-pais Месяц назад
Thank you!
@yohanmestre2203 17 дней назад ⁺¹
I just bought a 3090 because i was frustrated to not be able to do everything with my vega 56 for my usage the p102 would have made more sense i think
@Gameboygenius 2 месяца назад ⁺²
Hi. Interesting. A couple of questions, if you're interested in doing a follow up. What's the power usage per operation like? How does it do on non-AI compute? How well does NVLink work? Here's my thinking. These might be decent for something like a personal Blender render farm. You could make a rig with maybe 2-4 of them. Especially for the RAM, 10 GB seems reasonable for the price.
@cj-pais 2 месяца назад ⁺¹
these are great questions, thank you for them
power usage per operation is really interesting and something i would love to follow up on in the future. chipsandcheese.com probably has the most extensive information on things like this currently, but i am not sure if they are measuring that much in depth. will see what i can do with regards to that, but it may take some time
the p102 does not have nvlink, but the older SLI. definitely could be curious to test it and see how it does
right now i am not looking at too many other workloads, but i could be interested to explore them. i just don't work directly in them so my knowledge would be a bit less. if you could point me at certain workloads you would be interested in having benchmarks for i can try and do what i can
@denismaleev3848 Месяц назад ⁺¹
I saw that someone tried to unlock x16 by soldering additional capacitors and hacking the BIOS. Do you know anything about this?
@cj-pais Месяц назад ⁺¹
no, this is super interesting! would be really cool to see
@chaotikclover7157 26 дней назад
I heard that it is impossible to unlock x16 on this device, because of lack of some connections inside chip
@sisakamence 10 дней назад
I share my experience that can help other ppl :) I have modded my M40, I installed a cooler heatpipe of 980ti with 3 fans with 4k rpm speed and it works; the temp max is 65 C ;) Look on web if someone has modded your tesla gpu . Now i m trying to mod my Tesla P100 (has a special chip ,not other nvidia gt gpu has same chip) and I will install a AIO watercool like I see on internet. Sorry for bad english
@blarhblerh3436 2 месяца назад ⁺¹
It has PCIE x1 4.0. How about multiple GPUs setup and transfers between the cards? How big a bottleneck it will be?
@cj-pais 2 месяца назад ⁺¹
Will follow up and test soon! Are you thinking mostly inference or?
@blarhblerh3436 2 месяца назад
@@cj-pais inference, training, fine-tuning. What would be the best config for a multi-GPU setup with them? What would be the performance hit with row/layer split? Something like that. It would be useful for large LLMs and training as it is PCIe 1x. We can pack up to 16 to one PCIe 16 slot with 160GB of VRAM.
@cj-pais 2 месяца назад ⁺¹
@@blarhblerh3436 this all sounds great, will see what's possible. i would suspect training is going to be impacted fairly heavily by the lack of pcie bandwidth, but would love to try and test it and find out for real!
@denismaleev3848 Месяц назад
have the same setup with 8gpu. x1 4.0 its so slow for first upload data to the GPU(
@blarhblerh3436 Месяц назад ⁺²
@@denismaleev3848 Ok. I know it may be slow to load weights due to PCIE speed. That should not be an issue because you do not change models very often. But what about interference speed?
@jamessprow2032 Месяц назад
What library/program did you use for inferencing using this gpu?
@cj-pais Месяц назад
For language, vision, and speech to text I used the ggml based llamafile and whisperfile
For diffusion it was done with ComfyUI
@akungws1126 Месяц назад
I'm from Indonesia, how much would it cost to get a p102-100?
@cj-pais Месяц назад
I am not sure the best way in Indonesia, here in the US we buy them off ebay
@def7782 2 месяца назад
What is your system setup? Which drivers did you use to get it to work?
@cj-pais 2 месяца назад
The benchmarking rig is:
* AMD EPYC 7352 24 Core CPU
* 128GB RAM
Run on Ubuntu 22.04
Kernel: 6.5.0-41-generic
Drivers: NVIDIA 555.42.06.
Drivers were installed via: sudo apt-get install -y cuda-drivers (proprietary, and supports the older GPUs)
@def7782 2 месяца назад
@@cj-paisno other software or any modifications required for getting this cards to work? They are locked out of performing some things as far as I know
@cj-pais 2 месяца назад ⁺¹
@@def7782 nope! worked just like any nvidia card does for CUDA application at least. for other applications it may be different. but the workloads tested are all just compute, so it just worked
@remiranda 2 месяца назад
nice landing huahuaha
@cj-pais 2 месяца назад
ahahahaha face first
@iheuzio Месяц назад ⁺²
That is not a good card, 5GB vram and 250W with only 3200 cuda cores. There's better options than this. I'll even recommend a couple v100s as they're 150$ CAD and each have 300W with 5020cuda cores. So for nearly 2x the price you get 3x the vram for the same wattage. Also setting up old nvlink servers is cheap.
@vulcan4d Месяц назад ⁺³
It is almost the P40 but with 10GB vram once firmware flashed. Best part, you can find them for $50. The V100 is good for Exlama, P40 or this better in Ollama.
@FSK1138 6 дней назад
ook into mini pc with amd cpu / gpu some can use 96gb of ram
you can assign 16 to 42gb of ram to the gpu
this is the best value for A.I. hands down
Alex Ziskind "Cheap mini runs a 70B LLM "

Следующие

Автовоспроизведение