Downgrading My GPU For More Performace
HTML-код
- Опубликовано: 14 ноя 2024
- Checking out a older nvidia tesla card that can meet my needs for AI.
○○○ LINKS ○○○
Nvidia Tesla M40 ► ebay.us/ED5oqB
Nvidia Tesla P40 ► ebay.us/HWpCZO
○○○ SHOP ○○○
Novaspirit Shop ► teespring.com/...
Amazon Store ► amzn.to/2AYs3dI
○○○ SUPPORT ○○○
💗 Patreon ► goo.gl/xpgbzB
○○○ SOCIAL ○○○
🎮 Twitch ► / novaspirit
🎮 Pandemic Playground ► / @pandemicplayground
▶️ novaspirit tv ► goo.gl/uokXYr
🎮 Novaspirit Gaming ► / @novaspiritgaming
🐤 Twitter ► / novaspirittech
👾 Discord chat ► / discord
FB Group Novaspirit ► / novasspirittech
○○○ Send Me Stuff ○○○
Don Hui
PO BOX 765
Farmingville, NY 11738
○○○ Music ○○○
From Epidemic Sounds
patreon @ / novaspirittech
Tweet me: @ / novaspirittech
facebook: @ / novaspirittech
Instagram @ / novaspirittech
DISCLAIMER: This video and description contains affiliate links, which means that if you click on one of the product links, I’ll receive a small commission.
I picked one up on Ebay for $45 shipped. I also had a FTW 980ti cooler laying arround. As long as the cooler fits the stock PCB of any 970 to titan X card, you can just swap it. You may need to cut out or re-solder the 12v power connector in the other orientation tho, in my case I moved it from the back to the top. I also thermal glued heat sinks on the backplate because not beingin a server case means that vram gets warm.
holy molly bro, 45? any link or tip to get one that cheap? ty and hope u enjoy it x)
@@yungdaggerdikkk Newegg has them at about that price
Running Stable Diffusion does it run out of vram at 12gb or at 24gb?
The tech docs claim the system is 2 systems of Cuda and vram etc...
Turing, not TURNing lol
He's a Pro, don't tell him he's wrong 😂
you just turinged him on
😂
SD, GPT, and other AI apps _still_ not taking advantage of AI Tensor cores...
Literally what they were invented for.
As long as I know, llama-cpp can use tensor cores
I'm using a trio of P40s in my headless Z840, kinda risking running into the PSU's power limit, but there's nothing like having a nearly real-time conversation with a 13b or 30b parameter model like Meta's LLaMA.
I am looking into buying a Z840 also, how are you able to keep the P40s cool enough?
@@jaffmoney1219 Air ducting and cramming the PCIe zone intakes to 100%. If you buy the HP branded P40s supposedly their BIOS will tell the motherboard to ramp the fans automatically. I'm using a pair supposedly from PNY so I don't know.
@@KiraSlith Hello! Can you make a short video on how it works for you from the side of hardware and a language model such as LLAMA?
If you can’t or don’t want to make a video, you can briefly describe here your hardware configuration, and what is better to take for this?
I'm looking at an old platform 2011-v3 18-22 core cpu, gaming motherboard from asus or asrock with 128/256gb ddr4 ecs ram. At first I wanted to buy a modern video card RTX 30xx / 40xx line, but then I came across Tesla server accelerators, which have a large amount of VRAM 16/24/32 GB
which we have about 150/250/400 euros
Unfortunately, there is somehow little information, and if you come across videos on RUclips, then people start stable diffusion, which gives very deplorable results even at tesla V 100, which the RTX3060 bypasses.
Thanks in advance!
@@strikerstrikerson8570 Sure, when it comes down for maintenance next. It's currently training a model. If you want new cards only and don't have a fat wallet to spend from, you're stuck with Consumer cards either way. Otherwise, what you want depends entirely on what your primary goal is. Apologies in advance for the sizable wall of text you're about to read, but it's necessary to understand how to actually pick a card.
I'll start by breaking it down by task demand:
- image recognition and voice synthesis models want fast CUDA cores but still benefit from higher core counts, and the larger the input or output, the more VRAM they need.
- Image generation and voice recognition models also want fast CUDA cores, but their VRAM demands expand exponentially faster.
- LLMs want enough VRAM to fit the whole model uncompressed and lots of CUDA cores. They aren't as affected by core speed but still benefit.
- Model training always requires lots of VRAM and CUDA cores to complete in a reasonable amount of time. Doesn't really matter what the model you're training does.
Some models bottleneck harder than others (though the harshest bottleneck is always VRAM capacity), but ALL CUDA Compute capable GPUs (basically anything made after 2016) are able to run all models to some degree. So I'll break it down by their degree of capability, within their same generation and product tier.
- Tesla cards have the most CUDA cores and VRAM, but have the slowest cores and require your own high CFM cooling solution to keep them from roasting themselves to death. They're reliably the 2nd cheapest card option for their performance used and the only really "good" option for training models.
- Tesla 100 variants trade VRAM capactiy for faster HBM2 memory, but don't benefit much from that faster memory outside enterprise environments with remote storage. They're usually the 2nd most expensive card in spite.
- Quadro cards strike a solid balance between Tesla and Consumer. Fewer CUDA than Tesla but more than Consumer. Faster CUDA cores than Tesla but slower than Consumer. More VRAM than consumer, but usually less than Tesla. Thanks to "RTX Experience" providing solid gaming on these cards too, they're the true "Jack of all trades" option and appropriately end up with a used price right in the middle.
- Quadro "G" variants (eg GP100) trade their VRAM advantage over consumer for HBM2 VRAM at absurd clock speeds, giving them a unique advantage in Image generation (and video editing). They're also reliably the most expensive card in their tier.
- Consumer cards are the best used option for the price if you want bulk image generation, voice synthesis, and voice recognition. They're slow with LLMs, and if you try to feed them a particularly big model (30b or more) will bottleneck even more harshly on their lacking VRAM (be it capacity or speed) and potential to bottleneck even further paging out to significantly slower system RAM.
Stuffed a z440 mobo into a 3u case, will be putting 2x p40s in here shortly.
I also got myself an M40 a few months ago. But cooling with air is not really a good solution in my opinion. I was lucky enough to get a Titan X (Maxwell) water block from EK for 40€/~44USD. With it, the part runs perfectly and comes under full load to a maximum of 60 ° C / 140 °F.
If you are not so lucky, I would still recommend using these AiO CPU to GPU adapters (e.g. from NZXT).
Air cooling is comparatively huge and extremely loud (most of the time).
I did get myself a P40 for 170€. RTX 2080 gaming performance and 24gb GDDR5 694.3 GB/s. Stable diffusion on my 2080 runs around 5-10x Faster than on the P40. But it would make a good price/performance cloud gaming GPU.
They are going for $50 currently. get a server rack and fill them up!
If you put a p40 with a 3090 will it be bottlenecked at p40 speeds or will it be an average?
So according to Nvidia own specs the m40 uses the same board as the titan x and 900 series. So theoretical any cooling system that works for either of those two should also work on the M40.
Great explanation. Basically a Gamers vs AI hackers. The AI models want to fit into V/RAM, but are huge, so the 8G or 12G VRAM cards can't run them. Getting a new + huge VRAM GPU is hella expensive right now. So an older card with lots of VRAM works. Also, the Gamers tend to overclock/overheat, but the Tesla and Quadro are usually datacenter liquidations, so there's less risk of getting a fried GPU. BTW: the P40 is newer version of the M40.
I just bought a rtx 4090 last night and all the parts for a new desktop, i9 13900K, MSI Meg Z790, ddr5 128gb, 4 - samsung 990 pros, to just do SD and AI, maybe over kill
Dude you’re loaded 😁$
Definitely not overkill if it's for professional use
Tesla P40 24gb cards are on ebay for sub $200 now. Considering one for my server
For anyone wanting to do this. I found the best cooling solution is a Zotac gtx 980 amped edition 4 Gb model. It has the exact same footprint. The circuit board is nearly identical. Bolts right on with very little modifications. You will need to use parts from tesla and zotac gpuGPU to make it work. Been running mine for a while now without issue.
I have been planning this move as well as the M40 is dirt cheap on ebay. But I worry about one thing you did not touch on this video (or at least I did not notice if you did): How did you solve the power cabling issue? I believe the M40 does not take a regular pcie gpu power cable but needs something different, an 8-pin cable?
That's right, the Tesla M40 and P40 use an EPS (aka "8-pin CPU") cable, which can thankfully be resolved using an adapter cable. Just a note, the 6-pin PCI power to 8-pin EPS cables some chinese sellers offer should ONLY be used with a dedicated cable run from the PSU to avoid cable meltdowns! Thankfully this isn't an issue if you're using a HP Z840 (which also conveniently solves the airflow issue too), or a custom modular PSU with plenty of PCI power connections, but it can quickly become an issue for something like a Dell T7920.
Guys, have you ever heard about mining gpus? If you do, you have a solution for maybe 10 or 20 cards at once, why complaint about 2 or 3 cards?
You say you need a newer motherboard to use the P40. Does any motherboard with PCIe x16 3.0 work?
Yes, as long as it supports above 4g decoding
Also don't forget the fan.
Got my Tesla M40 a while back, and now have a fan cooling on it (EVGA SC GTX 980ti cooler) to mess around with, but just seeing the power consumption 😅😅
I just purchased a Telsa P4 some weeks ago, and having a blast with it. The Low Profile even fits in the QNAP 472XT chassis. Passthrough works fine (minor tweaks). Currently compiling kernel to get support for vGPU (if i ever succeed).
I got to ask. Why do you say it needs PCIE Gen 4 and a newer motherboard? Documentation says it's PCIe 3
So, I picked up a P40 after watching this video... Thanks! Do you have any videos that talk about loading these LLMs, or if I should go with linux/windows/etc... maybe install Jetpack from the Nvidia downloads? I've screwed around a little with hugging face, and that made me want to get the card to run better models, but rabbit hole after rabbit hole, I'm questioning my original strategy.
i'm glad you were able to pick up a p40 and not the m40 since pascal arch can run 4bit modes which is most llm models but llm's changes so rapidly i can't even keep up myself but i have been running the docker container for github.com/Atinoda/text-generation-webui-docker . but yes this is a deep rabbit hole i feel your pain
Easiest out-of-box apps for running local LLMs are GPT4All and AnythingLLM. Huggingface requires lots of hugging to not sink into rabbit holes :) The apps like I mention keep things simple. Both have active Discord channels that are helpful too.
Remember how much it was at the time?
@@l0gic23 180 bucks locally here off of fb marketplace.
How does the p40 perform for video editing and 3D design programs like Blender?
You can use both card simultaneously. There will be two CUDA devices.
The Tesla K80 with 24gb vram, claims a setup of 2 system each with it own Cuda and vram. When running Stable Diffusion does it behave as one GPU with 24gb or does it behave as 2? Does it run out of vram at 12gb or 24gb in image production?
That's exactly my question.
2:21 Actually, that Tesla card has 1150 more cuda cores than that 2070...
3,072-1,922= 1150
The only thing im curious about is how well it can mine. 🤔
If anything, why the hell wouldnt you just get a 3090ti? It has 10,496 cuda cores which is far and beyond the tesla in both capabilities for work and gaming.
If its due to sheer prices, i get it but the specs are still beyond what you currently have.
Cost:Performance...
i have asus h410 hdv m.2 intel chipset, compatibilty good with the tesla m40?
ty
But wait, i was under the impression that both the M40 and the P40 are dual GPU cards, so the 24gb of vram is split between the to gpu's. or am i mistaken ? when i look up the specs it looks like only 12gb per gpu.
M40 and P40 GPU are single CPU
I think you are talking about K80 gpu.
I went with K80 but stable diffusion only runs with torch 1.12 and cuda 11.3 and right now only runs on 12GB half memory and half gpu in the k80 because it is Dual GPU. M40 should allow modern cuda and nvidia driver and also no work around needed to access full 24GB on K80.
Thank you, I have been looking for this info
Does it use the whole 25Gb VRam, because it's basically 4 cores put together, is the Vram working as 1?
You can't just plug that card in and go. There are driver issues. Did you get it working?
Owner of P40 and 3090 in the same PC.
No problems whatsoever, just install Studio driver
Excellent video.
In my case I have a workstation with an msi X99A TOMAHAWK motherboard with an Intel Xeon E5-2699 v3 processor, (and I currently use 3 monitors). Because of this I installed a GPU, AMD firepro w7100 which works very well for me in Solidworks.
The RAM is Non-ECC 32 gigabytes.
The problem is that I am learning to use ANSYS, and this software is married to Nvidea, and for GPU calculation acceleration, looking at the Ansys GPU compatibility lists, I see that the K80 is used, and taking into account the second-hand price, I am interested in purchasing one.
How can I configure my system to install an Nvidea Tesla K80 and have the AMD GPU work as an image or video generator for my monitors as it currently does? Does the Nvidea K80 gpu have 24 GB of ram, can this be affected when using this gpu in conjunction with the AMD GPU that only has 8 GB of ram? Would the K80 be restricted to the RAM of the Firepro w7100?
My PSU is 700 watts.
Thank you.
Great video. Thanks!!
Home Depot has free delivery.
😂
With them having the choice of worst wood, no thanks
P40 is good, but only concern is that it had probably been used for mining
Mining has been proven to not cause any more significant wear than regular duty cycles..
In fact, in some situations the mining rig would be a cleaner and safer environment than in a PC case, on the floor in some persons home with toddlers sloshing their chocky milk around, for example 😂
Just buy a used RTX 3090 for $500. Works great with generative art, LLMs, etc.
after watching your video, i tried to do the same, but, i had a problem.. i have the HP DL380 server and I purchased the Nvidia Tesla P100 16GB, but i can't find the power cable.
watching other poeple i am afraid to buy the wrong one and fry my server.... can you please tell me the right cable to buy please..
Pc dont need HDMI output to boot. Any display interface is ok. VGA, DVI or DP
What's idle power consumption the m40. I'm thinking to use in my server but can't find details on internet. Thanks
what is the power draw "idle" of that?! if on 24/7 in a server. can it power down? cant find info on that online.
My M40 idled at ~30 watts, P40 is closer to 20
So P40 > M40?
Yes
For training or FP16 inference get the P100, it got decent FP16 performance, the P40 is horrible at that, it was specialised for INT8 inference@@b_28_vaidande_ayush93
the p40 from the price is already way better, also if you wanted more cuda cores you could have gotten 2 K80s for the same price
My 4090 takes 2 seconds to make a 512x512 at 25 steps. It only has 24gb vrm which means that i can only like make 2000x2000 inages with no upscaling
How does this compare to an RTX A2000?
Can you suggest a desktop workstation can include tesla m40? Thank you so much
look for an HP z840, but buy a GPU separately because you are probably going to pay way more if included.
i have one of these cards how do i use it an ubuntu 22.04 computer
m40?
What about hbcc on a vega 64 to "unlimited" boost in ram all be it a little slower but with video out etc
I have server that I'm going repurpose as a video renderer to a multiple storage drive bay (24) I wanted to know if this is possible? would I need proxmox etc would the p40 model be sufficient?
I have a video on this topic with using tdarr
I'm about to try it with a 32 gb Radeon Instinct Mi 50.
is the 8gb lopro good for my sff dell i like my rx550 but i could play alot more stuff i bet i could lay starfield 1080 on low on the 8gb m4 .... is it worth th e90 bucks
Now, I have Fit, what’s its comparison? 🤭
The Ford F150 of graphics cards. Slick!
That is a heck of a lot cheaper than the 3090's
kind of sad that the price of these cards in my region is ridiculous.... its actually cheaper to get a rtx3090 2nd hand rather than getting the p40.... and the m40 is double the price compared to the one in this video....
P100 would be better for stable diffusion
isn't 4090 faster?
Finally, i will be able to fine-tune and upgrade my Gynoid. Btw 3090 has 10496 cudas, and its about 850$ the cheapest in the market brand new.
cuda cores my frend .ihave this card on my table right now.g f pol
You could've made an actual comparison.
Did anyone able to get this server graphic card able to play video games? Or only able to get it to only work how you have, running tasks, its a "smart" card, like how cars are able to drive.
All tesla cards can play games, the problem with those is the cooling because there is no heatsink fan, you have to either buy your own 3D printed shroud or have a server that shoots air across the chassis
I never knew .THANKS.
Great video👍
Can you run a stable diffusion test and show us how to set it up please!
i have NVIDIA GeForce GTX 1080 Ti 3584 CUDA Cores. and i was thinking it is so old lol
Купил Максвелл и хвастается. Хоть бы Паскаль...
Yay rmiddle
Second!
First!
Try --medvram or --lowvram. 24gb should be able to get 2048x2048 with --lowvram.