Gaming on NVIDIA Tesla GPUs - Part 2 - NVIDIA Pascal

Поделиться
HTML-код
  • Опубликовано: 15 ноя 2024

Комментарии • 228

  • @zeroforkgiven
    @zeroforkgiven 5 месяцев назад +59

    Price at the launch of this video for the Tesla P4 is ~105 on eBay. Very curious what it will be tomorrow.

    • @CraftComputing
      @CraftComputing  5 месяцев назад +40

      Do I post a pre-emptive sorry, or wait until they're $250?

    • @logan_kes
      @logan_kes 5 месяцев назад +12

      @@CraftComputingsomething has to have driven up p4 prices in the last month or so. In December last year I picked up a pair of p4’s for $85 each (trended $75-$80 from China or $80-90 from U.S. sellers) and now they have shot up to close to $110 for us sellers and $95 for China sellers. I’d imagine your video will bump these up even more 😅 glad I just got 2x Tesla p40’s though last week before those go up too 😂
      Keep up the great content, as you get more and more popular you will start to mess with the used enterprise equipment market more and more to the point you will need to put a disclaimer in your videos stating *pre video pricing* lol

    • @zeroforkgiven
      @zeroforkgiven 5 месяцев назад +7

      @@CraftComputing LOL, its the Craft Effect. I don't mind as I already own 2 of them (the best Plex hardware card IMO) and the prices will fall back down in a few weeks.

    • @garthkey
      @garthkey 5 месяцев назад +4

      Yea after he post videos they spike. I just bought the ASUS gaming server from a couple videos ago. Original price was 175. Then it spiked to 250

    • @JPDuffy
      @JPDuffy 5 месяцев назад +5

      I bought one for $50 in February. It's excellent, but I don't think it's worth $100+ considering the extra work to setup. I have it running in a 4970K Dell and it plays 1080p games at max settings 60+ without breaking a sweat.

  • @ProjectPhysX
    @ProjectPhysX 5 месяцев назад +48

    Main difference between P100 and P40 is not the VRAM. The P100 has 1:2 FP64:FP32 ratio, for the P40 (and all other Pascal GPUs) it's 1:32, basically incapable of FP64.
    P100 is much better for certain computational physics workloads that need the extra precision, like molecular dynamics or orbital mechanics.

    • @OliverKr4ft
      @OliverKr4ft 5 месяцев назад +11

      All games use FP32 though, so the additional FP64 FPUs on the P100 make no difference

    • @sidichochase
      @sidichochase 5 месяцев назад +8

      @@OliverKr4ft For gaming. but for people who want a nice cheap GPGPU, the p100 is the better choice.

    • @TheLibertyfarmer
      @TheLibertyfarmer 4 месяца назад +1

      P100 can do 2:1 fp16 to fp32 ratio too which makes it much faster at training at 1/2 precision than the P40 as well having NVlink support. and thus more efficient for training in general

    • @gg-gn3re
      @gg-gn3re 4 месяца назад +1

      @@OliverKr4ft Yea, gaming isn't "molecular dynamics or orbital mechanics" if you didn't know

    • @MiG82au
      @MiG82au 4 месяца назад +1

      @@OliverKr4ft The claim in the video is that the unique chip is for HBM2, which is arguably wrong and at least only half the story because the biggest difference is the huge count of FP64 execution engines.
      Whether games use FP64 or not is irrelevant to why the GP100 chip exists.

  • @BrinkGG
    @BrinkGG 5 месяцев назад +28

    I've been waiting for this one! Was holding out on buying a P40 or P100 until this came out. Thanks Jeff. :D

  • @DMS3TV
    @DMS3TV 4 месяца назад +1

    My big takeaway from this is just how impressive integrated graphics are now. These dedicated GPU's were once the bee's knees and now a 7840U can outpace them in games. Really cool times we live in!

  • @forsaken1776
    @forsaken1776 4 месяца назад +1

    I've watched many of these types of videos not to mention most of your other vids. what i'm not sure about is how your vm's are set up. Are your VMs just a vm of windows with the game(s) installed or is there a way to directly install the game in a vm without the overhead of windows or linux OS?

  • @Americancosworth
    @Americancosworth 5 месяцев назад +42

    Hurray! More good ideas for my poor life decisions (building a cloud gaming server)

  • @OMGPOKEMON47
    @OMGPOKEMON47 5 месяцев назад +1

    Was the P4 tested at PCIE x16 or x8? If I understand correctly, using 8 of the P4s (vs. 4 double slot cards) on this server would result in each slot running at x8 bandwidth.Not sure if that would make a difference in gaming performance 🤷‍♂

  • @novantha1
    @novantha1 4 месяца назад +3

    With regards to AI tests:
    It might be an unexpectedly sagacious decision to avoid jumping into it at the moment. We're at what is simultaneously a crossroads and an wild west, and I can only see it getting more crazy.
    In the simplest possible terms: Raw FP16 compute sort of doesn't lie. Given a sufficient quantity of it (and memory bandwidth to feed it), it's pretty straightforward to multiply two matrices. But there's a problem. TOPs.
    Dedicated TOPs don't operate on the same principle as FP16 compute (And I'm giving companies' marketing divisions the credit of assuming they're talking about tensor operations when they talk about TOPs which is not always true), so it can be hard to draw an equivalence between, for instance, the FP16 compute of a pascal card and the tensor performance (which is often the majority of the AI performance) of a modern Nvidia GPU, for instance...To say nothing of extended instruction sets in the X86, ARM, or Risc V space (I would love to start a youtube channel talking about those at some point; a lot of people misunderstand CPU AI performance, including Ampere, and now Intel's Sierra Forest marketing department).
    And then it gets even harder. Do you compare the memory access patterns or the raw performance? If you do the raw performance, a pascal GPU might hold up surprisingly well because in the end, FP16 and memory bandwidth will get you most of the way there. On the other hand, something like a CPU with VNNI extensions (Zen 4, and I think Intel's server P-cores, but not consumer) might actually perform more efficiently for its memory bandwidth in the sense that it can do lower precision AVX compute, and at a faster rate per unit of bandwidth thanks to fused instructions, but it might have a slower absolute rate of operation. Which one is better? Well, it depends on your use case.
    Plus, all of this is ignoring more exotic things like Tenstorrent's lineup (very sexy), or things like Hailo M.2 accelerators (very accessible).
    So when you add it all together...
    At what precision do you evaluate? Some accelerators (notably CPUs, NPUs, and accelerators) will perform at an outsized rate on lower precision, particularly integer operations like int8. Common high performance AI models are not trained with those precisions in mind, so there is an accuracy loss at those precisions (And some of those losses only show up experientially, and not on standard benchmarks). Is it fair to compare an accelerator with block floating point 8 to the full FP16 of another accelerator?
    How much customization is allowed to the pipeline? Is it fair to compare image generation on Nvidia and AMD using Automatic1111 webUI, when AMD is a second class citizen there? Do you compare Automatic1111 Nvidia to nodeshark AMD?
    How do you compare an accelerator with more RAM at a slower speed to one that has a fast speed but little RAM? Some people favor accuracy/quality, while some people favor responsiveness, and some people have crazy workflows that depend on huge amounts of generations from models whose quality almost doesn't matter. In this case, the one accelerator would just be better because it can run the higher quality model, but that might not be what everyone wants.
    Is the evaluation on training or inference?
    If training, with which framework?
    Are tensor cores used?
    Do you use "pure" primitives like ResNet, or off the shelf production grade models and pipelines?
    Do you measure at large batch sizes indicative of peak performance, similar to how we do CPU evaluations in gaming benchmarks, or do we test single user latency, which is reflective of end-user engagement with the product?
    Do you focus on objective, timeless evaluation, such as by looking at peak performance (people would have had a really bad time buying hardware for AI if they bought before quantization and flash attention changed the game pretty drastically), or do you take into account the current state and usability of the hardware (people would have had a really bad time buying an unsupported AMD GPU assuming that "oh, AI's a big deal, it'll all get supported eventually").
    Honestly, at the moment it's a bit of a mess, there's not really industry standards, every option you try to test at could have default settings or customizations which vary by hardware, making it potentially not fair, and there's just not a lot of collective industry wisdom on how to do it right.
    To be honest, I'm not sure why I typed this out, I'm not sure this is going to be terribly useful for anyone, lol.

    • @CraftComputing
      @CraftComputing  4 месяца назад +3

      LOL, I read it. I've done some research and talked to a number of colleagues about AI performance testing, and you summed up a number of points nicely.
      Every model is built a bit differently. Every GPU has their own strengths and weaknesses with its own hardware, configuration, available features, etc.
      My 2¢, oftentimes, orgs that want to run a specific AI model will purchase the hardware that model was built for.
      Me running generic benchmarks isn't really an accurate assessment of performance, as each model will take advantage of specific GPU architecture features. Like as you mentioned Int8, FP16, Tensor/RTX, etc performance in GPUs will wildly affect the speed of running a specific model, but that's really down to software selection of what you WANT to run, not the hardware you're running it on.
      It's a chicken and the egg, but with LLM and GPU. You choose one and it decides the other.

  • @BlitzFingers
    @BlitzFingers Месяц назад +1

    Looking forward to that first LLM review!

  • @renobodyrenobody
    @renobodyrenobody 27 дней назад +1

    Bought P40 for my research in AI. Never thought is was doable to play with it!

  • @edgecrush3r
    @edgecrush3r 5 месяцев назад +7

    I am running the p4 for almost one year 24x7 and absolutely love this card. I have more projects running on this thing now, my NAS doesnt classify as NAS anymore😅 more vGPU Emulation server, enjoying Mario Kart on many connected devices with the whole family 😂 its just so dang cheap now, its impossible to beat and great for inferencing llms (the p100 would be better due to faster memory).. i am now hoping the t4 will drop in price.

  • @DerrangedGadgeteer
    @DerrangedGadgeteer 5 месяцев назад +2

    I'm so glad you ran these benchmarks! I'm elbows deep building a multipurpose virtualization/AI server out of 2nd gen threadripper and p100's. It's good to know what to expect, and also that my expectations weren't way off base when I started.

  • @samthedev32
    @samthedev32 5 месяцев назад +7

    I have been waiting for this video for so long!
    I was planning to get a P4, and now I want it even more :)

  • @aaronburns2858
    @aaronburns2858 4 месяца назад +1

    Do you think lga3647 machines are relevant? I just ended up with a supermicro x10spm-tf and a Xeon gold 6232. I got it dirt cheap and curious if you think it’d be good enough to run a couple machines for the kids to play Minecraft and me to run a few other games(fallout,cyberpunk, hogwarts legacy. And mostly old titles)

  • @bigearsinc.7201
    @bigearsinc.7201 23 дня назад +1

    The Tesla P4 performs the exact same as the RX 580, at about half the wattage. Pretty insane.

  • @StevenWilliams-lb9tf
    @StevenWilliams-lb9tf 4 месяца назад +1

    Jeff, have you tried the rtx 4000, ive thought of getting one as it claims to be close to an rtx 2080 mobile on quadro wiki, but tech powerup claims its more a rx 6600. im thinking, do i save for the rtx 4000 or just get a p100. Single slot at 160w vs half the price at 250w thanks

    • @igordasunddas3377
      @igordasunddas3377 25 дней назад +1

      I can't give you an answer, but I know, that neither the A4000 RTX, nor the RTX 4000 Ada (SFF or not) support vGPU. You'd need to shell out for a A5000 or higher to do that.
      If you're not interested in vGPU, it probably depends on your workload really.

  • @DustinShort
    @DustinShort 5 месяцев назад +3

    I was really surprised by the P4. I may have to try it as an energy efficient VDI solution. At two VMs per GPU it should be more than enough powerful for light CAD work, but I bet you could squeeze 4 VMs if you aren't working with large assemblies.

  • @davidfarnham3548
    @davidfarnham3548 5 месяцев назад +17

    Really curious to see how a t4 performs vs a p4

    • @KiraSlith
      @KiraSlith 5 месяцев назад +5

      Ehhh... As I understand it, the P4 has two working NVENC engines on-die, but the T4 is a custom compute-targeted die from word go. You'll get more for your money from the P4 if you're using it for virtualization/transcode, especially since the P4 is still staying sub $130, where the T4 is hovering around $600 at the moment.
      If however, you're looking for FP16 compute specifically (like for AI tasks), the T4 is fast enough it competes with a 3090 while staying at 75w. It's a spectacular monster within that specific arena only, it's FP32 is pretty miserable for it's price however, which is what games make the most use of.

  • @k9man163
    @k9man163 4 месяца назад +1

    Would you be intrested in testing these cards for local LLM preformance? Im curious what impact the HBM2 memory will have over the DDR5.

  • @gustersongusterson4120
    @gustersongusterson4120 4 месяца назад +1

    Great video and I love the series! Though it would be a lot easier to visualize the data in bar graph form rather than just a matrix of values.

  • @dectoasd3644
    @dectoasd3644 5 месяцев назад +6

    1 minute in and I'm already excited with my 2 x P40

    • @Satori-Automotive
      @Satori-Automotive 5 месяцев назад

      how does a single one perform in rendering and editing compared to something like a 1080ti?

  • @criostasis
    @criostasis 3 месяца назад +1

    I designed and developed a RAG based LLM chatbot for my university using GPT4All, langchain and torchserve. Testing with my 16GB RTX 5000 laptop it performed on par with just my 13900K, producing answers with memory and chat context in about 40-50 seconds. On my server with an RTX 4080, it was blazing fast, answers came in about 5-10 seconds. Im sure a 4090 would be a bit faster but I didnt have one to test. Concurrency on a single gpu is where you can really hit a bottleneck. You have to setup a queue and locks but to handle it but with one gpu it gets slow. Thats why OpenAI and others have thousands and thousands of GPUs to handle to concurrent workloads. That and some magic code sauce I didnt get around to implementing in my time working on it before handing it off.

  • @m4nc1n1
    @m4nc1n1 4 месяца назад +2

    I still have my 1080TI on a shelf. Used it for years!

  • @smccrode
    @smccrode Месяц назад +1

    Market Update: Tesla P40s are around $300-350 now on Ebay.

  • @Prophes0r
    @Prophes0r 5 месяцев назад +4

    I know it is comparing completely different families, but I'm interested in comparing the P4 to an A380 in straight passthrough.
    The A380 can be had new for $120ish for the half-height cards. That's in the same ballpark.
    I know we will never get SR-IOV for the ARC cards (Except maybe the A770 with major hacks). But I do think it has the possibility of being really interesting.
    Plus, there are other comparisons to be made. The Nvidia cards likely have a gaming performance advantage, but QuickSync is SO much better than NVENC that it might make big differences when it comes to encoding those remote desktop video streams.

  • @daidaloscz
    @daidaloscz 4 месяца назад +1

    Would love to see how you set up sunshine+moonlight next. especially on headless systems, with no GPU output.

  • @ICanDoThatToo2
    @ICanDoThatToo2 5 месяцев назад +1

    I've been learning LLM on my R720, and found something interesting: My old 1050 Ti 4GB card runs AI about 3x faster than all 16 CPU cores together (2x E5-2667 v2 chips). While neither of those options are fast in absolute terms, and the RAM is very limiting, basically _any_ GPU is better for AI than CPU only.

  • @ShooterQ
    @ShooterQ 4 месяца назад +1

    Just threw a unused Tesla P4 8GB into my Frigate NVR for video decoding. Thing shoots all the way up to 105C and crashes. Added an 80mm Arctic P8 with some custom ducting and have it working along at 71C constant now. Doing great for the $100 pricetag.
    Dell Optiplex 3080 SFF, so it's the biggest I could fit and works well with the available power in that slim PSU.

  • @playeronthebeat
    @playeronthebeat 4 месяца назад +1

    Will you do one more video for Turing/Volta cards (essentiall 20xx Series), too, or are those still out of reach (budget wise etc)?
    Would be interesting to me if they're not too expensive.

    • @CraftComputing
      @CraftComputing  4 месяца назад +3

      Yep! I've got some V100 and A5000 GPUs lined up. Not sure if I'll cover Turing, as those are prohibitively expensive still.

    • @playeronthebeat
      @playeronthebeat 4 месяца назад +1

      @CraftComputing ah. That's unfortunate.
      Would still love to see it, honestly. The V100 doesn't seem too expensive on their own. Still, they'd definitely strecht the budget quite a bit going for ~€700€ here for the 16GB SXM2 and roughly 1k more for the 32GB SXM3.
      For someone like me toying with the idea of having at max one or two systems on there, it'd be quite cool. But eight systems (4 GPUs) could be a bit harsh regarding the price.

  • @Yuriel1981
    @Yuriel1981 5 месяцев назад +1

    I think the main problem with switching to an Epyc platform is finding a board that can accommodate the 8 gpus. The best and most affordable option I see on eBay is a 7551p and an asrock EPYCD8 board with 4 PCIE 3.0x16 and 3 PCIE 3.0x8 slots. Since the last slot is a x16 you could* (if you can find a case big enough, or modify one) us a P100 or P40 that suffers the double VM afflection. But, the newer platform may make up some of the difference. Ad postings around 450, not sure if that actually includes cpu though, most similar full Epyc boards with cpu and various RAM combos can range from 500-850$. Might be more doable than you think.

    • @CraftComputing
      @CraftComputing  5 месяцев назад +1

      There are a couple servers with a very similar design to the ESC4000 that accommodate either 4 or 8 GPUs. They're just insanely expensive.

  • @anthonyguerrero4612
    @anthonyguerrero4612 5 месяцев назад +2

    Wow, I wasn't expecting this, thank you for further experimenting. 😊

  • @win7best
    @win7best 4 месяца назад +1

    as someone who has owned a P100 and still owns a P40(24GB) i can say that the P4ß has the better expiriance, also the P100 only has 16GB and i dont think that the HB2 memory will save it.

  • @blendpinexus1416
    @blendpinexus1416 4 месяца назад +1

    got a 12gb 2060, am happy with it's performance and thought about getting tesla t4 gpus (the turing version of the p4) but the 12gb 3060 is also a runner up for that. similar efficiency too.

  • @TheRogueBro
    @TheRogueBro 5 месяцев назад +3

    Random power related question. If you were to run multiple P4's. If you turn off a VM does it "power down" the card?

    • @CraftComputing
      @CraftComputing  5 месяцев назад +3

      All of the GPUs have idle power draw, because they're still being used by the host. There is a host driver for monitoring and partitioning the GPU. The P40 and P100 were around 12-15W. The P4 was closer to 8-10W.

    • @igordasunddas3377
      @igordasunddas3377 25 дней назад

      ​@@CraftComputingI'd so love to know what power draw the T4 has, because I am possibly interested in one, but I've read somewhere, that it only supports P0 power state... Which would be a bummer.
      Great video though, Jeff! Many thanks!

  • @jamb312
    @jamb312 5 месяцев назад +1

    Have a couple of Quadro t400s for Plex and VM glad I got a P4 as it's been a powerhouse for running LLMs, recognize, etc.
    By the way, Epyc 7302 is what I'm running, and I love it other than being a little heater.
    Iron Horse Brewery has its main staples, like the quilter’s Irish Death, but they play with many others. I was up in their tap room last week, and the cookie death was only $3.

  • @pkt1213
    @pkt1213 5 месяцев назад +3

    I just put a P4 in my server. 4 transcodes were using ~25w. I may pick up a p40 or p100 if I want to run a AI locally.

    • @pkt1213
      @pkt1213 5 месяцев назад

      I also took the front plate off 5 sink and ziptied a 40mm noctua fan over the die. Haven't seen it much over 50C.

  • @TheAnoniemo
    @TheAnoniemo 4 месяца назад +1

    How were the temperatures on the P4? I know they have some very specific airflow requirements due to the small restrictive heatsink.

    • @CraftComputing
      @CraftComputing  4 месяца назад

      This server is specifically designed for passive GPUs. The P4 ran at ~45C. The P40 and P100 ran between 55-62C.

    • @TheAnoniemo
      @TheAnoniemo 4 месяца назад +1

      @@CraftComputing thanks for the reply, I was wondering because I know we had some issues at work when installing a single T4 and no other expansion cards. The perforated back of the chassis provided too little restriction so all the air just went around the T4 instead of being forced through. It would subsequently throttle like crazy...

  • @LinHolcomb
    @LinHolcomb 4 месяца назад +1

    I still love to see tokens /sec running a few mainstream LLMs. I run 2 P40 in a AMD 5950 64Gb ram, truely the processor is not used to its potential. Going to send you a pickle beer.

  • @tylereyman5290
    @tylereyman5290 5 месяцев назад +4

    I somehow managed to snag a p100 last month for $40 off ebay. That may have been the greatest deal i have ever scored.

  • @calebgrefe8922
    @calebgrefe8922 4 месяца назад +1

    I get so excited thinking a itx gaming build with the p4 = )

  • @Leetauren
    @Leetauren 5 месяцев назад +3

    ai benchmarks for home labs are relevant. Please, include some.

  • @KeoniAzuara
    @KeoniAzuara 5 месяцев назад +2

    Still rocking the M40 with 12Gb and the NZXT water cooling bracket

  • @hi-friaudioman
    @hi-friaudioman 4 месяца назад +1

    Oh baby, he's dropping the E5 v4's! we gaming now boys!

  • @xmine08
    @xmine08 4 месяца назад +2

    LLMs are, in my opinion, becoming a huge thing in homelabs. For everyone? No, but then, many homelabbers have maybe two raspberry pi's and yet videos like yours exist where you have full blown real server hardware (Albeit old and thus affordable). I appreciate your honesty however that you don't want to produce numbers that you don't feel qualified for!

  • @SpacelsFake
    @SpacelsFake 4 месяца назад +1

    Jeff is it possible to run dual 30 series Nvidia cards for stable diffusion machine learning?,
    I am currently using a 3060 12 GB, and I am waiting on a 3080 10 GB to come in the mail, is it possible to run them at the same time? ( I know you can't combine them and run them as one) But somehow making them both work for my desired use?
    Or is it just better to run it with the 3080 and just leave the 3060 for something else?
    edit: I have a 5950x, and I'm using a x570 crosshair 8 extreme motherboard, 64 GB of g skills Trident 3600 mega transfers, with a seasonic platinum 1000 watt power supply.. in a cooler master cosmos c700m case.

  • @pidojaspdpaidipashdisao572
    @pidojaspdpaidipashdisao572 4 месяца назад +1

    I always had only one question for you, why do you drink beers (or whatever that is) from a glass? Why not the bottle or the can in this case? I feel like a less of a man when I drink it out of the glass.

    • @CraftComputing
      @CraftComputing  4 месяца назад +1

      Glossing over the strange identity crisis you seem to be having, a glass let's you smell the beer far better than a can or bottle. Secondly, pouring a beer with a head brings out more flavors. A nucleated glass also helps refresh the head, making your beer more enjoyable longer.
      As for your latter comment, I think it's queer to let other's opinions of you define your identity. Next time you're at a bar, order that Cosmo you've always wanted.

    • @pidojaspdpaidipashdisao572
      @pidojaspdpaidipashdisao572 4 месяца назад

      @@CraftComputing Making a science of an orange juice that you drink, mfw. Nobody defines me, we all know who drinks out of a glass. What is Cosmo?

    • @CraftComputing
      @CraftComputing  4 месяца назад

      Who drinks out of a glass?

  • @SpoonHurler
    @SpoonHurler 5 месяцев назад +9

    I agree with you on benchmarking LLMs and AI (or advanced logic generation). Many benchmarks will also be irrelevant in a year (my opinion, not a fact) I wouldn't waste time making possibly bad results in such a chaotic environment unless I was very equipped to do so.
    I do think a video of playing around/ learning LLMs could be interesting though... With no comparative numbers, just a journey episode.

    • @CraftComputing
      @CraftComputing  5 месяцев назад +4

      Yeah, I did a couple videos on Stable Diffusion last year, where I explored running it in my homelab.

  • @rklauco
    @rklauco 4 месяца назад

    Maybe stupid question - when you calculated the price, did you include the license of Windows into it? I am not sure if my information is correct, but I thought you need special (and quite expensive) Windows 11 license to run it in VM. But it's possible I am wrong and there is some option to get it without the $100+ license...

    • @CraftComputing
      @CraftComputing  4 месяца назад

      When I'm running tests like this, I often run Windows without a license key. No sense purchasing a Windows license for a VM that won't exist in two months. For long-term deployment, grab an OEM license key. They're possible to snag for $10-15.

    • @rklauco
      @rklauco 4 месяца назад

      @@CraftComputing I thought these OEM keys are not in line with MS licensing and their license (while technically working) is not allowing you to virtualize the machine and should only run on bare metal. But again, not windows licensing expert.

  • @sjukfan
    @sjukfan 5 месяцев назад +2

    Hm... is there a x16 to x8/x8 splitter with externa power that can drive two 75W cards? Then you could run two P4s in a x16 😛

  • @thedeester100
    @thedeester100 4 месяца назад

    Been using a Quadro P4000 gpu for over a year. Was half the price of a 1080ti on e-bay at the time. I dont game a great deal anymore but its never failed at anything Ive thrown at it.

  • @buddybleeyes
    @buddybleeyes 5 месяцев назад +2

    Lets goo! Love this cloud gaming series 😄

  • @AdamKemp-k1w
    @AdamKemp-k1w 22 дня назад +1

    Perhaps consider a slightly newer platform for testing. I recently picked up a Lenovo ThinkStation P920. With dual Xeon Gold 6138s, 256GB DDR4-2666 ECC Registered DRAM, an nVidia RTX 2080, 500GB and 1TB SSDs, and an Intel 82599 10 Gbps fiber-optic NIC, my total investment was about $1,200. Another similar system is the Dell Precision T7920.

  • @Majesticwalker77
    @Majesticwalker77 4 месяца назад +1

    Thanks for keeping the info within your knowledge, I definitely appreciate it.

  • @KomradeMikhail
    @KomradeMikhail 5 месяцев назад +4

    I run into significant app crashes and issues when using an HBM2 graphics card through PCIe Passthrough to a VM.
    Most noticably with KiCAD, and Deep Rock Galactic. They run fine on the same hardware bare-metal.
    First encountered with a Radeon VII, then tested a Titan V to compare. Same results for team red and team green.
    Tested on a Broadwell Xeon workstation, slimmed down from what Jeff runs in this video.
    Anybody else have issues passing through HBM2 ?

    • @CraftComputing
      @CraftComputing  5 месяцев назад +4

      I've had no issues at all. I've done testing on the P100 and V100, and haven't had any problems.

    • @OliverKr4ft
      @OliverKr4ft 5 месяцев назад +3

      Have you stress tested the cards on bare metal? The memory type should not have any effect on stability when passed through

  • @mrsrhardy
    @mrsrhardy 4 месяца назад

    The cards dont have video-out, so you need onboard GPU (say intels) so how do you get windowsOS 10/11 to use the gpu for gaming (assuming steam)? I know intelsQsync is good but in apps like DaVinci Resolve can the nVidia-GPU alternative be selected? I ask becuase I know you use these often in VM enviroments and do passthrough for HW/GPU support so obviously its slectable from a Lev1 Hypervisor but what about plebs like us mere mortals with a ssf-desktop with intergrated intel graphics, is the P4 a nice affordable boost or more trouble than its worth?

  • @clintsuperhero
    @clintsuperhero 3 месяца назад

    I had seen titan XP's for cheap($180), i bought one out of childhood dreams. Though the performance ive gotten from it has been great over the 1080 i used to have, less stuttering in some games and overall better for 1440p like my two main screens are.

  • @gabrielramirezorihuela6935
    @gabrielramirezorihuela6935 3 месяца назад +1

    The tiny Tesla is hilarious.

  • @zr0dfx
    @zr0dfx 4 месяца назад

    I’d like to see an update on the we home server you did in that pre jonsbo style case! I made a very similar build but used LSI 9300i and 10gb m.2 adapter with TrueNAS scale (could not get pcie pass through to work either)

  • @SpacelsFake
    @SpacelsFake 4 месяца назад

    Is a used second gen threadripper, good for machine learning, ai? I was considering hooking up a system with one, or trying to get an epyc Cpu. Are those CPUs any better than just regular ryzen, for that intended purposes?

  • @ronaldvanSluijs
    @ronaldvanSluijs 4 месяца назад

    I have a Dell r730 with a recently bought GRID K2 card in it and have been struggling for ever with it. I have it recognized in proxmox and on my windows server 2019 vm. It is also showing up in plex as a transcoder option. But some how plex does not seem to wanna use the video card and transcodes with the CPU instead. I see you have allot of experience with this, did you find a solution to this with your previous build?

  • @carbongrip2108
    @carbongrip2108 5 месяцев назад +2

    How did a single Volta GPU perform when running 2x VM’s? We know you tested it 😉

  • @Sunlight91
    @Sunlight91 5 месяцев назад

    From what I've heard machine learning is best at FP16 to half the memory requirements and speed up computation. Some even do it in INT8. This means old architectures are not recommend, particularly pre Turing.

  • @michaelstowe3675
    @michaelstowe3675 4 месяца назад +3

    Good choice on the beer! Local to me!

    • @CraftComputing
      @CraftComputing  4 месяца назад +1

      Quilters Irish Death is one of my top 20 beers. So good!!

  • @MiG82au
    @MiG82au 4 месяца назад

    Surely there's a mistake in the Fire Strike results? The P100 x2 and P4 physics and combined scores are higher than P40 and single VM P100.

  • @HPTRUE
    @HPTRUE Месяц назад +1

    Great video!

  • @montecorbit8280
    @montecorbit8280 4 месяца назад +1

    At 25:55
    "....Better at 50 degrees Fahrenheit then 35 degrees Fahrenheit...."
    I remember reading somewhere that the optimum temperature for beer to be served was 40 degrees Fahrenheit....anything colder and you will "freeze out" the flavor. That information comes from a time before "artesian brews" were a thing, though.
    I take it this is no longer correct....or was it ever correct??

    • @CraftComputing
      @CraftComputing  4 месяца назад +1

      That's a very generic statement. Different flavors are better and worse depending on temperature. I enjoy IPAs starting at 35F, and letting them warm up to 50F while drinking, as you get a whole range of flavor and experience.
      Stouts and other dark beers are typically much better starting at 45F and letting them warm even up to room temp.
      Domestic Lagers and Pilsners, well, they're advertised ice cold because they're absolute garbage above 40F 😂

    • @montecorbit8280
      @montecorbit8280 4 месяца назад

      @@CraftComputing
      I have never particularly light beer, so I was curious. Thank you!!

  • @matthewsan4594
    @matthewsan4594 4 месяца назад

    as people may use the cards for other things like: video editing, conversion, and animation. could you please do that sort of testing as well??

  • @chadbotting8425
    @chadbotting8425 2 дня назад

    Will this work with VMWARE ESXi rather than proxmox? I have a Asus Z10PE-D8 WS with 128GB of ram, the current video card is NVIDIA 1070ti and running Win11, but I was thinking of replacing Win11 with VMWARE EXI and then use NVIDIA Tesla P100.

  • @KHITTutorials
    @KHITTutorials 4 месяца назад

    they have less cores, but the E5-2687Wv4 come quite close to desktop gaming chips. Most likely it will help with the "bottleneck", but will impact how many machines you can run. But would be interesting to see what improvements come from it

  • @SoftwareRat
    @SoftwareRat 4 месяца назад

    Old GeForce NOW instances used the Tesla P40 shared between two instances

  • @mastermoarman
    @mastermoarman 4 месяца назад

    I wonder how well the three works with transcoding for plex/jellyfin and running project code ai for security camera image recognition

  • @ccleorina
    @ccleorina 5 месяцев назад +1

    I've been waiting for P100 or P40 setup and guild vgpu. since I stil cant get it run with proxmox 7 or 8. Still wait for new vgpu guide.

    • @insu_na
      @insu_na 4 месяца назад

      What problems are you experiencing? I've been running proxmox with p100 vgpus for a year and p40 vgpus for months

  • @spicyandoriginal280
    @spicyandoriginal280 5 месяцев назад

    I know that you can’t test everything but I would love to know if 5C/10T makes a noticeable improvement? It opens up the possibility of a 6 x P4 system with dual 16 core Xeon (2.6 GHz Base Clock).

  • @cgrosbeck
    @cgrosbeck 3 месяца назад

    Do you have a how to setup with your hardware. Specifically OP system drivers network to terminals like raspberry pies

  • @frankenstein3163
    @frankenstein3163 4 месяца назад

    Littel off subject. How do you send the cloud gaming around 200 ft ?

  • @Agent_Clark
    @Agent_Clark 4 месяца назад

    Where and how might I get more information on a server like this. I'm interested is building one but only have experience with mostly consumer hardware.

  • @al.waliiid
    @al.waliiid 4 месяца назад +1

    what about rendering and 3d and montage time and smoth and adobe premiere pro

  • @kenzieduckmoo
    @kenzieduckmoo 4 месяца назад +1

    I support your new channel Cookie Computing

    • @CraftComputing
      @CraftComputing  4 месяца назад +1

      Today's show is brought to you by the letter "C"

  • @drakkon_sol
    @drakkon_sol 5 месяцев назад

    I have my P4 sitting in my PE-T110-II, as my decoder for Plex.
    (My PE-T110-II is my NAS, plex, MC, BeamNG server. Total cost for this 32tb server: $200 CAD)

  • @logan_kes
    @logan_kes 5 месяцев назад

    I just got a pair of p40’s in last week and have begun benchmarking them on my Dell 14g servers running skylake and cascade lake xeons, I might throw them in my old 13g with broadwell xeons to see if the performance of scalable makes a noticeable jump for the massive price increase of the platform in a “cloud gaming” situation

  • @cyklondx
    @cyklondx 4 месяца назад

    You should disable ecc vram on either of those cards; on p100 with ecc enabled it suffers some 30% of performance.

  • @spotopolis
    @spotopolis 4 месяца назад

    With how old the P4 is at this point, how would an Intel Arc A310 stack up to it? Its half the VRAM, but its clock speeds are double that of the P4. Do you think the lower powered card with newer architecture would have a chance?

    • @CraftComputing
      @CraftComputing  4 месяца назад

      Oof... The A310 and A380 don't hold up well for rasterization performance. They absolutely win when it comes to video encode/decode though. Depending on your needs, they're a solid option.

  • @JoshWolabaugh
    @JoshWolabaugh 5 месяцев назад

    I might have to drop a P4 in my dell r720 and give it a go. thanks jeff

  • @DamonKwong
    @DamonKwong День назад

    how do these compare to RTX cards on fps performance?

  • @blehbop4268
    @blehbop4268 5 месяцев назад

    Would you be able to test your store of GPUs, both gaming and professional, with BOINC GPU tasks with power consumption and production in mind?

  • @DanielPersson
    @DanielPersson 5 месяцев назад

    I have benchmarks for newer cards. If you want to Collab on a video about AI inference or training I could help out.

  • @haylspa
    @haylspa 4 месяца назад +1

    can you put tesla p40's or P10's in SLI with a Titan XP or X ??? this is a question I have because I am building a Godlike MSI x99 platform

    • @CraftComputing
      @CraftComputing  4 месяца назад +2

      No

    • @haylspa
      @haylspa 4 месяца назад

      @@CraftComputing Thank you! have a blessed day!!

    • @VinnyG919
      @VinnyG919 4 месяца назад

      you may be able with different sli auto

  • @bobylapointe-l4r
    @bobylapointe-l4r 4 месяца назад

    I used a P4 o Proxmox for Ai VM. Just good to "build" your VM which is a very long journey. Getting all lib drivers and venv. Once done I quickly understood self hosted Ai is all about trial and errors and waiting for the P4 became painful. Also very very important: vGPU builds are ok for gaming but NOT for Ai. It's nearly impossible to get cuda working while using vGPU. At least not with these homemade setups. Without saying Ai is all about vram and vgpu vram split as direct dramatic impact. I ended up removing all vGPU setup and sticked to PCIe passthrough, that was the only way to have multi-purpose home server for both gaming VM and Ai VM.

    • @VinnyG919
      @VinnyG919 4 месяца назад

      exllama runs fine on vgpu here less than 10% overhead loss

  • @mrsittingmongoose
    @mrsittingmongoose 4 месяца назад

    Is the stuttering in every single game just the video? Or are they actually that stuttery?

  • @masoudakbarzadeh8393
    @masoudakbarzadeh8393 3 месяца назад

    I buy tesla k80 , and rx580 , can i use the same time power?

  • @robe_p3857
    @robe_p3857 4 месяца назад

    Looking forward to AI benchmarks. Trying to decide whether to be creative or just grab a 5090.

  • @elpanaqute
    @elpanaqute 4 месяца назад

    are you still using the v14.0 nVidia vGPU Drivers like it says on your text file on google drive?
    Bc i'm having trouble with this configuration:
    P40
    proxmox 8.2 (kernel 6.8.8)
    Linux nvidia driver 17.1 (550.54.16) patched and xml replaced from the 16.5 (as it says on the polloloco manual)
    mdevctl profile 52 (12Q)
    to this point everything fine, the problem:
    on the windows vm, the only driver that works its the rtx/quadro 552.55 and it's limiting to 15fps after 20 minutes.
    what am I doing wrong?

    • @CraftComputing
      @CraftComputing  4 месяца назад +1

      No, I'm using 16.4. Different versions of the GRID drivers will only compile on specific kernel versions.
      Check out the link in the description to the Proxmox vGPU Install script. It'll set up everything automatically including drivers.

    • @elpanaqute
      @elpanaqute 4 месяца назад

      ​@@CraftComputing
      For some reason the first time I tried the script, two days ago.. was a complete failure.
      But now tried again on a fresh install of pve 8.2 and went flawlessly.
      Thank you so much.

  • @cyklondx
    @cyklondx 4 месяца назад

    Hi, disable ecc memory on P100

  • @michaelwillman5342
    @michaelwillman5342 4 месяца назад

    You dont account for the x8 vs x16 slot for the P4s, you need to run that card at 8x if you have 8 of them but it could bottleneck it (or not?)

    • @CraftComputing
      @CraftComputing  4 месяца назад

      I was running the P4 on an x8 slot. And trust me, there's more than enough bandwidth on an x8 for that GPU.

    • @michaelwillman5342
      @michaelwillman5342 4 месяца назад

      @@CraftComputing 1 in 1 slot is not the same as 8 in 8 slots.

    • @CraftComputing
      @CraftComputing  4 месяца назад

      The (4) x8 slots on each side of the server are directly wired to each of the two CPUs. It's not a shared bus or PLX-Split lanes. Every slot is dedicated x8.
      So yes, there is plenty of bandwidth.

    • @protator
      @protator 4 месяца назад

      @@michaelwillman5342 That server has 80! pcie lanes from the cpus alone, plus chipset. Where do you see the risk for a bottleneck in this setup?
      With server and workstation class cpus you don't have to worry about bandwidth limitations like on gaming/consumer platforms where they spread 16 or 20 lanes over the entire board via bridge chips.
      I run a similar setup with two E5-2696v3s, with one card running full x16 and 6 accelerators at x8. Makes no difference whether I put load on a single component or decide to go full bore and draw 1200W ... as long as the chosen cpus can keep up in terms of performance per core/thread, such a setup works fine.

  • @yokunz1837
    @yokunz1837 5 месяцев назад

    Can i run Tesla m40 on bluestack or ldplayer?

  • @ewenchan1239
    @ewenchan1239 4 месяца назад

    There isn't a standard way of benchmarking GPUs for AI that's meaningful for homelabbers.
    You can run the HumanEval benchmark for example, but the score is practically meaningless (as it is use moreso for benchmarking the MODELS rather than the hardware that said model runs on).

  • @AwSomeNESSS
    @AwSomeNESSS 5 месяцев назад +1

    Now I’m wondering how these run on Chinese X99 with Turbo Boost Unlock on Xeon V3 CPUs. E.g., a 2699 V3 runs at 3.2-3.4ghz TBU full load, 18c/36t would run at 4c/8t x4 equivalent machines with 2C/4t to spare for the bare metal OS, with 128GB = 28GB for the VMs and 16GB for the bare metal. Could have the base system up and running for ~$450ish + the cost of GPUs.

    • @CraftComputing
      @CraftComputing  5 месяцев назад +1

      ruclips.net/video/ngW_FI4PPZk/видео.html

    • @AwSomeNESSS
      @AwSomeNESSS 5 месяцев назад +1

      @@CraftComputing Man, that’s quite a throwback! Peak of when you were reviewing Chinese parts every few videos.
      Hopefully Turing comes down in price in the next couple of years, would be interesting doing a revisit with more GPU grunt down the road. Top-end Tesla/Quadro Turing is still ~$2000CAD.

    • @CraftComputing
      @CraftComputing  5 месяцев назад +1

      No idea why Turing GPUs are still so expensive. You can snag an A5000 for less than $1200 for 2x the performance of an RTX 6000.

    • @AwSomeNESSS
      @AwSomeNESSS 5 месяцев назад +1

      @@CraftComputing that is weird. Must be connected to contract pricing or the like (e.g., not enough Turing supply has hit the market yet). Probably can expect it to bottom out as more companies move to Lovelace/Hopper/Blackwell. A single A5000 + Chinese X99 V3 setup would be an interesting proposition for an all-in-one server: 2 8c/16T VMs with 1/2 an A5000 + a 2C/4T promox server. Add-in a cheap A310 for Plex and you’d have a decent home lab setup started.

    • @CraftComputing
      @CraftComputing  5 месяцев назад +3

      I've got a pair of A5000s, and you'll be seeing them shortly here on the channel ;-)

  • @LA-MJ
    @LA-MJ 5 месяцев назад

    Where the heck do you find those prices?!

    • @marcogenovesi8570
      @marcogenovesi8570 4 месяца назад

      ebay, often from china. They are sitting on crates of those cards in china

  • @TheRogueBro
    @TheRogueBro 5 месяцев назад

    Could the lows also be due to RAM speed? I see that being a big game changer more often then not.

    • @CraftComputing
      @CraftComputing  5 месяцев назад +1

      No. 8 channels of DDR4-2400 is NOT slow.

    • @TheRogueBro
      @TheRogueBro 5 месяцев назад +1

      @@CraftComputing I forgot about the 8 channel part lol.

  • @Adam130694
    @Adam130694 3 месяца назад

    Just put in there two 2696v3 (for $50-70 a piece), unlock them and have ~3.6GHz clocked CPUs, with the same 72 threads?

    • @CraftComputing
      @CraftComputing  3 месяца назад

      The unlock is still power limited. Under full load, the CPUs would still likely struggle to hit 2.8GHz or higher.

    • @Adam130694
      @Adam130694 3 месяца назад

      @@CraftComputing I’ve seen them hitting 3.5-3.6 in games quite frequently… but you being someone with higher experience I believe you tested that. Good job anyways and as always!