Dell PowerEdge R720XD GPU Upgrade: Installing Tesla P40 with NVIDIA Drivers

Поделиться
HTML-код
  • Опубликовано: 21 окт 2024
  • Learn how to turbocharge your Dell PowerEdge R720XD server with a Tesla P40 GPU in this step-by-step installation guide. We cover everything, from the necessary hardware components to loading NVIDIA drivers for seamless performance. Whether you're a tech enthusiast or an data science/IT professional, this tutorial ensures a smooth and optimized GPU installation process. Harness the power of your server with the Tesla P40 and enhance your computing capabilities. Subscribe for more tech upgrade tutorials and stay ahead in the world of AI/ML/DL optimization.
    📚 Additional Resources:
    AI/ML/DL GPU Buying Guide 2023: Get the Most AI Power for Your Budget
    • AI/ML/DL GPU Buying Gu...
    AI/ML/DL with the Dell PowerEdge R720 Server - Energy, Heat, and Noise Considerations
    • AI/ML/DL with the Dell...
    Throttle No More: My Strategy for GPU Cooling in Dell PowerEdge
    • Throttle No More: My S...
    Dell PowerEdge R720 GPU Deep Learning Upgrade: Installing Dual Tesla P40s with NVIDIA Drivers
    • Dell PowerEdge R720 GP...
    Installing Tesla P100 GPU on Dell PowerEdge R720 Server with Driver Installation
    • Installing Tesla P100 ...
    Installing DUAL Tesla P100 GPU on Dell PowerEdge R720 Server with Driver Installation
    • Installing DUAL Tesla ...
    Other RUclips Video That Describes Cabling Issues In More Detail
    • Discussing power cavea...
    Links to Parts I Used
    BETTER CABLING OPTION: a.co/d/hccc8m8
    www.amazon.com...
    www.amazon.com...
    HOW TO GET IN TOUCH WITH ME 👋
    For the most up-to-date contact details, please visit my RUclips bio. Open to any and all inquiries, collaborations, questions, or feel free just to say hello! Thanks for your interest!
    HOW TO SUPPORT MY CHANNEL 🙏
    If you found this content useful, please consider buying me a coffee at the link below. This goes a long way in helping me through grad school and allows me to continue making the best content possible.
    Buy Me a Coffee
    www.buymeacoff...
    As a cryptocurrency enthusiast, I warmly welcome donations in crypto. If you're inclined to support my work this way, please feel free to use the following addresses:
    Bitcoin (BTC) Address: bc1q3hh904l4uttmge6p58kjhrw4v9clnc6ec0jns7
    Ethereum (ETH) Address: 0x733471ED0A46a317A10bf5ea71b399151A4bd6BE
    Should you prefer to donate in a cryptocurrency other than Bitcoin or Ethereum, please don't hesitate to reach out, and I'll provide you with the appropriate wallet address.
    Thanks for your support!

Комментарии • 64

  • @swiftlabbuildstuff
    @swiftlabbuildstuff 10 месяцев назад +4

    Wanted to show my thanks here for the video. I'd all but given up on trying to get GPUs into my R720s. I have a bunch of them, as they are very cheap on ebay and elsewhere. I picked up one of these Tesla P40 cards for $175 and installed it exactly as your video shows. Ubuntu 22.04 installed the drivers in a snap, a quick reboot and nvidia-smi showed everything was up and running. I immediately started training an AI model for text to speech (using piper) and found the performance to be pretty good. I challenge anyone reading this to show me a GPU this powerful for $175. I was previously training on an RTX 3060 with 6GB of vRAM. The 24GB of vRAM in the P40 allowed me to increase the batch size and also allowed the longer voice samples to fit into the training. I was so impressed that I bought two more P40s for the other R720s in the rack. All in with cables and such, it's costing me over $550 but again, I don't think you can get this much GPU compute in R720 for $550 any other way. Thank you for the video and the confidence to pull the trigger.

    • @TheDataDaddi
      @TheDataDaddi  10 месяцев назад +1

      This is amazing feedback. Thank you so much for sharing your experience. This is exactly how I felt as well. I wanted the most performance for the money, and I truly don't believe you can do better any other way at this current point in time. Only thing I will say here is the the R720s are EOL for Dell support, but I don't really see this being an issue for home lab use. I have as also been really impressed with the performance. The extra VRAM makes a huge difference for me in my work. It took me a while actually to build up the confidence myself to try this because I could not find any videos on this particular area. Anyway, so glad this helped you and wish you all the best on your projects!

    • @swiftlabbuildstuff
      @swiftlabbuildstuff 10 месяцев назад +1

      @@TheDataDaddi I didn't mention that I was also able to install the NVIDIA container runtime for docker like a champ. After all the issues I've had trying to do GPU passthrough for VMs, I was braced for hours of tinkering around. I was wrong, the NVIDIA container runtime installed as easy as can be. No reboot of the host needed. Within 5 minutes, I had docker containers running and using the GPU just fine. handbrake-cli GPU encoding a stack of DVDs specifically. now I'm considering containerizing my plex server and re-locating it this server. I know containers aren't the right fit for everything, but I can make good use of them. If you want some inspiration for a future video, may I recommend installing proxmox on that R720 and passing through the P40 to a VM. I'm sure folks would appreciate it. I know I would.

    • @TheDataDaddi
      @TheDataDaddi  10 месяцев назад

      Gotcha. Yeah I pretty much run just about everything I do in various docker containers. I have not used traditional VMs for awhile now. I also do not have a ton of experience with Proxmox, but I have actually been interested in toying around with it. This is a great video idea. I will definitely give it a go when I have some time! Thanks again. @@swiftlabbuildstuff

  • @bowlochili
    @bowlochili 9 месяцев назад +2

    Just got a R720XD this stuff will help! Keep up the great work!

    • @TheDataDaddi
      @TheDataDaddi  9 месяцев назад

      Awesome man! That is super exciting. Let me know if there is anything specific you would like to see or you need help getting something setup.

  • @minime9400
    @minime9400 9 месяцев назад +2

    Thanks for the vid!
    I just bought a P40 and is assembling a pc to house it. Good thing™ I watched this vid and read the comments before hooking it up. The PSU is the modular type, I hope it have 12V RPS or I'll have to decide whether to order these cables or rewire a cable.

    • @TheDataDaddi
      @TheDataDaddi  9 месяцев назад +1

      Hi there. So glad you found this video helpful! Please let me know how it works out for you.

    • @minime9400
      @minime9400 9 месяцев назад +1

      I found one cable that pin wise is right, but the connector(s, two ATX 12V or combined a EPS12V) have these latches that needs to be trimmed to fit the P40. So time for some Dremel 🙃To be cont'd

    • @minime9400
      @minime9400 9 месяцев назад +1

      Btw, you really had to cram those cables in place, in your opinion - did it hamper air flow?

    • @TheDataDaddi
      @TheDataDaddi  9 месяцев назад

      I am sure it probably did. The ideal solution would be to make you own cable. However, I was trying to see if I could use readily available parts. Not every one is willing/able to make their own parts. It was much tighter fitting than I would have liked it to be for sure though. Unfortunately, I could not really find a better way to do it short of making something custom. @@minime9400

    • @TheDataDaddi
      @TheDataDaddi  9 месяцев назад

      Oh nice! Please let me know how that works out for you. I am always looking for better solutions. Best of luck!@@minime9400

  • @Demoxx1
    @Demoxx1 Месяц назад +1

    What dual power supplies should you be running with these gpus? Dual 750 watts enough?

    • @TheDataDaddi
      @TheDataDaddi  Месяц назад

      Hi there! Thanks so much for the question.
      Dual 750 W PSUs should be plenty for system.

    • @Demoxx1
      @Demoxx1 Месяц назад

      @@TheDataDaddi Cool, thanks!

  • @The-Weekend-Warrior
    @The-Weekend-Warrior 10 дней назад +1

    I would have routed the power cable UNDER the GPU rather than block off airflow to it. Wouldn't there be enough space under it to route the cable?

    • @TheDataDaddi
      @TheDataDaddi  3 дня назад

      There might be enough space... but I was more worried about hurting some of the sensitive components under the GPUs. I would recommend using the alternative cabling option that should be list in the video description.

  • @IntenseGrid
    @IntenseGrid 9 месяцев назад +1

    Would be interesting also to walk us through starting from Linux install to models run. I'm not sure where to start.

    • @TheDataDaddi
      @TheDataDaddi  9 месяцев назад

      So I actually have a whole series on server setup.
      ruclips.net/video/Cw5za9DtaVo/видео.html
      This should help you get the server setup for use.
      This current video should show you how to install the GPU and drivers.
      I will also do a video on how to run some basic models once everything is setup.
      Next time I setup a server I will try to do a video on the whole process from start to finish.
      Anyway, I hope this helps for now!

    • @IntenseGrid
      @IntenseGrid 9 месяцев назад +1

      @@TheDataDaddi I'm more interested in how to get started with AI. I've got somewhere around 200 Xeon cores running currently...but don't know where to start with AI. Can I start with my Xeons without getting GPU's while I figure out what I actually need?

    • @TheDataDaddi
      @TheDataDaddi  9 месяцев назад

      Yeah absolutely. I started using only my CPU for financial reasons. lol. This will be much slower that with even one small GPU, but it is certainly possible. Also, most open source code code for AI/ML/DL assumes that it will be running on a machine with at least one GPU so you may have to tweak some of the code you use to work only with your GPU. If you want to reach out to me at skingutube22@gmail.com with your current hardware specs and project goals, I can try to help you more precisely to find what you need.@@IntenseGrid

  • @TuMusicaTV
    @TuMusicaTV 3 месяца назад +2

    I cant find the custome cable guy's link :(

    • @TheDataDaddi
      @TheDataDaddi  3 месяца назад

      Hi there. Thanks for letting me know!
      Here is the link to the cable that has worked best for me:
      a.co/d/0317ir6Y

  • @ieneseliulian9294
    @ieneseliulian9294 8 месяцев назад +1

    Hey , I have the same server in I installed a p2000 , but the unraid doesn’t detect it , I know that you use Linux in this case , but did you changed some setting in bios so that the SO can detected id ?

    • @TheDataDaddi
      @TheDataDaddi  8 месяцев назад +1

      So I didn't have to change anything in BIOS from what I remember.
      Here are somethings you can try though:
      1) The first step is to verify that your server's BIOS is configured to recognize and enable the GPU. Depending on your server model, you might need to:
      - Enable Above 4G Decoding: This setting is crucial for systems with multiple PCIe devices, like GPUs, and can help in their detection.
      - Check PCIe Slot Configuration: Ensure that the PCIe slot used for the P2000 is set up correctly in the BIOS. Some servers allow you to configure the slot's bandwidth or enable/disable slots.
      - Update BIOS: If your BIOS is outdated, it may not properly support newer hardware. Check for any available updates from your server's manufacturer that could improve compatibility.
      2) In Unraid, the detection and utilization of GPUs can sometimes require additional steps:
      - NVIDIA Driver Plugin: If you haven't already, install the NVIDIA Driver plugin for Unraid. This plugin is essential for Unraid to properly utilize NVIDIA GPUs.
      - Tools and Drivers: Ensure that you have the correct drivers and tools installed for your P2000. The NVIDIA Driver plugin should handle this, but it's good to verify that it's installed and updated.
      3) Since Unraid is Linux-based, you can use command-line tools to check if the system recognizes the GPU. SSH into your Unraid server and use commands like lspci to see if the P2000 is listed as a PCIe device. If it's listed, it's a good sign that the hardware is recognized at the system level, and any issues are more likely related to software or drivers.
      4) Ensure the GPU is properly seated in the PCIe slot and that any required power connectors are securely attached. It might also be worth testing the GPU in another system to rule out any hardware issues.
      Hope this helps!

  • @SuccessDynamics
    @SuccessDynamics 6 месяцев назад

    Thank you very much for the idea 💡 my R730 and 2x P40s are UP and running. I see, I need liquid cooling, because its very nosy 😂

    • @TheDataDaddi
      @TheDataDaddi  6 месяцев назад

      Hi there. Thank for the comment.
      Liquid cooling is certainly an option and depending on the environment where the server lives this may be necessary. I have a video specifically on cooling the GPUs. I would advise you to watch it before you go that route. I created a python script to adjust the fans based on GPU temperature. It seems to work pretty well to keep the GPUs from throttling and might save you some money and time installing liquid cooling.
      ruclips.net/video/RUW3Ay5rsCY/видео.html
      Let me know if you have any questions!

  • @strangegriffin
    @strangegriffin 22 дня назад +1

    I have the same server. How did you get around the fact the r720xd doesn't support video cards? I have tried a 1080ti and a 1660ti and neither of them give me a video output or work.

    • @TheDataDaddi
      @TheDataDaddi  3 дня назад

      Hi there! Thanks for the question.
      That is interesting. In my case, I am not actually using the video cards for video output. I am only using them for machine learning applications. I also did not do anything special to get them to work. I simply installed them, downloaded the nvidia drivers and they worked. I am thinking either those cards are not compatible with this server or the cards themselves may be bad. Does the machine recognize them at all? Have you tried something like:
      lspci | grep -i vga
      Try that and lmk if the machine is able to recognize them at all and we can go from there.

    • @strangegriffin
      @strangegriffin 3 дня назад +1

      @@TheDataDaddi Yes I have done that The machine recognizes the cards and the cards are good because I put them in another machine and they work just fine maybe it's the fact that these cards have video output and are designed differently on the other ones I will have to order some different cards and try again. Thank you for your reply

    • @TheDataDaddi
      @TheDataDaddi  3 дня назад

      @@strangegriffin Gotcha. Yeah. Sorry I can be of more help here. Unfortunately, I have never worked with those cards directly so I can't say anything for sure. What I can tell you is that servers can be extremely particular at time as far as what hardware they will support. My gut feeling is that the R720 just does not support those cards. A couple other possibilities. You may need to try and older Nvidia driver for them to be recognized. Also, it is possible that there may be some setting in the BIOS that are incorrect and keeping them from being properly utilized. Its strange though because there GPUs are relatively new so I would have thought they would have been supported initially.

  • @MattJonesYT
    @MattJonesYT 8 месяцев назад +3

    At ruclips.net/video/lX2718vURCU/видео.html the cabling is blocking all the airflow to the input for the cards ducts. It will run hot

    • @TheDataDaddi
      @TheDataDaddi  8 месяцев назад +2

      Yep, it does run a little hotter because of this. However, I have sense found a better solution and fixed it. I just have not had a chance to make an update video. Here is the link to the better cabling:
      www.amazon.com/dp/B08N4BJL2J?psc=1&ref=ppx_yo2ov_dt_b_product_details

  • @IntenseGrid
    @IntenseGrid 9 месяцев назад

    Did you ever show us the NVLink, and how that compares for training on say your 2 P40's compared to an A100 40GB? How about 2 P100 (with 16GB of HBM2 ea) compared to A100 40GB?

    • @TheDataDaddi
      @TheDataDaddi  9 месяцев назад

      So I have not yet because I have not been able to find out which exact version of NVLink or NVSwitch will work with the p100 or p40.
      These are NVIDIA's official docs for it, and it says that p100 is supported for first gen NVLink then it says later in the doc that only Volta architecture is supported...
      www.nvidia.com/en-us/data-center/nvlink/
      Anyway, I have been trying to figure that out a bit better before I buy anything. However, that video and several other benchmarking type videos are definitely on my radar.

  • @MrZoidbergg
    @MrZoidbergg 9 месяцев назад +1

    what PSUs you have in R720?

    • @TheDataDaddi
      @TheDataDaddi  9 месяцев назад

      Hi there! Thank you for you question.
      PSU: Dell 1100 Watts 100-200V Power Supply
      MPN: YT39Y
      Hope this helps!

  • @caizza3
    @caizza3 11 месяцев назад

    I have a 720xd and was thinking of doing the same thing. Would you recommend picking up this GPU?

    • @TheDataDaddi
      @TheDataDaddi  11 месяцев назад +1

      I would definitely recommend it. I really wish I would have been able to fit in 2. I think long term 24GB of VRAM my be limiting for some of my projects. However, for the money you cant beat it.

    • @ddrci88
      @ddrci88 11 месяцев назад

      Any possibility to make NVlink or something with them for llm? Thanks

    • @TheDataDaddi
      @TheDataDaddi  11 месяцев назад +1

      @@ddrci88 Hi there! Great question. To the best of my knowledge, the P40 does not support nvlink or SLI. Although, I have never tested it myself so I can't be sure. I have read that the P100 supports the first generation of nvlink though. However, it is only going to get you to 32 GB of VRAM unfortunately. Might have to see about trying to find the first gen nvlink and test it out.

    • @swiftlabbuildstuff
      @swiftlabbuildstuff 10 месяцев назад +1

      Read my comment on this video above. I'd definitely recommend picking up a Tesla P40. If you find a GPU with more compute capability for less than $175, I wanna hear about it. I was so impressed with how easy it was to get up and running, I bought 2 more P40 for the other R720 in my rack.

  • @Rico0333
    @Rico0333 5 месяцев назад

    Does the GPU have a CPU Ram requirement? I'm getting not enough resources error in windows.

    • @TheDataDaddi
      @TheDataDaddi  5 месяцев назад

      Hi there. Thanks for the question!
      So in theory there is not hard and fast requirement for a CPU RAM based on a particular GPU. That said you need enough RAM to hold whatever it is you are trying to load on your GPUs. For example, if you have 16GB of CPU RAM and 32GB of GPU VRAM. If you try to load a 24GB model onto the GPUs you will likely get errors, because even though you have enough GPU VRAM to hold the model you can't fit it into CPU RAM first in order to load it.
      I am not sure if this is your problem? If not please let me know, and I can dig deeper to try to help you figure out where the error may be coming from.

  • @bopal93
    @bopal93 9 месяцев назад

    How much is the idle power consumption for one p40? Planning to stick it to my server

    • @TheDataDaddi
      @TheDataDaddi  9 месяцев назад

      Hi there! I just checked and it looks like about 10 W per p40. Please let me know if you have any other questions.

    • @bopal93
      @bopal93 9 месяцев назад

      @@TheDataDaddi Hi, thanks for the reply. 10W idle seems great to me. I'm planning to use in proxmox server and use it in different VMs. Another question, how do you cool it? Are you using those fan shrouds or anything?

    • @TheDataDaddi
      @TheDataDaddi  9 месяцев назад

      Yep it was quite a bit less than I was expecting as well. So I actually have a whole video on that. I have written a program that uses idrac to adjust the internal fans once the gpus get over a certain temp. You could also (most likely) get some small snail blower fans and put them in as well.
      Here is the link to the video:
      ruclips.net/video/RUW3Ay5rsCY/видео.htmlsi=JP3IMRCFe23yy4RG
      @@bopal93

    • @bopal93
      @bopal93 9 месяцев назад

      ​@@TheDataDaddi Thanks for the video link! Great stuff.
      Looks like that's exactly I was looking for. The servers that will be using GPU don't need the GPU all the time, so spinning fans 24/7 is not ideal and you've already figure that out. Just ordered the P40, excited for the new server build.

    • @TheDataDaddi
      @TheDataDaddi  9 месяцев назад +1

      So glad its helped you! Yep, I hope that cooling solution works for you. If you have any issues let me know.
      Super excited for you. Let me know how it turns out!@@bopal93

  • @skrillexdj79
    @skrillexdj79 5 месяцев назад

    Is this the same for the telsa k80 for the cable to power in the server

    • @TheDataDaddi
      @TheDataDaddi  5 месяцев назад

      Hi there. Thanks for the question!
      Yes, the same cabling should work for the K80s as well.

  • @DragonsR4Ever2
    @DragonsR4Ever2 3 месяца назад

    Has anyone attempted connecting more than 4 tesla p40's to the r720xd? I've got 4 connected now but when I originally had one of them connected to the lower pcie on riser 2 I got a pcie training error. 2 on riser 1 and 1 each on risers 2 and 3 seems to work fine. I've got another p40 in the mail that I will attempt to connect to the last slot on riser 1 but it would be awesome if I could have 6😀 or even 8. However this machine does not support bifurcation and adding pcie switches to the 2 full bandwidth pcie's is too expensive.

    • @TheDataDaddi
      @TheDataDaddi  2 месяца назад

      Hi there! Thanks so much for the question.
      I am assuming you are connecting these externally or how are you fitting in 4 GPUs? Also, another thing to keep in mind is that after 2 GPUs the amount of PCIE lanes for each GPU will be reduced (I forget what is supported by the GPU and mobo honestly), but this is is something you might want to check to make sure it does not become a bottle neck for you operations.

  • @iasplay224
    @iasplay224 4 месяца назад +1

    how did you download ubuntu on the dell server , I have installed the ubuntu on usb bootable then I installing on a ssd hard drive but I cannot boot into that hard drive did you have that issue or does someone know how to fix it

    • @TheDataDaddi
      @TheDataDaddi  4 месяца назад +1

      Hi there. Thanks so much for your comment.
      The process is normally as easy as:
      1) Creating a bootable drive with whatever OS you want to install
      2) Plug in the usb
      3) Power on the machine and boot into BIOS
      4) Adjust the bios settings so that the usb device is the first boot choice (although many times you dont even need to do this)
      5) Let the machine boot normally, and you should be prompted to install the OS.
      6) Install the OS
      7) Restart machine and boot into BIOS
      8) Adjust bios setting so that the new drive where the OS is installed is the first boot option
      9) Let the machine boot normally, and you should have a working fresh install of whatever OS you choose.

    • @TheDataDaddi
      @TheDataDaddi  4 месяца назад +1

      Let me know if this does not work for you. I can try to help you trouble shoot from there.

    • @iasplay224
      @iasplay224 4 месяца назад

      @@TheDataDaddi I was able to solve the issue but the things is that I originally tried to plug in a 2.5 ssd drive to the back of the server and that didn't work in terms of booting, I was able to install the os on it but when restarting the bios said no bootable drive was found. my solution was to use a usb to Sata adapter and plug that in the internal usb port and that worked and one could then boot, but the drives are detected in ubuntu and can save files to them so it's weird

    • @tranquillitydysfunction8142
      @tranquillitydysfunction8142 2 месяца назад

      I have an r720xd, my first fix was to install on a SSD connected via USB to Sata like you but the internal USB is v slow.
      Next I set up the SSD as a raid 0, by entering raid bios (ctrl R) after boot, when it talks about raid, I initialised the drive by ticking advanced and then initialise. 3 seconds later, done.
      I then set it as bootable in the life app (reboot F10), under the raid controller option at the bottom (could not find the option in raid bios).
      Rebooted F10, UEFI bios and chose it as boot disk.
      Next rebooted with F11, picked front USB disk with install image.
      Removed USB after install and rebooted, no F keys, everything was good. Straight into linux

  • @DragonsR4Ever2
    @DragonsR4Ever2 3 месяца назад

    I recently got a R720Xd and fitted a few p40s externally. All was well u till I added a team group ssd 4tb qx verity. On reboot the created virtual disk disappears and the physical disk shows as failed. it works fine on a fresh start but I still get blinking yellow light on the drive and a few error symbols. Should I use software raid or a different card or any other workarounds? Anyone have this problem? Id hate to have to buy a different ssd😢

    • @TheDataDaddi
      @TheDataDaddi  3 месяца назад

      Hi there. Thanks so much for your comment!
      I have a video related to this topic. Link is below:
      ruclips.net/video/oQqcXyS2WQ4/видео.htmlsi=iisyvWNAJlWJwQQD
      If it is just error light on the drive, you should be fine. I have been using TeamGroup drives for a while now and been very happy with them. If there are issues booting after installation, there is likely a problem with the drive and you might want to have it replaced. I believe TeamGroup offers a 3 year warranty at minimum on most of their products. Might be worth reaching out to have it replaced if there are issues.

    • @DragonsR4Ever2
      @DragonsR4Ever2 3 месяца назад

      ​I ended up getting it working by flashing the perc h710 mini D1 with Lsi IT mode firmware and installing the boot images using the guide on fohdeesha. It's working great and I now have 4 p40's mounted on a gpu rack above the case. I still haven't got the fans not to run at 100% yet but when I get some extra time I'll continue digging into the ipmi tool. I've gotten it to return the fan info but it doesn't seem to respond to turning off automatic fan control or manual controlling the fan speed ​@@TheDataDaddi

    • @DragonsR4Ever2
      @DragonsR4Ever2 3 месяца назад

      I resolved my fan problem by using RACAdm. I cleared the idrac logs and disabled the PCI fan response. Fans are running at 10% at idle now. I watched your video on installing the idrac service module and I'd like to note that version 10 point something has a Ubuntu specific version so you don't have to do any converting. Although I am running Ubuntu version 20 so all I did was run the shell script and voila all done