I am glad you did not break your motherboard please read MBD-X10SDV-4C-TLN2F manual (can be downloaded from supermicro home page) that 4pin is not for atx 4 pin power its for dedicated DC supply. You need to either use 24pin atx or 4 pin. DO NOT use them both! Quote from manual: "Do not use the 4-pin DC power at PJ1 when the 24-pin ATX Power at JPW1 is connected to the power supply. Do not plug in both PJ1 and JPW1 at the same time."
I'm gonna be honest as well. I DO NOT NEED a lot of my homelab stuff. However, i do love cosplay. Specifically cosplaying as a sysadmin/automation engineer.
The PCIe passthrough definitely works on those boards. If you’ve set the vfio stuff in /etc/modules/ and “intel_iommu=on” in PVE and still not booting, make user the BIOS has the VT-d extension and IOMMU enabled. Thanks for the tour of the new cluster!
Thanks for putting together this video and the previous one showing ProxMox VE HCI with less expensive hardware. The two 10GbE switches shown are each a single point of failure. To upgrade the networking to HA, these could be replaced with two switches configured with MLAG. VLANs can be used to create the two logical networks shown: Host and Ceph. For maintenance like internal drive or part replacement, having four nodes instead of the minimum three would allow one node to be safely removed at any time to perform orderly maintenance and upgrades. When one of only three nodes is intentionally made unavailable to perform maintenance, the two remaining nodes are in a degraded state for some services (including Ceph), and if anything unlucky happens to one of the two remaining nodes during the maintenance window, there is no longer a cluster.
Great tour! I love that you include the mistakes and it's not a "do this and the HA gods will bless you" tutorial. I also had a devil of a time getting iGPU passthrough to work on Proxmox, although I'm running it on a Dell 3930 (with USB-C-only iGPU display output). I had to use cpu=host, q35, Virtio-GPU, PCI passthrough with PCI-e & x-vga, USB port passthrough (for the dummy dongle to work). I would still get error 43, but a quick disable/enable cycle in Windows gets things back in order.
Excellent presentation. Thank you. I have been testing this Proxmox stuff in my home lab. I have built a a few full-on NVME ESXi/VSAN setups and I am really amazed how good Proxmox is, especially for the home lab price.
I'm still in the "just got my first HP Elitedesk mini PC as my new docker host instead of my NAS" phase of my homelab journey but I still enjoy watching there videos even though I don't ever expect to rackmount my servers (unless perhaps it's a very small one that can mount miniPC's).
Damn dude, that’s a badass cluster setup. I just bought 3 2U PowerEdge R730’s, which is clearly less efficient, and you’re is really high performance. Me likey.
This is essentially the same route I took, except I used 5 Dell 7050s 1L SFF PCs, NVMe for ProxMox, 1 TB SSDs on each node for Ceph and a dedicated backhaul network for Ceph. All in, it cost about $750 and my only regret was not waiting until Prime Day to get 2TB SSDs for about what I paid for the 1TB drives. 🤦♂️ I haven't made it the primary cluster yet though and am still trying to figure out what I really want to do with it. 😅 Great video though!
I'm just here taking up space - For PV(C) in your k3s Cluster, I recommend using "Rook with an external Ceph cluster", i.e. the Ceph storage provided by ProxMox.
I'm seriously contemplating ordering a few of the Topton Intel 8505 router boxes and running them in a cluster like this... More powerful processor with lower power draw than the SuperMicro you're using, 6x 2.5Gbps NICs for direct host-to-host connectivity without a switch (and enough extra ports for network connectivity), and they're completely passive on cooling. Only real downside is there's no PCIe expansion to speak of, and it's not Xeon / ECC, but for the price (around the same price as you spent for each of these, if not a touch cheaper) they would make for an awesome cluster! I'm currently running one for my router and it's been rock solid, and I've got a couple of Chenbro 1U servers that are due for replacement.
Regarding PCI Passthrough: Some vendors literally block it from properly working, HP ProLiant Servers for example. Ive been cracking my head for literal weeks with them. After trying it with some Lenovo Servers it worked instantly for me. Just a heads up, that sometimes its literally impossible to get it to work
I ran a mirrored set of 870 evos in my proxmox cluster and the performance was okay, as long as i didn't update more than one VM at a time or download and install large packages/binaries. I/O delay would cause random VMs to become unresponsive and general instability Promxox and ZFS really needs enterprise drives with larger caches and high endurance.
The cool thing about those inwin cases is that you can swap the position of the PSU and front I/O ports, swap the rack ears to the other side and then you have all your motherboard I/O and PCI slots at the front of the rack while leaving power at the back.
I'm just here taking up space, but I'm trying to understand the high availability part -- what sort of failures are we trying to protect against? the shared ceph pool seems to be the single point of failure that would take down the entire cluster? A single drive failure isn't an issue with raid, but what if that hardware failure that isn't a drive?
@@RaidOwl I think I've mixed something up then. Is there a 4th server that holds all the drives doing the ceph storage ( the larger 4U under the 3 nodes ), or is the ceph storage replicated across, and exists on, the drives in the node servers?
@@JonathanDavisJJ Ceph runs on each of the 3 nodes using each of the Micron SSDs in each of the nodes. So yeah, the ceph storage is replicated across all 3 nodes.
I would recommend you to replace those silicon power SSDs. I had a few of these in the datacenter running only as proxmox boot disk and all died after a few month.
I'm looking at replacing an existing vSphere Enterprise with a shared storage enterprise "grade" virtualization platform. Maybe I missed it, but I seem to be having a problem finding anyone who can demonstrate High Availability (HA) of the hypervisor nodes in these three scenarios below. Everyone has videos on setting up the cluster, live migration but I'm not seeing anyone doing actual tests of a complete or partial failure of one of the cluster nodes. 1) Complete node fail -- just pull the power plug(s) out to simulate. How does Proxmox handle dozens of VMs powering on? Does it have a DRS type function where it will distribute the VMs across the remaining nodes? Is there an ability to have specified VMs prioritized over other VMs? Also, the ability to restart VMs in a specific order? 2) Partial fail where the hypervisor is in some sort of hung state and the VMs are down but the storage is still accessible and any file locks (if applicable) are still held? 3) Host isolation. What happens when the Proxmox host is unreachable from the management side but the VMs running are still accessible? Will it allow VMs to still run? Will it provide an option to restart VMs on other nodes? Thanks.
you deserved this and you probably want to follow this thread down the line - you will find you do need this once you get all the kinks out - do a ha opnsense next - non virt total bare metal... you will want to max the ram out on these - more ram equals more better - great you have an upgrade path - you will want to go all nvme - that seems to be your weakest link - pls update with and make the cluster fabric 20g bonded and add a usb 2.5 for mgt inf - please explore other netfs options - nfs, zfs, ocfs2, sshfs, gluster
Some suggetions for improvement: First of all... You spent so much money on high availability, but ended up creating another single point of failure, which is the ceph switch. Thing is... For a 3 node cluster you don't even need a switch at all. Instead you can make a direct connection from each to each node without a switch in between (plenty on info in the proxmox forum on how to do this) and as a result you get minimum latency and no single point of failure in the ceph network. Of course you need to use both of the 10G connections that the Supermicro board has for that, but using the other one for a 1G connection is a terrible waste anyway. So use the PCIe slot to get a 4-port 1G Card, which are very cheap available second hand. Fromt that 4 additional 1G ports, you can use 2 to apply the same principle for cross-connecting the nodes directly and use that for the cluster backend network exclusively. The only downside of this woud be, that it will be not so easy to expand the cluster with more nodes, compared to a solution that uses switches to connect the ceph and cluster backend network. Of course you could ramp up things even further by using a 4x10G or even 4x25G PCIe card instead of a 4x1G. Although that will add another few bucks to the bill, it would open up a possibility to get a card with direct memory access support (RDMA), which is also supported by proxmox and known to be very beneficial for ceph performance. You're welcome. :)
if you had NVME drive and need for bandwidth, I'd probably pick 25Gbe or so NICs for the available PCIe slot, another option would be to populate that PCIe slot with multi-nvme PCIe board, I think Sonnet makes some new one with 8 nvme bays on PCIe 4.0 x16 which is wild :D
7400 Micron drives are NVMe. Faster in Ceph than any consumer drive can dream of. And the quad core can't handle more than a single NVMe at full load anyway.
I am running 2 Dell r620's with 10c/20t and 64GB ram each. I need 1 more to make a matching trio. The 2 Dells pull about 130w. I am using Harvester right now but really thinking about switching to Proxmox because that where the cool kids hang (thats where the projects and tutorial videos are), So hard to find Harvester content.
No no no I just got rid of one of my 2u servers to save on power and noise. I don't need a 3 1us for a cluster to run k8s like I've wanted to.....how loud are these....asking for a friend.
I wanted to mimic this setup but for the switch that has the dedicated CEPH network...if that switch needs to reboot say for an update....would that wreck a lot of stuff since all 3 hosts lose communication to each other over CEPH? Have you tested that?
I built a proxmox cluster using Supermicro M11SDV-8C-LN4F AMD Epyc 3251 board. 8-core, 16-thread, 65W total usage with 4 sticks of ECC ram and a SATA SSD boot disk under load. Another Mini ITX board, and though it's Zen 1, its power usage is the reason I chose it. I need fast networking too, but storage is handled differently for high availability in my network. So the 1 PCI-e slot is used for a 10 Gbit nic, because though Epyc 3xx1 supports 10Gigabit networking on chip, this board doesn't have 10gig ports.
lol I love how the official Ceph documentation questions the need for multiple networks, but the users went nuts all over the internet and demand that you use it
How does the secondary network for the CEPH storage work? Is it not at all connected to the main network? If so, do I have to manually assign IP addresses to the systems?
@@RaidOwl Its not recommended to use DHCP for Ceph network (Ceph is tightly bounded to IP address). Yeah, it doesnt really matter in a 3 node ceph cluster, but its a really bad practice
Kinda weird request, and I know I most definitely should not be buying hardware based on aesthetics, but could you let me know what these chassis look like with rack mounted ubiquiti gear? Do the two silvers look good together, or do they clash?
I've just finished creating a pair of Proxmox servers for myself hosting my original 6x rPi's rebuilt as Debian VMs. Each machine has the following specs : - Inter-tech K-125L 1U rackmount case - Akyga 200W PSU - ASRock J5040-ITX M/board - 32GB DDR4 2400 RAM (2x16GB) - 1x 500GB Samsung 870 EVO SSD for the boot drive (overkill as I originally ordered 256GB but weren't in stock but the retailer supplied for the same price) - 1x 4TB Samdung 870 EVO SSD for VM storage (again overkill, but i have plenty of available space and they're sheap) - 3x Noctua 40mm NF-A4x20 FLX 5000 fans Both machines run super cool and quiet and have plenty of power for my current needs, with each only using
I don't know about proxmox, but with ESXi you can add variables to the VM to fix the code 43 error with GPUs. Although from my experience that error only came up with older NVIDIA cards. Basically you gotta tell the VM it isn't a VM.
I must admit , when I first came across your channel, I found you and/or your method of presentation to be somewhat annoying. But the overlords at RUclips and their algo kept on pushing your content to my feed and after watching more of your videos, I have actually started taken a liking to your awkward sense of humor. I also enjoy that you share all of your mistakes and blunders with us, which any homelabber can relate to. So I guess I’ll hit that subscribe button!
I would love to have a proxmox cluster for my home lab. I could not get GPU passthrough to work with my setup either. Does proxmox do load balancing were it would move a vm to the other host if it is less busy then the other host? Thanks for sharing.
@@RaidOwl that would be quite much for me .. is that idle or load? in the case you showed 1 node can run the VMs and stuff while the other 2 nodes are basically fully idle (until there is a crash of node 1). are the 30-35W idle (for node 2 and 3) or load (node 1)? thanks for the reply
I wonder if the 00 is the code the bios is reporting to the IPMI, Then I'd hazard a guess its a cracked ball joint under the cpu possibly. A reflow might fix it. I have no real experience in that, but its just a random guess based on other things I've seen.
I think you're supposed to have TWO separate Ceph networks, one for their "private" and one for their "public" -- plus 10 gig for your proxmox vm network, making three, then a separate gig network for corosync and yet another gig network for proxmox system (separate from vm network). I'm just looking into this now though and I have read that putting the two ceph networks on one NIC is usually fine for most people. But I'm just figuring this out myself too.
Probably the better way to setup ceph would have been to get 2 switches in an MLAG and do an LACP with the 2 ports to get 20git for ceph and VMs. Since you are in a non production environment where your servers get hit with 10g incomming traffic from the internet ceph has more ressources and failover capability
Man please, im from spain, could you tell me where to buy that kind of server chasis in order to build my own high Availability clusters servers. Please, i have to build some of them for my clients and i. i'll appreciate any kind of help!!! Have a really good one!! And by the way keep us woth all of thos awesome content
I concur, those Silicon Power (SPCC) SATA SSDs do suck. I discovered the company hardcoded the SMART data! They are all fixed to display 40 C. No matter what the actual temperature is. This seems to be a response to a review on Amazon that their SSDs were running as high as 60 C and failing prematurely. And that reviewer noted the replacements all read 40 C. I bought a 1 and 2 TB SATA SSD and both of mine never waver from 40 C. Even when cold booting at a much lower ambient temperature. Or testing them under CrystalDiskMark. F-cking Amazon pulled my review down with my findings! Buyer beware.
l Bought the same motherboard but for the life of me it won’t connect to the internet tried many settings no IPv4 or 6 sends but wont receive any ideas would really help Thanks !!!!!!!
I have a similar setup, for those who want a 3 node cluster but don’t want to splash out on a 10g switch, you can use duel 10gig nics in a full mesh network so each node has a direct connection to each other, works well and removes a single point of failure! (The switch)
re: gpu. It's either going to be you need this in your file: vfio_iommu_type1.allow_unsafe_interrupts=1 or you need hugepagesz=1G default_hugepagesz=2M in your grub and hugepages: 2 and balloon: 0 in your /etc/pve/qemu-server/VMID.conf or your hardware just can't handle outputting to the physical ports on the gpu, in which case if you turn off default gpu on its setting in hardware it will still work but only over vnc/for computation, which is not that useful in your case probably (it's way more useful if you are importing gpus to to hardware encode/decode for plex/jellyfin.)
I am glad you did not break your motherboard please read MBD-X10SDV-4C-TLN2F manual (can be downloaded from supermicro home page) that 4pin is not for atx 4 pin power its for dedicated DC supply. You need to either use 24pin atx or 4 pin. DO NOT use them both! Quote from manual: "Do not use the 4-pin DC power at PJ1 when the 24-pin ATX Power at JPW1 is connected to the power supply. Do not plug in both PJ1 and JPW1 at the same time."
Oh neat! 🙃
RTFM - The truest words to abide by.
@@LND3947 its too long
@@LND3947 * insert Michael Scott NOOOOO gif here *
@@RaidOwlAnother is PEBKAC
I'm gonna be honest as well. I DO NOT NEED a lot of my homelab stuff. However, i do love cosplay. Specifically cosplaying as a sysadmin/automation engineer.
Finally someone honest
The PCIe passthrough definitely works on those boards. If you’ve set the vfio stuff in /etc/modules/ and “intel_iommu=on” in PVE and still not booting, make user the BIOS has the VT-d extension and IOMMU enabled. Thanks for the tour of the new cluster!
Thanks for putting together this video and the previous one showing ProxMox VE HCI with less expensive hardware.
The two 10GbE switches shown are each a single point of failure. To upgrade the networking to HA, these could be replaced with two switches configured with MLAG. VLANs can be used to create the two logical networks shown: Host and Ceph.
For maintenance like internal drive or part replacement, having four nodes instead of the minimum three would allow one node to be safely removed at any time to perform orderly maintenance and upgrades.
When one of only three nodes is intentionally made unavailable to perform maintenance, the two remaining nodes are in a degraded state for some services (including Ceph), and if anything unlucky happens to one of the two remaining nodes during the maintenance window, there is no longer a cluster.
Great tour! I love that you include the mistakes and it's not a "do this and the HA gods will bless you" tutorial. I also had a devil of a time getting iGPU passthrough to work on Proxmox, although I'm running it on a Dell 3930 (with USB-C-only iGPU display output). I had to use cpu=host, q35, Virtio-GPU, PCI passthrough with PCI-e & x-vga, USB port passthrough (for the dummy dongle to work). I would still get error 43, but a quick disable/enable cycle in Windows gets things back in order.
I really hope to see you do MORE videos about Proxmox!
"cyberbullied by some neckbeard" - priceless. That kept me smiling right to the end and then some. Thanks owl.
8:59 THANK YOU!!!! I was looking all around to find out which order this went in!
Excellent presentation. Thank you. I have been testing this Proxmox stuff in my home lab. I have built a a few full-on NVME ESXi/VSAN setups and I am really amazed how good Proxmox is, especially for the home lab price.
Would love a non-neckbeard approach/mindset to Ceph/CephFS/Rook setup on that cluster as a follow-up. Fighting thru that on my own setup.
Yeah I got some stuff to try
@@RaidOwl Awesome. More responses for the RUclips algorithm overloards.
Good thing you can directly use your ceph cluster as a csi backend! And if you create a cephs and mons you can even use RWX pvcs
I'm still in the "just got my first HP Elitedesk mini PC as my new docker host instead of my NAS" phase of my homelab journey but I still enjoy watching there videos even though I don't ever expect to rackmount my servers (unless perhaps it's a very small one that can mount miniPC's).
Damn dude, that’s a badass cluster setup. I just bought 3 2U PowerEdge R730’s, which is clearly less efficient, and you’re is really high performance. Me likey.
This is essentially the same route I took, except I used 5 Dell 7050s 1L SFF PCs, NVMe for ProxMox, 1 TB SSDs on each node for Ceph and a dedicated backhaul network for Ceph. All in, it cost about $750 and my only regret was not waiting until Prime Day to get 2TB SSDs for about what I paid for the 1TB drives. 🤦♂️
I haven't made it the primary cluster yet though and am still trying to figure out what I really want to do with it. 😅
Great video though!
I'm just here taking up space - For PV(C) in your k3s Cluster, I recommend using "Rook with an external Ceph cluster", i.e. the Ceph storage provided by ProxMox.
Someone else mentioned Rook. Ima look into it for sure
I'm just here taking up space!
I'm seriously contemplating ordering a few of the Topton Intel 8505 router boxes and running them in a cluster like this... More powerful processor with lower power draw than the SuperMicro you're using, 6x 2.5Gbps NICs for direct host-to-host connectivity without a switch (and enough extra ports for network connectivity), and they're completely passive on cooling. Only real downside is there's no PCIe expansion to speak of, and it's not Xeon / ECC, but for the price (around the same price as you spent for each of these, if not a touch cheaper) they would make for an awesome cluster!
I'm currently running one for my router and it's been rock solid, and I've got a couple of Chenbro 1U servers that are due for replacement.
Please make a follow-up video on this setup when it completes 1 year with your new learnings along the way. 🙂
Agreed
U took me back to my youth with Alvin and the chipmunks proxmox server naming 😊
Regarding PCI Passthrough:
Some vendors literally block it from properly working, HP ProLiant Servers for example. Ive been cracking my head for literal weeks with them.
After trying it with some Lenovo Servers it worked instantly for me.
Just a heads up, that sometimes its literally impossible to get it to work
"I'm just here so I won't get fined."
- Someone else
Okay Marshawn
I ran a mirrored set of 870 evos in my proxmox cluster and the performance was okay, as long as i didn't update more than one VM at a time or download and install large packages/binaries. I/O delay would cause random VMs to become unresponsive and general instability
Promxox and ZFS really needs enterprise drives with larger caches and high endurance.
Thanks for the video. Missing one thing though - simulated failure for one of the nodes
The cool thing about those inwin cases is that you can swap the position of the PSU and front I/O ports, swap the rack ears to the other side and then you have all your motherboard I/O and PCI slots at the front of the rack while leaving power at the back.
I just love your content...hmmm feels like home ❤
I'm just here taking up space, but I'm trying to understand the high availability part -- what sort of failures are we trying to protect against? the shared ceph pool seems to be the single point of failure that would take down the entire cluster? A single drive failure isn't an issue with raid, but what if that hardware failure that isn't a drive?
An entire server could blow up and everything would keep running
@@RaidOwl I think I've mixed something up then. Is there a 4th server that holds all the drives doing the ceph storage ( the larger 4U under the 3 nodes ), or is the ceph storage replicated across, and exists on, the drives in the node servers?
@@JonathanDavisJJ Ceph runs on each of the 3 nodes using each of the Micron SSDs in each of the nodes. So yeah, the ceph storage is replicated across all 3 nodes.
"I'm just here taking up space" - me, 2024
I would recommend you to replace those silicon power SSDs. I had a few of these in the datacenter running only as proxmox boot disk and all died after a few month.
I'm just here taking up space - and laughing. Thanks for the entertainment :)
Liking this video 😊 appreciate a year later... if you were building today? What motherboard/cpu/ram combo would you use in this case?
I'm looking at replacing an existing vSphere Enterprise with a shared storage enterprise "grade" virtualization platform. Maybe I missed it, but I seem to be having a problem finding anyone who can demonstrate High Availability (HA) of the hypervisor nodes in these three scenarios below. Everyone has videos on setting up the cluster, live migration but I'm not seeing anyone doing actual tests of a complete or partial failure of one of the cluster nodes.
1) Complete node fail -- just pull the power plug(s) out to simulate. How does Proxmox handle dozens of VMs powering on? Does it have a DRS type function where it will distribute the VMs across the remaining nodes? Is there an ability to have specified VMs prioritized over other VMs? Also, the ability to restart VMs in a specific order?
2) Partial fail where the hypervisor is in some sort of hung state and the VMs are down but the storage is still accessible and any file locks (if applicable) are still held?
3) Host isolation. What happens when the Proxmox host is unreachable from the management side but the VMs running are still accessible? Will it allow VMs to still run? Will it provide an option to restart VMs on other nodes?
Thanks.
you deserved this and you probably want to follow this thread down the line - you will find you do need this once you get all the kinks out - do a ha opnsense next - non virt total bare metal... you will want to max the ram out on these - more ram equals more better - great you have an upgrade path - you will want to go all nvme - that seems to be your weakest link - pls update with and make the cluster fabric 20g bonded and add a usb 2.5 for mgt inf - please explore other netfs options - nfs, zfs, ocfs2, sshfs, gluster
Just a wonderful inspiring video thx 😊
Some suggetions for improvement:
First of all... You spent so much money on high availability, but ended up creating another single point of failure, which is the ceph switch. Thing is... For a 3 node cluster you don't even need a switch at all. Instead you can make a direct connection from each to each node without a switch in between (plenty on info in the proxmox forum on how to do this) and as a result you get minimum latency and no single point of failure in the ceph network. Of course you need to use both of the 10G connections that the Supermicro board has for that, but using the other one for a 1G connection is a terrible waste anyway. So use the PCIe slot to get a 4-port 1G Card, which are very cheap available second hand. Fromt that 4 additional 1G ports, you can use 2 to apply the same principle for cross-connecting the nodes directly and use that for the cluster backend network exclusively.
The only downside of this woud be, that it will be not so easy to expand the cluster with more nodes, compared to a solution that uses switches to connect the ceph and cluster backend network.
Of course you could ramp up things even further by using a 4x10G or even 4x25G PCIe card instead of a 4x1G. Although that will add another few bucks to the bill, it would open up a possibility to get a card with direct memory access support (RDMA), which is also supported by proxmox and known to be very beneficial for ceph performance.
You're welcome. :)
if you had NVME drive and need for bandwidth, I'd probably pick 25Gbe or so NICs for the available PCIe slot,
another option would be to populate that PCIe slot with multi-nvme PCIe board, I think Sonnet makes some new one with 8 nvme bays on PCIe 4.0 x16 which is wild :D
7400 Micron drives are NVMe. Faster in Ceph than any consumer drive can dream of. And the quad core can't handle more than a single NVMe at full load anyway.
I’d link my guide but it’s not the first result on google
Doesn’t count then
I'm sorry to say...
I love you video's. Very well edited.
I’m sorry but….thank you
What is your energy use under load at at rest? This might be just what I need.
Under load they pull just over 100W. At idle it’s like 85
NICE!, used this same chassis in a firewall build. Good case but the back io shields were a PITA> !
Lol yeah I just avoided those
@@RaidOwl I saw :) Nice setup sir !
I am running 2 Dell r620's with 10c/20t and 64GB ram each. I need 1 more to make a matching trio. The 2 Dells pull about 130w. I am using Harvester right now but really thinking about switching to Proxmox because that where the cool kids hang (thats where the projects and tutorial videos are), So hard to find Harvester content.
No no no I just got rid of one of my 2u servers to save on power and noise. I don't need a 3 1us for a cluster to run k8s like I've wanted to.....how loud are these....asking for a friend.
Actually verrrrry quiet. I forgot to put my sound test in here but I used some low adapters from Noctua and they work great.
I also don’t need these & want to know how loud they are… for … a… “friend”… haha
I wanted to mimic this setup but for the switch that has the dedicated CEPH network...if that switch needs to reboot say for an update....would that wreck a lot of stuff since all 3 hosts lose communication to each other over CEPH? Have you tested that?
you can put the ceph cluster in maint mode or just let it pause on its own. source, lost two switches powering a cluster.
Great video love this style and subject
I would like to see some benchmarks. Ideally with database usage :)
Some filesystem Benchmarks would be nice
I built a proxmox cluster using Supermicro M11SDV-8C-LN4F AMD Epyc 3251 board. 8-core, 16-thread, 65W total usage with 4 sticks of ECC ram and a SATA SSD boot disk under load. Another Mini ITX board, and though it's Zen 1, its power usage is the reason I chose it. I need fast networking too, but storage is handled differently for high availability in my network. So the 1 PCI-e slot is used for a 10 Gbit nic, because though Epyc 3xx1 supports 10Gigabit networking on chip, this board doesn't have 10gig ports.
10:09 holy banana. no standoffs? :S
Hei, good setup, and very interesting video, thanks ^^
lol I love how the official Ceph documentation questions the need for multiple networks, but the users went nuts all over the internet and demand that you use it
Love it
How does the secondary network for the CEPH storage work? Is it not at all connected to the main network? If so, do I have to manually assign IP addresses to the systems?
It’s connected to the main network but it has its own VLAN with proper DHCP addressing
@@RaidOwl Its not recommended to use DHCP for Ceph network (Ceph is tightly bounded to IP address). Yeah, it doesnt really matter in a 3 node ceph cluster, but its a really bad practice
@@Mcs1v reservations make everything possible and just fine.
I love these cases. They are very hard to get hands on.
Does anyone have a link to the U.3 to NVME adapter cable? I'm having trouble finding how the U.3 drive connects to an M.2/NVME slot on a motherboard.
These are what I used: amzn.to/3uTXR1b
Wow! thanks for the quickly reply. Can't wait to try it it out. I have the 2 of the XeonD-1540 boards and it might be time to get a 3rd ! @@RaidOwl
Wish I woulda gone with the 1540s haha
anything to be gained by glusterfs the 1TB X 3 SSD ?
Kinda weird request, and I know I most definitely should not be buying hardware based on aesthetics, but could you let me know what these chassis look like with rack mounted ubiquiti gear? Do the two silvers look good together, or do they clash?
1amp for all three? Whats the voltage in the US? 220?
120
I've just finished creating a pair of Proxmox servers for myself hosting my original 6x rPi's rebuilt as Debian VMs.
Each machine has the following specs :
- Inter-tech K-125L 1U rackmount case
- Akyga 200W PSU
- ASRock J5040-ITX M/board
- 32GB DDR4 2400 RAM (2x16GB)
- 1x 500GB Samsung 870 EVO SSD for the boot drive (overkill as I originally ordered 256GB but weren't in stock but the retailer supplied for the same price)
- 1x 4TB Samdung 870 EVO SSD for VM storage (again overkill, but i have plenty of available space and they're sheap)
- 3x Noctua 40mm NF-A4x20 FLX 5000 fans
Both machines run super cool and quiet and have plenty of power for my current needs, with each only using
I don't know about proxmox, but with ESXi you can add variables to the VM to fix the code 43 error with GPUs. Although from my experience that error only came up with older NVIDIA cards. Basically you gotta tell the VM it isn't a VM.
I must admit , when I first came across your channel, I found you and/or your method of presentation to be somewhat annoying. But the overlords at RUclips and their algo kept on pushing your content to my feed and after watching more of your videos, I have actually started taken a liking to your awkward sense of humor. I also enjoy that you share all of your mistakes and blunders with us, which any homelabber can relate to. So I guess I’ll hit that subscribe button!
Praise to the almighty RUclips overlords 🙏🏼
Did you try and pass through the gpu without the rizer cable?
I did not 🤔
Tried the spell too.
Now how do I revert, there's no snapshot for that.
what is the read and write performance of your Ceph Cluster?
connect a sas cart on that slot and an LTO tape drive for backup
My x10 gets quite warm. Have you done anything to the cooling?
Nah they have active cooling though. Get about mid 70s under load
I would love to have a proxmox cluster for my home lab. I could not get GPU passthrough to work with my setup either. Does proxmox do load balancing were it would move a vm to the other host if it is less busy then the other host? Thanks for sharing.
Not sure if I missed it .. but power consumption (idle/load) per node would have been nice too - otherwise cool video!
About 30-35W per node
@@RaidOwl that would be quite much for me .. is that idle or load? in the case you showed 1 node can run the VMs and stuff while the other 2 nodes are basically fully idle (until there is a crash of node 1). are the 30-35W idle (for node 2 and 3) or load (node 1)? thanks for the reply
I wonder if the 00 is the code the bios is reporting to the IPMI, Then I'd hazard a guess its a cracked ball joint under the cpu possibly. A reflow might fix it. I have no real experience in that, but its just a random guess based on other things I've seen.
I am just here taking up space - you 😂
Would you still recommend the Zima board?
For sure, just manage your expectations
I think you're supposed to have TWO separate Ceph networks, one for their "private" and one for their "public" -- plus 10 gig for your proxmox vm network, making three, then a separate gig network for corosync and yet another gig network for proxmox system (separate from vm network). I'm just looking into this now though and I have read that putting the two ceph networks on one NIC is usually fine for most people. But I'm just figuring this out myself too.
Probably the better way to setup ceph would have been to get 2 switches in an MLAG and do an LACP with the 2 ports to get 20git for ceph and VMs.
Since you are in a non production environment where your servers get hit with 10g incomming traffic from the internet ceph has more ressources and failover capability
did you use eltro past
I am here for the chat!! Allways easy to learn from
I’m here just taking up space 🙌
-some dude
I watched the Livestream already, so I'm definitely just here taking up space
Man please, im from spain, could you tell me where to buy that kind of server chasis in order to build my own high Availability clusters servers. Please, i have to build some of them for my clients and i. i'll appreciate any kind of help!!! Have a really good one!! And by the way keep us woth all of thos awesome content
I’m just here taking up space!!
Has it enough CPU power to encode /transcode 4K videos for Plex?
I'm Just Here Taking Up Space...but love the content.
How loud is it?
thanks for another great video, awesome. have a great day
Im here just taking up space 😂
I concur, those Silicon Power (SPCC) SATA SSDs do suck. I discovered the company hardcoded the SMART data! They are all fixed to display 40 C. No matter what the actual temperature is. This seems to be a response to a review on Amazon that their SSDs were running as high as 60 C and failing prematurely. And that reviewer noted the replacements all read 40 C. I bought a 1 and 2 TB SATA SSD and both of mine never waver from 40 C. Even when cold booting at a much lower ambient temperature. Or testing them under CrystalDiskMark. F-cking Amazon pulled my review down with my findings! Buyer beware.
Love the realness, linux server nerd bro
"I am here just taking up space"
l Bought the same motherboard but for the life of me it won’t connect to the internet tried many settings no IPv4 or 6 sends but wont receive any ideas would really help Thanks !!!!!!!
I have a similar setup, for those who want a 3 node cluster but don’t want to splash out on a 10g switch, you can use duel 10gig nics in a full mesh network so each node has a direct connection to each other, works well and removes a single point of failure! (The switch)
or go for the quad 25G nic in the pci-slot way and use them in mesh network with dac/fiber and leave the 10G copper for outbound networking.
Gotta love those musical rodents
Sure do
I am surprised you didn't go for the 8 core version
These were much more easily available
I’m just here taking up space 😂
Funny story.. those chassis are used by one of our client's vendors as NVRs
I’m just here taking up space and mouth breathing
re: gpu. It's either going to be you need this in your file: vfio_iommu_type1.allow_unsafe_interrupts=1 or you need hugepagesz=1G default_hugepagesz=2M in your grub and hugepages: 2 and balloon: 0 in your /etc/pve/qemu-server/VMID.conf or your hardware just can't handle outputting to the physical ports on the gpu, in which case if you turn off default gpu on its setting in hardware it will still work but only over vnc/for computation, which is not that useful in your case probably (it's way more useful if you are importing gpus to to hardware encode/decode for plex/jellyfin.)
Just here taking up space
I'm just here taking up space :)
I am here just taking up space :P
I would stick another network adaptor in the pci-e slot
100G
@@RaidOwl 400G
If you have a time machine you could go back in time to stop yourself but if not then. I am just taking up space.
I'm just here taking up space, I won't spend that kind of money.
im just here taking up space c:
I'm just here taking up space.
You know the rules, so do I
I’m just here taking up space