I am glad you did not break your motherboard please read MBD-X10SDV-4C-TLN2F manual (can be downloaded from supermicro home page) that 4pin is not for atx 4 pin power its for dedicated DC supply. You need to either use 24pin atx or 4 pin. DO NOT use them both! Quote from manual: "Do not use the 4-pin DC power at PJ1 when the 24-pin ATX Power at JPW1 is connected to the power supply. Do not plug in both PJ1 and JPW1 at the same time."
I'm gonna be honest as well. I DO NOT NEED a lot of my homelab stuff. However, i do love cosplay. Specifically cosplaying as a sysadmin/automation engineer.
The PCIe passthrough definitely works on those boards. If you’ve set the vfio stuff in /etc/modules/ and “intel_iommu=on” in PVE and still not booting, make user the BIOS has the VT-d extension and IOMMU enabled. Thanks for the tour of the new cluster!
Excellent presentation. Thank you. I have been testing this Proxmox stuff in my home lab. I have built a a few full-on NVME ESXi/VSAN setups and I am really amazed how good Proxmox is, especially for the home lab price.
Thanks for putting together this video and the previous one showing ProxMox VE HCI with less expensive hardware. The two 10GbE switches shown are each a single point of failure. To upgrade the networking to HA, these could be replaced with two switches configured with MLAG. VLANs can be used to create the two logical networks shown: Host and Ceph. For maintenance like internal drive or part replacement, having four nodes instead of the minimum three would allow one node to be safely removed at any time to perform orderly maintenance and upgrades. When one of only three nodes is intentionally made unavailable to perform maintenance, the two remaining nodes are in a degraded state for some services (including Ceph), and if anything unlucky happens to one of the two remaining nodes during the maintenance window, there is no longer a cluster.
Great tour! I love that you include the mistakes and it's not a "do this and the HA gods will bless you" tutorial. I also had a devil of a time getting iGPU passthrough to work on Proxmox, although I'm running it on a Dell 3930 (with USB-C-only iGPU display output). I had to use cpu=host, q35, Virtio-GPU, PCI passthrough with PCI-e & x-vga, USB port passthrough (for the dummy dongle to work). I would still get error 43, but a quick disable/enable cycle in Windows gets things back in order.
I'm still in the "just got my first HP Elitedesk mini PC as my new docker host instead of my NAS" phase of my homelab journey but I still enjoy watching there videos even though I don't ever expect to rackmount my servers (unless perhaps it's a very small one that can mount miniPC's).
Damn dude, that’s a badass cluster setup. I just bought 3 2U PowerEdge R730’s, which is clearly less efficient, and you’re is really high performance. Me likey.
This is essentially the same route I took, except I used 5 Dell 7050s 1L SFF PCs, NVMe for ProxMox, 1 TB SSDs on each node for Ceph and a dedicated backhaul network for Ceph. All in, it cost about $750 and my only regret was not waiting until Prime Day to get 2TB SSDs for about what I paid for the 1TB drives. 🤦♂️ I haven't made it the primary cluster yet though and am still trying to figure out what I really want to do with it. 😅 Great video though!
I'm just here taking up space - For PV(C) in your k3s Cluster, I recommend using "Rook with an external Ceph cluster", i.e. the Ceph storage provided by ProxMox.
you deserved this and you probably want to follow this thread down the line - you will find you do need this once you get all the kinks out - do a ha opnsense next - non virt total bare metal... you will want to max the ram out on these - more ram equals more better - great you have an upgrade path - you will want to go all nvme - that seems to be your weakest link - pls update with and make the cluster fabric 20g bonded and add a usb 2.5 for mgt inf - please explore other netfs options - nfs, zfs, ocfs2, sshfs, gluster
The cool thing about those inwin cases is that you can swap the position of the PSU and front I/O ports, swap the rack ears to the other side and then you have all your motherboard I/O and PCI slots at the front of the rack while leaving power at the back.
I ran a mirrored set of 870 evos in my proxmox cluster and the performance was okay, as long as i didn't update more than one VM at a time or download and install large packages/binaries. I/O delay would cause random VMs to become unresponsive and general instability Promxox and ZFS really needs enterprise drives with larger caches and high endurance.
I'm seriously contemplating ordering a few of the Topton Intel 8505 router boxes and running them in a cluster like this... More powerful processor with lower power draw than the SuperMicro you're using, 6x 2.5Gbps NICs for direct host-to-host connectivity without a switch (and enough extra ports for network connectivity), and they're completely passive on cooling. Only real downside is there's no PCIe expansion to speak of, and it's not Xeon / ECC, but for the price (around the same price as you spent for each of these, if not a touch cheaper) they would make for an awesome cluster! I'm currently running one for my router and it's been rock solid, and I've got a couple of Chenbro 1U servers that are due for replacement.
I've just finished creating a pair of Proxmox servers for myself hosting my original 6x rPi's rebuilt as Debian VMs. Each machine has the following specs : - Inter-tech K-125L 1U rackmount case - Akyga 200W PSU - ASRock J5040-ITX M/board - 32GB DDR4 2400 RAM (2x16GB) - 1x 500GB Samsung 870 EVO SSD for the boot drive (overkill as I originally ordered 256GB but weren't in stock but the retailer supplied for the same price) - 1x 4TB Samdung 870 EVO SSD for VM storage (again overkill, but i have plenty of available space and they're sheap) - 3x Noctua 40mm NF-A4x20 FLX 5000 fans Both machines run super cool and quiet and have plenty of power for my current needs, with each only using
I built a proxmox cluster using Supermicro M11SDV-8C-LN4F AMD Epyc 3251 board. 8-core, 16-thread, 65W total usage with 4 sticks of ECC ram and a SATA SSD boot disk under load. Another Mini ITX board, and though it's Zen 1, its power usage is the reason I chose it. I need fast networking too, but storage is handled differently for high availability in my network. So the 1 PCI-e slot is used for a 10 Gbit nic, because though Epyc 3xx1 supports 10Gigabit networking on chip, this board doesn't have 10gig ports.
Regarding PCI Passthrough: Some vendors literally block it from properly working, HP ProLiant Servers for example. Ive been cracking my head for literal weeks with them. After trying it with some Lenovo Servers it worked instantly for me. Just a heads up, that sometimes its literally impossible to get it to work
Some suggetions for improvement: First of all... You spent so much money on high availability, but ended up creating another single point of failure, which is the ceph switch. Thing is... For a 3 node cluster you don't even need a switch at all. Instead you can make a direct connection from each to each node without a switch in between (plenty on info in the proxmox forum on how to do this) and as a result you get minimum latency and no single point of failure in the ceph network. Of course you need to use both of the 10G connections that the Supermicro board has for that, but using the other one for a 1G connection is a terrible waste anyway. So use the PCIe slot to get a 4-port 1G Card, which are very cheap available second hand. Fromt that 4 additional 1G ports, you can use 2 to apply the same principle for cross-connecting the nodes directly and use that for the cluster backend network exclusively. The only downside of this woud be, that it will be not so easy to expand the cluster with more nodes, compared to a solution that uses switches to connect the ceph and cluster backend network. Of course you could ramp up things even further by using a 4x10G or even 4x25G PCIe card instead of a 4x1G. Although that will add another few bucks to the bill, it would open up a possibility to get a card with direct memory access support (RDMA), which is also supported by proxmox and known to be very beneficial for ceph performance. You're welcome. :)
I have a similar setup, for those who want a 3 node cluster but don’t want to splash out on a 10g switch, you can use duel 10gig nics in a full mesh network so each node has a direct connection to each other, works well and removes a single point of failure! (The switch)
I would recommend you to replace those silicon power SSDs. I had a few of these in the datacenter running only as proxmox boot disk and all died after a few month.
if you had NVME drive and need for bandwidth, I'd probably pick 25Gbe or so NICs for the available PCIe slot, another option would be to populate that PCIe slot with multi-nvme PCIe board, I think Sonnet makes some new one with 8 nvme bays on PCIe 4.0 x16 which is wild :D
7400 Micron drives are NVMe. Faster in Ceph than any consumer drive can dream of. And the quad core can't handle more than a single NVMe at full load anyway.
I am running 2 Dell r620's with 10c/20t and 64GB ram each. I need 1 more to make a matching trio. The 2 Dells pull about 130w. I am using Harvester right now but really thinking about switching to Proxmox because that where the cool kids hang (thats where the projects and tutorial videos are), So hard to find Harvester content.
lol I love how the official Ceph documentation questions the need for multiple networks, but the users went nuts all over the internet and demand that you use it
I concur, those Silicon Power (SPCC) SATA SSDs do suck. I discovered the company hardcoded the SMART data! They are all fixed to display 40 C. No matter what the actual temperature is. This seems to be a response to a review on Amazon that their SSDs were running as high as 60 C and failing prematurely. And that reviewer noted the replacements all read 40 C. I bought a 1 and 2 TB SATA SSD and both of mine never waver from 40 C. Even when cold booting at a much lower ambient temperature. Or testing them under CrystalDiskMark. F-cking Amazon pulled my review down with my findings! Buyer beware.
Probably the better way to setup ceph would have been to get 2 switches in an MLAG and do an LACP with the 2 ports to get 20git for ceph and VMs. Since you are in a non production environment where your servers get hit with 10g incomming traffic from the internet ceph has more ressources and failover capability
I don't know about proxmox, but with ESXi you can add variables to the VM to fix the code 43 error with GPUs. Although from my experience that error only came up with older NVIDIA cards. Basically you gotta tell the VM it isn't a VM.
I'm looking at replacing an existing vSphere Enterprise with a shared storage enterprise "grade" virtualization platform. Maybe I missed it, but I seem to be having a problem finding anyone who can demonstrate High Availability (HA) of the hypervisor nodes in these three scenarios below. Everyone has videos on setting up the cluster, live migration but I'm not seeing anyone doing actual tests of a complete or partial failure of one of the cluster nodes. 1) Complete node fail -- just pull the power plug(s) out to simulate. How does Proxmox handle dozens of VMs powering on? Does it have a DRS type function where it will distribute the VMs across the remaining nodes? Is there an ability to have specified VMs prioritized over other VMs? Also, the ability to restart VMs in a specific order? 2) Partial fail where the hypervisor is in some sort of hung state and the VMs are down but the storage is still accessible and any file locks (if applicable) are still held? 3) Host isolation. What happens when the Proxmox host is unreachable from the management side but the VMs running are still accessible? Will it allow VMs to still run? Will it provide an option to restart VMs on other nodes? Thanks.
I wonder if the 00 is the code the bios is reporting to the IPMI, Then I'd hazard a guess its a cracked ball joint under the cpu possibly. A reflow might fix it. I have no real experience in that, but its just a random guess based on other things I've seen.
re: gpu. It's either going to be you need this in your file: vfio_iommu_type1.allow_unsafe_interrupts=1 or you need hugepagesz=1G default_hugepagesz=2M in your grub and hugepages: 2 and balloon: 0 in your /etc/pve/qemu-server/VMID.conf or your hardware just can't handle outputting to the physical ports on the gpu, in which case if you turn off default gpu on its setting in hardware it will still work but only over vnc/for computation, which is not that useful in your case probably (it's way more useful if you are importing gpus to to hardware encode/decode for plex/jellyfin.)
If I comment, It's going to be valuable. I won't admit I am taking up space. I am a valuable person. And my comments will have value and be pertinent and everyone will give me lots of thumbs up and the trolls will get a shock from their device when they try and thumbs down. Everyone reading this should be glad about the time they spent reading it!
@@RaidOwl that would be quite much for me .. is that idle or load? in the case you showed 1 node can run the VMs and stuff while the other 2 nodes are basically fully idle (until there is a crash of node 1). are the 30-35W idle (for node 2 and 3) or load (node 1)? thanks for the reply
Man please, im from spain, could you tell me where to buy that kind of server chasis in order to build my own high Availability clusters servers. Please, i have to build some of them for my clients and i. i'll appreciate any kind of help!!! Have a really good one!! And by the way keep us woth all of thos awesome content
No no no I just got rid of one of my 2u servers to save on power and noise. I don't need a 3 1us for a cluster to run k8s like I've wanted to.....how loud are these....asking for a friend.
Kinda weird request, and I know I most definitely should not be buying hardware based on aesthetics, but could you let me know what these chassis look like with rack mounted ubiquiti gear? Do the two silvers look good together, or do they clash?
I wanted to mimic this setup but for the switch that has the dedicated CEPH network...if that switch needs to reboot say for an update....would that wreck a lot of stuff since all 3 hosts lose communication to each other over CEPH? Have you tested that?
I'm just here taking up space, but I'm trying to understand the high availability part -- what sort of failures are we trying to protect against? the shared ceph pool seems to be the single point of failure that would take down the entire cluster? A single drive failure isn't an issue with raid, but what if that hardware failure that isn't a drive?
@@RaidOwl I think I've mixed something up then. Is there a 4th server that holds all the drives doing the ceph storage ( the larger 4U under the 3 nodes ), or is the ceph storage replicated across, and exists on, the drives in the node servers?
@@JonathanDavisJJ Ceph runs on each of the 3 nodes using each of the Micron SSDs in each of the nodes. So yeah, the ceph storage is replicated across all 3 nodes.
I would love to have a proxmox cluster for my home lab. I could not get GPU passthrough to work with my setup either. Does proxmox do load balancing were it would move a vm to the other host if it is less busy then the other host? Thanks for sharing.
I think you're supposed to have TWO separate Ceph networks, one for their "private" and one for their "public" -- plus 10 gig for your proxmox vm network, making three, then a separate gig network for corosync and yet another gig network for proxmox system (separate from vm network). I'm just looking into this now though and I have read that putting the two ceph networks on one NIC is usually fine for most people. But I'm just figuring this out myself too.
I must admit , when I first came across your channel, I found you and/or your method of presentation to be somewhat annoying. But the overlords at RUclips and their algo kept on pushing your content to my feed and after watching more of your videos, I have actually started taken a liking to your awkward sense of humor. I also enjoy that you share all of your mistakes and blunders with us, which any homelabber can relate to. So I guess I’ll hit that subscribe button!
I am glad you did not break your motherboard please read MBD-X10SDV-4C-TLN2F manual (can be downloaded from supermicro home page) that 4pin is not for atx 4 pin power its for dedicated DC supply. You need to either use 24pin atx or 4 pin. DO NOT use them both! Quote from manual: "Do not use the 4-pin DC power at PJ1 when the 24-pin ATX Power at JPW1 is connected to the power supply. Do not plug in both PJ1 and JPW1 at the same time."
Oh neat! 🙃
RTFM - The truest words to abide by.
@@longnamedude3947 its too long
@@longnamedude3947 * insert Michael Scott NOOOOO gif here *
@@RaidOwlAnother is PEBKAC
I'm gonna be honest as well. I DO NOT NEED a lot of my homelab stuff. However, i do love cosplay. Specifically cosplaying as a sysadmin/automation engineer.
Finally someone honest
The PCIe passthrough definitely works on those boards. If you’ve set the vfio stuff in /etc/modules/ and “intel_iommu=on” in PVE and still not booting, make user the BIOS has the VT-d extension and IOMMU enabled. Thanks for the tour of the new cluster!
Excellent presentation. Thank you. I have been testing this Proxmox stuff in my home lab. I have built a a few full-on NVME ESXi/VSAN setups and I am really amazed how good Proxmox is, especially for the home lab price.
Thanks for putting together this video and the previous one showing ProxMox VE HCI with less expensive hardware.
The two 10GbE switches shown are each a single point of failure. To upgrade the networking to HA, these could be replaced with two switches configured with MLAG. VLANs can be used to create the two logical networks shown: Host and Ceph.
For maintenance like internal drive or part replacement, having four nodes instead of the minimum three would allow one node to be safely removed at any time to perform orderly maintenance and upgrades.
When one of only three nodes is intentionally made unavailable to perform maintenance, the two remaining nodes are in a degraded state for some services (including Ceph), and if anything unlucky happens to one of the two remaining nodes during the maintenance window, there is no longer a cluster.
Great tour! I love that you include the mistakes and it's not a "do this and the HA gods will bless you" tutorial. I also had a devil of a time getting iGPU passthrough to work on Proxmox, although I'm running it on a Dell 3930 (with USB-C-only iGPU display output). I had to use cpu=host, q35, Virtio-GPU, PCI passthrough with PCI-e & x-vga, USB port passthrough (for the dummy dongle to work). I would still get error 43, but a quick disable/enable cycle in Windows gets things back in order.
I really hope to see you do MORE videos about Proxmox!
I'm still in the "just got my first HP Elitedesk mini PC as my new docker host instead of my NAS" phase of my homelab journey but I still enjoy watching there videos even though I don't ever expect to rackmount my servers (unless perhaps it's a very small one that can mount miniPC's).
8:59 THANK YOU!!!! I was looking all around to find out which order this went in!
"cyberbullied by some neckbeard" - priceless. That kept me smiling right to the end and then some. Thanks owl.
I'm just here taking up space!
Damn dude, that’s a badass cluster setup. I just bought 3 2U PowerEdge R730’s, which is clearly less efficient, and you’re is really high performance. Me likey.
This is essentially the same route I took, except I used 5 Dell 7050s 1L SFF PCs, NVMe for ProxMox, 1 TB SSDs on each node for Ceph and a dedicated backhaul network for Ceph. All in, it cost about $750 and my only regret was not waiting until Prime Day to get 2TB SSDs for about what I paid for the 1TB drives. 🤦♂️
I haven't made it the primary cluster yet though and am still trying to figure out what I really want to do with it. 😅
Great video though!
Please make a follow-up video on this setup when it completes 1 year with your new learnings along the way. 🙂
Agreed
Would love a non-neckbeard approach/mindset to Ceph/CephFS/Rook setup on that cluster as a follow-up. Fighting thru that on my own setup.
Yeah I got some stuff to try
@@RaidOwl Awesome. More responses for the RUclips algorithm overloards.
I'm just here taking up space - For PV(C) in your k3s Cluster, I recommend using "Rook with an external Ceph cluster", i.e. the Ceph storage provided by ProxMox.
Someone else mentioned Rook. Ima look into it for sure
U took me back to my youth with Alvin and the chipmunks proxmox server naming 😊
Good thing you can directly use your ceph cluster as a csi backend! And if you create a cephs and mons you can even use RWX pvcs
you deserved this and you probably want to follow this thread down the line - you will find you do need this once you get all the kinks out - do a ha opnsense next - non virt total bare metal... you will want to max the ram out on these - more ram equals more better - great you have an upgrade path - you will want to go all nvme - that seems to be your weakest link - pls update with and make the cluster fabric 20g bonded and add a usb 2.5 for mgt inf - please explore other netfs options - nfs, zfs, ocfs2, sshfs, gluster
The cool thing about those inwin cases is that you can swap the position of the PSU and front I/O ports, swap the rack ears to the other side and then you have all your motherboard I/O and PCI slots at the front of the rack while leaving power at the back.
Thanks for the video. Missing one thing though - simulated failure for one of the nodes
I ran a mirrored set of 870 evos in my proxmox cluster and the performance was okay, as long as i didn't update more than one VM at a time or download and install large packages/binaries. I/O delay would cause random VMs to become unresponsive and general instability
Promxox and ZFS really needs enterprise drives with larger caches and high endurance.
I'm seriously contemplating ordering a few of the Topton Intel 8505 router boxes and running them in a cluster like this... More powerful processor with lower power draw than the SuperMicro you're using, 6x 2.5Gbps NICs for direct host-to-host connectivity without a switch (and enough extra ports for network connectivity), and they're completely passive on cooling. Only real downside is there's no PCIe expansion to speak of, and it's not Xeon / ECC, but for the price (around the same price as you spent for each of these, if not a touch cheaper) they would make for an awesome cluster!
I'm currently running one for my router and it's been rock solid, and I've got a couple of Chenbro 1U servers that are due for replacement.
"I'm just here so I won't get fined."
- Someone else
Okay Marshawn
I've just finished creating a pair of Proxmox servers for myself hosting my original 6x rPi's rebuilt as Debian VMs.
Each machine has the following specs :
- Inter-tech K-125L 1U rackmount case
- Akyga 200W PSU
- ASRock J5040-ITX M/board
- 32GB DDR4 2400 RAM (2x16GB)
- 1x 500GB Samsung 870 EVO SSD for the boot drive (overkill as I originally ordered 256GB but weren't in stock but the retailer supplied for the same price)
- 1x 4TB Samdung 870 EVO SSD for VM storage (again overkill, but i have plenty of available space and they're sheap)
- 3x Noctua 40mm NF-A4x20 FLX 5000 fans
Both machines run super cool and quiet and have plenty of power for my current needs, with each only using
I just love your content...hmmm feels like home ❤
I built a proxmox cluster using Supermicro M11SDV-8C-LN4F AMD Epyc 3251 board. 8-core, 16-thread, 65W total usage with 4 sticks of ECC ram and a SATA SSD boot disk under load. Another Mini ITX board, and though it's Zen 1, its power usage is the reason I chose it. I need fast networking too, but storage is handled differently for high availability in my network. So the 1 PCI-e slot is used for a 10 Gbit nic, because though Epyc 3xx1 supports 10Gigabit networking on chip, this board doesn't have 10gig ports.
I'm just here taking up space - and laughing. Thanks for the entertainment :)
Just a wonderful inspiring video thx 😊
"I'm just here taking up space" - me, 2024
Regarding PCI Passthrough:
Some vendors literally block it from properly working, HP ProLiant Servers for example. Ive been cracking my head for literal weeks with them.
After trying it with some Lenovo Servers it worked instantly for me.
Just a heads up, that sometimes its literally impossible to get it to work
Liking this video 😊 appreciate a year later... if you were building today? What motherboard/cpu/ram combo would you use in this case?
Some suggetions for improvement:
First of all... You spent so much money on high availability, but ended up creating another single point of failure, which is the ceph switch. Thing is... For a 3 node cluster you don't even need a switch at all. Instead you can make a direct connection from each to each node without a switch in between (plenty on info in the proxmox forum on how to do this) and as a result you get minimum latency and no single point of failure in the ceph network. Of course you need to use both of the 10G connections that the Supermicro board has for that, but using the other one for a 1G connection is a terrible waste anyway. So use the PCIe slot to get a 4-port 1G Card, which are very cheap available second hand. Fromt that 4 additional 1G ports, you can use 2 to apply the same principle for cross-connecting the nodes directly and use that for the cluster backend network exclusively.
The only downside of this woud be, that it will be not so easy to expand the cluster with more nodes, compared to a solution that uses switches to connect the ceph and cluster backend network.
Of course you could ramp up things even further by using a 4x10G or even 4x25G PCIe card instead of a 4x1G. Although that will add another few bucks to the bill, it would open up a possibility to get a card with direct memory access support (RDMA), which is also supported by proxmox and known to be very beneficial for ceph performance.
You're welcome. :)
NICE!, used this same chassis in a firewall build. Good case but the back io shields were a PITA> !
Lol yeah I just avoided those
@@RaidOwl I saw :) Nice setup sir !
I have a similar setup, for those who want a 3 node cluster but don’t want to splash out on a 10g switch, you can use duel 10gig nics in a full mesh network so each node has a direct connection to each other, works well and removes a single point of failure! (The switch)
or go for the quad 25G nic in the pci-slot way and use them in mesh network with dac/fiber and leave the 10G copper for outbound networking.
Great video love this style and subject
I would recommend you to replace those silicon power SSDs. I had a few of these in the datacenter running only as proxmox boot disk and all died after a few month.
I love these cases. They are very hard to get hands on.
if you had NVME drive and need for bandwidth, I'd probably pick 25Gbe or so NICs for the available PCIe slot,
another option would be to populate that PCIe slot with multi-nvme PCIe board, I think Sonnet makes some new one with 8 nvme bays on PCIe 4.0 x16 which is wild :D
7400 Micron drives are NVMe. Faster in Ceph than any consumer drive can dream of. And the quad core can't handle more than a single NVMe at full load anyway.
I am just here taking up space - you 😂
I would like to see some benchmarks. Ideally with database usage :)
Some filesystem Benchmarks would be nice
I’m just here taking up space!!
Hei, good setup, and very interesting video, thanks ^^
connect a sas cart on that slot and an LTO tape drive for backup
I'm sorry to say...
I love you video's. Very well edited.
I’m sorry but….thank you
I’m here just taking up space 🙌
-some dude
I am running 2 Dell r620's with 10c/20t and 64GB ram each. I need 1 more to make a matching trio. The 2 Dells pull about 130w. I am using Harvester right now but really thinking about switching to Proxmox because that where the cool kids hang (thats where the projects and tutorial videos are), So hard to find Harvester content.
lol I love how the official Ceph documentation questions the need for multiple networks, but the users went nuts all over the internet and demand that you use it
Love it
"I am here just taking up space"
I watched the Livestream already, so I'm definitely just here taking up space
I'm Just Here Taking Up Space...but love the content.
Love the realness, linux server nerd bro
I concur, those Silicon Power (SPCC) SATA SSDs do suck. I discovered the company hardcoded the SMART data! They are all fixed to display 40 C. No matter what the actual temperature is. This seems to be a response to a review on Amazon that their SSDs were running as high as 60 C and failing prematurely. And that reviewer noted the replacements all read 40 C. I bought a 1 and 2 TB SATA SSD and both of mine never waver from 40 C. Even when cold booting at a much lower ambient temperature. Or testing them under CrystalDiskMark. F-cking Amazon pulled my review down with my findings! Buyer beware.
Probably the better way to setup ceph would have been to get 2 switches in an MLAG and do an LACP with the 2 ports to get 20git for ceph and VMs.
Since you are in a non production environment where your servers get hit with 10g incomming traffic from the internet ceph has more ressources and failover capability
I’m just here taking up space 😂
I am here for the chat!! Allways easy to learn from
Im here just taking up space 😂
Just here taking up space
I don't know about proxmox, but with ESXi you can add variables to the VM to fix the code 43 error with GPUs. Although from my experience that error only came up with older NVIDIA cards. Basically you gotta tell the VM it isn't a VM.
If you have a time machine you could go back in time to stop yourself but if not then. I am just taking up space.
I'm just here taking up space :)
I'm looking at replacing an existing vSphere Enterprise with a shared storage enterprise "grade" virtualization platform. Maybe I missed it, but I seem to be having a problem finding anyone who can demonstrate High Availability (HA) of the hypervisor nodes in these three scenarios below. Everyone has videos on setting up the cluster, live migration but I'm not seeing anyone doing actual tests of a complete or partial failure of one of the cluster nodes.
1) Complete node fail -- just pull the power plug(s) out to simulate. How does Proxmox handle dozens of VMs powering on? Does it have a DRS type function where it will distribute the VMs across the remaining nodes? Is there an ability to have specified VMs prioritized over other VMs? Also, the ability to restart VMs in a specific order?
2) Partial fail where the hypervisor is in some sort of hung state and the VMs are down but the storage is still accessible and any file locks (if applicable) are still held?
3) Host isolation. What happens when the Proxmox host is unreachable from the management side but the VMs running are still accessible? Will it allow VMs to still run? Will it provide an option to restart VMs on other nodes?
Thanks.
What is your energy use under load at at rest? This might be just what I need.
Under load they pull just over 100W. At idle it’s like 85
I’d link my guide but it’s not the first result on google
Doesn’t count then
I'm just here taking up space, I won't spend that kind of money.
Funny story.. those chassis are used by one of our client's vendors as NVRs
I am here just taking up space :P
thanks for another great video, awesome. have a great day
I wonder if the 00 is the code the bios is reporting to the IPMI, Then I'd hazard a guess its a cracked ball joint under the cpu possibly. A reflow might fix it. I have no real experience in that, but its just a random guess based on other things I've seen.
I’m just here taking space
re: gpu. It's either going to be you need this in your file: vfio_iommu_type1.allow_unsafe_interrupts=1 or you need hugepagesz=1G default_hugepagesz=2M in your grub and hugepages: 2 and balloon: 0 in your /etc/pve/qemu-server/VMID.conf or your hardware just can't handle outputting to the physical ports on the gpu, in which case if you turn off default gpu on its setting in hardware it will still work but only over vnc/for computation, which is not that useful in your case probably (it's way more useful if you are importing gpus to to hardware encode/decode for plex/jellyfin.)
did you use eltro past
anything to be gained by glusterfs the 1TB X 3 SSD ?
im just here taking up space c:
I'm just here taking up bandwidth
Gotta love those musical rodents
Sure do
I’m just here, late, taking up space.
I'm just space taking up here
I'm just here taking up space.
You know the rules, so do I
If I comment, It's going to be valuable. I won't admit I am taking up space. I am a valuable person. And my comments will have value and be pertinent and everyone will give me lots of thumbs up and the trolls will get a shock from their device when they try and thumbs down. Everyone reading this should be glad about the time they spent reading it!
Wow such beauty
Not sure if I missed it .. but power consumption (idle/load) per node would have been nice too - otherwise cool video!
About 30-35W per node
@@RaidOwl that would be quite much for me .. is that idle or load? in the case you showed 1 node can run the VMs and stuff while the other 2 nodes are basically fully idle (until there is a crash of node 1). are the 30-35W idle (for node 2 and 3) or load (node 1)? thanks for the reply
Lookup "Proxmox Kernel 5.15.60-1-pve Breaks PCI Passthrough" I spent days trying to get PCI passthrough working until I found out about kernel issue
Man please, im from spain, could you tell me where to buy that kind of server chasis in order to build my own high Availability clusters servers. Please, i have to build some of them for my clients and i. i'll appreciate any kind of help!!! Have a really good one!! And by the way keep us woth all of thos awesome content
"I'm just here taking up space"
-Me
well said
Im just here taking up space
No no no I just got rid of one of my 2u servers to save on power and noise. I don't need a 3 1us for a cluster to run k8s like I've wanted to.....how loud are these....asking for a friend.
Actually verrrrry quiet. I forgot to put my sound test in here but I used some low adapters from Noctua and they work great.
I also don’t need these & want to know how loud they are… for … a… “friend”… haha
I’m just here taking up space
Kinda weird request, and I know I most definitely should not be buying hardware based on aesthetics, but could you let me know what these chassis look like with rack mounted ubiquiti gear? Do the two silvers look good together, or do they clash?
I wanted to mimic this setup but for the switch that has the dedicated CEPH network...if that switch needs to reboot say for an update....would that wreck a lot of stuff since all 3 hosts lose communication to each other over CEPH? Have you tested that?
you can put the ceph cluster in maint mode or just let it pause on its own. source, lost two switches powering a cluster.
Did you try and pass through the gpu without the rizer cable?
I did not 🤔
I'm just here taking up space, but I'm trying to understand the high availability part -- what sort of failures are we trying to protect against? the shared ceph pool seems to be the single point of failure that would take down the entire cluster? A single drive failure isn't an issue with raid, but what if that hardware failure that isn't a drive?
An entire server could blow up and everything would keep running
@@RaidOwl I think I've mixed something up then. Is there a 4th server that holds all the drives doing the ceph storage ( the larger 4U under the 3 nodes ), or is the ceph storage replicated across, and exists on, the drives in the node servers?
@@JonathanDavisJJ Ceph runs on each of the 3 nodes using each of the Micron SSDs in each of the nodes. So yeah, the ceph storage is replicated across all 3 nodes.
1amp for all three? Whats the voltage in the US? 220?
120
what is the read and write performance of your Ceph Cluster?
I would love to have a proxmox cluster for my home lab. I could not get GPU passthrough to work with my setup either. Does proxmox do load balancing were it would move a vm to the other host if it is less busy then the other host? Thanks for sharing.
I think you're supposed to have TWO separate Ceph networks, one for their "private" and one for their "public" -- plus 10 gig for your proxmox vm network, making three, then a separate gig network for corosync and yet another gig network for proxmox system (separate from vm network). I'm just looking into this now though and I have read that putting the two ceph networks on one NIC is usually fine for most people. But I'm just figuring this out myself too.
I must admit , when I first came across your channel, I found you and/or your method of presentation to be somewhat annoying. But the overlords at RUclips and their algo kept on pushing your content to my feed and after watching more of your videos, I have actually started taken a liking to your awkward sense of humor. I also enjoy that you share all of your mistakes and blunders with us, which any homelabber can relate to. So I guess I’ll hit that subscribe button!
Praise to the almighty RUclips overlords 🙏🏼
Has it enough CPU power to encode /transcode 4K videos for Plex?
10:09 holy banana. no standoffs? :S
Tried the spell too.
Now how do I revert, there's no snapshot for that.
Would you still recommend the Zima board?
For sure, just manage your expectations