Hey Mitch, thanks for getting started on this video series so fast. I would like to see tuning for VM workloads first. So RBD. Thanks! Keep up the great work!
First, congratulations and thank you for getting this Ceph Cluster benchmark series kicked off. I really like the way you started this: being a networked FS, we really need to ensure the foundation of the Ceph Cluster - the CPU, Network and the Discs meet our requirements. The insight about disabling higher C States for CPU power saving was very helpful. I can see that a lot of us here are using Ceph to back VM workloads and yes, iSCSI setup, tuning and benchmarks would really be appreciated! I might have missed a detail - to get those nice 10Gbps numbers, was custom network tuning done Eg - jumboframes (and maybe a discussion of the pros and cons)?
Jumbo frames will definitely help for throughput but latency can also be a killer. If there are any networking issues such as bad cable, switch issues and etc jumbo frames will work against you as it has to retransmit the larger 9k packets instead of the 1.5k packets that normal network use.
This is great. I'm going to take a look at tuning the ceph cluster we built at work. Would love to see a top end version of this with an all NVMe setup.
I am debating on running CEPH again on our 7 node ProxMox cluster. Right now it's running ZFS with replication to keep things simple. I'm more into HA in keeping all the VMs running when a node fails. I've ran CEPH before on older version of ProxMox three years ago and had serious performance issues when it rebalances. I am thinking of having 4 compute nodes and 3 CEPH nodes just to separate this out instead of all 7 nodes running the VMs and CEPH at the same time on dedicated 10 gig network. I won't be using iSCSI so it'll all be through RBD. Any thoughts?
I tried to tune our cluster with tuned network-latency profile on PowerEdge R730xd Dell servers running Ubuntu 20.04.4 LTS and our CPUs started to overheat from 50 °C to 65 °C and increase so I decided to switch back to default balanced profile. C states was empty and Busy% was 100% on every single core.
Performance = More Watts = More Heat = More pressure on HVAC. There's no way to improve the performance without increasing the power consumption. (except free cooling in the winter) :)
@@DanBroscoi Running this at a large data center shouldn't be an issue. Servers are built like tanks as they're designed to run 24/7 so I wouldn't worry too much about the CPU heat output.
Hey Mitch, thanks for getting started on this video series so fast. I would like to see tuning for VM workloads first. So RBD. Thanks! Keep up the great work!
Great start, looking forward to the rest of this series. For me RBD peformance is most important.
First, congratulations and thank you for getting this Ceph Cluster benchmark series kicked off.
I really like the way you started this: being a networked FS, we really need to ensure the foundation of the Ceph Cluster - the CPU, Network and the Discs meet our requirements. The insight about disabling higher C States for CPU power saving was very helpful.
I can see that a lot of us here are using Ceph to back VM workloads and yes, iSCSI setup, tuning and benchmarks would really be appreciated!
I might have missed a detail - to get those nice 10Gbps numbers, was custom network tuning done Eg - jumboframes (and maybe a discussion of the pros and cons)?
Jumbo frames will definitely help for throughput but latency can also be a killer. If there are any networking issues such as bad cable, switch issues and etc jumbo frames will work against you as it has to retransmit the larger 9k packets instead of the 1.5k packets that normal network use.
This is great. I'm going to take a look at tuning the ceph cluster we built at work. Would love to see a top end version of this with an all NVMe setup.
Bring on Part 2!
is there any part 2?
I'm looking for the 2nd and maybe third videos from this series and can't seem to find them. Are they named differently?
I am debating on running CEPH again on our 7 node ProxMox cluster. Right now it's running ZFS with replication to keep things simple. I'm more into HA in keeping all the VMs running when a node fails. I've ran CEPH before on older version of ProxMox three years ago and had serious performance issues when it rebalances. I am thinking of having 4 compute nodes and 3 CEPH nodes just to separate this out instead of all 7 nodes running the VMs and CEPH at the same time on dedicated 10 gig network. I won't be using iSCSI so it'll all be through RBD. Any thoughts?
Vm workloads and iscsi, hopefully with vmware as well. ,nfs (not vms) pretty please
Would be very interested on this as well
I tried to tune our cluster with tuned network-latency profile on PowerEdge R730xd Dell servers running Ubuntu 20.04.4 LTS and our CPUs started to overheat from 50 °C to 65 °C and increase so I decided to switch back to default balanced profile. C states was empty and Busy% was 100% on every single core.
Performance = More Watts = More Heat = More pressure on HVAC. There's no way to improve the performance without increasing the power consumption. (except free cooling in the winter) :)
@@DanBroscoi Running this at a large data center shouldn't be an issue. Servers are built like tanks as they're designed to run 24/7 so I wouldn't worry too much about the CPU heat output.