Precisely on cue, the fourth compute node (c4) was working immediately after the stream was over. Sorry we weren't able to dig into why it wasn't working during the webinar! All we needed to do was `scontrol update nodename=c4 state=resume` to return the node to service after it was "unexpectedly" rebooted for the purposes of the demo. I even ran this command during the demo, as `scontrol update nodename=c[1-4] state=resume`; but this generated an error because c[1-3] were already "resumed." That error was a distraction, though, and didn't prevent it from resuming c4, which is why I discovered it working after the demo. If I had just tried one more time, it would have worked.
Excellent information. Particularly the part about leveraging the sysadmin side of containers in warewulf to help sysadmins improve their container comfort.
Here are links to a couple of our previous webinars along with time stamps. Feel free to reach out to us at info@ciq.co if you would like any additional information. ruclips.net/video/JBQxdfcLC08/видео.html Tensorflow Jupyter Notebook/Workflow Lifecycle/Fuzzball UI - 54:03 to 1:09:56 ruclips.net/video/Pbmxq3dg35E/видео.html PyTorch Jupyter notebook - 37:00 to 1:03:45
Do you have any examples with InfiniBand devices? We have a 24-core head node (with must run OpenSM, so one IB card) and twelve 48-core compute nodes with dual 56 Gb InfiniBand cards, 10 GbE for file serving / SLURM and a IMPI (really iDRAC) network for machine control. We're in a college environment and this is our first HPC, but we're really struggling with the deployment phase. The video was very helpful and I really never thought of using containers vs. traditional HPC.
Hi Todd. For the control node running OpenSM, just setup the IB as you normally would on a stateful system, Warewulf doesn't interfere at all there. In terms of the compute nodes, you would do something like: wwctl node set --netname infiniband --netdev ib0 --ipaddr x.x.x.x --netmask x.x.x.x n0000
Hi severgun! It wouldn’t be too hard, as it’s just golang. It should run everywhere. Only nit might be making sure the generated tfpd and dhcp server templates are compatible. I hope this helps! :)
Precisely on cue, the fourth compute node (c4) was working immediately after the stream was over. Sorry we weren't able to dig into why it wasn't working during the webinar! All we needed to do was `scontrol update nodename=c4 state=resume` to return the node to service after it was "unexpectedly" rebooted for the purposes of the demo. I even ran this command during the demo, as `scontrol update nodename=c[1-4] state=resume`; but this generated an error because c[1-3] were already "resumed." That error was a distraction, though, and didn't prevent it from resuming c4, which is why I discovered it working after the demo. If I had just tried one more time, it would have worked.
Excellent information. Particularly the part about leveraging the sysadmin side of containers in warewulf to help sysadmins improve their container comfort.
Simply amazing! Could you demonstrate examples in python? It would also be interesting examples with GPUs with Pytorch and Tensorflow.
Here are links to a couple of our previous webinars along with time stamps. Feel free to reach out to us at info@ciq.co if you would like any additional information.
ruclips.net/video/JBQxdfcLC08/видео.html
Tensorflow Jupyter Notebook/Workflow Lifecycle/Fuzzball UI - 54:03 to 1:09:56
ruclips.net/video/Pbmxq3dg35E/видео.html
PyTorch Jupyter notebook - 37:00 to 1:03:45
Do you have any examples with InfiniBand devices? We have a 24-core head node (with must run OpenSM, so one IB card) and twelve 48-core compute nodes with dual 56 Gb InfiniBand cards, 10 GbE for file serving / SLURM and a IMPI (really iDRAC) network for machine control. We're in a college environment and this is our first HPC, but we're really struggling with the deployment phase. The video was very helpful and I really never thought of using containers vs. traditional HPC.
Hi Todd. For the control node running OpenSM, just setup the IB as you normally would on a stateful system, Warewulf doesn't interfere at all there. In terms of the compute nodes, you would do something like: wwctl node set --netname infiniband --netdev ib0 --ipaddr x.x.x.x --netmask x.x.x.x n0000
Do you have Rocky Linux container on ARM64 (aarch64) for Warewulf?
Not yet, but please check out github.com/hpcng/warewulf/issues/62 for updates!
@@CtrlIQ Any news about ARM64 (aarch64)?
Is there any plans to provide prebuild packages for debian / ubuntu LTS?
Hi severgun!
It wouldn’t be too hard, as it’s just golang. It should run everywhere. Only nit might be making sure the generated tfpd and dhcp server templates are compatible.
I hope this helps! :)