OpenHPC: Node's provisioned, config file discussion, and issues resolved!

Поделиться
HTML-код
  • Опубликовано: 16 сен 2024
  • So provisioning starts off a bit rough but we go over how all the pieces get brought together with Warewulf and allow nodes to get deployed and show that off. Then we discuss a bit on how slurm gets configured on nodes to talk to the headnode to know how to schedule jobs! (We'll do more on slurm at a later date if wanted).
    Thank you for all the support!
    All config files will be here: github.com/smi...

Комментарии • 8

  • @szbalogh
    @szbalogh 2 месяца назад +1

    Super useful series! Thank you! We just received our machine 4x80 cores. Cant wait to try out this build... but first i need to source two special powercords to be eble to fire it up XD

  • @SyberPrepper
    @SyberPrepper 2 месяца назад +1

    Well, I really appreciate the videos and the time it took to make them. Rarely do we get this kind of detail on a complex topic as this. I'm kind of shaking my head here wondering if I'll ever get something like this working. I would love to know places that are using this technology to run STAR-CCM+ or whatever. I hope your employer allows you to share more information. Thanks again for your work on this. It is inspiring.

    • @sysadminsean
      @sysadminsean  2 месяца назад +1

      this is pretty much just the surface of it all. It can get even wilder. So you can take software like spack, and let people deploy singularity for their jobs to basically run containers inside nodes so that its like nodes inside nodes to do even more complex jobs and job arrays, and even more stuff that we don't even begin to touch where I work because it's not really a focal point. Now I don't cover it in this series (cause I can't afford to have any) but you can do GPU based HPC or gpu pinned based HPC as well for doing al the AI, LLM, and even bigger computations.

    • @SyberPrepper
      @SyberPrepper 2 месяца назад +1

      @@sysadminsean Sounds like I need a bigger team. :)

  • @AshokReddyM-m6k
    @AshokReddyM-m6k 2 месяца назад +1

    Really appreciated your time and efforts put in ...
    Im facing the same error.. the compute nodes are picking the dhcp but not loading os ?

    • @sysadminsean
      @sysadminsean  2 месяца назад

      Check the following stuff
      1.) firewall on and ports open (80 for http, how pxe offers up the image)
      2.) verify that httpd, and tftpd service is running
      3.) verify that dhcpd.conf has the info for ipxe to tell the nodes where to go for their images.

  • @AshokReddyM-m6k
    @AshokReddyM-m6k 2 месяца назад +1

    Eth provision should be ens19 right ? Which one we should use ?

    • @sysadminsean
      @sysadminsean  2 месяца назад

      Yeah this was confusing on my machine it would be ens19 the 10.0.0.1 ip address. This is the flat network that talks between the headnode and the compute nodes.