SC'19: Tutorials: IB, Omni-Path, and HSE: Advanced Features, Challenges & Usage

Поделиться
HTML-код
  • Опубликовано: 10 фев 2025
  • As InfiniBand (IB), Omni-Path, and High-Speed Ethernet (HSE) technologies
    mature, they are being used to design and deploy various High-End Computing
    (HEC) systems: HPC clusters with GPGPUs supporting MPI, Storage
    and Parallel File Systems, Cloud Computing systems with SR-IOV Virtualization,
    Grid Computing systems, and Deep Learning systems. These systems are bringing
    new challenges in terms of performance, scalability, portability, reliability
    and network congestion. Many scientists, engineers, researchers, managers and
    system administrators are becoming interested in learning about these
    challenges, approaches being used to solve these challenges, and the associated
    impact on performance and scalability. This tutorial will start with an
    overview of these systems. Advanced hardware and software features of IB,
    Omni-Path, HSE, and RoCE and their capabilities to address these challenges will
    be emphasized. Next, we will focus on Open Fabrics RDMA and Libfabrics
    programming, and network management infrastructure and tools to effectively use
    these systems. A common set of challenges being faced while designing these
    systems will be presented. Case studies focusing on domain-specific
    challenges in designing these systems,
    their solutions and sample performance numbers will be presented.
    Finally, hands-on exercises will be carried out with
    Open Fabrics and Libfabrics software stacks and Network Management tools.
    sc19.supercomp...

Комментарии •