Skip to main content

Cornelis Technical Documentation

6.9.3. AMD GPUs with ROCm

6.9.3.1. Prerequisites

To achieve optimal bandwidth and the lowest latency for ROCm-enabled GPU devices, ensure the following:

  • Hardware configuration

    Connect the AMD GPU and CN5000 SuperNIC to the same CPU socket or behind a PCIe switch.

  • MPI configuration

    Use an MPI implementation compatible with ROCm, such as Open MPI.

  • Compile Open MPI with ROCm support

    Include the ROCm path during Open MPI compilation. Example configure command:

    ./configure --with-rocm=/opt/rocm \
                --with-rccl=/opt/rocm \
                --with-cuda=no \
                --enable-orterun-prefix-by-default \
                LDFLAGS=-Wl,--enable-new-dtags \
                --with-ofi=/usr
  • After installation, ensure the following paths are updated

    • Add openmpi/bin to your PATH environment variable.

    • Include openmpi/lib and rocm/lib in your LD_LIBRARY_PATH.

6.9.3.2. Application and Benchmark Setup

  • Use a ROCm-enabled benchmark or application, such as OMB (OSU Micro-Benchmarks) version 7.5.

  • Ensure the application is built with --enable-rocm and any additional settings specified in the application’s README documentation.

Example run command:

mpirun -x FI_OPX_HFI_SELECT=0 -x HIP_VISIBLE_DEVICES=0 \
       -mca mtl ofi -x FI_PROVIDER=opx -mca btl self,vader \
       -np 2 -host node1,node2 ./osu_bw D D

As of OPX Software 12.1.1 release, higher bandwidth can be achieved by enabling BTS. If the hfi1 driver is loaded with use_bulksvc=Y, you can enable BTS by setting FI_OPX_HFISVC=1 and FI_HMEM_ROCR_USE_DMABUF=1.