6.5.3. MPI Latency
MPI latency is most accurately measured between two nodes, using one MPI rank per node. This configuration isolates the communication path, providing a direct measurement of the fabric’s point-to-point latency.
6.5.3.1. Example: Running IMB PingPong with Intel MPI
mpirun -np 2 -ppn 1 -host hostA,hostB \ -genv FI_PROVIDER=opx \ -genv I_MPI_FABRICS=shm:ofi \ -genv I_MPI_PIN_PROCESSOR_LIST=<local_numa_cores> \ IMB-MPI1 PingPong
Sample output:
#--------------------------------------------------- # Benchmarking PingPong # #processes = 2 #--------------------------------------------------- #bytes #repetitions t[usec] Mbytes/sec ...
The
t[usec]column represents latency in microseconds.Latency for 8-byte messages is typically used for performance comparisons.
The equivalent OSU Micro-Benchmark is
osu_latency.
For best results, ensure that MPI ranks are pinned to CPU cores local to the CN5000 SuperNIC (see Core Pinning).