6.5.3. MPI Latency

Skip to main content

Cornelis Technical Documentation

6.5.3. MPI Latency

MPI latency is most accurately measured between two nodes, using one MPI rank per node. This configuration isolates the communication path, providing a direct measurement of the fabric’s point-to-point latency.

6.5.3.1. Example: Running IMB PingPong with Intel MPI

mpirun -np 2 -ppn 1 -host hostA,hostB \
  -genv FI_PROVIDER=opx \
  -genv I_MPI_FABRICS=shm:ofi \
  -genv I_MPI_PIN_PROCESSOR_LIST=<local_numa_cores> \ 
  IMB-MPI1 PingPong

Sample output:

#---------------------------------------------------
# Benchmarking PingPong 
# #processes = 2
#---------------------------------------------------
#bytes #repetitions t[usec] Mbytes/sec
...

The t[usec] column represents latency in microseconds.
Latency for 8-byte messages is typically used for performance comparisons.
The equivalent OSU Micro-Benchmark is osu_latency.

For best results, ensure that MPI ranks are pinned to CPU cores local to the CN5000 SuperNIC (see Core Pinning).

Would you like to provide feedback? Just click here to suggest edits.