Skip to main content

Cornelis Technical Documentation

6.5.7. Core Pinning

Core pinning can lead to significant performance improvements, for example, latency-sensitive MPI applications. If an MPI process accesses remote NUMA node, it may suffer from increased memory/cache access latency and additional overhead due to inter-NUMA traffic.

To reduce this performance penalty, it is recommended to pin MPI processes to CPU cores that are local to the SuperNIC’s NUMA node. For example, if the SuperNIC is on numa1 and the local cores are 32–47, use cores 32-47.

Use the I_MPI_PIN_PROCESSOR_LIST environment variable to set the core affinity:

mpirun -np 2 -ppn 1 -host hostA,hostB \
  -genv FI_PROVIDER=opx \
  -genv I_MPI_FABRICS=shm:ofi \
  -genv I_MPI_PIN_PROCESSOR_LIST=32-47 \
  IMB-MPI1 PingPong

This approach ensures that MPI ranks are scheduled on cores with optimal locality to the SuperNIC, helping reduce latency and improve overall communication performance.