6.1.1. CN5000 Fabric Performance Tuning Quick Start
The sections below summarize tunings for CN5000 performance, separated by MPI/OPX, Verbs, and IPoIB. This is only a rough guide, and individual clusters may require other tunings, discussed in other sections of this guide.
6.1.1.1. Highest Priority Tunings
Review and apply BIOS Settings and Linux Settings.
Enable processor turbo mode, if possible.
Enable “Performance Governor” with either ACPI or Intel P-State frequency driver:
> cpupower -c all frequency-set -g performance
6.1.1.2. MPI Using the OPX Provider
Make sure the MPI is using libfabric (OFI) with the OPX Provider. See Intel MPI Library Settings.
Use the latest available Intel MPI Library for optimized application performance. In some cases, Open MPI may perform better and is application dependent.
Improved bandwidth is available by setting
FI_OPX_HFISVC=1in the MPI job.Note this also requires loading the hfi1 driver with the module parameter
use_bulksvc=Y. For details, see HFI1 Driver Parameters.
6.1.1.3. Verbs
Improved bandwidth is available by loading the hfi1 driver with the module parameter use_bulksvc=Y. When using verbs, no additional flags are necessary. For details, see HFI1 Driver Parameters and Verbs.
6.1.1.4. IPoIB
Cornelis recommends using Connected Mode to achieve the highest single-threaded performance. The Connected Mode MTU size can be adjusted to 65520 bytes to achieve better bandwidth.
For the best multi-threaded bandwidth scaling, Datagram Mode should be used. For more information, see IPoIB Interface Configuration