6.8.8. IPoIB Bandwidth Benchmarking
Typically, the best performance is seen when running IPoIB interfaces in datagram mode with an MTU of 10236 bytes. Although single threaded performance is lower than connected mode with an MTU of 65520 bytes, datagram mode scales better with multiple threads. Note that iperf2 is recommended with the -P option in place of iperf3. Alternatively, IMB-MPI1 or OSU benchmarks could be run using libfabric, with the FI_PROVIDER=tcp option and specifying FI_TCP_IFACE=ib0 (replace ib0 with the IPoIB device name).
On some platforms, reducing PCIe Max Payload Size (MPS) and Max Read Request Size (MRRS) has been shown to improve IPoIB performance. For example, on Intel Emerald Rapids, PCIe MPS=256 and MRRS=256 is optimal.
iperf2 should be used as an alternative to running multiple instances of iperf3 for parallel throughput testing. It supports the --parallel (-P) option, which simplifies testing and can often deliver higher aggregate throughput.
6.8.8.1. Example: Running iperf2 with 16 Parallel Streams On the server
On the server:
iperf2 -s
On the client:
iperf2 -c <server_ipoib_address> -P16 --len 1M
This runs 16 parallel streams with a 1MB buffer length to the specified IPoIB address of the server.
Note
When using high thread counts, iperf2 may encounter connection issues such as:
write failed: Connection reset by peer
If this error occurs, increasing the socket backlog size in the iperf2 source code may resolve the issue.
6.8.8.1. Fix: Modify Listener.cpp
Open
src/Listener.cppin the iperf2 source code.Locate the following line:
rc = listen(mSettings->mSock, 5);
Change it to:
rc = listen(mSettings->mSock, 128);
Rebuild iperf2:
make clean && make
This change allows the server to handle a larger number of simultaneous incoming connections, reducing the chance of failures during high-parallelism testing.