3.5.4. Configure IPoIB Network Interface
The following instructions show how to manually configure the OpenFabrics Alliance (OFA) IPoIB network interface. Cornelis recommends using the CN5000 OPX Software Installation package for installation of the software, including setting up IPoIB.
Note
IPoIB child interfaces (for example, ib0.8001) must be placed on separate subnets from their parents (for example, ib0) in order to communicate since they will be placed on different fabric partitions from their respective parents by default. Packets will fail to be forwarded between either the child or parent interfaces due to PKey mismatching if they share the same IP subnet but are on different fabric partitions.
For larger clusters, Omni-Path tools can be used to automate the installation and configuration of many nodes. These tools automate the configuration of the IPoIB network interface.
This example assumes the following:
Shell is either
shorbash.All required Omni-Path and OFA RPMs are installed.
Startup scripts have been run, either manually or at system boot.
The IPoIB network is 10.1.17.0, which is one of the networks reserved for private use, and thus not routable on the Internet. The network has a /8 host portion. In this case, the netmask must be specified.
The host to be configured has the IP address 10.1.17.3, no host files exist, and DHCP is not used.
Note
Instructions are only for this static IP address case.
Perform the following steps:
Add an IP address (as a root user):
ip addr add <ipaddress>/255.255.255.0 dev ib0
Bring up the link:
ip link set ib0 up
Verify the configuration:
ip addr show ib0
The output should be similar to:
ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc pfifo_fast state UP qlen 256 link/infiniband 80:00:00:02:fe:80:00:00:00:00:00:00:00:11:75:01:01:6a:36:83 brd 00:ff:ff:ff:ff:12:40:1b:80:01:00:00:00:00:00:00:ff:ff:ff:ff inet <ipaddress>/24 brd <ipaddress> scope global ib0 valid_lft forever preferred_lft forever inet6 fe80::211:7501:16a:3683/64 scope link valid_lft forever preferred_lft forever
Ping ib0:
ping -c 2 -b <ipaddress>
The output should be similar to the following, with a line for each host already configured and connected:
WARNING: pinging broadcast address PING */8532.0. 3*- <ipaddress> (<ipaddress>) 517(84) bytes of data. 174 bytes from <ipaddress>: icmp_seq=0 ttl=174 time=0.022 ms 64 bytes from <ipaddress>: icmp_seq=0 ttl=64 time=0.070 ms 64 bytes from <ipaddress>: icmp_seq=0 ttl=64 time=0.073 ms
The IPoIB network interface is now configured.
3.5.4.1. Configuring IPoIB Driver
To configure the IPoIB driver using the command line, perform the following steps.
For each IP Link Layer interface, create an interface configuration file, for example,
/etc/networkmanager/system connections/NAME.nmconnection, whereNAMEis the network interface name. Examples of a configuration file follow:For RHEL:
[root@Node1 system-connections]# cat ib0.nmconnection [connection] id=ib0 uuid=c5ca41b5-becb-4c43-86a2-1fdd64676989 type=infiniband autoconnect=false interface-name=ib0 [infiniband] transport-mode=datagram [ipv4] address= method=auto [ipv6] addr-gen-mode=eui64 method=autoDEVICE=ib0 TYPE=InfiniBand BOOTPROTO=static IPADDR=<ipaddress> BROADCAST=<ipaddress> NETWORK=<ipaddress> NETMASK=255.255.252.0 ONBOOT=yes CONNECTED_MODE=yes
For SLES:
DEVICE=ib0 TYPE=InfiniBand BOOTPROTO=static IPADDR=192.168.0.1 BROADCAST=192.168.0.255 NETWORK=192.168.0.0 NETMASK=255.255.255.0 STARTMODE=auto IPOIB_MODE='connected' MTU=65520
For Ubuntu:
Set the file,
/etc/netplan/ib1-infiniband.yamlnetwork: version: 2 renderer: networkd ethernets: ib1: infiniband-mode: connected addresses: [192.168.101.122/24] mtu: 65520 dhcp4: false dhcp6: falseIn the configuration file, the following options are listed by default:
ONBOOT=yesorSTARTMODE=auto: This is a standard network option that tells the system to activate the device at boot time.CONNECTED_MODE=yes,IPOIB_MODE='connected'orinfiniband-mode: connected: This option controls IPoIB interface transport mode.Choosing the connected option sets Reliable Connection (RC) mode while the not connected option sets Unreliable Datagram (UD) mode.
Bring up the IPoIB interface with the following command:
ip link set <interface name> up
For example:
ip link set ib0 up
For Ubuntu:
netplan apply
To verify the configuration:
netplan status
Netplan does not recognize OmniPath as an InfiniBand device, so
infiniband-modewill not be taken into consideration. A workaround is to set it manually.ip link set ib1 down echo connected > /sys/class/net/ib1/mode ip link set ib1 up
3.5.4.2. Setting Router Advertisement for IPv6
In IPv6, Router Advertisements (RAs) provide the primary mechanism for hosts on a local network to automatically learn essential configuration parameters, including the default router, the on-link IPv6 prefix, and whether addressing should be performed via Stateless Address Autoconfiguration (SLAAC) and/or DHCPv6. Because IPv6 integrates this discovery process into its core networking model, RAs are typically required to establish functional end-to-end connectivity—most notably by enabling hosts to determine the correct next hop for traffic destined beyond the local link.
To set up RA for IPv6, perform the following:
Select the Fabric Manager node to be the IPv6 Router to send RAs to all IPv6 Hosts (client nodes), which will respond to the Router Solicitations.
Enable IPv6 forwarding:
sysctl -w net.ipv6.conf.all.forwarding=1
For the setting to be persistent across reboots, add the following line to
/etc/sysctl.conf:net.ipv6.conf.all.forwarding=1Install the Router Advertisement Daemon:
The following command uses
dnffor RHEL. Usezypperfor SLES oraptfor Ubuntu.sudo dnf install radvd
Configure
/etc/radvd.confwith your IPoIB interface and IPv6 router settings.For example:
interface ibp196s0d1 { AdvSendAdvert on; MaxRtrAdvInterval 30; prefix 2001:db8:100:5001::/64 { AdvOnLink on; AdvAutonomous on; }; AdvDefaultLifetime 1800; };Enable and start the daemon:
systemctl enable --now radvd systemctl start radvd.service
Verify the service is running:
systemctl status radvd
Restart the Fabric Manager:
systemctl stop opafm systemctl start opafm
3.5.4.3. IPoIB Bonding
IB bonding is a high-availability solution for IPoIB interfaces. It is based on the Linux Ethernet Bonding Driver and was adopted to work with IPoIB. The support for IPoIB interfaces is only for the active-backup mode. Other modes are not supported.
Parallel file systems such as GPFS and Lustre often use the IPoIB address to perform initial connection, maintenance, and management tasks, even though they use RDMA/Verbs for the bulk data transfers. To increase the reliability of a dual rail storage server, consider enabling IPoIB bonding on the two interfaces and use the bonded address when adding the server to the GPFS or Lustre configuration.
Note
DO NOT bond IPoIB interfaces on the same port where the SM is running. If you do so, failover will not occur because disabling the port disables the Fabric Manager, not just the network interface.
3.5.4.3.1. Interface Configuration Scripts
When using ib-bond to configure interfaces, the configuration is not saved. Therefore, whenever the primary interface or one of the standby interfaces is destroyed, the configuration needs to be restored by running ib-bond again (for example, after system reboot).
To avoid having to restore the configuration each time, create an interface configuration script for the ibX and bondX interfaces. This section demonstrates how to use the standard syntax to create the bonding configuration script for your OS.
3.5.4.3.2. Create RHEL IB Bonding Interface Configuration Script
Refer to the CN5000 OPX Software Release Notes for versions of RHEL that are supported by the OPX Software.
First, add the following lines to the RHEL file /etc/modprobe.d/hfi.conf:
alias bond0 bonding
To create the interface configuration script for IB Bonding, perform the following steps using NetworkManager:
Note
When bringing up or down interfaces, allow 5 seconds for the status to change.
Modify ipoib mode of both network devices to be connected.
nmcli con modify ib0 infiniband.transport-mode connected nmcli con modify ib1 infiniband.transport-mode connected nmcli connection down ib0 nmcli connection down ib1 nmcli connection up ib0 nmcli connection up ib1
Add a new bond interface with active-backup.
nmcli connection add type bond con-name bond0 ifname bond0 mode active-backup
Modify both ib interfaces to be slave connections of new network bond.
nmcli connection modify ib0 connection.master bond0 connection.slave-type bond nmcli connection modify ib1 connection.master bond0 connection.slave-type bond nmcli connection down ib0 nmcli connection down ib1 nmcli connection up ib0 nmcli connection up ib1
Configure IP address and gateway on the bond interface with the IP address of ib interface.
nmcli con modify bond0 +ipv4.addresses <ib interface ipaddress>/<subnet mask> nmcli connection modify bond0 ipv4.gateway <ib interface ipaddress> nmcli connection modify bond0 ipv4.method manual
Bring up all interfaces.
nmcli connection up bond0 nmcli connection up ib0 nmcli connection up ib1
Check the bond interface file.
cat /proc/net/bonding/bond0
Note
Refer to the CN5000 Performance Tuning Guide for details on how to enable 10K MTU for the best throughput with IPoFabric.
3.5.4.3.3. Create SLES IB Bonding Interface Configuration Script
Refer to the CN5000 OPX Software Release Notes for versions of SLES that are supported by OPX Software.
To create the interface configuration script for IB Bonding, perform the following steps:
Create a bond0 interface.
vi /etc/sysconfig/network/ifcfg-bond0
In the primary (bond0) interface script, add the following lines:
STARTMODE='auto' BOOTPROTO='static' USERCONTROL='no' IPADDR='<IPADDRESS>' NETMASK='255.255.252.0' NETWORK='<IPADDRESS>' BROADCAST='<IPADDRESS>' BONDING_MASTER='yes' BONDING_MODULE_OPTS='mode=active-backup miimon=100 primary=ib0 updelay=0 downdelay=0' BONDING_SLAVE0='ib0' BONDING_SLAVE1='ib1' MTU=65520
Create a standby ib0 interface.
vi /etc/sysconfig/network/ifcfg-ib0
Add the following lines:
BOOTPROTO='none' STARTMODE='off'
Create a standby ib1 interface.
vi /etc/sysconfig/network/ifcfg-ib1
Add the following lines:
BOOTPROTO='none' STARTMODE='off'
Restart the network.
systemctl restart network.service
Note
Refer to the CN5000 Performance Tuning Guide for details on how to enable 10K MTU for the best throughput with IPoFabric.
3.5.4.3.4. Verifying IB Bonding Configuration
After the configuration scripts are updated, and the service network is restarted or a server reboot is accomplished, use the following CLI commands to verify that IB bonding is configured.
cat /proc/net/bonding/bond0 ip
Example of cat /proc/net/bonding/bond0 output:
cat /proc/net/bonding/bond0 Ethernet Channel Bonding Driver: vX.X.X (mm dd, yyyy) Bonding Mode: fault-tolerance (active-backup) (fail_over_mac) Primary Slave: ib0 Currently Active Slave: ib0 MII Status: up MII Polling Interval (ms): 100 Up Delay (ms): 0 Down Delay (ms): 0 Slave Interface: ib0 MII Status: up Link Failure Count: 0 Permanent HW addr: 80:00:04:04:fe:80 Slave Interface: ib1 MII Status: up Link Failure Count: 0 Permanent HW addr: 80:00:04:05:fe:80
Example of ip output:
st2169:/etc/sysconfig ip
bond0 Link encap:InfiniBand HWaddr
80:00:00:02:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00
inet addr:192.168.1.1 Bcast:192.168.1.255 Mask:255.255.255.0
inet6 addr: fe80::211:7500:ff:909b/64 Scope:Link
UP BROADCAST RUNNING MASTER MULTICAST MTU:65520 Metric:1
RX packets:120619276 errors:0 dropped:0 overruns:0 frame:0
TX packets:120619277 errors:0 dropped:137 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:10132014352 (9662.6 Mb) TX bytes:10614493096 (10122.7 Mb)
ib0 Link encap:InfiniBand HWaddr
80:00:00:02:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00
UP BROADCAST RUNNING SLAVE MULTICAST MTU:65520 Metric:1
RX packets:118938033 errors:0 dropped:0 overruns:0 frame:0
TX packets:118938027 errors:0 dropped:41 overruns:0 carrier:0
collisions:0 txqueuelen:256
RX bytes:9990790704 (9527.9 Mb) TX bytes:10466543096 (9981.6 Mb)
ib1 Link encap:InfiniBand HWaddr
80:00:00:02:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00
UP BROADCAST RUNNING SLAVE MULTICAST MTU:65520 Metric:1
RX packets:1681243 errors:0 dropped:0 overruns:0 frame:0
TX packets:1681250 errors:0 dropped:96 overruns:0 carrier:0
collisions:0 txqueuelen:256
RX bytes:141223648 (134.6 Mb) TX bytes:147950000 (141.0 Mb)