Skip to main content

Cornelis Technical Documentation

3.5.4. Configure IPoIB Network Interface

The following instructions show how to manually configure the OpenFabrics Alliance (OFA) IPoIB network interface. Cornelis recommends using the CN5000 OPX Software Installation package for installation of the software, including setting up IPoIB.

Note

IPoIB child interfaces (for example, ib0.8001) must be placed on separate subnets from their parents (for example, ib0) in order to communicate since they will be placed on different fabric partitions from their respective parents by default. Packets will fail to be forwarded between either the child or parent interfaces due to PKey mismatching if they share the same IP subnet but are on different fabric partitions.

For larger clusters, Omni-Path tools can be used to automate the installation and configuration of many nodes. These tools automate the configuration of the IPoIB network interface.

This example assumes the following:

  • Shell is either sh or bash.

  • All required Omni-Path and OFA RPMs are installed.

  • Startup scripts have been run, either manually or at system boot.

  • The IPoIB network is 10.1.17.0, which is one of the networks reserved for private use, and thus not routable on the Internet. The network has a /8 host portion. In this case, the netmask must be specified.

  • The host to be configured has the IP address 10.1.17.3, no host files exist, and DHCP is not used.

    Note

    Instructions are only for this static IP address case.

Perform the following steps:

  1. Add an IP address (as a root user):

    ip addr add <ipaddress>/255.255.255.0 dev ib0
  2. Bring up the link:

    ip link set ib0 up
  3. Verify the configuration:

    ip addr show ib0

    The output should be similar to:

    ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc pfifo_fast state UP qlen
    256
    link/infiniband 80:00:00:02:fe:80:00:00:00:00:00:00:00:11:75:01:01:6a:36:83 brd
    00:ff:ff:ff:ff:12:40:1b:80:01:00:00:00:00:00:00:ff:ff:ff:ff
    inet <ipaddress>/24 brd <ipaddress> scope global ib0
    valid_lft forever preferred_lft forever
    inet6 fe80::211:7501:16a:3683/64 scope link
    valid_lft forever preferred_lft forever
  4. Ping ib0:

    ping -c 2 -b <ipaddress>

    The output should be similar to the following, with a line for each host already configured and connected:

    WARNING: pinging broadcast address
    PING */8532.0.
    3*-
    <ipaddress> (<ipaddress>) 517(84) bytes of data.
    174 bytes from <ipaddress>: icmp_seq=0 ttl=174 time=0.022 ms
    64 bytes from <ipaddress>: icmp_seq=0 ttl=64 time=0.070 ms
    64 bytes from <ipaddress>: icmp_seq=0 ttl=64 time=0.073 ms

    The IPoIB network interface is now configured.

3.5.4.1. Configuring IPoIB Driver

To configure the IPoIB driver using the command line, perform the following steps.

  1. For each IP Link Layer interface, create an interface configuration file, for example, /etc/networkmanager/system connections/NAME.nmconnection, where NAME is the network interface name. Examples of a configuration file follow:

    For RHEL:

    [root@Node1 system-connections]# cat ib0.nmconnection
    [connection]
    id=ib0
    uuid=c5ca41b5-becb-4c43-86a2-1fdd64676989
    type=infiniband
    autoconnect=false
    interface-name=ib0
    
    [infiniband]
    transport-mode=datagram
    
    [ipv4]
    address= 
    method=auto
    
    [ipv6]
    addr-gen-mode=eui64
    method=autoDEVICE=ib0
    TYPE=InfiniBand
    BOOTPROTO=static
    IPADDR=<ipaddress>
    BROADCAST=<ipaddress>
    NETWORK=<ipaddress>
    NETMASK=255.255.252.0
    ONBOOT=yes
    CONNECTED_MODE=yes

    For SLES:

    DEVICE=ib0
    TYPE=InfiniBand
    BOOTPROTO=static
    IPADDR=192.168.0.1
    BROADCAST=192.168.0.255
    NETWORK=192.168.0.0
    NETMASK=255.255.255.0
    STARTMODE=auto
    IPOIB_MODE='connected'
    MTU=65520
    

    For Ubuntu:

    Set the file, /etc/netplan/ib1-infiniband.yaml

    network:
      version: 2
      renderer: networkd
      ethernets:
        ib1:
          infiniband-mode: connected
          addresses: [192.168.101.122/24]
          mtu: 65520
          dhcp4: false
          dhcp6: false  

    In the configuration file, the following options are listed by default:

    • ONBOOT=yes or STARTMODE=auto: This is a standard network option that tells the system to activate the device at boot time.

    • CONNECTED_MODE=yes, IPOIB_MODE='connected' or infiniband-mode: connected: This option controls IPoIB interface transport mode.

      Choosing the connected option sets Reliable Connection (RC) mode while the not connected option sets Unreliable Datagram (UD) mode.

  2. Bring up the IPoIB interface with the following command:

    ip link set <interface name> up

    For example:

    ip link set ib0 up

    For Ubuntu:

    netplan apply
    1. To verify the configuration:

      netplan status
    2. Netplan does not recognize OmniPath as an InfiniBand device, so infiniband-mode will not be taken into consideration. A workaround is to set it manually.

       ip link set ib1 down
       echo connected > /sys/class/net/ib1/mode
       ip link set ib1 up          

3.5.4.2. Setting Router Advertisement for IPv6

In IPv6, Router Advertisements (RAs) provide the primary mechanism for hosts on a local network to automatically learn essential configuration parameters, including the default router, the on-link IPv6 prefix, and whether addressing should be performed via Stateless Address Autoconfiguration (SLAAC) and/or DHCPv6. Because IPv6 integrates this discovery process into its core networking model, RAs are typically required to establish functional end-to-end connectivity—most notably by enabling hosts to determine the correct next hop for traffic destined beyond the local link.

To set up RA for IPv6, perform the following:

  1. Select the Fabric Manager node to be the IPv6 Router to send RAs to all IPv6 Hosts (client nodes), which will respond to the Router Solicitations.

  2. Enable IPv6 forwarding:

    sysctl -w net.ipv6.conf.all.forwarding=1

    For the setting to be persistent across reboots, add the following line to /etc/sysctl.conf:

    net.ipv6.conf.all.forwarding=1 
  3. Install the Router Advertisement Daemon:

    The following command uses dnf for RHEL. Use zypper for SLES or apt for Ubuntu.

    sudo dnf install radvd
  4. Configure /etc/radvd.conf with your IPoIB interface and IPv6 router settings.

    For example:

    interface ibp196s0d1 { AdvSendAdvert on; MaxRtrAdvInterval 30; prefix 2001:db8:100:5001::/64 { AdvOnLink on; AdvAutonomous on; }; AdvDefaultLifetime 1800; };
  5. Enable and start the daemon:

    systemctl enable --now radvd
    systemctl start radvd.service
  6. Verify the service is running:

    systemctl status radvd
  7. Restart the Fabric Manager:

    systemctl stop opafm
    systemctl start opafm

3.5.4.3. IPoIB Bonding

IB bonding is a high-availability solution for IPoIB interfaces. It is based on the Linux Ethernet Bonding Driver and was adopted to work with IPoIB. The support for IPoIB interfaces is only for the active-backup mode. Other modes are not supported.

Parallel file systems such as GPFS and Lustre often use the IPoIB address to perform initial connection, maintenance, and management tasks, even though they use RDMA/Verbs for the bulk data transfers. To increase the reliability of a dual rail storage server, consider enabling IPoIB bonding on the two interfaces and use the bonded address when adding the server to the GPFS or Lustre configuration.

Note

DO NOT bond IPoIB interfaces on the same port where the SM is running. If you do so, failover will not occur because disabling the port disables the Fabric Manager, not just the network interface.

3.5.4.3.1. Interface Configuration Scripts

When using ib-bond to configure interfaces, the configuration is not saved. Therefore, whenever the primary interface or one of the standby interfaces is destroyed, the configuration needs to be restored by running ib-bond again (for example, after system reboot).

To avoid having to restore the configuration each time, create an interface configuration script for the ibX and bondX interfaces. This section demonstrates how to use the standard syntax to create the bonding configuration script for your OS.

3.5.4.3.2. Create RHEL IB Bonding Interface Configuration Script

Refer to the CN5000 OPX Software Release Notes for versions of RHEL that are supported by the OPX Software.

First, add the following lines to the RHEL file /etc/modprobe.d/hfi.conf:

alias bond0 bonding

To create the interface configuration script for IB Bonding, perform the following steps using NetworkManager:

Note

When bringing up or down interfaces, allow 5 seconds for the status to change.

  1. Modify ipoib mode of both network devices to be connected.

    nmcli con modify ib0 infiniband.transport-mode connected
    nmcli con modify ib1 infiniband.transport-mode connected
    nmcli connection down ib0
    nmcli connection down ib1
    nmcli connection up ib0
    nmcli connection up ib1
    
  2. Add a new bond interface with active-backup.

    nmcli connection add type bond con-name bond0 ifname bond0 mode active-backup
  3. Modify both ib interfaces to be slave connections of new network bond.

    nmcli connection modify ib0 connection.master bond0 connection.slave-type bond
    nmcli connection modify ib1 connection.master bond0 connection.slave-type bond
    nmcli connection down ib0
    nmcli connection down ib1
    nmcli connection up ib0
    nmcli connection up ib1
    
  4. Configure IP address and gateway on the bond interface with the IP address of ib interface.

    nmcli con modify bond0 +ipv4.addresses <ib interface ipaddress>/<subnet mask>
    nmcli connection modify bond0 ipv4.gateway <ib interface ipaddress>
    nmcli connection modify bond0 ipv4.method manual
  5. Bring up all interfaces.

    nmcli connection up bond0
    nmcli connection up ib0
    nmcli connection up ib1
  6. Check the bond interface file.

    cat /proc/net/bonding/bond0

Note

Refer to the CN5000 Performance Tuning Guide for details on how to enable 10K MTU for the best throughput with IPoFabric.

3.5.4.3.3. Create SLES IB Bonding Interface Configuration Script

Refer to the CN5000 OPX Software Release Notes for versions of SLES that are supported by OPX Software.

To create the interface configuration script for IB Bonding, perform the following steps:

  1. Create a bond0 interface.

    vi /etc/sysconfig/network/ifcfg-bond0
  2. In the primary (bond0) interface script, add the following lines:

    STARTMODE='auto'
    BOOTPROTO='static'
    USERCONTROL='no'
    IPADDR='<IPADDRESS>'
    NETMASK='255.255.252.0'
    NETWORK='<IPADDRESS>'
    BROADCAST='<IPADDRESS>'
    BONDING_MASTER='yes'
    BONDING_MODULE_OPTS='mode=active-backup miimon=100 primary=ib0 updelay=0 downdelay=0'
    BONDING_SLAVE0='ib0'
    BONDING_SLAVE1='ib1'
    MTU=65520
    
  3. Create a standby ib0 interface.

    vi /etc/sysconfig/network/ifcfg-ib0
  4. Add the following lines:

    BOOTPROTO='none'
    STARTMODE='off'
    
  5. Create a standby ib1 interface.

    vi /etc/sysconfig/network/ifcfg-ib1
  6. Add the following lines:

    BOOTPROTO='none'
    STARTMODE='off'
    
  7. Restart the network.

    systemctl restart network.service

Note

Refer to the CN5000 Performance Tuning Guide for details on how to enable 10K MTU for the best throughput with IPoFabric.

3.5.4.3.4. Verifying IB Bonding Configuration

After the configuration scripts are updated, and the service network is restarted or a server reboot is accomplished, use the following CLI commands to verify that IB bonding is configured.

cat /proc/net/bonding/bond0 
ip

Example of cat /proc/net/bonding/bond0 output:

cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: vX.X.X (mm dd, yyyy)

Bonding Mode: fault-tolerance (active-backup) (fail_over_mac)
Primary Slave: ib0
Currently Active Slave: ib0
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

Slave Interface: ib0
MII Status: up
Link Failure Count: 0
Permanent HW addr: 80:00:04:04:fe:80

Slave Interface: ib1
MII Status: up
Link Failure Count: 0
Permanent HW addr: 80:00:04:05:fe:80

Example of ip output:

st2169:/etc/sysconfig ip
bond0     Link encap:InfiniBand  HWaddr 
80:00:00:02:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00
       inet addr:192.168.1.1  Bcast:192.168.1.255  Mask:255.255.255.0
       inet6 addr: fe80::211:7500:ff:909b/64 Scope:Link
       UP BROADCAST RUNNING MASTER MULTICAST  MTU:65520  Metric:1
       RX packets:120619276 errors:0 dropped:0 overruns:0 frame:0
       TX packets:120619277 errors:0 dropped:137 overruns:0 carrier:0
       collisions:0 txqueuelen:0
       RX bytes:10132014352 (9662.6 Mb)  TX bytes:10614493096 (10122.7 Mb)

ib0     Link encap:InfiniBand  HWaddr 
80:00:00:02:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00
       UP BROADCAST RUNNING SLAVE MULTICAST  MTU:65520  Metric:1
       RX packets:118938033 errors:0 dropped:0 overruns:0 frame:0
       TX packets:118938027 errors:0 dropped:41 overruns:0 carrier:0
       collisions:0 txqueuelen:256
       RX bytes:9990790704 (9527.9 Mb)  TX bytes:10466543096 (9981.6 Mb)

ib1     Link encap:InfiniBand  HWaddr 
80:00:00:02:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00
       UP BROADCAST RUNNING SLAVE MULTICAST  MTU:65520  Metric:1
       RX packets:1681243 errors:0 dropped:0 overruns:0 frame:0
       TX packets:1681250 errors:0 dropped:96 overruns:0 carrier:0
       collisions:0 txqueuelen:256
       RX bytes:141223648 (134.6 Mb)  TX bytes:147950000 (141.0 Mb)