Skip to main content

Cornelis Technical Documentation

4.4.3. Setting Up Dual Rails for a Single Subnet

Support for single rail or dual rails in a single subnet is the default scenario expected during installation. The driver will detect that there are one or two SuperNICs connected to the host on the same fabric. In most scenarios, no adjustments to opafm.xml are required for this configuration.

If opafm.xml is not modified, some APIs may not function correctly or may issue a warning (such as older versions of Open MPI) as shown below; however, both rails participate in the MPI application correctly.

WARNING: There are more than one active ports on host 'a’, but the default subnet GID prefix was detected on more than one of these ports. If these ports are connected to different physical IB networks, this configuration will fail in Open MPI. This version of Open MPI requires that every physically separate IB subnet that is used between connected MPI processes must have different subnet ID values.

In this case, in order to prevent this warning, you need to change the SubnetPrefix for fm0 as described in the steps below.

4.4.3.1. Assumptions

  • SuperNICs have been installed in the host servers.

  • SuperNICs have been cabled as shown below:

    Figure 70. SuperNIC Cabling Example
    SuperNIC Cabling Example


4.4.3.2. Procedures

Perform the following steps to set the SubnetPrefix:

  1. Stop all standby Fabric Managers using systemctl stop opafm.

  2. Stop the primary Fabric Manager using systemctl stop opafm.

  3. Open the /etc/opa-fm/opafm.xml file on the primary Fabric Manager for editing.

  4. Search for "<Shared>" to review the fm0 settings.

    • An example of the shared instance for fm0 with key items in bold is shown below. Note that this is the default configuration for a single rail in a single subnet.

      <!-- Shared Instance config, applies to all components: SM, PM and FE -->
      <Shared>
        <!-- Fm.Shared.Start controls overall startup of the Instance. -->
        <!-- If 0, none of the components in the Instance are started. -->
        <!-- If 1, instance is enabled and Fm.Sm.Start, Fm.Pm.Start, etc -->
        <!-- control startup of each manager.  The default for each manager -->
        <!-- is defined by Common.Sm.Start, Common.Pm.Start, etc -->
        <!-- ESM does not support Start via XML configuration. Use CLI commands -->
        <Start>1</Start>
        <!-- <StartupRetries>5</StartupRetries> -->
        <!-- <StartupStableWait>10</StartupStableWait> -->
      
        <!-- Name, Hfi, Port, and PortGUID are ignored for ESM since they -->
        <!-- are automatically set -->
        <Name>fm0</Name> <!-- also for logging with _sm, _fe, _pm appended -->
        <Hfi>1</Hfi> <!-- local HFI to use for FM instance, 1=1st HFI -->
        <Port>1</Port> <!-- local HFI port to use for FM instance, 1=1st Port -->
        <PortGUID>0x0000000000000000</PortGUID> <!-- local port to use for FM -->
        <SubnetPrefix>0xfe80000000000000</SubnetPrefix> <!-- should be unique -->
      
        <!-- Overrides of the Common.Shared parameters if desired -->
        <!-- ESM does not support LogFile -->
        <!-- <LogFile>/var/log/fm0_log</LogFile> --> <!-- log for this instance -->
      </Shared>
  5. Change the <SubnetPrefix> for fm0 to a unique value other than 0xfe80000000000000. A recommended value to use would be 0xfe80000000001000.

    • An example of the change is shown below.

      <!-- Shared Instance config, applies to all components: SM, PM and FE -->
      <Shared>
        <!-- Fm.Shared.Start controls overall startup of the Instance. -->
        <!-- If 0, none of the components in the Instance are started. -->
        <!-- If 1, instance is enabled and Fm.Sm.Start, Fm.Pm.Start, etc -->
        <!-- control startup of each manager.  The default for each manager -->
        <!-- is defined by Common.Sm.Start, Common.Pm.Start, etc -->
        <!-- ESM does not support Start via XML configuration. Use CLI commands -->
        <Start>1</Start>
        <!-- <StartupRetries>5</StartupRetries> -->
        <!-- <StartupStableWait>10</StartupStableWait> -->
      
        <!-- Name, Hfi, Port, and PortGUID are ignored for ESM since they -->
        <!-- are automatically set -->
        <Name>fm0</Name> <!-- also for logging with _sm, _fe, _pm appended -->
        <Hfi>1</Hfi> <!-- local HFI to use for FM instance, 1=1st HFI -->
        <Port>1</Port> <!-- local HFI port to use for FM instance, 1=1st Port -->
        <PortGUID>0x0000000000000000</PortGUID> <!-- local port to use for FM -->
        <SubnetPrefix>0xfe80000000001000</SubnetPrefix> <!-- should be unique -->
      
        <!-- Overrides of the Common.Shared parameters if desired -->
        <!-- ESM does not support LogFile -->
        <!-- <LogFile>/var/log/fm0_log</LogFile> --> <!-- log for this instance -->
      </Shared>
  6. Save the opafm.xml file.

  7. Copy the opafm.xml file to all secondary Fabric Managers.

  8. Restart the primary Fabric Manager using systemctl restart opafm.

  9. Restart the standby Fabric Managers using systemctl restart opafm.

  10. Run systemctl status opafm to verify that the Fabric Managers are running.