Skip to main content

Cornelis Technical Documentation

2.2.4. Fabric Manager 

The CN5000 Fabric Manager (FM) is a set of components that perform various functions for OPA fabric management, consisting of:

  • Subnet Manager (SM): Discovers, configures, and manages the fabric topology — including routing, LID assignment, switch programming, and high-availability failover

  • Subnet Administration (SA): Query interface to the SM that lets fabric nodes discover paths, services, topology, and events without direct SM interaction.

  • Performance Manager (PM): Actively collects performance and error statistics from every node's Performance Management Agent (PMA) across the fabric.

  • Performance Administration (PA): Query interface to the PM that stores and serves aggregated performance data to management tools without generating additional fabric traffic.

These components interact with the SuperNICs and Switches in the fabric using in-band communications to SM agents and PM agents in each Omni-Path device. Any host with OPX Software installed can start a Fabric Manager service. Refer to the Starting the Fabric Manager.

2.2.4.1. Terminology - Fabric Manager vs. Subnet Manager

The terms Fabric Manager and Subnet Manager can overlap in usage. The following guidelines clarify their meaning:

  • Fabric Manager refers to the full suite of fabric management software, which includes the SM/SA and PM/PA.

  • Subnet Manager refers specifically to the Subnet Manager component within the FM.

  • In practice, because all FM components are bundled and operated together, SM and FM are often used interchangeably when referring to general fabric management operations. For example, "Start the SM" typically means starting all FM processes.

As a rule of thumb: use FM when referring to the complete software suite or its specific components, and SM when referring to the Subnet Manager specifically — though in general discussion, SM is commonly understood to encompass all FM processes.

2.2.4.1. Subnet Manager

The Fabric Manager implements a complete Omni-Path Architecture-compliant Subnet Manager. The SM is responsible for monitoring, initializing, and configuring the fabric.

The SM component performs all of the necessary subnet management functions. Primarily, the SM is responsible for initializing the fabric and managing its topology. Some of its tasks include:

  • Link and port initialization

  • Link width downgrade policies

  • Route and path assignments

  • Local Identifier (LID) address assignments

  • Switch forwarding table programming

  • Programming Virtual Fabrics (vFabrics)

    • QoS

    • Security

  • Sweeping the fabric to discover topology changes, then managing those changes when nodes are added and/or deleted

  • Enforcing topology and security

  • Configuring congestion handling

  • Negotiating and synchronizing redundant SMs

  • Arbitration for primary and secondary roles between multiple SMs in the fabric

The SM performs all its subnet management and monitoring functions via in-band packets sent over the fabric being managed.

One of the critical roles of the SM is the initialization and configuration of routing tables in all the switches. The SM supports a variety of routing algorithms, which are discussed in detail in the CN5000 Topologies and Routing Guide. Among the capabilities are:

  • Support for assigning multiple LIDs to end nodes and the carefully balanced programming of alternate routes through the fabric for use by dispersive routing, load balancing, and failover techniques by various upper-level protocols (ULP).

  • Support for Omni-Path Architecture-compliant multicast, including MLID sharing, pre-creating groups, and other fine-tuning of multicast performance.

  • Advanced monitoring and logging capabilities to quickly react to fabric changes and produce historical logs of changes and problems in the fabric.

  • Support for vFabrics.

  • Support for configuring and enabling adaptive routing in CN5000 Switches.

2.2.4.2. Subnet Administration

The Subnet Administration function acts in tight coordination with the SM to perform data storage and retrieval of fabric information. The SM/SA is a single unified entity.

Through the use of SA messages, nodes on the fabric can gain access to fabric information such as:

  • Node-to-node path information

  • Fabric topology and configuration

  • Event notification

  • Application service information

  • Join/Leave multicast groups

  • vFabric information

  • Fabric security configuration and lists of quarantined nodes

Fundamental information required for address resolution and name services is available to all nodes. The more advanced information about fabric topology, status, and configuration is available only to management applications on management nodes.

2.2.4.3. Performance Manager

The Performance Manager component communicates with nodes to collect performance and error statistics. The PM communicates with the PMA on each node in the fabric, using the PM (GSI) packets.

Examples of the type of statistics collected by the PM include:

  • Link utilization bandwidth

  • Link packet rates

  • Link congestion

  • Error statistics, such as packet discards, attempted security violations, and packet routing errors

2.2.4.4. Performance Administration

The Performance Administration function works with the PM to perform data storage and retrieval of fabric performance information. The PM/PA is a single unified entity.

Through the use of PA messages, management nodes on the fabric gain access to fabric information such as:

  • Fabric overall health

  • Overall Fabric utilization, congestion, and packet rates

  • Fabric error rates

  • Traffic and congestion per vFabric

  • Traffic and congestion over a PortGroup defined by the sysadmin

  • Recent historical fabric performance data

  • Counters and status for a specific port

  • Sorted lists of ports with the highest utilization, packet rates, congestion, and error rates

2.2.4.5. Host Fabric Manager

The host Fabric Manager deploys all Fabric Manager components on a Linux server. The Fabric Manager components run in the user space and access the CN5000 SuperNIC using the management datagram (MAD) interface provided by the OFA stack.

The host Fabric Manager manages both small and large fabrics. Managing large fabrics requires the Fabric Manager software, as it makes use of the large memory resources and high-speed processor technology of standard servers.

The software installation installs the host Fabric Manager onto a Linux system. The utilities are installed into the /usr/sbin directory. The configuration file is installed as /etc/opa-fm/opafm.xml. The program is started, restarted, and stopped using the opafm service (which standard Linux commands such as systemctl can use) with the start, restart, stop, or reload parameters, respectively.

The FastFabric application configures and controls the host Fabric Manager; queries the SA and PA; and analyzes fabric configuration and status. In addition, there are a few host Fabric Manager control applications that are installed in /usr/lib/opa-fm/bin discussed later in this guide.

2.2.4.6. Fabric Sweeping

The SM periodically sweeps the fabric, during which the fabric is analyzed, routes are computed, and the Switches and SuperNICs are configured. The SM sweep algorithms attempt to:

  • Balance responsiveness to fabric changes while limiting fabric overhead of the SM

  • Efficiently analyze and configure the fabric

  • Detect and handle potential hardware limitations

  • Handle the possibility that the fabric is changing while being analyzed or configured

The SM performs sweeps at fixed intervals and also immediately performs a sweep when a switch reports a port state change trap. Such traps indicate a link has come up or down. Generally, traps trigger rapid sweeps to respond to fabric changes. By default, SM sweeps are performed every five minutes. This interval may be changed by the user; however, short sweep intervals may increase fabric overhead.

The SM protocol performs sweep operations over VL15. By default, VL15 operates without flow control, but users can enable flow control if needed. Since the fabric may change while the SM sweeps, and VL15 often lacks flow control, some packets may be lost during the sweep.

To optimize sweep performance, the SM issues multiple concurrent packets to a given device or the fabric. It carefully balances the number of packets sent at once between the hardware's capabilities and the goal of accelerating the sweep.

An administrator may also choose to force a sweep. This is generally not necessary when using fabrics constructed using Cornelis-supplied switches. Sweeps can be forced via /usr/lib/opa-fm/bin/opafmcmdsmForceSweep.

2.2.4.7. Redundant Fabric Managers in a Fabric

A fabric can include more than one Fabric Manager instance for redundancy. If the primary FM or its host platform fails, a secondary FM can take over fabric management. For instructions on setting up additional FM instances, refer to the CN5000 Fabric Installation Guide.

2.2.4.8. Multiple Subnet Support in Host Fabric Manager

The host FM can manage more than one subnet, often called fabric planes. Each plan is a distinct, isolated fabric—ports in one plane have no interaction with ports in another plane (refer to the following figure). This isolation provides independent failure domains and enables workload or traffic separation across planes.

Figure 3. Multiple Subnet Support in Host FM
Multiple Subnet Support in Host FM


On a host server equipped with a SuperNIC, the FM automatically configures a separate instance for each available SuperNIC port. Each port connects to a separate fabric plane, and the FM configuration specifies which port each instance uses to access its respective plane.

The host Fabric Manager supports four planes with four separate SuperNIC ports on the server. Each Fabric Manager instance runs as a separate process, with no interdependency among instances.

To start or stop a specific FM instance, use the following command:

/usr/lib/opa-fm/bin/opafmctrl.sh [start|stop] -i #

For more information on setting up multiple subnets, refer to the CN5000 Topologies and Routing Guide.

2.2.4.9. Fabric Manager Logging

The FM logs operational and diagnostic information to aid fabric monitoring, troubleshooting, and debugging. In the host environment, the FM sends log output to the Linux system log (syslog) by default. Logs can also be directed to a centralized syslog server.

Log messages follow a hierarchical severity convention controlled by the LogLevel parameter in the FM configuration file (opafm.xml), including severities such as FATAL, ERROR, WARNING, and INFO.

The SM logs many important fabric events, including:

  • Ports going up or down

  • Nodes not responding to management queries

  • Nodes appearing in or disappearing from the fabric

  • SMs joining or leaving the fabric

  • SM state transitions (for example, Standby to Primary failover)

  • Fabric sweep and synchronization errors

  • Configuration consistency check results between Primary and Standby SMs

These event logs are essential for diagnosing fabric health issues and tracking topology changes over time.

For more information on logging, refer to the CN5000 Maintenance and Troubleshooting Guide, Troubleshooting the Fabric Manager.