5.4.2. In-Band Security
In-band security uses the same system to request user authentication and check their identity. In networking, in-band management uses the same interface for user traffic and management traffic.
5.4.2.1. Fabric Security Quarantine
To enforce security in-band, the Fabric Manager can quarantine nodes that are detected attempting to circumvent fabric security. The method by which the Fabric Manager enforces this quarantine is through link state. By leaving a port in an inactivated state, the Fabric Manager can render the port unable to communicate with the fabric through the link.
Quarantined Node List
In addition to every quarantine event being logged, the Fabric Manager also maintains a list of quarantined nodes that is available through an SA query. Refer to the CN5000 Commands Guide, opasaquery for details. opareport also provides a list of quarantined nodes.
5.4.2.2. Anti-Spoofing Protection
Using information exchanged at Link Negotiation and Initialization (LNI), the Fabric Manager is able to verify the identity of a node against what is reported by the node in attribute responses. If a node is detected attempting to spoof one of the key fields the Fabric Manager checks, it will be quarantined using the method described in Fabric Security Quarantine.
The Fabric Manager also configures a field on the switch neighbor of an end node that prevents said node from sending fabric packets with source LIDs outside the range assigned to the node by the Fabric Manager. If a node attempts to send a packet outside this configured LID range, the neighbor switch immediately drops said packet.
In addition to the LID checks done by the switch port the Fabric Manager is connected to, for packets sent to the Fabric Manager, the Fabric Manager also performs per packet validation. When the anti-spoofing protection is enabled, the Fabric Manager verifies proper packet origins and correct PKey usage for the response compared to the request that was issued.
The anti-spoofing detection of the Fabric Manager is enabled in the following section of the configuration file:
<!-- SmaSpoofingCheck enables support for port level SMA security--> <!-- checking related features. --> <SmaSpoofingCheck>0</SmaSpoofingCheck>
5.4.2.3. Management Traffic Denial of Service (DoS) Protection
CN5000 Omni-Path Fabric provides a mechanism for preventing Denial of Service (DoS) attacks against management traffic that could cause slow fabric configuration or even prevent the full configuration of a large-size fabric. The Fabric Manager is responsible for configuring this mechanism on all external switch ports that limit the rate at which management traffic can be sent by an end node. This limit is configurable in the Fabric Manager configuration file and is defined in the following example:
<!-- *************** Security Features ****************************** --> <!-- VL15CreditRate: Rate at which to return credits to a non-mgmt HFI. --> <!-- Helps avoid VL15 Denial of Service attacks against the fabric --> <!-- The field is defined as 1/(2^X) of the normal return rate. For --> <!-- instance, if the field is set to 3, the credit return rate will be --> <!-- 1/(2^3) = 1/8 (one eighth) the normal credit return rate. --> <!-- Valid values are between 0 and 21, with 0 disabling the feature. --> <VL15CreditRate>18</VL15CreditRate>
5.4.2.4. Predefined Topology Verification
The ability to validate a fabric layout and configuration against a predefined or expected configuration is an important part of the fabric bring-up process on large scale fabrics. The aim of Predefined Topology Verification is to provide a mechanism within the FM to allow the verification of a predefined topology input during a sweep against the real fabric topology.
Predefined Topology Verification relies on a topology input file that describes nodes and links using a combination of Node GUIDs, Node Descriptions, and Port numbers. See CN5000 Commands Guide (Topology Files and opareport ) for details on creating a topology input file.
If a node's actual position and values do not match the predefined topology, the node will be quarantined in the same manner as described in Fabric Security Quarantine.
If a link is discovered that is not defined in the input topology file, it will also be quarantined.
This predefined topology verification is done on every sweep so that any dynamic changes that occur within the fabric while the FM is running will be noted in the logs. For more information on sweeps, refer to the CN5000 Product Family Description Guide, Fabric Sweeping.
Which checks are performed and what the SM does when a check fails can be configured per-field. There are three levels of field enforcement:
Disabled - Disables checking for that field. Mismatches are not reported.
Warn - Causes the FM to report if part of the real topology fails a check. This warning is sent to the FM log output.
Enabled - Causes the FM to both report mismatches between the expected and actual topology (same as Warn) as well as quarantine the offending node.
NodeGUID, PortGUID, and NodeDesc enforcement settings apply only when UndefinedLink enforcement is set to Warn or Enabled.
A log message limiter is provided to limit the number of log messages output by this feature per sweep when a mismatch is detected.
All options are configurable through the Fabric Manager configuration file as shown in the following examples:
Common Sm (Section) Example
<!-- **************** Pre-defined Topology Verification **************** -->
<!-- The PreDefinedTopology section is used for verifying the layout of -->
<!-- the fabric against a topology input file of the expected layout. -->
<!-- There are three modes of handling mismatches: Disabled, Warn, and -->
<!-- Enabled. Disabled ignores any mismatches on that field, Warn prints -->
<!-- a warning to the log file, and Enabled prints a warning to the log -->
<!-- file and quarantines the node from the fabric. -->
<!-- Field Definitions: -->
<!-- Enabled: -->
<!-- Whether or not this feature is enabled. -->
<!-- TopologyFilename: -->
<!-- Fully qualified filename of pre-defined input topology. -->
<!-- LogMessageThreshold: Number of warnings to output to log -->
<!-- Number of warnings to output to log file before suppressing -->
<!-- further warnings and only printing a summary at the end of -->
<!-- a sweep. Entering 0 disables this threshold. -->
<!-- FieldEnforcement: -->
<!-- Per-field enforcement levels for mismatch handling. -->
<!-- Field comparison is done on a validation link. Validation links are -->
<!-- found using (NodeGUID,PortNum) unless NodeGUID enforcement is -->
<!-- DISABLED, in which case (NodeDesc,PortNum) is used to find the -->
<!-- link. If there is more than one link matches by (NodeDesc,PortNum) -->
<!-- a log warning will be printed and false matches may occur. -->
<PreDefinedTopology>
<LogMessageThreshold>100</LogMessageThreshold>
<FieldEnforcement>
<NodeDesc>Warn</NodeDesc>
<NodeGUID>Warn</NodeGUID>
<PortGUID>Warn</PortGUID>
<UndefinedLink>Warn</UndefinedLink>
</FieldEnforcement>
</PreDefinedTopology>Instance-specific Sm Example
<PreDefinedTopology>
<Enabled>0</Enabled>
<TopologyFilename>/etc/opa/topology.0:0.xml</TopologyFilename>
</PreDefinedTopology>The following table shows the parameters, default values, and descriptions for PreDefinedTopology.
Field | Default Value | Description |
|---|---|---|
Enabled | 0 | Whether or not this feature is enabled. 0=Disabled, 1=Enabled |
TopologyFilename | None | Fully qualified filename of predefined input topology. |
LogMessageThreshold | 0 | Number of warnings to output to log. Number of warnings to output to log file before suppressing further warnings and only printing a summary at the end of a sweep. Entering 0 disables this threshold. |
FieldEnforcement
| Disabled | Field enforcement levels for mismatch handling. Parent Element for NodeGUID, NodeDesc, and PortGUID field enforcement elements, which are described below. Enforcement levels are the same for all enforcement elements:
|
5.4.2.4.1. Predefined Topology Verification Security
This section describes how to set the Predefined Topology Verification security feature, and the two basic operating modes.
In general, Cornelis recommends that you verify a cluster before enabling the Predefined Topology Verification feature. Refer to the CN5000 Fabric Installation Guide, Verify Topology.
After the cluster connectivity has been verified, you can create the predefined topology input configuration file. There are multiple commands to assist in creating a topology input file. See CN5000 Commands Guide Topology Files and opareport).
There are two types of input configuration files:
Based on node GUIDs and port numbers
Based on node Descriptions and port numbers
Based on Node GUIDs and Port Numbers
Predefined Topology Verification based on node GUIDs and port numbers is considered more secure because node GUIDs cannot be modified by software. For network bootable nodes, this topology configuration is highly recommended. The main issue with a network boot is that the node descriptions may change as the OS is booted, and unless the node description remains constant through the booting process, the use of constant node GUID is required so nodes are not incorrectly quarantined.
Node replacement may be more tedious as the node GUID would need to be updated in the topology file and the FM restarted before new nodes would be permitted to join the cluster.
Based on Node Descriptions and Port Numbers
Predefined Topology Verification based on node descriptions is considered less secure because node Descriptions can be modified by software. However, using node Descriptions in topology definitions facilitates easy node replacement as the new node only needs to have its node description updated; and no FM restart is required.
Node/link lookup and validation are performed with Node Description and Port Number only if NodeGUID field enforcement level is Disabled. Node/link lookup by Node Description, however, is independent of the NodeDescription field enforcement level.
Cornelis recommends unique node descriptions so that the link resolves exactly to one pair. Otherwise, it is possible that a link may be resolved to a similarly named link if node descriptions are replicated.
5.4.2.4.2. User Query Permissions
During installation, the default user queries setting permits non-root users and tools to query the fabric. This setting enables read/write permissions on Userspace Management Datagrams (UMADs). The UMAD interface allows Subnet Administration (SA) queries, Performance Administration (PA) queries, Subnet Management Agent (SMA) queries, and Performance Management Agent (PMA) queries from user space applications.
Note
Allowing non-root users read/write access to issue queries can be a security risk as non-root users can potentially send MADs to change the fabric state. Also, given the nature of the UMAD interface, sending MADs and receiving the corresponding responses require read/write permission, both for queries and state changes. Setting the interface to read-only for non-root users disables the ability to issue queries.
Use the MgmtAllowed configuration setting in the Omni-Path switches to prevent a compute node from accessing or altering the fabric configuration. A compute node without MgmtAllowed cannot issue SMA and PMA transactions to other nodes, thus protecting the fabric from malicious or buggy application code. Such a node will also be restricted in its access to SA information and will not be permitted to access PA information. However, when umad is enabled for local use, a malicious application could potentially alter its own SMA or PMA settings, for example, taking the node's port down or observing or clearing PMA counters.
Many of the Omni-Path basic tools make use of UMAD. Other tools are typically run on a management node and frequently issue SA, PA, SMA, and PMA operations; however they are often run as root.
Note
The user query setting you choose should depend on whether there are other tools or applications run as non-root that need access to UMAD and whether non-root users are trusted. For example, some non-root applications may use UMAD to issue SA queries for name services style queries.