Name
opafabricanalysis — Performs analysis of the fabric.
Syntax
opafabricanalysis [-b|-e] [-s] [-d dir] [-c file] [-t portsfile] [-p ports] [-T topology_input]
Options
--helpProduces full help text.
-bSpecifies the baseline mode. Default is compare/check mode.
-eEvaluates health only. Default is compare/check mode.
-sSaves history of failures (errors/differences).
-ddirSpecifies the top-level directory for saving baseline and history of failed checks. Default is
/var/usr/lib/opa/analysis.-cfileSpecifies the error thresholds config file. Default is
/etc/opa/opamon.conf.-tportsfileSpecifies the file with list of local SuperNIC ports used to access fabric(s) for analysis. Default is
/etc/opa/ports.-pportsSpecifies the list of local SuperNIC ports used to access fabric(s) for analysis. Default is the first active port. Specified as
HFI:portas follows:Note
The first port on a SuperNIC is 1.
0:0First active port in the system.
0:yPort
within the system.yx:0First active port on SuperNIC
.xx:ySuperNIC
, portx.y
-Ttopology_inputSpecifies the name of the topology input file to use. Any
%Pmarkers in this filename are replaced with theHFI:portbeing operated on (such as0:0or1:2). Default is/etc/opa/topology.%P.xml. If-T NONEis specified, no topology input file is used. See and for more information.
Examples
opafabricanalysis opafabricanalysis -p '1:1 1:2 2:1 2:2'
The fabric analysis tool checks the following:
Note
The comparison includes components on the fabric. Therefore, operations such as shutting down a server cause the server to no longer appear on the fabric and are flagged as a fabric change or failure by opafabricanalysis.
Environment Variables
The following environment variables are also used by this command:
PORTSList of ports, used in absence of
-tand-pPORTS_FILEFile containing the list of ports, used in absence of
-tand-pFF_TOPOLOGY_FILEFile containing
topology_input(may have%Pmarker in filename), used in absence of-TFF_ANALYSIS_DIRTop-level directory for baselines and failed health checks
Details
For simple fabrics, the Omni-Path Fabric Toolset host is connected to a single fabric. By default, the first active port on the FastFabric Toolset host is used to analyze the fabric. However, in more complex fabrics, the FastFabric Toolset host may be connected to more than one fabric or subnet. In this case, you can specify the ports or SuperNICs to use with one of the following methods:
On the command line using the
-poptionIn a file specified using the
-toptionThrough the environment variables
PORTSorPORTS_FILEUsing the
PORTS_FILEconfiguration option inopafastfabric.conf
If the specified port does not exist or is empty, the first active port on the local system is used. In more complex configurations, you must specify the exact ports to use for all fabrics to be analyzed.
You can specify the topology_input file to be used with one of the following methods:
On the command line using the
-ToptionIn a file specified through the environment variable
FF_TOPOLOGY_FILEUsing the
ff_topology_fileconfiguration option inopafastfabric.conf
If the specified file does not exist, no topology_input file is used. Alternatively the filename can be specified as NONE to prevent the use of an input file.
For more information on topology_input, refer to .
By default, the error analysis includes PMA counters and slow links (that is, links running below enabled speeds). You can change this using the FF_FABRIC_HEALTH configuration parameter in opafastfabric.conf. This parameter specifies the opareport options and reports to be used for the health analysis. It also can specify the PMA counter clearing behavior (-I , seconds-C, or none at all).
When a topology_input file is used, it can also be useful to extend FF_FABRIC_HEALTH to include fabric topology verification options such as -o verifylinks.
The thresholds for PMA counter analysis default to /etc/opa/opamon.conf. However, you can specify an alternate configuration file for thresholds using the -c option. The opamon.si.conf file can also be used to check for any non-zero values for signal integrity (SI) counters.
All files generated by opafabricanalysis start with fabric in their file name. This is followed by the port selection option identifying the port used for the analysis. Default is 0:0.
The opafabricanalysis tool generates files such as the following within FF_ANALYSIS_DIR:
Health Check
latest/fabric.0:0.errorsstdoutofopareportfor errors encountered during fabric error analysislatest/fabric.0.0.errors.stderrstderrofopareportduring fabric error analysis
Baseline
During a baseline run, the following files are also created in FF_ANALYSIS_DIR/latest.
baseline/fabric.0:0.snapshot.xmlopareportsnapshot of complete fabric components and SMA configurationbaseline/fabric.0:0.compsopareportsummary of fabric components and basic SMA configurationbaseline/fabric.0.0.linksopareportsummary of internal and external links
Full Analysis
latest/fabric.0:0.snapshot.xmlopareportsnapshot of complete fabric components and SMA configurationlatest/fabric.0:0.snapshot.stderrstderrofopareportduring snapshotlatest/fabric.0:0.errorsstdoutofopareportfor errors encountered during fabric error analysislatest/fabric.0.0.errors.stderrstderrofopareportduring fabric error analysislatest/fabric.0:0.compsstdoutofopareportfor fabric components and SMA configurationlatest/fabric.0:0.comps.stderrstderrofopareportfor fabric componentslatest/fabric.0:0.comps.diffdiffof baseline and latest fabric componentslatest/fabric.0:0.linksstdoutofopareportsummary of internal and external linkslatest/fabric.0:0.links.stderrstderrofopareportsummary of internal and external linkslatest/fabric.0:0.links.diffdiffof baseline and latest fabric internal and external linkslatest/fabric.0:0.links.changes.stderrstderrofopareportcomparison of linkslatest/fabric.0:0.links.changesopareportcomparison of links against baseline. This is typically easier to read than thelinks.difffile and contains the same information.latest/fabric.0:0.comps.changes.stderrstderrofopareportcomparison of componentslatest/fabric.0:0.comps.changesopareportcomparison of components against baseline. This is typically easier to read than thecomps.difffile and contains the same information.
The .diff and .changes files are only created if differences are detected.
If the -s option is used and failures are detected, files related to the checks that failed are also copied to the time-stamped directory name under FF_ANALYSIS_DIR.
Fabric Items Checked Against the Baseline
Based on opareport -o links:
Unconnected/down/missing cables
Added/moved cables
Changes in link width and speed
Changes to Node GUIDs in fabric (replacement of SuperNIC or Switch hardware)
Adding/Removing Nodes [FI, Virtual FIs, Virtual Switches, Physical Switches, Physical Switch internal switching cards (leaf/spine)]
Changes to server or switch names
Based on opareport -o comps:
Overlap with items from links report
Changes in port speed/width enabled or supported
Changes in SuperNIC or switch device IDs/revisions/VendorID (for example, ASIC hardware changes)
Changes in port Capability mask (which features/agents run on port/server)
Changes to ErrorLimits and PKey enforcement per port
Changes to IOUs/IOCs/IOC Services provided
Note
Only applicable if IOUs are in the fabric (such as Virtual IO cards, native storage, and others).
Location (port, node) and number of SMs in fabric. Includes:
Primary and backups
Configured priority for SM
Fabric Items Also Checked During Health Check
Based on opareport -s -C -o errors -o slowlinks:
PMA error counters on all Omni-Path Fabric ports (SuperNIC, switch external and switch internal) checked against configurable thresholds.
Counters are cleared each time a health check is run. Each health check reflects a counter delta since the last health check.
Typically identifies potential fabric errors, such as symbol errors.
May also identify transient congestion, depending on the counters that are monitored.
Link active speed/width as compared to Enabled speed.
Identifies links whose active speed/width is < min (enabled speed/width on each side of link).
This typically reflects bad cables or bad ports or poor connections.
Side effect is the verification of SA health.