3.6.2. Verify the Fabric
This section covers the use of the Fabric Manager Sweep. For more details, refer to the CN5000 Product Family Description Guide, Fabric Sweeping.
3.6.2.1. Verifying Fabric Manager Sweep
By default, the Fabric Manager sweeps every five minutes as defined in the /etc/opa-fm/opafm.xml file. Sweeps are triggered sooner if there are fabric changes such as hosts, switches, or links going up or down. Edit /var/log/messages and search for CYCLE START. Each cycle start has a complementary cycle end. Any links with errors are noted during this sweep cycle.
An example of a clean FM sweep follows:
Feb 16 16:12:08 hds1fnb8261 fm0_sm[3946]: PROGR[topology]: SM: topology_main: TT: DISCOVERY CYCLE START - REASON: Scheduled sweep interval Feb 16 16:12:08 hds1fnb8261 fm0_sm[3946]: PROGR[topology]: SM: topology_main: DISCOVERY CYCLE END. 9 SWs, 131 HFIs, 131 end ports, 523 total ports, 1 SM(s), 1902 packets, 0 retries, 0.350 sec sweep
Compare the sweep results with opafabricinfo and the fabric topology.
3.6.2.2. Modifying the Fabric Manager Routing Algorithm
If long Fabric Manager (FM) sweep times are observed or FM sweeps do not finish when a large number of nodes are bounced, consider changing the FM routing algorithm to fattree from the default shortestpath. Do this by updating the /etc/opa-fm/opafm.xml file as shown in the following example:
<!-- **************** Fabric Routing **************************** -->
<!-- The following Routing Algorithms are supported -->
<!-- shortestpath - pick shortest path and balance lids on ISLs -->
<!-- dgshortestpath - A variation of shortestpath that uses the -->
<!-- RoutingOrder parameter to control the order in which -->
<!-- switch egress ports are assigned to LIDs being routed -->
<!-- through the fabric. This can provide a better balance -->
<!-- of traffic through fabrics with multiple types of end -->
<!-- nodes. -->
<!-- See the <DGShortestPathTopology> section, below, for -->
<!-- more information. -->
<!-- fattree - A variation of shortestpath with better balancing -->
<!-- and improved SM performance on fat tree-like fabrics. -->
<RoutingAlgorithm>fattree</RoutingAlgorithm>3.6.2.2.1. LogFile
Log message output is controlled by the log_level parameter in /etc/rdma/ibacm_opts.cfg. When this parameter is given, the I/O path events are redirected to the specified log file. Error events in the I/O path library, oibutils/libibumad, are printed to the screen by default.
LogFile=/var/log/ibacm.log
The LogFile and Dbg parameters are used primarily for debugging purposes.
3.6.2.3. Verifying PM Sweep Duration
To show the PM sweep duration, perform the following steps:
Open
opatop, then selecti.opatop: Img:Tue Feb 16 01:54:43 2016, Hist Now:Tue Feb 16 09:53:26 2016 Image Info: Sweep Start: Tue Feb 16 01:54:43 2016 Sweep Duration: 0.001 Seconds Num SW-Ports: 3 HFI-Ports: 2 Num SWs: 1 Num Links: 2 Num SMs: 2 Num Fail Nodes: 0 Ports: 0 Unexpected Clear Ports: 0 Num Skip Nodes: 0 Ports: 0
Select
rto traverse the previous sweep duration time from history files. By default, PM sweeps every ten seconds. The latest ten image files (100 sec) are stored in RAM and up to 24 hours of history is stored in/var/usr/lib/opa-fm.
3.6.2.4. Verifying Credit Loop Operation
For details on credit loops, refer to the CN5000 Topologies and Routing Guide, Credit Loops.
To verify that a fabric does not have a credit loop issue, use:
# opareport -o validatecreditloops
The output should report similar to the following where no credit loops are detected:
Fabric summary: 135 devices, 126 HFIs, 9 switches, 504 connections, 16880 routing decisions, 15750 analyzed routes, 0 incomplete routes Done Building Graphical Layout of All Routes Routes are deadlock free (No credit loops detected)