Skip to main content

Cornelis Technical Documentation

3.6.2. Verify the Fabric

This section covers the use of the Fabric Manager Sweep. For more details, refer to the CN5000 Product Family Description Guide, Fabric Sweeping.

3.6.2.1. Verifying Fabric Manager Sweep

By default, the Fabric Manager sweeps every five minutes as defined in the /etc/opa-fm/opafm.xml file. Sweeps are triggered sooner if there are fabric changes such as hosts, switches, or links going up or down. Edit /var/log/messages and search for CYCLE START. Each cycle start has a complementary cycle end. Any links with errors are noted during this sweep cycle.

An example of a clean FM sweep follows:

Feb 16 16:12:08 hds1fnb8261 fm0_sm[3946]: PROGR[topology]: SM: topology_main: TT: DISCOVERY CYCLE START - REASON: Scheduled sweep interval
Feb 16 16:12:08 hds1fnb8261 fm0_sm[3946]: PROGR[topology]: SM: topology_main: DISCOVERY CYCLE END. 9 SWs, 131 HFIs, 131 end ports, 523 total ports, 1 SM(s), 1902 packets, 0 retries, 0.350 sec sweep

Compare the sweep results with opafabricinfo and the fabric topology.

3.6.2.2. Modifying the Fabric Manager Routing Algorithm

If long Fabric Manager (FM) sweep times are observed or FM sweeps do not finish when a large number of nodes are bounced, consider changing the FM routing algorithm to fattree from the default shortestpath. Do this by updating the /etc/opa-fm/opafm.xml file as shown in the following example:

<!-- **************** Fabric Routing **************************** -->
<!-- The following Routing Algorithms are supported -->
<!-- shortestpath - pick shortest path and balance lids on ISLs -->
<!-- dgshortestpath - A variation of shortestpath that uses the        -->
<!--            RoutingOrder parameter to control the order in which   -->
<!--            switch egress ports are assigned to LIDs being routed  -->
<!--            through the fabric. This can provide a better balance  -->
<!--            of traffic through fabrics with multiple types of end  -->
<!--            nodes.                                                 -->
<!--            See the <DGShortestPathTopology> section, below, for   -->
<!--            more information.                                      -->
<!-- fattree -  A variation of shortestpath with better balancing      -->
<!--            and improved SM performance on fat tree-like fabrics.  -->
<RoutingAlgorithm>fattree</RoutingAlgorithm>
3.6.2.2.1. LogFile

Log message output is controlled by the log_level parameter in /etc/rdma/ibacm_opts.cfg. When this parameter is given, the I/O path events are redirected to the specified log file. Error events in the I/O path library, oibutils/libibumad, are printed to the screen by default.

LogFile=/var/log/ibacm.log

The LogFile and Dbg parameters are used primarily for debugging purposes.

3.6.2.3. Verifying PM Sweep Duration

To show the PM sweep duration, perform the following steps:

  1. Open opatop, then select i.

    opatop: Img:Tue Feb 16 01:54:43 2016, Hist  Now:Tue Feb 16 09:53:26 2016
    Image Info:
     Sweep Start: Tue Feb 16 01:54:43 2016
     Sweep Duration: 0.001 Seconds
    
     Num SW-Ports:       3  HFI-Ports:       2
     Num SWs:            1  Num Links:       2  Num SMs:         2
    
     Num Fail Nodes:       0  Ports:       0  Unexpected Clear Ports: 0
     Num Skip Nodes:       0  Ports:       0
  2. Select r to traverse the previous sweep duration time from history files. By default, PM sweeps every ten seconds. The latest ten image files (100 sec) are stored in RAM and up to 24 hours of history is stored in /var/usr/lib/opa-fm.

3.6.2.4. Verifying Credit Loop Operation

For details on credit loops, refer to the CN5000 Topologies and Routing Guide, Credit Loops.

To verify that a fabric does not have a credit loop issue, use:

# opareport -o validatecreditloops

The output should report similar to the following where no credit loops are detected:

Fabric summary: 135 devices, 126 HFIs, 9 switches,
504 connections, 16880 routing decisions,
15750 analyzed routes, 0 incomplete routes
Done Building Graphical Layout of All Routes
Routes are deadlock free (No credit loops detected)