7.4.2.2. Link Down Reason
In order for two link partners to communicate reliably, a set of link states is defined to identify the ability of the link to move management or data traffic. Two indicators provide information about the reason a link went down:
LinkDownReason: The reason the local port initiated a LinkDown from either the LinkInit, LinkArmed, or LinkActive state. It only captures the first reason for why the link is down (if more than one reason before the indicator is cleared).
NeighborLinkDownReason: The value received from the neighbor.
Note
The SM is in charge of clearing both values to permit subsequent reasons to be recorded. The SM clears both these values as part of bringing the link to Armed.
The opasaquery tool can be used to show both the LinkDownReason and the NeighborLinkDownReason: opasaquery -o portinfo.
You can also look at the LinkDownErrorLog, which stores the last eight historical reasons for why the port went down, using: opasaquery -o portinfo -vvv.
Other tools showing the LinkDownReason are opaportinfo and opasmaquery.
The table below describes the LinkDownReason values:
Value | Description |
|---|---|
0: None | |
Corresponding to locally initiated link bounce due to PortErrorAction | |
2: Bad Packet Length | Illegal packet length in the header |
3: Packet Too Long | Packet longer than length |
4: Packet Too Short | Packet shorter than length with normal tail |
5: Bad source LID | Illegal SLID (0, using multicast as SLID. Does not include security validation of SLID) |
6: Bad destination LID | Illegal DLID (0, does not match SuperNIC, multicast DLID on SC15) |
7: Bad L2 | Illegal L2 opcode |
8: Bad SC | Unconfigured SC |
10: Bad Mid Tail | Body/Tail received without a corresponding Head flit |
12: Preempt Error | Preempting with same VL |
13: Preempt VL15 | Preempting a VL15 packet |
14: Bad VL Marker | |
17: Bad Head Distance | Distance violation between two head flits |
18: Bad Tail Distance | Distance violation between two tail flits |
19: Bad Control Distance | Distance violation between two credit LF command flits |
20: Bad Credit Ack | Credits return for unsupported VL |
21: Unsupported VL Marker | |
22: Bad Preempt | Exceeding the interleaving level |
23: Bad Control Flit | Unknown or reserved control flit received—deprecated |
24: Exceed Multicast Limit | |
32: Excessive Buffer Overrun | |
Corresponding to local initiated intentional link down | |
33: Unknown | |
35: Reboot | Reboot or service reset |
36: Neighbor Unknown | Link down was not locally initiated but no LinkGoingDown idle flit was received |
39: FM Bounce | FM initiated bounce by transitioning from LinkUp to Polling. |
40: Speed Policy | Link outside link policy |
41: Width Policy | Link downgrade outside policy |
Corresponding to local initiated intentional link down via transition to Offline or Disabled | |
49: Disconnected | Link can never reach LinkUp |
50: No Local Media Installed | Module is not installed in local port connector |
51: Not Installed | Internal link not installed, due to absence of link partner FRU or backplane |
52: Chassis Config | Chassis management forcing port Offline due to incompatible or absent link partner FRU or backplane |
54: End to End not Installed | Silicon photonics mid-board module installed, but unable to detect link partner silicon photonics, due to absence of some part of the optical interconnect or absence of the remote module |
56: Power Policy | Unable to enable port without exceeding power policy |
57: Link Speed Policy | Link Speed Enabled policy is not able to be met due to a persistent cause |
58: Link Width Policy | Link Width Enabled policy is not able to be met due to a persistent cause such as board design having insufficient lanes. Does not include dynamic reasons such as failed link negotiation or LinkWidthDowngrade below policy |
60: Switch Management | User disabled via switch management interface (CLI, SNMP, Config file, etc.) |
61: SMA Disabled | User disabled via SMA packet changing Physics Port State to Disabled |
63: Transient | Port recently entered Offline and is waiting for a Timeout to ensure synchronization with link partner Physics Port State machine |
If the link is currently down, the LinkDownReason and LinkDownErrorLog will not be available. It will be populated when the link comes back up.
In many cases, the LinkDownReason will be the same as the neighbor ports value of NeighborLinkDownReason and vice versa. However, there are exceptions.
The following table shows the LinkDownReasons that are only applicable to LinkDownReason and will not be used for NeighborLinkDownReason.
Value | Description |
|---|---|
36 | Neighbor Unknown |
49 | Disconnected |
50 | No Local Media Installed |
51 | Not Installed |
54 | End to End not Installed |
The following are some sample combinations of values for a configuration with Device A connected to Device B:
For a link down initiated by device A and device A is able to send the reason:
A.LinkDownReason=X, A.NeighborLinkDownReason=0; B.LinkDownReason=0, B.NeighborLinkDownReason=X
For a link down initiated by device A and B concurrently, where one of the devices is able to send the reason to the other device:
A.LinkDownReason=X, A.NeighborLinkDownReason=Y; B.LinkDownReason=Y, B.NeighborLinkDownReason=X
For a link down initiated by device A and device A is unable to send reason to device B:
A.LinkDownReason=X, A.NeighborLinkDownReason=0; B.LinkDownReason=36 (Neighbor Unknown), B.NeighborLinkDownReason=0
For an unexplained link down and device A or B is unable to send a reason code for the link going down (for example, cable failure):
A.LinkDownReason=36 (Neighbor Unknown), A.NeighborLinkDownReason=0; B.LinkDownReason=36 (Neighbor Unknown), B.NeighborLinkDownReason=0
For a link down initiated by device A due to hard failure (for example, power loss, hard reset, ASIC fault, FW/driver crash, etc) and device A is unable to send reason to device B:
A.reason codes are inaccessible, powers back up as 0,0; B.LinkDownReason=36 (Neighbor Unknown), B.NeighborLinkDownReason=0