You are here: CSP Developer’s Guide: Overview > 10 Configuring Multi-Node Systems > Fault Detection and Switchover
Fault Detection and Switchover
The CSP initiates a switchover when any of the following occurs:
• The active EXNET-ONE card detects its own failure while the ring is in service.
• The standby EXNET-ONE card no longer detects its node’s packet information.
• If the active EXNET-ONE card gets isolated due to a link re-validation.
When this occurs, the host is notified with a Ring Status Report (0x72) and Alarm (0xB9) indicating that the active EXNET-ONE card is out-of-service and is not host-initiated. The possible failures, as indicated by the More Status field, are as follows:
• (0xFF) Unknown Failure (Bad Ring)
• (0xFC) Internal Error (Bad Receive Matrix)
• (0xFB) Isolated From Ring
• (0xFA) Internal Diagnostics Failed
• (0xF9) Time-out During Ring Initialization
• (0xF7) Ring Mastership Configuration Change
• (0xF6) Unrecoverable Failure (Looped Out)
• (0xF5) Unsupported Ring Mode
When any of the above failures occur the CSP performs a switchover as described below:
• The active EXNET-ONE card is looped off of the ring, disabling write access to the ring and the local PCM bus. The standby EXNET-ONE card takes over the active role, maintaining the node’s presence on the ring, including all calls established. Internal diagnostics are run on the "Looped Out" EXNET-ONE card. If the diagnostics fail, the EXNET-ONE card will stay in the "Looped Out" state. If diagnostics pass, the EXNET-ONE card will go to the standby state. Since the EXNET-ONE card is looped off of the ring, it can be replaced with another EXNET-ONE card, removed from the system, or, if it is determined that the EXNET-ONE card is still operational, passively added without affecting the ring state. This is also true for the standby EXNET-ONE card, if it unexpectedly removed from the ring.
• The switchover logic for an unexpected failure of the active EXNET-ONE card is handled differently for the master node than it is for a slave node. For the condition where the active slave node fails, the standby node takes over 50 milliseconds after it detects that its node’s packets are no longer being transmitted on the ring. When the active master node fails, the ring looses Frame 125 Sync, resulting in the loss of all data on the ring. The standby master node switches over as master if the ring has not come back into service within 50 milliseconds. Once the switchover has occurred, the new master EXNET-ONE card revalidates each node, which takes about 20 milliseconds per node. As each node is added back to the ring, data between those nodes is allowed to pass again.
Host-Initiated Switchovers
The host can initiate a switchover in the following ways:
• The host sends a Line Card Switchover message (0x24) specifying the originating slot as the active EXNET-ONE card and the destination slot as the standby EXNET-ONE card. If both EXNET-ONE cards are assigned to the same Ring ID, a switchover is performed, where the active EXNET-ONE card goes to the standby state and the standby EXNET-ONE card goes to the active state. The switchover is synchronized between the two EXNET-ONE cards to minimize down time. The standby EXNET-ONE card takes over as active EXNET-ONE card within 10 milliseconds, usually around 2 milliseconds.
• The host de-assigns by removing, resetting, or taking the active EXNET-ONE card out-of-service. When this occurs, the standby EXNET-ONE card automatically switches to the active state, maintaining its node’s presence on the ring.
Important! Removing the active slave EXNET-ONE card in this manner is treated as an unexpected failure, resulting in a switchover after 50 milliseconds or more.