CSP Host Communication Redundancy

Overview

You can set up host communication redundancy in a CSP system in an application where a redundant local area network (LAN) segment is required to protect against LAN failure. To configure redundancy, you connect switch nodes with two CSP Matrix Controller Series 3 cards to redundant hubs connected to one or more hosts.

Redundant links enable the host to maintain communications to the CSP and to keep running if a single component in the system fails. If a faulty primary link is detected, the host application can switch over its requests to the secondary link.

Sample Topology

Ethernet Host Communication Redundancy shows a topology that supports host communication redundancy. In this example, two hubs and four nodes are connected to one or more hosts. All the nodes have dual CSP Matrix Controller Series 3 cards, and all the nodes are on the same physical network. The letters X, Y, Z, and Z’ designate points of failure:

 

• X - Ethernet link failure between the host and the CSP

The cable or host is defective, or the port on either the host or the hub is defective.

• Y - Hub failure

The hub is defective, and all ports and devices directly connected to it are affected. No communication can take place between any of its ports. The standby components (B side) are not affected, and host ports on that side are still connected to each other.

• Z - Ethernet link failure between the hub and the CSP Matrix Controller Series 3 card

The cable is defective, or the port on either the hub or the CSP Matrix Controller Series 3 card is defective.

• - Switch CSP Matrix Controller Series 3 card failure

The CSP Matrix Controller Series 3 card is defective

Figure 5-18 Ethernet Host Communication Redundancy

Failure Detection and Recovery

In the redundant system shown in Ethernet Host Communication Redundancy, failure detection and recovery occurs as follows:

• If the host detects a failure in one of the Ethernet links between the host and the CSP (as designated by X in Ethernet Host Communication Redundancy), it opens new socket connections to all four A CSP Matrix Controller Series 3 cards through Ethernet link B. If successful, the recovery is complete and the A CSP Matrix Controller Series 3cards are still in control of the switch. If unsuccessful, the host connects to the standby B CSP Matrix Controller Series 3 cards of any node to which a connection to the A CSP Matrix Controller Series 3 card has failed and causes a switchover on that node only.

• If the host and the A CSP Matrix Controller Series 3 cards detect a hub failure (as designated by Y in Ethernet Host Communication Redundancy), the host tries to reconnect to the A CSP Matrix Controller Series 3 cards through Hub B. If unsuccessful, the host connects to the B CSP Matrix Controller Series 3 cards and requests all four of them to become active. You can minimize the recovery process by connecting half of the CSP Matrix Controller Series 3 cards to Hub A and half to Hub B. In this configuration a hub failure will affect only half of the CSP nodes.

• If the host detects a failure in one of the Ethernet links between the hub and the CSP Matrix Controller Series 3 card A in switch node 4 (designated by Z in Ethernet Host Communication Redundancy), it opens a new socket to CSP Matrix Controller Series 3 card A through Ethernet link B. If unsuccessful, the host connects to CSP Matrix Controller Series 3 card B on Switch Node 4 and causes it to become active.

• If the host detects a failure in CSP Matrix Controller Series 3 card A in Switch Node 4 (designated by Z’ in Ethernet Host Communication Redundancy), the host tries to connect to CSP Matrix Controller Series 3 card B in Switch Node 4 through Ethernet link B, but CSP Matrix Controller Series 3 card B detects the failure and becomes active, which is indicated by the Poll message sent to the host connected to CSP Matrix Controller Series 3 card B.

General Strategy

In all scenarios, use the following general strategy for the host:

• If a TCP/IP connection to an active switch CSP Matrix Controller Series 3 card is broken, the host should try to reconnect to the original active CSP Matrix Controller Series 3 card (A) through the redundant Ethernet link. If that fails, the host should then connect to the standby CSP Matrix Controller Series 3 card (B) and cause a switchover.

• If a connection to a standby CSP Matrix Controller Series 3 card is broken, the only recovery needed is to alert Systems Support to restore redundancy on the B side.

• A CSP Matrix Controller Series 3 card switchover is caused only by a severe hardware or software fault, or when initiated by the host using the Become Active message. The system software never assumes that a switchover is needed based on any problem conditions related to the state of host connections.

You can minimize the disruption of a complete hub failure by using hubs with some built-in level of redundancy. The most useful type to consider is power supply redundancy.