ClusterXL and State Synchronization
The health of ClusterXL can be examined using a number of different commands:
cphaprob –a if
cphaprob state
cphaprob list
cpstat ha –f all | more
fw ctl pstat
Use the ‘cphaprob –a if’ command on the cluster members to check which interfaces have been configu
red for state synchronization and verify the sync mode is consistent on the cluster members:
Example output:
[Expert@Zulu]# cphaprob –a if
eth1c0 non sync(non secured)
eth2c0 non sync(non secured)
eth3c0 non sync(non secured)
eth4c0 sync(secured), multicast
Virtual cluster interfaces: 3
eth1c0 192.168.1.1
eth2c0 192.168.2.1
eth3c0 10.1.1.1
[Expert@Zulu]#
[Expert@Shaka]# cphaprob –a if
eth1c0 non sync(non secured)
eth2c0 non sync(non secured)
eth3c0 non sync(non secured)
eth4c0 sync(secured), broadcast
Virtual cluster interfaces: 3
eth1c0 192.168.1.1
eth2c0 192.168.2.1
eth3c0 10.1.1.1
[Expert@Shaka]#
In the above example, interface eth4c0 hasbeen configured on both cluster members for state sync but
the sync mode is inconsistent, one is using multicast and the other broadcast mode. Ensure the c
luster members use the same mode. (The default mode is multicast.)
The following document explains how to change between broadcast and multicast mode:
sk20576: How to set ClusterXL Control Protocol (CCP) in broadcast mode in ClusterXL
Use the ‘cphaprob state’ command to check if state sync is up and running. The local and remote
state synchronization IP addresses should be displayed and their state should be shown as ‘Active’ on
the HA Master and ‘Standby’ on the HA Backup. In a load-sharing
cluster the state should be shown as
‘Active’ on both the local and remote firewalls:
Example output - HA:
[Expert@Zulu]# cphaprob state
Cluster Mode: New High Availability (Active Up)
Number Unique Address Assigned Load State
1 (local)
1.1.1.1
100%
Active
2 1.1.1.2 0% Standby
[Expert@Zulu]#
In a HA cluster configuration (above), one member should be Active and the other Standby.
Example output – Load-Sharing:
[Expert@Dingaan]# cphaprob state
Cluster Mode: New High Availability (Active Up)
Number Unique Address Assigned Load State
1 (local)
1.1.1.3
50%
Active
2 1.1.1.4 50% Active
[Expert@Dingaan]#
In a load-sharing cluster configuration (above), both members should be shown as Active.
Example output – HA or Load-Sharing:
[Expert@Zulu]# cphaprob state
Cluster Mode: New High Availability (Active Up)
Number Unique Address Assigned Load State
1 (local) 1.1.1.1 100% Active
[Expert@Zulu]#
Remote cluster partner is missing!
If the remote partner is not shown it will be usually be due to one of the following:
There is no network connectivity between the members of the cluster on the state sync
network
The partner does not have state synchronization enabled
One partner is using broadcast mode and the other is using multicast mode
One of the monitored processes has an issue, such as no policy loaded
The partner firewall is down.
Example output - HA or Load-Sharing:
[Expert@Zulu]# cphaprob state
Cluster Mode: New High Availability (Active Up)
Number Unique Address Assigned Load State
1 (local)
1.1.1.1
100%
Active
2 1.1.1.2 0% Ready
[Expert@Zulu]#
Partner is in the ‘Ready’ state. If one of the partners is in the ‘Ready’ state it indicates that
there is an issue with state synchronization.
The ‘Ready’ state is normally caused by another member of the cluster running
a higher version of code or HFA, for example, as would happen during
an upgrade. This state is also seen when CoreXL has been configured to use a
different number of cores on the individual cluster members. For further information see:
sk42096: Cluster member with CoreXL is in 'Ready' state
The ‘Ready’ state can also occur if a cluster member receives state synchronization traffic from a
different cluster that is using the same mac magic number and the other cluster is running a
higher version of code. For further information see:
sk36913: Connecting several clusters on the same network
Example output - HA or Load-Sharing:
[Expert@Zulu]# cphaprob state
Cluster Mode: New High Availability (Active Up)
Number Unique Address Assigned Load State
1 (local)
1.1.1.1
100%
Active
2 1.1.1.2 0% Down
[Expert@Zulu]#
A remote cluster member is in the ‘Down’ state indicates
that there is either a problem on the remote member or the state synchronization network between the c
luster members is broken.
To investigate why a member shows itself to be locally ‘Down’ use the ‘cpstat ha –f all | more’ comma
nd on the firewall that shows ‘Down’. This command displays the Problem Notification Table and
the state of health of the monitored processes:
Example output (truncated):
[Expert@Zulu]# cpstat ha –f all | more
Problem Notification table
-------------------------------------------------
|Name |Status |Priority|Verified|Descr|
-------------------------------------------------
|Synchronization|OK | 0| 3383| |
|Filter |OK | 0| 3383| |
|cphad |OK | 0| 0| |
|fwd |OK | 0| 0| |
-------------------------------------------------
All monitored processes have the ‘OK’ status.
Example output (truncated):
[Expert@Shaka]# cpstat ha –f all | more
Problem Notification table
-------------------------------------------------
|Name |Status |Priority|Verified|Descr|
-------------------------------------------------
|Synchronization|problem| 0| 3383| |
|Filter |problem| 0| 3383| |
|cphad |OK | 0| 0| |
|fwd |OK | 0| 0| |
-------------------------------------------------
State synchronization is in a problem state because the policy is unloaded on this cluster member.
Installing the policy will fix this issue.
Alternatively, the ‘cphaprob list’ command displays
the same information plus some additional details:
Example output:
[Expert@Zulu]# cphaprob list
Registered Devices:
Device Name: Synchronization
Registration number: 0
Timeout: none
Current state: OK
Time since last report: 12139.6 sec
Device Name: Filter
Registration number: 1
Timeout: none
Current state: OK
Time since last report: 12124.5 sec
Device Name: cphad
Registration number: 2
Timeout: 5 sec
Current state: OK
Time since last report: 0.6 sec
Device Name: fwd
Registration number: 3
Timeout: 5 sec
Current state: OK
Time since last report: 0.6 sec
All monitored processes are shown as ‘OK’.
Assuming that state synchronization on the cluster is healthy, use the following command to check if t
he state tables are synchronized:
fw tab –t connections –s
Simultaneously execute the command on both cluster members; compare the values
of #VALS. The values
on both firewalls should be similar if the state synchronization mechanism is working unless
a lot of delayed notification is in use.
Example output:
[Expert@Zulu]# fw tab –t connections -s
HOST NAME ID #VALS #PEAK #SLINKS
localhost connections 8158 3222 38026 9820
[Expert@Zulu]#
[Expert@Shaka]# fw tab –t connections -s
HOST NAME ID #VALS #PEAK #SLINKS
localhost connections 8158 3187 38026 9808
[Expert@Shaka]#
The #PEAK may be different depending on the uptime and when the last peak number of connection
s occurred.
The #VALS on a HA pair should always be similar.
Examine the output of the sync section of ‘fw ctl pstat’.
Example output:
Sync: Version: new
Status: Able to Send/Receive sync packets
Sync packets sent:
total : 13880231, retransmitted : 5, retrans reqs : 524, acks : 70
Sync packets received:
total : 692409645, were queued : 720, dropped by net : 517
retrans reqs : 5, received 43019 acks retrans reqs for illegal seq : 0
dropped updates as a result of sync overload: 0
Callback statistics: handled 42940 cb, average delay : 1, max delay : 4
If the dropped by net counter has incremented then some sync packets have been lost and the
problem needs to be investigated to find the cause.
For further information please refer to:
sk34476: Explanation of Sync section in the output of fw ctl pstat command