Pages

Fixing PowerHA Error: Cluster IPC error: The cluster manager on node is in ST_INIT or NOT_CONFIGURED state

In IBM PowerHA SystemMirror (formerly HACMP), you may encounter an error message like this during cluster operations, synchronization, or while running clmgr commands:

Cluster IPC error: The cluster manager on node note1 is in ST_INIT or NOT_CONFIGURED state and cannot process the IPC request.

This error prevents cluster commands from completing successfully and indicates that inter-node communication (IPC) within the cluster has failed.

What This Error Means:

PowerHA nodes communicate through IPC (Inter-Process Communication) between the clstrmgrES daemons on each node.

This message tells you that the target node’s cluster manager process (clstrmgrES) is either:

  • ST_INIT The cluster manager is initializing — services haven’t fully started yet.
  • NOT_CONFIGURED The node’s local PowerHA configuration (ODM repository) isn’t set up or is corrupted.

So when you try to synchronize, verify, or query the cluster, that node cannot respond.

Root Causes:

This error can happen for several reasons:

  • Cluster manager service not running (clstrmgrES daemon stopped or failed).
  • ODM or repository corruption in /etc/es/objrepos/HACMP* files.
  • Network communication issue — PowerHA communication interfaces down or hostname mismatch.
  • Cluster configuration mismatch between nodes.
  • Partial node configuration (node added but not yet synchronized).


Step-by-Step Fix:


Step 1: Check the Cluster Manager Daemon

On the affected node (note1):

# ps -ef | grep clstrmgrES

If not running, start it:

# startsrc -s clstrmgrES

Confirm it’s active:

# lssrc -s clstrmgrES

Expected output:

Subsystem         Group            PID          Status

clstrmgrES       cluster          123456       active

If the status shows “inoperative,” check /tmp/hacmp.out and /var/hacmp/log/clstrmgr.debug for startup errors.


Step 2: Verify Network and Hostname Resolution

PowerHA depends on stable communication between cluster nodes.

Check interface status:

# netstat -in

# lsdev -Cc if

Verify that all cluster nodes resolve correctly:

# host note1

# host <other_node>

If any hostnames fail, fix /etc/hosts or DNS before continuing.


Step 3: Validate ODM (Object Repository) Integrity

Check that the cluster repository files exist and are consistent:

# ls -l /etc/es/objrepos/HACMP*

If they are missing, empty, or corrupted, restore them from a healthy node:

# scp root@healthy_node:/etc/es/objrepos/HACMP* /etc/es/objrepos/

Then verify:

# clmgr verify cluster


Step 4: Restart PowerHA Cluster Services

Once network and configuration are healthy, restart PowerHA services on the affected node:

# stopsrc -s clstrmgrES

sleep 5

# startsrc -s clstrmgrES

Monitor startup logs:

# tail -f /tmp/hacmp.out

# tail -f /var/hacmp/log/clstrmgr.debug

You should see messages showing the node transitioning from ST_INIT → UP.


Step 5: Synchronize Cluster Configuration

Once all nodes are up and communicating:

# clmgr sync cluster <cluster_name>

If successful, you’ll see:

Synchronization completed successfully.


Step 6: Confirm Cluster Health

Finally, verify cluster and node status:

# clmgr query cluster

# clmgr status cluster

Expected healthy output:

Cluster: mycluster

  State: UP

  Node note1: UP

  Node note2: UP

No comments:

Post a Comment