It ensures near-continuous application uptime by automating failure detection, failover, and recovery — making it a cornerstone of mission-critical AIX deployments.
PowerHA SystemMirror:
PowerHA SystemMirror provides a robust framework for high availability and disaster recovery in AIX environments. It integrates deeply with Cluster Aware AIX (CAA) and Reliable Scalable Cluster Technology (RSCT), creating an intelligent cluster that can detect failures, reassign resources, and recover services automatically.
With PowerHA, applications continue running seamlessly, even during hardware, network, or node failures.
PowerHA SystemMirror provides a robust framework for high availability and disaster recovery in AIX environments. It integrates deeply with Cluster Aware AIX (CAA) and Reliable Scalable Cluster Technology (RSCT), creating an intelligent cluster that can detect failures, reassign resources, and recover services automatically.
With PowerHA, applications continue running seamlessly, even during hardware, network, or node failures.
Core Architecture and Components:
At its foundation, PowerHA clusters consist of nodes, networks, shared storage, and resources coordinated by a suite of daemons and management utilities.
At its foundation, PowerHA clusters consist of nodes, networks, shared storage, and resources coordinated by a suite of daemons and management utilities.
Cluster Nodes:
- Each node runs AIX and participates in the cluster.
- Supports up to 32 nodes for large-scale environments.
Cluster Networks:
Node1_Boot: 192.168.10.101 Node2_Boot: 192.168.10.102 Service IP: 192.168.10.201
Shared Storage:
Repository Disk: hdisk2 Application VG: nfs_vg Logical Volume: nfs_lv
PowerHA Cluster Daemons and Services
lstrmgrES → Cluster manager; maintains heartbeat, manages events, failover logic.
- Internal (heartbeat) network for node-to-node communication.
- Service network for client access using floating service IPs.
Node1_Boot: 192.168.10.101 Node2_Boot: 192.168.10.102 Service IP: 192.168.10.201
Shared Storage:
- Repository Disk: Stores cluster configuration and locking data.
- Application/NFS Disks: Shared via Enhanced Concurrent Volume Groups (ECVGs).
Repository Disk: hdisk2 Application VG: nfs_vg Logical Volume: nfs_lv
PowerHA Cluster Daemons and Services
| Daemon | Function |
|---|
clcomdES → Handles node-to-node communication.
cllockd → Provides distributed resource locking.
gsclvmd → Manages Enhanced Concurrent Volume Groups (ECVGs).
clsmuxpd → Delivers cluster status monitoring services.
These daemons ensure that the cluster maintains consistency, communicates efficiently, and executes failovers seamlessly.
PowerHA Failover Process (Logical Flow)
- Node1 hosts the active resources — applications, service IPs, and shared disks.
- A heartbeat failure or node crash is detected by clstrmgrES via CAA.
- Cluster daemons automatically relocate resources to Node2.
- The Service IP and associated applications start on Node2.
- Clients continue to connect without service interruption.
This automated recovery process ensures minimal downtime and data consistency.
PowerHA Startup & Failover Policies
Start cluster on home node only
# clmgr start cluster -o home
Failover to next node
Failover to next node
# clmgr set failover -g app_rg -m next
Never fallback
Never fallback
# clmgr set fallback -g app_rg -m never
Stop cluster gracefully
Stop cluster gracefully
# clmgr stop cluster -m graceful
Key Features of PowerHA 7.2
- Supports up to 32-node clusters.
- Full integration with CAA and RSCT frameworks.
- Enhanced Concurrent Volume Groups (ECVGs) for shared disk access.
- Flexible Startup, Failover, and Fallback policies.
- Dynamic Automatic Reconfiguration (DARE) snapshots for live configuration capture.
- Simplified management via C-SPOC (Cluster Single Point of Control).
Required Filesets
Ensure the following filesets are installed on all nodes:
Ensure the following filesets are installed on all nodes:
- cluster.es.client – Client components
- cluster.es.server – Server components
- cluster.es.cspoc – Cluster Single Point of Control
- bos.clvm – Required for enhanced concurrent volume groups
Cluster Awareness and Communication: CAA Integration
Cluster Aware AIX (CAA) is the kernel-level clustering infrastructure beneath PowerHA.
It handles:
Cluster Aware AIX (CAA) is the kernel-level clustering infrastructure beneath PowerHA.
It handles:
- Heartbeat monitoring
- Repository disk access
- Network and node failure detection
- Cluster configuration synchronization
- clcomd – Communication handler
- clconfd – Synchronizes configuration changes (~ every 10 minutes)
- ctrmc – Monitors resources (part of RSCT)
- clstrmgrES – PowerHA’s cluster manager daemon
The CAA Repository Disk
The repository disk is the central coordination point for PowerHA clusters.
Key Facts:
The repository disk is the central coordination point for PowerHA clusters.
Key Facts:
- Dedicated use only — cannot store application data.
- Typical size: 512 MB – 10 GB
- Managed exclusively by CAA (not standard LVM).
- Ensures consistency across all cluster nodes.
- Recommended: RAID and multipathing for redundancy.
Deadman Switch (DMS): Cluster Safety Mechanism
The Deadman Switch (DMS) protects cluster integrity by detecting hung or isolated nodes.
Modes:
The Deadman Switch (DMS) protects cluster integrity by detecting hung or isolated nodes.
Modes:
- Mode "a" (assert): Forces node crash to prevent split-brain.
- Mode "e" (event): Triggers an AHAFS event for manual intervention.
RSCT – Reliable Scalable Cluster Technology
RSCT is the backbone of PowerHA, providing monitoring, event handling, and system coordination.
Components:
RSCT is the backbone of PowerHA, providing monitoring, event handling, and system coordination.
Components:
- RMC (Resource Monitoring and Control): Tracks cluster resources.
- HAGS (Group Services): Handles cluster messaging and coordination.
- HATS (Topology Services): Monitors heartbeat and detects failures.
- SRC (System Resource Controller): Manages daemon processes.
RSCT organizes nodes into:
- Peer Domains (operational clusters)
- Management Domains (administrative supervision)
PowerHA Cluster Services
PowerHA relies on tightly integrated services to ensure continuous operation:
PowerHA relies on tightly integrated services to ensure continuous operation:
- clstrmgrES: Main cluster manager
- clevmgrdES: Manages shared LVM coordination
- clinfoES: Provides monitoring and status info
- RSCT and CAA daemons: Enable communication, health checks, and configuration sync
Cluster Verification: clverify
Before deployment or after any configuration change, PowerHA uses clverify to check cluster consistency.
It detects:
# cat /var/hacmp/clverify/clverify.log
Verification can be run via CLI or SMIT, ensuring a healthy cluster before going live.
C-SPOC (Cluster Single Point of Control)
C-SPOC simplifies administration by letting you manage the entire cluster from a single node.
Functions include:
Before deployment or after any configuration change, PowerHA uses clverify to check cluster consistency.
It detects:
- Network misconfigurations
- Volume group mismatches
- Missing resources
# cat /var/hacmp/clverify/clverify.log
Verification can be run via CLI or SMIT, ensuring a healthy cluster before going live.
C-SPOC (Cluster Single Point of Control)
C-SPOC simplifies administration by letting you manage the entire cluster from a single node.
Functions include:
- Synchronizing configuration changes across all nodes
- Managing volume groups and user accounts
- Propagating commands securely via clcomd
Application Server & Monitor
Application Server: Hosts the clustered application.
Application Monitor: Ensures service health via two methods:
Application Server: Hosts the clustered application.
Application Monitor: Ensures service health via two methods:
- Process Monitoring: Tracks app processes through RSCT.
- Custom Monitoring: Uses scripts to validate service functionality.
DARE (Dynamic Automatic Reconfiguration) Snapshot
DARE snapshots capture complete cluster configurations live, allowing rollback or restoration.
DARE snapshots capture complete cluster configurations live, allowing rollback or restoration.
- Stored in /usr/es/sbin/cluster/snapshots
- Used for troubleshooting, change rollback, or migration
- Works without stopping the cluster
Network Topology, Persistent IPs, and Service IPs
Persistent Node IPs:
Static IPs for administrative access; remain on the same node.
Service IPs:
Floating IPs tied to resource groups; move automatically during failover.
Persistent Node IPs:
Static IPs for administrative access; remain on the same node.
Service IPs:
Floating IPs tied to resource groups; move automatically during failover.
Redundant networks, multiple NICs, and heartbeat links provide fault tolerance and seamless failover.
Logs and Diagnostics
Log File Description
Log File Description
/var/hacmp/log/clstrmgr.debug → Cluster manager debug logs
/var/hacmp/adm/cluster.log → General cluster events
/var/hacmp/clverify/clverify.log → Verification logs
/var/hacmp/log/cspoc.log → CSPOC operations
/var/hacmp/adm/history/ → Daily cluster activity
No comments:
Post a Comment