PowerHA Cluster Installation & Configuration

Setting up a PowerHA SystemMirror (formerly HACMP) cluster requires careful planning and execution to ensure high availability for critical applications. This guide walks through the steps to configure a two-node AIX cluster using shared storage and CAA (Cluster Aware AIX).

Step 1: Install PowerHA

Mount NFS share both the nodes

# mount nimserver1:/export/powerHA /mnt

Change to the PowerHA installation directory:

# cd /mnt/HA7.x/

Install all packages:

# installp -acgXYd . all

# vi ~/.profile

export PATH=$PATH:/usr/es/sbin/cluster/utilities:/usr/es/sbin/cluster/bin:/usr/es/sbin/cluster

# . ~/.profile

Step 2: Generate SSH Key Pair

On Node1 (repeat for Node2):

# su - root

# ssh-keygen -t rsa -b 4096

Press Enter for default file location.

Leave passphrase empty (important for passwordless access).

Step 3: Copy Public Key to Other node

On node1, view public key:

# cat /root/.ssh/id_rsa.pub

On node2, append it to /root/.ssh/authorized_keys:

# vi /root/.ssh/authorized_keys

# (paste the key from node1)

# chmod 600 /root/.ssh/authorized_keys

Step 4: Set Correct Permissions

On all nodes:

# chmod 700 /root/.ssh

# chmod 600 /root/.ssh/authorized_keys

# chmod 600 /root/.ssh/id_rsa

# chmod 644 /root/.ssh/id_rsa.pub

Step 5: Test Node-to-Node Communication

Test remote shell communication using clrsh:

# ssh node1 hostname # Output: node1

# ssh node2 hostname # Output: node2

If tests fail, the cluster cannot be created until the communication issue is resolved.

Step 6: Verify Shared Storage

Before creating a cluster, ensure all shared disks are visible on both nodes:

# lspv

Check all shared datavg hdisks and the hdisk intended for the CAA repository.

Ensure all shared hdisks have reserve_policy = no_reserve:

# lsattr -El hdiskXX

Example:

for i in hdisk4 hdisk5 hdisk6 hdisk7 hdisk8 hdisk9

lsattr -El $i | grep "no_reserve"

done

reserve_policy no_reserve Reserve Policy True

# chdev -l hdiskXX -a reserve_policy=no_reserve

The hdisk used for the CAA repository should be new and never part of a VG.

Step 7: Verify Network Interfaces

Run ifconfig -a on both nodes to identify boot IP addresses for all adapters.

At this stage, there should be no alias IPs configured.

Step 8: Configure rhosts for Cluster Communication

Add all boot IPs to /etc/cluster/rhosts on both nodes (one IP per line, no comments).

Ensure /etc/hosts resolves all IPs locally.

Configure /etc/netsvc.conf to prioritize local hostname resolution:

hosts=local4,bind4

Addressing the netmon.cf Warning

# vi /usr/es/sbin/cluster/netmon.cf

! REQD <gateway_IP>

Example:

! REQD 192.168.10.1

Step 9: Restart Cluster Communication Daemon

# stopsrc -s clcomd; sleep 5; startsrc -s clcomd

Step 10: Create the Cluster

On node1, add the cluster and nodes:

# /usr/es/sbin/cluster/utilities/clmgr add cluster <cluster_name> NODES="node1 node2"

Example:

# /usr/es/sbin/cluster/utilities/clmgr add cluster dcp_cluster NODES="indcppha01 indcppha02"

Add a repository disk (disable validation for a new disk):

# /usr/es/sbin/cluster/utilities/clmgr add repository <Repository_Disk_PVID> DISABLE_VALIDATION=true

Example:

# chdev -l hdisk4 -a pv=yes --run both nodes

varify PVID both the node

# lspv | grep 00659fa7e0f7be1f

# /usr/es/sbin/cluster/utilities/clmgr add repository 00659fa7e0f7be1f DISABLE_VALIDATION=true

Successfully added a primary repository disk.

To view the complete configuration of repository disks use:

"clmgr query repository" or "clmgr view report repository"

clmgr query repository

hdisk4 (00659fa7e0f7be1f)

# clmgr view report repository

dcp_cluster :

00659fa7e0f7be1f hdisk4(indcppha01) active

No backup repository

Synchronize the cluster configuration:

# /usr/es/sbin/cluster/utilities/clmgr sync cluster

# smitty hacmp --> Cluster Applications and Resources --> Verify and Synchronize Cluster Configuration

Verify cluster status on both nodes:

# lscluster -m # Should show each node as UP

# lssrc -a | grep cthags # Should show cthags active

Step 11: Create Resource Group and Service IP

Create a resource group:

# /usr/es/sbin/cluster/utilities/clmgr add resource_group <RG_Name> NODES=node1,node2

Example:

# /usr/es/sbin/cluster/utilities/clmgr add resource_group dcp_rg NODES=indcppha01,indcppha02

Create a service IP:

# /usr/es/sbin/cluster/utilities/clmgr add service_ip <Service_IP_Label> NETWORK=<Network_Name>

Example:

Verify network configuration

/usr/es/sbin/cluster/utilities/clmgr query network

cat /etc/hosts | grep dcpapps

192.168.10.18 dcpapps.ppc.com dcpapps

# /usr/es/sbin/cluster/utilities/clmgr add service_ip dcpapps NETWORK=net_ether_02

Step 12: Create Shared Volume Group

# /usr/es/sbin/cluster/sbin/cl_mkvg -f -n -cspoc -n 'node1,node2' -r '<RG_Name>' -y '<VG_Name>' -V '<VG_Major_Number>' -E <hdiskx_PVID>

Example:

# ls -l /dev/hdisk5

brw------- 1 root system 14, 7 Mar 11 09:48 /dev/hdisk5

lspv | grep 00f33172e0f7ffaa

hdisk5 00f33172e0f7ffaa None

# /usr/es/sbin/cluster/sbin/cl_mkvg -f -n -cspoc -n 'indcppha01,indcppha02' -r 'dcp_rg' -y 'dcpvg01' -V '14' -E 00f33172e0f7ffaa

smitty hacmp --> System Management (C-SPOC) --> Storage --> Volume Groups --> Create a Volume Group

Selete indcppha01 & indcppha02 --> select Bigvg --Select PV ID

Create a Big Volume Group

Type or select values in entry fields.

Press Enter AFTER making all desired changes.

[Entry Fields]

Node Names indcppha01,indcppha02

Resource Group Name [dcp_rg] +

PVID 00f33172e0f7ffaa

VOLUME GROUP name [dcpvg01]

Physical partition SIZE in megabytes 4 +

Volume group MAJOR NUMBER [42] #

Enable Fast Disk Takeover or Concurrent Access Fast Disk Takeover +

Volume Group Type Big

CRITICAL volume group? no +

Enable LVM Encryption yes +

Enable PV Encryption no +

Auth Method +

Auth Method Name []

Warning:

Changing the volume group major number may result

in the command being unable to execute

successfully on a node that does not have the

major number currently available. Please check

for a commonly available major number on all nodes

before changing this setting.

F1=Help F2=Refresh F3=Cancel F4=List

Esc+5=Reset Esc+6=Command Esc+7=Edit Esc+8=Image

Esc+9=Shell Esc+0=Exit Enter=Do

Step 13: Add Resources to Resource Group

# /usr/es/sbin/cluster/utilities/clmgr modify resource_group '<RG_Name>' SERVICE_LABEL='<Service_IP_Label>'

Example:

# /usr/es/sbin/cluster/utilities/clmgr modify resource_group 'dcp_rg' SERVICE_LABEL='dcpapps'

Step 14: Synchronize and Start Cluster

Synchronize the cluster:

# /usr/es/sbin/cluster/utilities/clmgr sync cluster

# smitty hacmp --> Cluster Applications and Resources --> Verify and Synchronize Cluster Configuration

Start PowerHA and bring the cluster online:

# /usr/es/sbin/cluster/utilities/clmgr online cluster WHEN=now START_CAA=yes
# /usr/es/sbin/cluster/utilities/clmgr online rg dcp_rg

Output:

# clRGinfo

-----------------------------------------------------------------------------

Group Name State Node

-----------------------------------------------------------------------------

dcp_rg ONLINE indcppha01

OFFLINE indcppha02

Create FS:

# mklv -t jfs2 -y dcp_lv dcpvg01 128

# crfs -v jfs2 -d dcp_lv -m /dcpapps -A no -p rw -a logname=INLINE

# mkdir -p /dcpapps

# mount /dcpapps

# df -g | grep /dcpapps

/dev/dcp_lv 4.00 3.98 1% 4 1% /dcpapps

Failover the Node:

# /usr/es/sbin/cluster/utilities/clmgr move resource_group <RG_NAME> node=<Node_Name>
# /usr/es/sbin/cluster/utilities/clmgr move resource_group dcp_rg node=indcppha02

Attempting to move resource group dcp_rg to node indcppha02.

Waiting for the cluster to stabilize.............................

# clRGinfo

-----------------------------------------------------------------------------

Group Name State Node

-----------------------------------------------------------------------------

dcp_rg OFFLINE indcppha01

ACQUIRING indcppha02

# clRGinfo

-----------------------------------------------------------------------------

Group Name State Node

-----------------------------------------------------------------------------

dcp_rg OFFLINE indcppha01

ONLINE indcppha02

Example Output:

indcppha02 / # df -g

Filesystem GB blocks Free %Used Iused %Iused Mounted on

/dev/hd4 1.00 0.93 8% 3089 2% /

/dev/hd2 3.00 0.55 82% 52947 29% /usr

/dev/hd9var 1.00 0.92 8% 837 1% /var

/dev/hd3 1.00 1.00 1% 27 1% /tmp

/dev/hd1 1.00 1.00 1% 7 1% /home

/dev/hd11admin 0.12 0.12 1% 5 1% /admin

/proc - - - - - /proc

/dev/hd10opt 1.00 0.93 7% 375 1% /opt

/dev/livedump 0.25 0.25 1% 4 1% /var/adm/ras/livedump

/dev/fslv00 1.00 1.00 1% 5 1% /usr/local

/ahafs - - - 51 1% /aha

/dev/dcp_lv 4.00 3.98 1% 4 1% /dcpapps

indcppha02 / # ifconfig -a

en0: flags=e084863,14c0<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,LARGESEND,CHAIN>

inet 192.168.20.14 netmask 0xffffff00 broadcast 192.168.20.255

tcp_sendspace 262144 tcp_recvspace 262144 rfc1323 1

en1: flags=e084863,114c0<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,LARGESEND,CHAIN>

inet 192.168.10.18 netmask 0xffffff00 broadcast 192.168.10.255

inet 192.168.10.14 netmask 0xffffff00 broadcast 192.168.10.255

tcp_sendspace 262144 tcp_recvspace 262144 rfc1323 1

lo0: flags=e08084b,c0<UP,BROADCAST,LOOPBACK,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,LARGESEND,CHAIN>

inet 127.0.0.1 netmask 0xff000000 broadcast 127.255.255.255

inet6 ::1%1/64

tcp_sendspace 131072 tcp_recvspace 131072 rfc1323 1

indcppha02 / # cltopinfo

Cluster Name: dcp_cluster

Cluster Type: Standard

Heartbeat Type: Unicast

Repository Disk: hdisk4 (00659fa7e0f7be1f)

There are 2 node(s) and 1 network(s) defined

NODE indcppha01:

Network net_ether_02

dcpapps 192.168.10.18

indcppha01 192.168.10.13

NODE indcppha02:

Network net_ether_02

dcpapps 192.168.10.18

indcppha02 192.168.10.14

Resource Group dcp_rg

Startup Policy Online On Home Node Only

Fallover Policy Fallover To Next Priority Node In The List

Fallback Policy Fallback To Higher Priority Node In The List

Participating Nodes indcppha01 indcppha02

Service IP Label dcpapps

indcppha02 / # clshowres

Resource Group Name dcp_rg

Participating Node Name(s) indcppha01 indcppha02

Startup Policy Online On Home Node Only

Fallover Policy Fallover To Next Priority Node In The List

Fallback Policy Fallback To Higher Priority Node In The List

Site Relationship ignore

Dynamic Node Priority

Service IP Label dcpapps

Filesystems ALL

Filesystems Consistency Check fsck

Filesystems Recovery Method sequential

Filesystems/Directories to be exported (NFSv2/NFSv3)

Filesystems/Directories to be exported (NFSv4)

Filesystems to be NFS mounted

Network For NFS Mount

Filesystem/Directory for NFSv4 Stable Storage

Volume Groups dcpvg01

Concurrent Volume Groups

Use forced varyon for volume groups, if necessary false

Disks

Raw Disks

Disk Error Management?

GMVG Replicated Resources

GMD Replicated Resources

PPRC Replicated Resources

SVC PPRC Replicated Resources

SVC PBR Replicated Resources

EMC SRDF(R) Replicated Resources

TRUECOPY Replicated Resources

GENERIC XD Replicated Resources

Connections Services

Fast Connect Services

Shared Tape Resources

Application Servers

Highly Available Communication Links

Primary Workload Manager Class

Secondary Workload Manager Class

Delayed Fallback Timer

Miscellaneous Data

Automatically Import Volume Groups false

Inactive Takeover

SSA Disk Fencing false

Filesystems mounted before IP configured false

WPAR Name

Run Time Parameters:

Node Name indcppha01

Debug Level high

Format for hacmp.out Standard

Node Name indcppha02

Debug Level high

Format for hacmp.out Standard

indcppha02 / #

Unix System Administrator

Welcome to TechSysAdm, your go-to blog for practical insights, troubleshooting tips, and best practices in managing mission-critical enterprise systems. Here, we cover everything from AIX, RHEL, SUSE Linux, Solaris, VMware, and Windows servers to enterprise databases and DevOps environments, helping IT professionals optimize performance, ensure reliability, and solve complex system challenges.

adminCtrlX – Simplifying System Administration

Pages

PowerHA Cluster Installation & Configuration

No comments:

Post a Comment