Pages

Oracle 19c Pacemaker Cluster

RHEL/CentOS 8 | iSCSI Shared Storage | Automated Oracle Failover

Building bulletproof Oracle HA doesn't need expensive hardware. Pacemaker + Corosync on Linux gives you production-grade failover for ~$0. This guide walks through a 2-node cluster with iSCSI, LVM, VIP, and full Oracle stack.

Your cluster:
indrxlora01  192.168.10.91  ← Node 1
indrxlora02  192.168.10.92  ← Node 2  
oradb-vip    192.168.10.93  ← Oracle VIP (floats between nodes)
storage      192.168.10.20  ← iSCSI server

1. iSCSI Shared Storage
On Storage Server (192.168.10.20)
sudo dnf install -y targetcli lvm2
sudo vgcreate datavg /dev/sdb
sudo lvcreate -L 50G -n oradisk01 datavg
sudo lvcreate -L 50G -n oradisk02 datavg

# Expose via iSCSI
sudo targetcli
/backstores/block create ora_LUN01 /dev/datavg/oradisk01
/backstores/block create ora_LUN02 /dev/datavg/oradisk02
/iscsi create iqn.2026-02.ppc.com:oraservers
/iscsi/iqn.2026-02.ppc.com:oraservers/tpg1/luns create /backstores/block/ora_LUN01
/iscsi/iqn.2026-02.ppc.com:oraservers/tpg1/luns create /backstores/block/ora_LUN02
/iscsi/iqn.2026-02.ppc.com:oraservers/tpg1/acls create iqn.2026-02.ppc.com:indrxlora01
/iscsi/iqn.2026-02.ppc.com:oraservers/tpg1/acls create iqn.2026-02.ppc.com:indrxlora02
saveconfig

On Both Cluster Nodes
sudo dnf install -y iscsi-initiator-utils
echo "InitiatorName=iqn.2026-02.ppc.com:indrxlora01" | sudo tee /etc/iscsi/initiatorname.iscsi
sudo systemctl enable --now iscsid
sudo iscsiadm --mode discoverydb --type sendtargets --portal 192.168.10.20 --discover
sudo iscsiadm -m node --login
lsblk  # See /dev/sdb appear!

Pro tip: Node1 uses indrxlora01 IQN, Node2 uses indrxlora02.

2. Pacemaker Cluster Setup
On BOTH nodes:
sudo dnf install -y pcs pacemaker corosync
sudo systemctl enable --now pcsd
echo "secure_password" | sudo passwd --stdin hacluster

On Node1 only:
# Authenticate nodes
sudo pcs cluster auth indrxlora01 indrxlora02 -u hacluster -p secure_password

# Create cluster
sudo pcs cluster setup --start --name oracleha indrxlora01 indrxlora02

# Enable & start
sudo pcs cluster enable --all
sudo pcs property set stonith-enabled=false  # LAB ONLY
sudo pcs status

You should see: Cluster name: oracleha, 2 nodes online.

3. LVM for HA Storage
On BOTH nodes - Edit /etc/lvm/lvm.conf:
sudo sed -i "s/# activation {/activation {/" /etc/lvm/lvm.conf
sudo sed -i "s/# *auto_activation_volume_list/auto_activation_volume_list/" /etc/lvm/lvm.conf
echo 'auto_activation_volume_list = [ "@pacemaker" ]' | sudo tee -a /etc/lvm/lvm.conf
echo 'use_lvmetad = 1' | sudo tee -a /etc/lvm/lvm.conf  
echo 'use_lvmlockd = 1' | sudo tee -a /etc/lvm/lvm.conf

# Filter ONLY shared storage
sudo sed -i 's/# *filter =.*$/filter = [ "a|\\/dev\\/sdb|", "r|.*|" ]/' /etc/lvm/lvm.conf

Create HA VG/LV (Node1):
sudo vgcreate vg01 /dev/sdb --addtag pacemaker
sudo lvcreate -L 15G -n lvol01 vg01 --addtag pacemaker
sudo mkfs.xfs /dev/vg01/lvol01
sudo mkdir /u01
sudo vgchange -an vg01  # Pacemaker activates this

Create LVM resource:
sudo pcs resource create vg01 LVM volgrpname=vg01 exclusive=true \
  op monitor interval=30s timeout=60s

4. VIP + Filesystem
# Virtual IP (Oracle client connects here)
sudo pcs resource create oradb-vip IPaddr2 ip=192.168.10.93 cidr_netmask=24 nic=eth0 \
  op monitor interval=10s

# Oracle filesystem  
sudo pcs resource create u01-filesys Filesystem device="/dev/vg01/lvol01" directory="/u01" \
  fstype="xfs" op monitor interval=20s

# Group them (failover together)
sudo pcs resource group add oracleha oradb-vip vg01 u01-filesys

5. Oracle Resources (Listener + Database)
Database prep (both nodes, as oracle user):
sqlplus / as sysdba
CREATE USER cocfmon IDENTIFIED BY "secure_password";
GRANT CONNECT, RESOURCE TO cocfmon;
GRANT SELECT ANY TRANSACTION TO cocfmon;
ALTER PLUGGABLE DATABASE ALL OPEN;
ALTER PLUGGABLE DATABASE ALL SAVE STATE;

Pacemaker Oracle resources:
# Listener
sudo pcs resource create listener_orcl oralsnr sid="orcl" listener="LISTENER_ORCL" \
  --group oracleha

# Database (600s timeout = handles PDB startup)
sudo pcs resource create orcl oracle autostart=true sid="orcl" \
  monuser="cocfmon" monpassword="secure_password" \
  start_timeout=600 stop_timeout=300 \
  op monitor interval=30s timeout=90s \
  --group oracleha

6. Startup Order
# Colocation
sudo pcs constraint colocation add oracleha members=INFINITY

# Order (VIP first → storage → filesystem → listener → DB)
sudo pcs constraint order oradb-vip then vg01
sudo pcs constraint order vg01 then u01-filesys  
sudo pcs constraint order u01-filesys then listener_orcl
sudo pcs constraint order listener_orcl then orcl

Flow: VIP → LVM → FS → Listener → Database

7. Test Failover
# Move to Node2
sudo pcs resource move oracleha indrxlora02

# Simulate Node1 failure
sudo pcs cluster standby indrxlora01

# Bring Node1 back
sudo pcs node unstandby indrxlora01

# Cleanup stuck resources
sudo pcs resource cleanup oracleha

Watch it work:
sudo pcs status
tail -f /var/log/pacemaker/pacemaker.log

VIP 192.168.10.93 should ping continuously during failover.

8. Quick Health Dashboard
# One-liner status
sudo pcs status | grep -E "(oradb|orcl|FAILED)"

# Test Oracle monitoring
su - oracle -c "sqlplus cocfmon/secure_password@192.168.10.93/orcl as sysdba 'select 1 from dual;'"

9. Troubleshooting
ProblemFix
VG won't activatevgchange --addtag pacemaker vg01
iSCSI failsiscsiadm -m session -P 3 + check targetcli ACLs
DB monitor failsVerify cocfmon user + password
2-node quorumsudo pcs property set no-quorum-policy=ignore
Resources stucksudo pcs resource cleanup oracleha

10. Production Checklist
  • STONITH enabled (fencing)
  • Multipath iSCSI (not single path)
  • qdevice for quorum
  • Passwordless SSH between nodes
  • Firewall: 3121/tcp, 5404-5412/udp, 3260/tcp
  • /u01 in /etc/fstab excluded
  • Weekly failover test scheduled
Final Validation
# Client connectivity test
sqlplus scott/tiger@192.168.10.93/orcl  # Should always work

# Failover time
time sudo pcs cluster standby indrxlora01 && sleep 60 && sudo pcs node unstandby indrxlora01

No comments:

Post a Comment