RHEL/CentOS 8 | iSCSI Shared Storage | Automated Oracle Failover
Building bulletproof Oracle HA doesn't need expensive hardware. Pacemaker + Corosync on Linux gives you production-grade failover for ~$0. This guide walks through a 2-node cluster with iSCSI, LVM, VIP, and full Oracle stack.
Your cluster:
indrxlora01 192.168.10.91 ← Node 1
indrxlora02 192.168.10.92 ← Node 2
oradb-vip 192.168.10.93 ← Oracle VIP (floats between nodes)
storage 192.168.10.20 ← iSCSI server
1. iSCSI Shared Storage
On Storage Server (192.168.10.20)
sudo dnf install -y targetcli lvm2
sudo vgcreate datavg /dev/sdb
sudo lvcreate -L 50G -n oradisk01 datavg
sudo lvcreate -L 50G -n oradisk02 datavg
# Expose via iSCSI
sudo targetcli
/backstores/block create ora_LUN01 /dev/datavg/oradisk01
/backstores/block create ora_LUN02 /dev/datavg/oradisk02
/iscsi create iqn.2026-02.ppc.com:oraservers
/iscsi/iqn.2026-02.ppc.com:oraservers/tpg1/luns create /backstores/block/ora_LUN01
/iscsi/iqn.2026-02.ppc.com:oraservers/tpg1/luns create /backstores/block/ora_LUN02
/iscsi/iqn.2026-02.ppc.com:oraservers/tpg1/acls create iqn.2026-02.ppc.com:indrxlora01
/iscsi/iqn.2026-02.ppc.com:oraservers/tpg1/acls create iqn.2026-02.ppc.com:indrxlora02
saveconfig
On Both Cluster Nodes
sudo dnf install -y iscsi-initiator-utils
echo "InitiatorName=iqn.2026-02.ppc.com:indrxlora01" | sudo tee /etc/iscsi/initiatorname.iscsi
sudo systemctl enable --now iscsid
sudo iscsiadm --mode discoverydb --type sendtargets --portal 192.168.10.20 --discover
sudo iscsiadm -m node --login
lsblk # See /dev/sdb appear!
Pro tip: Node1 uses indrxlora01 IQN, Node2 uses indrxlora02.
2. Pacemaker Cluster Setup
On BOTH nodes:
sudo dnf install -y pcs pacemaker corosync
sudo systemctl enable --now pcsd
echo "secure_password" | sudo passwd --stdin hacluster
On Node1 only:
# Authenticate nodes
sudo pcs cluster auth indrxlora01 indrxlora02 -u hacluster -p secure_password
# Create cluster
sudo pcs cluster setup --start --name oracleha indrxlora01 indrxlora02
# Enable & start
sudo pcs cluster enable --all
sudo pcs property set stonith-enabled=false # LAB ONLY
sudo pcs status
You should see: Cluster name: oracleha, 2 nodes online.
3. LVM for HA Storage
On BOTH nodes - Edit /etc/lvm/lvm.conf:
sudo sed -i "s/# activation {/activation {/" /etc/lvm/lvm.conf
sudo sed -i "s/# *auto_activation_volume_list/auto_activation_volume_list/" /etc/lvm/lvm.conf
echo 'auto_activation_volume_list = [ "@pacemaker" ]' | sudo tee -a /etc/lvm/lvm.conf
echo 'use_lvmetad = 1' | sudo tee -a /etc/lvm/lvm.conf
echo 'use_lvmlockd = 1' | sudo tee -a /etc/lvm/lvm.conf
# Filter ONLY shared storage
sudo sed -i 's/# *filter =.*$/filter = [ "a|\\/dev\\/sdb|", "r|.*|" ]/' /etc/lvm/lvm.conf
Create HA VG/LV (Node1):
sudo vgcreate vg01 /dev/sdb --addtag pacemaker
sudo lvcreate -L 15G -n lvol01 vg01 --addtag pacemaker
sudo mkfs.xfs /dev/vg01/lvol01
sudo mkdir /u01
sudo vgchange -an vg01 # Pacemaker activates this
Create LVM resource:
sudo pcs resource create vg01 LVM volgrpname=vg01 exclusive=true \
op monitor interval=30s timeout=60s
4. VIP + Filesystem
# Virtual IP (Oracle client connects here)
sudo pcs resource create oradb-vip IPaddr2 ip=192.168.10.93 cidr_netmask=24 nic=eth0 \
op monitor interval=10s
# Oracle filesystem
sudo pcs resource create u01-filesys Filesystem device="/dev/vg01/lvol01" directory="/u01" \
fstype="xfs" op monitor interval=20s
# Group them (failover together)
sudo pcs resource group add oracleha oradb-vip vg01 u01-filesys
5. Oracle Resources (Listener + Database)
Database prep (both nodes, as oracle user):
sqlplus / as sysdba
CREATE USER cocfmon IDENTIFIED BY "secure_password";
GRANT CONNECT, RESOURCE TO cocfmon;
GRANT SELECT ANY TRANSACTION TO cocfmon;
ALTER PLUGGABLE DATABASE ALL OPEN;
ALTER PLUGGABLE DATABASE ALL SAVE STATE;
Pacemaker Oracle resources:
# Listener
sudo pcs resource create listener_orcl oralsnr sid="orcl" listener="LISTENER_ORCL" \
--group oracleha
# Database (600s timeout = handles PDB startup)
sudo pcs resource create orcl oracle autostart=true sid="orcl" \
monuser="cocfmon" monpassword="secure_password" \
start_timeout=600 stop_timeout=300 \
op monitor interval=30s timeout=90s \
--group oracleha
6. Startup Order
# Colocation
sudo pcs constraint colocation add oracleha members=INFINITY
# Order (VIP first → storage → filesystem → listener → DB)
sudo pcs constraint order oradb-vip then vg01
sudo pcs constraint order vg01 then u01-filesys
sudo pcs constraint order u01-filesys then listener_orcl
sudo pcs constraint order listener_orcl then orcl
Flow: VIP → LVM → FS → Listener → Database
7. Test Failover
# Move to Node2
sudo pcs resource move oracleha indrxlora02
# Simulate Node1 failure
sudo pcs cluster standby indrxlora01
# Bring Node1 back
sudo pcs node unstandby indrxlora01
# Cleanup stuck resources
sudo pcs resource cleanup oracleha
Watch it work:
sudo pcs status
tail -f /var/log/pacemaker/pacemaker.log
VIP 192.168.10.93 should ping continuously during failover.
8. Quick Health Dashboard
# One-liner status
sudo pcs status | grep -E "(oradb|orcl|FAILED)"
# Test Oracle monitoring
su - oracle -c "sqlplus cocfmon/secure_password@192.168.10.93/orcl as sysdba 'select 1 from dual;'"
9. Troubleshooting
| Problem | Fix |
|---|---|
| VG won't activate | vgchange --addtag pacemaker vg01 |
| iSCSI fails | iscsiadm -m session -P 3 + check targetcli ACLs |
| DB monitor fails | Verify cocfmon user + password |
| 2-node quorum | sudo pcs property set no-quorum-policy=ignore |
| Resources stuck | sudo pcs resource cleanup oracleha |
10. Production Checklist
- STONITH enabled (fencing)
- Multipath iSCSI (not single path)
- qdevice for quorum
- Passwordless SSH between nodes
- Firewall: 3121/tcp, 5404-5412/udp, 3260/tcp
- /u01 in /etc/fstab excluded
- Weekly failover test scheduled
Final Validation
# Client connectivity test
sqlplus scott/tiger@192.168.10.93/orcl # Should always work
# Failover time
time sudo pcs cluster standby indrxlora01 && sleep 60 && sudo pcs node unstandby indrxlora01
No comments:
Post a Comment