RHEL Linux Pacemaker Cluster

Building a Production-Grade Apache High Availability Cluster on RHEL

High availability on Linux is not about “keeping a service running” — it’s about coordinated failure handling, data integrity protection, deterministic recovery, and predictable behavior under stress.

On RHEL, Pacemaker + Corosync form Red Hat’s supported HA stack. This article goes beyond basic setup and explains how the cluster actually works, how to tune it, and how to avoid the classic mistakes that cause split-brain or endless failover loops.

Architecture Overview

┌─────────────┐

│ Corosync │ ← Cluster messaging, quorum, membership

│ (Totem) │

└──────┬──────┘

│

┌──────▼──────┐

│ Pacemaker │ ← Resource orchestration & decision engine

│ (CRMd) │

└──────┬──────┘

│

┌──────▼──────┐

│ Resource │ ← OCF / systemd agents

│ Agents │

└─────────────┘

Responsibility

Corosync

Reliable multicast messaging
Node membership
Quorum calculation
Split-brain prevention

Pacemaker

Resource placement decisions
Failure scoring
Recovery orchestration
Fencing enforcement

Pacemaker does nothing without Corosync. Corosync does not manage resources.

Key Pacemaker Internals

Designated Coordinator (DC)
Exactly one DC per cluster
Elected dynamically
Owns the authoritative CIB
All cluster decisions flow through the DC
DC changes are normal — frequent DC flapping is not.

Cluster Information Base (CIB)

XML-based configuration and runtime state

Stored in memory, synced across nodes

Sections:

Configuration (resources, constraints)

Status (failures, node state)

Options (timeouts, quorum policy)

View raw CIB:

# pcs cluster cib

Edit safely:

# pcs cluster cib /tmp/cib.xml

# pcs cluster cib-push /tmp/cib.xml

Local Resource Manager (LRMd)

Runs on every node

Executes resource agent actions:

start
stop
monitor
promote/demote (for master/slave)

LRMd reports results back to CRMd → DC.

Corosync:

Totem Protocol

UDP multicast or unicast
Token-based membership
Ordered, reliable delivery
Heartbeat + failure detection

Ports (default):

UDP 5404–5405

Corosync Configuration File

The /etc/corosync/corosync.conf file defines the cluster's communication layer for Pacemaker. It uses the Totem protocol for reliable multicast messaging across nodes, with redundancy via multiple rings.

Complete Sample Configuration

For a 2-node Apache HA cluster (apache-cluster), use this tuned configuration. Save it identically on all nodes before starting the cluster.

totem {

version: 2

cluster_name: apache-cluster

config_version: 2 # Increment on config changes

secauth: on # Enables authentication (recommended)

# token: 1000 # Heartbeat timeout (ms); default 1000, increase for slow networks

# consensus: 1200 # Quorum agreement timeout (ms); default 1200

# join: 60 # Time to wait for new nodes (s)

# max_messages: 20 # Messages per period

interface {

ringnumber: 0

bindnetaddr: 192.168.1.0 # Network subnet (e.g., your cluster net)

mcastaddr: 226.94.1.100 # Multicast address (unique per cluster)

mcastport: 5405

ttl: 1

}

logging {

fileline: off

to_stderr: no

to_logfile: yes

to_syslog: yes

logfile: /var/log/cluster/corosync.log

debug: off

timestamp: on

logger_subsys {

subsys: QUORUM

debug: off

}

nodelist {

node {

ring0_addr: node1.example.com # FQDN or IP of node1

nodeid: 1

quorum_votes: 1

}

node {

ring0_addr: node2.example.com # FQDN or IP of node2

nodeid: 2

quorum_votes: 1

}

quorum {

provider: corosync_votequorum

two_node: 1 # Enables 2-node operation (one vote each)

# wait_for_all: on # Wait for all nodes before quorum

}

amf {

mode: disabled # Application Management Framework (not needed for basic HA)

}

Key Parameters Explained

Section	Parameter	Purpose	Recommended Value
totem	secauth	Enables message signing/authentication	on (security)
totem	cluster_name	Unique cluster identifier	Matches pcs cluster name
totem.interface	bindnetaddr	Cluster network subnet	e.g., 192.168.1.0
totem.interface	mcastaddr	Multicast group (per-cluster unique)	226.94.1.100
nodelist.node	ring0_addr	Node's cluster IP/hostname	Resolvable via /etc/hosts
nodelist.node	nodeid	Unique numeric ID (1-N)	Sequential integers
nodelist.node	quorum_votes	Votes for quorum calculation	1 per node
quorum	two_node	Allows 2-node clusters	1 (critical for pairs)

Tuning note:

Lower token values = faster failover, but higher false-positive risk on noisy networks.

Quorum: The Non-Negotiable Rule

No quorum = no resources

Default behavior:

# pcs property get no-quorum-policy

Values:

stop (default)

freeze

ignore (only for 2-node with fencing or qdevice)

Why 2-Node Clusters Are Dangerous

50/50 split possible

Both nodes think the other is dead

Data corruption without fencing

Correct 2-Node Setup

Enable STONITH

Add qdevice (tie-breaker)

Fencing (STONITH): Why It’s Mandatory

If the cluster can’t power off a failed node, it cannot guarantee data integrity

Pacemaker will refuse to start resources without fencing in production-grade configs.

Fence Agent Categories

Power-based: IPMI, iDRAC, iLO
Hypervisor-based: fence_vmware, fence_rhevm
Network-based: fence_switch

Example (IPMI):

# pcs stonith create fence_node1 fence_ipmilan \

pcmk_host_list=node1 \

ipaddr=192.168.1.100 \

lanplus=1

Verify:

# pcs stonith show

# pcs stonith fence node1

Apache HA Design

Clients

[ Virtual IP ]

ApacheGroup

├── IPaddr2

└── httpd

Why Grouping Matters

Ensures start/stop order
Guarantees co-location
Simplifies constraints

Resource Agent Types

Type	Use Case
`ocf:heartbeat`	Portable, HA-aware
`systemd:`	Systemd units
`stonith:`	Fencing devices

Prefer OCF agents where possible — they expose richer monitoring.

Resource Creation (With Timeouts)

# pcs resource create VirtualIP ocf:heartbeat:IPaddr2 \

ip=192.168.1.50 cidr_netmask=24 \

op monitor interval=20s timeout=30s

# pcs resource create WebServer systemd:httpd \

op start timeout=90s \

op stop timeout=90s \

op monitor interval=30s timeout=60s

Failure Handling & Scoring

Pacemaker assigns failure scores:

Resource fails → node score drops

Score below threshold → resource moved

View scores:

# pcs resource failcount show WebServer

Clear after fixing issue:

# pcs resource cleanup WebServer

Constraints

Colocation

# pcs constraint colocation add WebServer with VirtualIP INFINITY

Ordering

# pcs constraint order VirtualIP then WebServer

Location Bias

# pcs constraint location WebServer prefers node1=100

Negative scoring:

# pcs constraint location WebServer avoids node3=INFINITY

Monitoring & Debugging

Live Cluster View

# crm_mon -Arf

Pacemaker Logs

# journalctl -u pacemaker -f

# journalctl -u corosync -f

Common Debug Commands

# pcs status --full

# pcs resource debug-start WebServer

# pcs cluster health

Failover Testing

Service Failure

# systemctl stop httpd

Node Eviction

# pcs node standby node1

Fence Test

# pcs stonith fence node1

Performance & Failover Tuning

Setting Effect

monitor interval Detection speed

token Corosync sensitivity

op timeout Avoid false failures

failure-timeout Auto recovery

Example tuning:

# pcs resource op defaults timeout=90s

# pcs property set failure-timeout=120s

Scaling the Architecture

Shared Content Options

NFS (simple, SPOF unless HA)

GFS2 (clustered FS)

DRBD (block replication)

Multi-Site

Anti-colocation rules

geo-clusters (advanced)

Application-level sync

Common Production Issues (And Root Causes)

CIB Sync Fail: pcs cluster sync.
Resource Stuck: pcs resource cleanup WebServer.
DC Election Loop: Check corosync.conf ring0_addr; ensure ntp sync.
SELinux Denials: ausearch -m avc -ts recent | audit2allow.
Metrics: pcs resource op defaults timeout=90s for slow starts.

Best Practices Checklist

Use 3+ nodes or qdevice
Always enable STONITH
Tune timeouts conservatively
Test fencing quarterly
Monitor failcounts
Document constraints
Never ignore quorum casually

Final Thoughts

Pacemaker and Corosync are not just HA tools — they’re distributed systems with strong opinions about safety.

If you:

Respect quorum
Configure fencing correctly
Tune failure detection
Test realistically

You’ll get predictable, fast, and safe failover.

Unix System Administrator

Welcome to TechSysAdm, your go-to blog for practical insights, troubleshooting tips, and best practices in managing mission-critical enterprise systems. Here, we cover everything from AIX, RHEL, SUSE Linux, Solaris, VMware, and Windows servers to enterprise databases and DevOps environments, helping IT professionals optimize performance, ensure reliability, and solve complex system challenges.

adminCtrlX – Simplifying System Administration

Pages

RHEL Linux Pacemaker Cluster

No comments:

Post a Comment