Building Enterprise-Grade Block-Level Replication for High Availability on RHEL
DRBD (Distributed Replicated Block Device) is a kernel-level block replication technology that mirrors disk writes between Linux servers in near real time. When paired with Pacemaker and Corosync, DRBD enables stateful service failover with minimal data loss and predictable recovery behavior.
This guide goes far beyond “how to configure DRBD” and explains how it behaves under failure, how to tune it, and how to design it safely in production.
Where DRBD Fits in the HA Stack
Application (Apache / DB / App)
│
Filesystem (ext4 / xfs / GFS2)
│
DRBD (/dev/drbd0)
│
Local Disk (/dev/sdb1)
│
Network Replication
│
Remote Disk (/dev/sdb1)
DRBD works below the filesystem, making it:
- Application-agnostic
- Fast and efficient
- Extremely sensitive to split-brain mistakes
Kernel Module + Userspace Tools
- Kernel module: Handles I/O interception and replication
- drbdadm / drbdsetup: Configuration and control
- Metadata: Tracks sync state, UUIDs, and history
Metadata Placement
- Internal metadata (common): Stored at disk end
- External metadata: Stored on separate device
Internal metadata reduces complexity but slightly reduces usable disk space.
Primary / Secondary Model
Role Capability
Primary Read + Write
Secondary Read-only (or none)
Only one Primary is allowed in standard setups.
Dual-Primary Mode
Requires:
- Clustered filesystem (GFS2 / OCFS2)
- allow-two-primaries
Used for active/active designs
Much harder to operate safely
Never enable dual-primary with ext4 or XFS
Replication Protocols
| Protocol | Acknowledgement Point | Data Safety |
|---|---|---|
| A | Local disk write | Risky |
| B | Network send | Small window |
| C | Remote disk commit | Strong |
Why Protocol C Is Default
Guarantees zero data loss
Required for:
- Databases
- Financial systems
- Filesystems without journaling guarantees
Latency cost is real — but predictable.
Network
Recommended
- Dedicated replication NIC
- 10Gbps+ for heavy I/O
- Jumbo frames if supported
- Separate from client traffic
Ports
TCP 7788–7790
MTU Mismatch = Silent Pain
Check:
# ip link show
DRBD Installation on RHEL
DRBD installation on RHEL (8/9) requires third-party repositories since it's not in base repos. Use ELRepo for stable kernel modules and utils on both nodes.
Prerequisites
Matching kernel versions across nodes: uname -r. Identical backing disks (e.g., /dev/sdb1). Disable Secure Boot or sign modules manually.
Add ELRepo Repo
On both nodes:
# dnf install -y https://www.elrepo.org/elrepo-release-9.el9.elrepo.noarch.rpm
# rpm --import https://www.elrepo.org/RPM-G-KEY-elrepo.org
For RHEL 8, use elrepo-release-8.el8.elrepo.noarch.rpm.
Install Packages
# dnf install -y drbd-utils kmod-drbd-9.0.91
drbd-utils: Userspace tools (drbdadm).
kmod-drbd: Kernel module for DRBD 9.x.
Verify: modinfo drbd shows version 9.0.x.
Secure Boot Handling RHEL enforces module signing:
Install build deps: dnf install kernel-devel-$(uname -r) kernel-headers gcc make bc openssl.
Generate key: /usr/src/kernels/$(uname -r)/scripts/sign-file.
Reboot, enroll MOK password in shim interface.
Build/sign DRBD module from source if ELRepo incompatible.
Load Module
# modprobe drbd
# lsmod | grep drbd
Auto-load: echo drbd > /etc/modules-load.d/drbd.conf.
Verification
# drbdadm --version # 9.x confirmed
# cat /proc/drbd # Empty until resource created
DRBD Configuration File /etc/drbd.d/r0.res
resource r0 {
protocol C;
on node1 {
device /dev/drbd0;
disk /dev/sdb1;
address 10.0.0.1:7789;
meta-disk internal;
}
on node2 {
device /dev/drbd0;
disk /dev/sdb1;
address 10.0.0.2:7789;
meta-disk internal;
}
net {
cram-hmac-alg sha1;
shared-secret "supersecret";
allow-two-primaries no;
}
disk {
on-io-error detach;
fencing resource-only;
}
}
Security
- cram-hmac-alg prevents MITM attacks
- Always use shared secrets on untrusted networks
Initialization Sequence
- Create metadata
- Bring resource up
- Promote one node only
- Create filesystem
- Mount
# drbdadm create-md r0
# drbdadm up r0
# drbdadm primary --force r0
Creating the filesystem before replication is complete = instant split-brain.
Monitoring & State Interpretation
/proc/drbd
cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate
| Field | Meaning |
|---|---|
| cs | Connection state |
| ro | Role |
| ds | Disk state |
Detailed Status
# drbdadm status
# drbdsetup status --verbose
Split-Brain:
Split-brain occurs when both nodes accept writes independently.
Common Causes
- No fencing
- Manual promotion
- Network partition + poor policies
DRBD uses:
- UUID history
- Generation counters
- Checksums
Recovery Policies (Configure Explicitly)
net {
after-sb-0pri discard-zero-changes;
after-sb-1pri discard-secondary;
after-sb-2pri disconnect;
}
Never rely on defaults in production.
Resync Performance Tuning Key Parameters
net {
max-buffers 36k;
sndbuf-size 0;
rcvbuf-size 0;
}
syncer {
rate 800M;
al-extents 6007;
}
Live Throttling
# drbdsetup syncer r0 --rate 500M
Filesystem Strategy
FS Use Case
ext4 Simple active/passive
xfs Large files, journaling
GFS2 Active/active
OCFS2 Legacy clustered
Never mount DRBD on two nodes without clustered FS.
Pacemaker Integration DRBD Master/Slave Resource
# pcs resource create drbd_r0 ocf:linbit:drbd \
drbd_resource=r0 \
op monitor interval=30s role=Master \
op monitor interval=60s role=Slave
Promote/Demote Logic
Pacemaker:
- Promotes DRBD
- Mounts filesystem
- Starts application
- Assigns VIP
Reverse on failover.
Grouping and Constraints
# pcs resource group add storage fs vip
# pcs constraint colocation add storage with drbd_r0 INFINITY
# pcs constraint order promote drbd_r0 then start storage
Fencing Is Not Optional (Especially with DRBD)
Why?
DRBD assumes only one writer
Pacemaker enforces this via STONITH
Without fencing:
Pacemaker will refuse promotion
Or worse — data corruption
Performance Characteristics:
Workload DRBD Impact
Small writes Higher latency
Sequential writes Near-native
Databases Predictable
VM disks Acceptable with tuning
Backup Strategy
DRBD ≠ Backup.
Best approach:
- LVM snapshot on Primary
- Mount snapshot
- Backup snapshot
- Release snapshot
Replication ensures availability, not recovery from deletion.
Failure Scenarios
| Scenario | DRBD Behavior | Pacemaker Action | Recovery Time |
|---|---|---|---|
| Primary crash | IO pause → Secondary UpToDate | Promote secondary | Seconds |
| Network partition | cs:StandAlone | Fencing resolves | Minutes |
| Disk fail | Detach, resync from peer | Auto-detach | Resync duration |
| Split-brain | Disconnect | Manual: drbdadm -- --discard-my-data connect r0 | Manual |
Force failover
# pcs resource move storage node2
Kill replication
# iptables -A INPUT -p tcp --dport 7789 -j DROP
Fence test
# pcs stonith fence node1
If you haven’t tested it — it’s not HA.
Best Practices Checklist
- Protocol C for production
- Dedicated replication NIC
- STONITH always enabled
- Never dual-primary without clustered FS
- Explicit split-brain policies
- Monitor /proc/drbd
- Test failover quarterly
Final Thoughts
DRBD is extremely powerful — and extremely unforgiving.
Used correctly:
- Near-zero downtime
- Deterministic recovery
- Simple architecture
Used incorrectly:
- Silent corruption
- Split-brain nightmares
- Long recovery windows
Pair it with Pacemaker, fencing, discipline, and testing, and it becomes one of the most reliable HA building blocks on Linux.
No comments:
Post a Comment