Pages

RHEL Linux Distributed Replicated Block Device(DRBD) Cluster

Building Enterprise-Grade Block-Level Replication for High Availability on RHEL

DRBD (Distributed Replicated Block Device) is a kernel-level block replication technology that mirrors disk writes between Linux servers in near real time. When paired with Pacemaker and Corosync, DRBD enables stateful service failover with minimal data loss and predictable recovery behavior.

This guide goes far beyond “how to configure DRBD” and explains how it behaves under failure, how to tune it, and how to design it safely in production.

Where DRBD Fits in the HA Stack

Application (Apache / DB / App)
        │
Filesystem (ext4 / xfs / GFS2)
        │
DRBD (/dev/drbd0)
        │
Local Disk (/dev/sdb1)
        │
Network Replication
        │
Remote Disk (/dev/sdb1)

DRBD works below the filesystem, making it:
  • Application-agnostic
  • Fast and efficient
  • Extremely sensitive to split-brain mistakes
Kernel Module + Userspace Tools
  • Kernel module: Handles I/O interception and replication
  • drbdadm / drbdsetup: Configuration and control
  • Metadata: Tracks sync state, UUIDs, and history
Metadata Placement
  • Internal metadata (common): Stored at disk end
  • External metadata: Stored on separate device
Internal metadata reduces complexity but slightly reduces usable disk space.

Primary / Secondary Model
Role          Capability
Primary        Read + Write
Secondary        Read-only (or none)

Only one Primary is allowed in standard setups.

Dual-Primary Mode
Requires:
  • Clustered filesystem (GFS2 / OCFS2)
  • allow-two-primaries
Used for active/active designs
Much harder to operate safely

Never enable dual-primary with ext4 or XFS

Replication Protocols
ProtocolAcknowledgement PointData Safety
ALocal disk writeRisky
BNetwork sendSmall window
CRemote disk commitStrong

Why Protocol C Is Default
Guarantees zero data loss
Required for:
  • Databases
  • Financial systems
  • Filesystems without journaling guarantees
Latency cost is real — but predictable.

Network
Recommended
  • Dedicated replication NIC
  • 10Gbps+ for heavy I/O
  • Jumbo frames if supported
  • Separate from client traffic
Ports
TCP 7788–7790
MTU Mismatch = Silent Pain
Check:
# ip link show

DRBD Installation on RHEL
DRBD installation on RHEL (8/9) requires third-party repositories since it's not in base repos. Use ELRepo for stable kernel modules and utils on both nodes.

Prerequisites
Matching kernel versions across nodes: uname -r. Identical backing disks (e.g., /dev/sdb1). Disable Secure Boot or sign modules manually.

Add ELRepo Repo
On both nodes:
# dnf install -y https://www.elrepo.org/elrepo-release-9.el9.elrepo.noarch.rpm
# rpm --import https://www.elrepo.org/RPM-G-KEY-elrepo.org
For RHEL 8, use elrepo-release-8.el8.elrepo.noarch.rpm.

Install Packages
# dnf install -y drbd-utils kmod-drbd-9.0.91
drbd-utils: Userspace tools (drbdadm).
kmod-drbd: Kernel module for DRBD 9.x.
Verify: modinfo drbd shows version 9.0.x.

Secure Boot Handling RHEL enforces module signing:
Install build deps: dnf install kernel-devel-$(uname -r) kernel-headers gcc make bc openssl.
Generate key: /usr/src/kernels/$(uname -r)/scripts/sign-file.
Reboot, enroll MOK password in shim interface.
Build/sign DRBD module from source if ELRepo incompatible.

Load Module
# modprobe drbd
# lsmod | grep drbd
Auto-load: echo drbd > /etc/modules-load.d/drbd.conf.
Verification
# drbdadm --version  # 9.x confirmed
# cat /proc/drbd     # Empty until resource created

DRBD Configuration File /etc/drbd.d/r0.res
resource r0 {
  protocol C;

  on node1 {
    device     /dev/drbd0;
    disk       /dev/sdb1;
    address    10.0.0.1:7789;
    meta-disk  internal;
  }

  on node2 {
    device     /dev/drbd0;
    disk       /dev/sdb1;
    address    10.0.0.2:7789;
    meta-disk  internal;
  }

  net {
    cram-hmac-alg sha1;
    shared-secret "supersecret";
    allow-two-primaries no;
  }

  disk {
    on-io-error detach;
    fencing resource-only;
  }
}

Security
  • cram-hmac-alg prevents MITM attacks
  • Always use shared secrets on untrusted networks
Initialization Sequence
  • Create metadata
  • Bring resource up
  • Promote one node only
  • Create filesystem
  • Mount
# drbdadm create-md r0
# drbdadm up r0
# drbdadm primary --force r0

Creating the filesystem before replication is complete = instant split-brain.

Monitoring & State Interpretation
/proc/drbd
cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate

FieldMeaning
csConnection state
roRole
dsDisk state
Detailed Status
# drbdadm status
# drbdsetup status --verbose

Split-Brain:
Split-brain occurs when both nodes accept writes independently.

Common Causes
  • No fencing
  • Manual promotion
  • Network partition + poor policies
DRBD uses:
  • UUID history
  • Generation counters
  • Checksums
Recovery Policies (Configure Explicitly)
net {
  after-sb-0pri discard-zero-changes;
  after-sb-1pri discard-secondary;
  after-sb-2pri disconnect;
}

Never rely on defaults in production.

Resync Performance Tuning Key Parameters

net {
  max-buffers 36k;
  sndbuf-size 0;
  rcvbuf-size 0;
}

syncer {
  rate 800M;
  al-extents 6007;
}

Live Throttling
# drbdsetup syncer r0 --rate 500M

Filesystem Strategy
FS            Use Case
ext4         Simple active/passive
xfs            Large files, journaling
GFS2        Active/active
OCFS2        Legacy clustered

Never mount DRBD on two nodes without clustered FS.

Pacemaker Integration DRBD Master/Slave Resource
# pcs resource create drbd_r0 ocf:linbit:drbd \
  drbd_resource=r0 \
  op monitor interval=30s role=Master \
  op monitor interval=60s role=Slave

Promote/Demote Logic

Pacemaker:
  • Promotes DRBD
  • Mounts filesystem
  • Starts application
  • Assigns VIP
Reverse on failover.

Grouping and Constraints
# pcs resource group add storage fs vip
# pcs constraint colocation add storage with drbd_r0 INFINITY
# pcs constraint order promote drbd_r0 then start storage

Fencing Is Not Optional (Especially with DRBD)

Why?
DRBD assumes only one writer
Pacemaker enforces this via STONITH

Without fencing:
Pacemaker will refuse promotion
Or worse — data corruption

Performance Characteristics:
Workload                DRBD Impact
Small writes             Higher latency
Sequential writes      Near-native
Databases                Predictable
VM disks                Acceptable with tuning

Backup Strategy
DRBD ≠ Backup.
Best approach:
  • LVM snapshot on Primary
  • Mount snapshot
  • Backup snapshot
  • Release snapshot
Replication ensures availability, not recovery from deletion.

Failure Scenarios
ScenarioDRBD BehaviorPacemaker ActionRecovery Time
Primary crashIO pause → Secondary UpToDatePromote secondarySeconds
Network partitioncs:StandAloneFencing resolvesMinutes
Disk failDetach, resync from peerAuto-detachResync duration
Split-brainDisconnectManual: drbdadm -- --discard-my-data connect r0Manual

Force failover
# pcs resource move storage node2
Kill replication
# iptables -A INPUT -p tcp --dport 7789 -j DROP
Fence test
# pcs stonith fence node1

If you haven’t tested it — it’s not HA.

Best Practices Checklist
  • Protocol C for production
  • Dedicated replication NIC
  • STONITH always enabled
  • Never dual-primary without clustered FS
  • Explicit split-brain policies
  • Monitor /proc/drbd
  • Test failover quarterly
Final Thoughts
DRBD is extremely powerful — and extremely unforgiving.

Used correctly:
  • Near-zero downtime
  • Deterministic recovery
  • Simple architecture
Used incorrectly:
  • Silent corruption
  • Split-brain nightmares
  • Long recovery windows
Pair it with Pacemaker, fencing, discipline, and testing, and it becomes one of the most reliable HA building blocks on Linux.

No comments:

Post a Comment