Pages

RHEL Linux Filesystem Management

Filesystems define Linux storage at the block level—organizing inodes, directories, extents, and superblocks while handling permissions, timestamps, and crash recovery via journaling. Kernel integration via VFS (Virtual File System) layer ensures portability. This post dives deep into ext4, XFS, ext3—with inode limits, mount options, benchmarks, LVM workflows, and production troubleshooting.

Filesystem Comparison: Technical Specs
FilesystemJournalingMax FS SizeMax File SizeInode SizeBlock SizeMax InodesIOPS (4K Rand RW)
ext4Yes (3 modes)1 EiB16 TiB256 bytes1-64 KiB~4.29B~50k
XFSMetadata8 EiB8 EiB512-8KiB512B-64KiBUnlimited*~120k
ext3Yes32 TiB2 TiB256 bytes1-4 KiB2^32~20k
ext2No32 TiB2 TiB256 bytes1-4 KiB2^32~30k

EiB = Exbibyte (≈1.15×10¹⁸ bytes), TiB = Tebibyte (≈1.1×10¹² bytes)

Key Takeaway:
  • ext4: Reliable, versatile, backward-compatible, great default for most Linux workloads.
  • XFS: Designed for enterprise workloads with massive datasets, parallel I/O, and high-speed storage.
  • ext3/ext2: Legacy options, useful only for old hardware or minimal systems.
ext4: Technical Deep Dive
Inode Structure: 256-byte inodes store 15 direct blocks + indirect/extent tree (up to 16 TiB files).

Journaling Modes (mount -o data=):
ordered (default): Metadata journaled, data written before metadata commit.
writeback: Fastest, metadata only—data corruption possible.
journal: Slowest, full data+metadata—zero corruption risk.

Advanced Features:
mkfs.ext4 -O extent,uninit_bg,dir_index -E lazy_itable_init=0 /dev/sdb1
extent: B-tree extents (vs. indirect blocks).
dir_index: Hashed B-tree directories (scales to millions of files).
lazy_itable_init: Speeds mkfs by 10x.

Mount Tuning:
ext4 /data defaults,noatime,nodiratime,commit=30,quota 0 2
commit=30: Sync every 30s (default 5s).
quota: Enable user quotas.
Benchmark: Sequential write 10GB: ext4 ~1.2 GB/s (NVMe); fragmentation <5% after 1M files.

XFS: Enterprise Scalability
Inode Design: Variable 512B-8KiB; allocation groups (AGs) parallelize metadata ops across CPUs.
Journaling: Metadata-only (directories, inodes); data=writeback implied. Buffer I/O for sub-block writes.

Advanced mkfs:
mkfs.xfs -f -i size=512 -n size=8192 -l size=128m,sunit=64,swidth=128 /dev/sdb1
-i size=512: Inodes/directory (more for millions of small files).
-n size=8192: Directory size (large dirs).
sunit=64: Stripe unit (RAID/SSD alignment, 64=32KiB).

Mount Options:
xfs /data defaults,noatime,allocsize=1m,discard 0 2
allocsize=1m: Preallocate 1MB for large writes.
discard: TRIM for SSDs.
Reflink Performance: cp --reflink=always creates zero-copy duplicates (CoW); 100x faster than traditional copy.
Benchmark: 4K random write: XFS 120k IOPS vs ext4 50k (16-thread fio).

Legacy: ext3/ext2 Block Limits
ext3: 32-bit block pointers limit 16TB volumes (2TiB files). No extents = high fragmentation >10M files.
ext2: Identical limits, no journal = fsck after every crash.

Non-Destructive Upgrade:
# tune2fs -j /dev/sda1  # ext2 → ext3
# tune2fs -O extents,uninit_bg,dir_index /dev/sda1
# fsck.ext4 -D /dev/sda1  # ext3 → ext4

LVM + Filesystem

Physical → LVM → Filesystem:
# pvcreate /dev/sdb
# vgcreate data-vg /dev/sdb
# lvcreate -L 100G -n data-lv data-vg
# mkfs.xfs /dev/data-vg/data-lv
# lvresize -L +50G /dev/data-vg/data-lv  # Online resize
# xfs_growfs /mnt/data
Snapshot: lvcreate -s -n snap /dev/data-vg/data-lv; mount /dev/data-vg/snap /snap.

Inspection Commands (Detailed Output)
# lsblk -f -o NAME,FSTYPE,UUID,LABEL,MOUNTPOINT,SIZE,FSAVAIL,FSUSE%
# dumpe2fs -h /dev/sda1 | grep -E 'Filesystem|Inode|Block size'  # ext4 details
# xfs_info /mnt/data  # XFS geometry
# tune2fs -l /dev/sda1 | grep -i feature  # Features enabled

Sample dumpe2fs:
Filesystem volume name:   mydata
Block size:               4096
Inode size:               256
Filesystem created:       Tue Feb 10 14:20:00 2026

Creation with Alignment

SSD/RAID Alignment (critical for perf):
Check alignment
# blockdev --getss /dev/sdb1  # Should be 4096+ (4KiB)

Aligned mkfs
# parted -a optimal /dev/sdb mklabel gpt
# parted /dev/sdb mkpart primary 1MiB 100%
# mkfs.xfs -f -b size=4k /dev/sdb1

Resizing Filesystem:

ext4 Shrink (offline only):
umount /data
e2fsck -f /dev/sdb1
resize2fs /dev/sdb1 50G  # New size
lvreduce -L 50G /dev/data-vg/data-lv

XFS Online Grow (LVM):
lvextend -L +20G /dev/data-vg/data-lv
xfs_growfs /data  # Uses all space instantly

fsck and Repair: Error Codes

ext4 fsck Exit Codes:
0: No errors
1: Corrected minor
2: Offline resize needed
4: Online resize needed
8: Corruption fixed
16: Corruption remains
128: Shared lib error

XFS Repair Stages:
# xfs_repair -n /dev/sdb1  # Phase 1-9 dry run
# Phase 5: Rebuild AG freespaces
# Phase 9: Verify primary/secondary superblocks

Force Rebuild (data loss risk):
# xfs_repair -L /dev/sdb1  # Zero log
# mkfs.xfs -f /dev/sdb1    # Last resort

Common errors:

ErrorCauseFix
EXT4-fs error (device sda1): ext4_lookup:1303Corrupt dir inodefsck.ext4 -y
XFS: metadata I/O errorBad sectorxfs_repair -v + SMART check
No space left (but df shows space)Inode exhaustiondf -i; recreate with more inodes
Mount: unknown filesystem typeWrong typeblkid; specify -t xfs
Resize2fs: "Filesystem has 0 blocks"Unwritten superblockmke2fs -n to locate

Modern Alternatives:
  • Btrfs: Snapshots, RAID, compression
  • ZFS: Checksums, deduplication, snapshots
Migration Tips and Real-World Scenarios
  • ext4 → XFS migration: Backup, create new XFS filesystem, restore with rsync -aHAX.
  • VM storage: Use XFS or ext4 with noatime and tuned inode size.
  • Databases: XFS preferred for large sequential writes; ext4 can work with smaller datasets.
Conclusion
Filesystems evolve, but ext4 remains the default for general-purpose Linux, while XFS dominates large-scale, high-performance workloads. Legacy ext3/ext2 still serve niche roles. Understanding filesystem features, tuning, and maintenance is key to reliable, high-performance Linux storage.

Tip: Test changes in a VM or lab before production, especially for resizing or migration.

No comments:

Post a Comment