Filesystems define Linux storage at the block level—organizing inodes, directories, extents, and superblocks while handling permissions, timestamps, and crash recovery via journaling. Kernel integration via VFS (Virtual File System) layer ensures portability. This post dives deep into ext4, XFS, ext3—with inode limits, mount options, benchmarks, LVM workflows, and production troubleshooting.
Filesystem Comparison: Technical Specs
| Filesystem | Journaling | Max FS Size | Max File Size | Inode Size | Block Size | Max Inodes | IOPS (4K Rand RW) |
|---|---|---|---|---|---|---|---|
| ext4 | Yes (3 modes) | 1 EiB | 16 TiB | 256 bytes | 1-64 KiB | ~4.29B | ~50k |
| XFS | Metadata | 8 EiB | 8 EiB | 512-8KiB | 512B-64KiB | Unlimited* | ~120k |
| ext3 | Yes | 32 TiB | 2 TiB | 256 bytes | 1-4 KiB | 2^32 | ~20k |
| ext2 | No | 32 TiB | 2 TiB | 256 bytes | 1-4 KiB | 2^32 | ~30k |
EiB = Exbibyte (≈1.15×10¹⁸ bytes), TiB = Tebibyte (≈1.1×10¹² bytes)
Key Takeaway:
- ext4: Reliable, versatile, backward-compatible, great default for most Linux workloads.
- XFS: Designed for enterprise workloads with massive datasets, parallel I/O, and high-speed storage.
- ext3/ext2: Legacy options, useful only for old hardware or minimal systems.
ext4: Technical Deep Dive
Inode Structure: 256-byte inodes store 15 direct blocks + indirect/extent tree (up to 16 TiB files).
Journaling Modes (mount -o data=):
ordered (default): Metadata journaled, data written before metadata commit.
writeback: Fastest, metadata only—data corruption possible.
journal: Slowest, full data+metadata—zero corruption risk.
Advanced Features:
mkfs.ext4 -O extent,uninit_bg,dir_index -E lazy_itable_init=0 /dev/sdb1
extent: B-tree extents (vs. indirect blocks).
dir_index: Hashed B-tree directories (scales to millions of files).
lazy_itable_init: Speeds mkfs by 10x.
Mount Tuning:
ext4 /data defaults,noatime,nodiratime,commit=30,quota 0 2
commit=30: Sync every 30s (default 5s).
quota: Enable user quotas.
Benchmark: Sequential write 10GB: ext4 ~1.2 GB/s (NVMe); fragmentation <5% after 1M files.
XFS: Enterprise Scalability
Inode Design: Variable 512B-8KiB; allocation groups (AGs) parallelize metadata ops across CPUs.
Journaling: Metadata-only (directories, inodes); data=writeback implied. Buffer I/O for sub-block writes.
Advanced mkfs:
mkfs.xfs -f -i size=512 -n size=8192 -l size=128m,sunit=64,swidth=128 /dev/sdb1
-i size=512: Inodes/directory (more for millions of small files).
-n size=8192: Directory size (large dirs).
sunit=64: Stripe unit (RAID/SSD alignment, 64=32KiB).
Mount Options:
xfs /data defaults,noatime,allocsize=1m,discard 0 2
allocsize=1m: Preallocate 1MB for large writes.
discard: TRIM for SSDs.
Reflink Performance: cp --reflink=always creates zero-copy duplicates (CoW); 100x faster than traditional copy.
Benchmark: 4K random write: XFS 120k IOPS vs ext4 50k (16-thread fio).
Legacy: ext3/ext2 Block Limits
ext3: 32-bit block pointers limit 16TB volumes (2TiB files). No extents = high fragmentation >10M files.
ext2: Identical limits, no journal = fsck after every crash.
Non-Destructive Upgrade:
# tune2fs -j /dev/sda1 # ext2 → ext3
# tune2fs -O extents,uninit_bg,dir_index /dev/sda1
# fsck.ext4 -D /dev/sda1 # ext3 → ext4
LVM + Filesystem
Physical → LVM → Filesystem:
# pvcreate /dev/sdb
# vgcreate data-vg /dev/sdb
# lvcreate -L 100G -n data-lv data-vg
# mkfs.xfs /dev/data-vg/data-lv
# lvresize -L +50G /dev/data-vg/data-lv # Online resize
# xfs_growfs /mnt/data
Snapshot: lvcreate -s -n snap /dev/data-vg/data-lv; mount /dev/data-vg/snap /snap.
Inspection Commands (Detailed Output)
# lsblk -f -o NAME,FSTYPE,UUID,LABEL,MOUNTPOINT,SIZE,FSAVAIL,FSUSE%
# dumpe2fs -h /dev/sda1 | grep -E 'Filesystem|Inode|Block size' # ext4 details
# xfs_info /mnt/data # XFS geometry
# tune2fs -l /dev/sda1 | grep -i feature # Features enabled
Sample dumpe2fs:
Filesystem volume name: mydata
Block size: 4096
Inode size: 256
Filesystem created: Tue Feb 10 14:20:00 2026
Creation with Alignment
SSD/RAID Alignment (critical for perf):
Check alignment
# blockdev --getss /dev/sdb1 # Should be 4096+ (4KiB)
Aligned mkfs
# parted -a optimal /dev/sdb mklabel gpt
# parted /dev/sdb mkpart primary 1MiB 100%
# mkfs.xfs -f -b size=4k /dev/sdb1
Resizing Filesystem:
ext4 Shrink (offline only):
umount /data
e2fsck -f /dev/sdb1
resize2fs /dev/sdb1 50G # New size
lvreduce -L 50G /dev/data-vg/data-lv
XFS Online Grow (LVM):
lvextend -L +20G /dev/data-vg/data-lv
xfs_growfs /data # Uses all space instantly
fsck and Repair: Error Codes
ext4 fsck Exit Codes:
0: No errors
1: Corrected minor
2: Offline resize needed
4: Online resize needed
8: Corruption fixed
16: Corruption remains
128: Shared lib error
XFS Repair Stages:
# xfs_repair -n /dev/sdb1 # Phase 1-9 dry run
# Phase 5: Rebuild AG freespaces
# Phase 9: Verify primary/secondary superblocks
Force Rebuild (data loss risk):
# xfs_repair -L /dev/sdb1 # Zero log
# mkfs.xfs -f /dev/sdb1 # Last resort
Common errors:
| Error | Cause | Fix |
|---|---|---|
EXT4-fs error (device sda1): ext4_lookup:1303 | Corrupt dir inode | fsck.ext4 -y |
XFS: metadata I/O error | Bad sector | xfs_repair -v + SMART check |
No space left (but df shows space) | Inode exhaustion | df -i; recreate with more inodes |
Mount: unknown filesystem type | Wrong type | blkid; specify -t xfs |
| Resize2fs: "Filesystem has 0 blocks" | Unwritten superblock | mke2fs -n to locate |
Modern Alternatives:
- Btrfs: Snapshots, RAID, compression
- ZFS: Checksums, deduplication, snapshots
Migration Tips and Real-World Scenarios
- ext4 → XFS migration: Backup, create new XFS filesystem, restore with rsync -aHAX.
- VM storage: Use XFS or ext4 with noatime and tuned inode size.
- Databases: XFS preferred for large sequential writes; ext4 can work with smaller datasets.
Conclusion
Filesystems evolve, but ext4 remains the default for general-purpose Linux, while XFS dominates large-scale, high-performance workloads. Legacy ext3/ext2 still serve niche roles. Understanding filesystem features, tuning, and maintenance is key to reliable, high-performance Linux storage.
Tip: Test changes in a VM or lab before production, especially for resizing or migration.
No comments:
Post a Comment