Rebuilding an AIX LPAR in an Oracle RAC environment isn't just an OS reinstall—it's a coordinated dance across Clusterware, ASM, storage multipathing, volume groups, and HMC. One wrong move corrupts OCR, breaks ASM discovery, or leaves you with an unbootable node.
This guide delivers a safe, repeatable, production-grade process for Oracle RAC on Pure Storage FlashArray (FCP, MPIO). We cover everything from clean Clusterware shutdown to Grid Infrastructure validation on the new LPAR.
Goal: Minimize risk, preserve RAC integrity, and keep a rollback path.
Environment Overview
| Component | Details |
|---|---|
| OS | IBM AIX |
| Database | Oracle RAC |
| Storage | Pure Storage FlashArray (FCP, MPIO) |
| Disk Management | ASM |
| Backup | mksysb |
| High Availability | Mirrored Oracle VG + Alternate rootvg |
1: Cluster Shutdown & System Backup
Stop Oracle Clusterware cleanly and back up the system before any storage changes.
# ./crsctl stop crs
# /usr/local/bin/backup/mksysb.sh
Why it matters:
- Prevents OCR/Voting disk corruption
- Creates a known-good rollback point
- Essential before destructive storage ops
mksysb Backup Script
This script validates rootvg, checks space, creates/verifies the image, and enforces retention.
#!/bin/ksh
########################################################################
# Script Name: mksysb.sh
# Purpose: Create filesystem-based mksysb backup for AIX
# Author: adminCtrlX
# Location: /usr/local/bin/backup
########################################################################
BACKUP_DIR="/backup/mksysb"
LOG_DIR="/var/log/mksysb"
RETENTION_DAYS=7
DATE=$(date +%Y%m%d_%H%M)
HOSTNAME=$(hostname)
MKSYSB_FILE="${BACKUP_DIR}/${HOSTNAME}_rootvg_${DATE}.mksysb"
LOG_FILE="${LOG_DIR}/mksysb_${DATE}.log"
mkdir -p ${BACKUP_DIR} ${LOG_DIR}
exec > >(tee -a ${LOG_FILE}) 2>&1
lsvg rootvg || exit 1
REQUIRED_MB=8192
AVAILABLE_MB=$(df -m ${BACKUP_DIR} | tail -1 | awk '{print $3}')
if [ ${AVAILABLE_MB} -lt ${REQUIRED_MB} ]; then
echo "Insufficient space for mksysb"
exit 1
fi
/usr/bin/mksysb -i -m -X ${MKSYSB_FILE} || exit 1
/usr/bin/lsmksysb -l ${MKSYSB_FILE} || exit 1
find ${BACKUP_DIR} -name "*.mksysb" -mtime +${RETENTION_DAYS} -exec rm -f {} \;
2: Pure Storage FCP Driver Installation
Mount the NIM image and install the MPIO driver for proper pathing.
# mount ibm-nim-001:/images/aix /mnt
# cd /mnt/purestorage
# installp -acXgd /mnt/PureStorage devices.fcp.disk.pure.flasharray.mpio.rte
# umount /mnt
# shutdown -Fr
Why it matters: Generic AIX paths lead to inconsistent failover and poor performance.
3: Service Startup & Disk Discovery
Post-reboot:
# startsrc -s collectd
# startsrc -s xntpd
# df -gt
Capture disk inventory:
# for i in `lspv | awk '{print $1}'`; do
SIZE=`bootinfo -s $i`
SERIAL=`lscfg -vl $i | grep Z1 | awk '{print $2 " " $3}'`
echo $i $SIZE $SERIAL
done
Share WWPNs (e.g., from FCS0/FCS1) with storage team for LUN allocation.
4: Pure Storage MPIO Configuration
Set failover algorithm on Pure disks:
# for i in `lsdev -Cc disk | grep PURE | awk '{print $1}'`; do
chdev -l $i -a algorithm=fail_over
lsattr -Pl $i -a algorithm
sleep 2
done
5: ASM Disk Renaming & Ownership
Rename and chown disks per ASM standards (e.g., ASMDATA_Disk,
ASMOCR_Disk, ASMVOT_Disk):
# rendev -l <disk_name> -n <asm_disk_name>
# for i in `cat /tmp/asm-disk`; do
chown grid:asmadmin $i
lkdev -l $i -a
ls -lrt $i
done
DB team then discovers and resyncs in ASM.
6: Oracle VG Operations (Source LPAR)
Clone rootvg, mirror Oracle VG, and wait for STALE PPs: 0.
# alt_disk_copy -B -d <disk_name>
# extendvg oraclevg <disk_name>
# mirrorvg -S -m oraclevg <disk_name>
# lsvg oraclevg
7: Create Target LPAR (HMC)
In HMC GUI/CLI:
- Launch Create Partition wizard.
- Map Virtual Ethernet (VIOS network, VLAN if needed).
- Map Virtual FC (NPIV): Pair client/server adapters, auto-generate WWPNs.
- Activate to Pending state.
- Extract WWPNs: lshwres -r virtualio --rsubtype fc -m <sys> --filter "lpar_names=<lpar>".
Share WWPNs with storage for zoning.
8: Migration Window Operations
Split/export Oracle/Data VG for safe migration:
# cp -p /etc/filesystems /etc/filesystems_Backup_$(date +'%d_%m_%Y')
# splitvg -y <new_vg> -i <old_vg>
# varyoffvg <vg_name>
# exportvg <vg_name>
9: Boot Target LPAR
SMS boot > Set rootvg disk > Configure networking (mktcpip, /etc/hosts).
10: Storage Cleanup (Target LPAR)
# for i in `lsdev -Cc disk | grep Defined | awk '{print $1}'`; do
rmdev -Rdl $i
done
11: Oracle Filesystem Recovery
# importvg -y <vg_name> <disk_name>
# chlv -n <new_lv> <old_lv>
# chfs -m <new_fs> <old_fs>
# mount <fs_name>
12: Clusterware Startup & Validation
# cd /opt/oracle/grid/home/bin
# ./crsctl start crs
# ./crsctl check crs
# ./crsctl status crs
# bootlist -m normal <rootvg_disk>
Final Checklist:
# lspv
# df -gt
# ifconfig -a
# netstat -nr
# lssrc -s sshd
# lssrc -s rubrik_backup
Key Takeaways
- Pure Storage MPIO is mandatory.
- Standardize ASM disk names.
- Mirror Oracle VG for binaries.
- Use alternate rootvg for rollback/DR.
- Always clean shutdown/startup.
Pro Tip: Document serials in /tmp/asm-disk. Recovery becomes a checklist, not guesswork.
No comments:
Post a Comment