Pages

Pre & Post Reboot Validation Automation for AIX Systems

This script connects to a remote host, validates reachability (ping + SSH), and runs a sequence of system checks to capture critical configuration and status data into a single snapshot file.

Features:
  • Works across AIX, VIOS, Linux, Solaris
  • Safe remote execution using SSH heredoc (no temp files)
Captures:
  • Host identity
  • Screen saver / timeout config
  • NTP / TZ info
  • Default route and network interfaces
  • HACMP resource group
  • Paging space, PowerPath, MPIO
  • Cluster services, IPSec status
  • NFS/GPFS mounts
  • errpt, lppchk, exportfs, showmount, etc.
Saves results locally under /tmp/pre_reboot_snapshots/<hostname>_pre_snap.txt

verify_post_reboot.ksh → compares post-reboot state against the baseline and highlights differences

Purpose:
After reboot, this script collects a new snapshot and compares it with the pre-reboot one.
It highlights configuration changes, missing devices, or service failures.

Features:
  • Automatically detects pre-snapshot file
  • Collects post-reboot snapshot using same logic as pre-reboot
  • Performs line-by-line diff
  • Provides summary of differences
  • Logs results to /tmp/post_reboot_verification
How It Works:
  • Validates SSH connection to target.
  • Executes same snapshot logic (reusing collect_pre_reboot_snapshot.ksh function set).
  • Stores results in /tmp/post_reboot_verification/<hostname>_post_snap.txt.
  • Runs diff between pre and post files.
  • Marks [ OK ] if no differences, [ WARN ] if differences exist.
Sample Folder Layout
/usr/local/syscheck/
├── collect_pre_reboot_snapshot.ksh
├── verify_post_reboot.ksh
├── /tmp/pre_reboot_snapshots/
│   └── aixlpar01_pre_snap.txt
└── /tmp/post_reboot_verification/
    ├── aixlpar01_post_snap.txt
    └── aixlpar01_diff_report.txt

Both scripts use ksh for maximum portability across enterprise Unix platforms.

Script 1 — collect_pre_reboot_snapshot.ksh
==========================================================================
#!/usr/bin/ksh
###############################################################################
# collect_pre_reboot_snapshot.ksh <hostname>
# Author: adminCtrlX
#
# Purpose:
#   Collects a comprehensive pre-reboot system snapshot from an AIX or VIOS host.
#   Runs remote commands over SSH and stores the output locally in a timestamped file.
#
# Features:
#   - Network and SSH connectivity tests
#   - Time sync, route, HACMP, paging, and storage validation
#   - Collects system identity and configuration details before reboot
#
# Compatible with: AIX / VIOS servers
###############################################################################

EXIT_OK=0
EXIT_ERR=1
EXIT_WARN=2
SSH_USER="root"

if [[ $# -ne 1 ]]; then
  echo "Usage: $0 <hostname>"
  exit $EXIT_ERR
fi

HOST="$1"
DATESTAMP=$(date +%Y%m%d_%H%M%S)
SNAPDIR="/tmp/pre_reboot_snapshots"
OUTFILE="${SNAPDIR}/${HOST}_pre_snap_${DATESTAMP}.txt"

mkdir -p "$SNAPDIR" || {
  echo "Failed to create snapshot directory: $SNAPDIR"
  exit $EXIT_ERR
}

print_line() { printf '%s\n' "-------------------------------------------------------------"; }

print_status() {
  case "$1" in
    ok)   printf '[ OK ]\n' ;;
    warn) printf '[ WARN ]\n' ;;
    fail) printf '[ FAIL ]\n' ;;
  esac
}

section_header() {
  print_line
  printf '%s\n' "$1"
  print_line
}

# --- Function: Ping detection (portable) ---
ping_host() {
  if ping -c1 -W1 "$HOST" >/dev/null 2>&1; then
    return 0
  elif ping -c1 "$HOST" >/dev/null 2>&1; then
    return 0
  elif ping -n 1 "$HOST" >/dev/null 2>&1; then
    return 0
  else
    return 1
  fi
}

###############################################################################
# Connectivity Checks
###############################################################################

section_header "Testing ping to $HOST"
if ping_host; then
  print_status ok
else
  print_status fail
  echo "Host $HOST is not responding to ping."
  exit $EXIT_ERR
fi

section_header "Testing SSH to $HOST"
if ssh -o BatchMode=yes -o ConnectTimeout=10 -q "$SSH_USER@$HOST" "echo ok" >/dev/null 2>&1; then
  print_status ok
else
  print_status fail
  echo "Unable to SSH to $SSH_USER@$HOST"
  exit $EXIT_ERR
fi

###############################################################################
# Collect Snapshot
###############################################################################

section_header "Collecting pre-reboot snapshot from $HOST"

ssh -o BatchMode=yes -o ConnectTimeout=30 "$SSH_USER@$HOST" 'ksh -s' <<'REMOTE' > "$OUTFILE" 2>&1

printf '--- Host identity ---\n'
uname -a || echo "Unable to get uname output"

printf '\n--- dtsession saverTimeout/lockTimeout ---\n'
if [ -d /etc/dt/config ]; then
  grep -H -E 'saverTimeout|lockTimeout' /etc/dt/config/* 2>/dev/null || echo "No saver/lockTimeout settings found"
else
  echo "/etc/dt/config not present"
fi

printf '\n--- TZ / NTP check ---\n'
if command -v lssrc >/dev/null 2>&1; then
  lssrc -s xntpd 2>/dev/null
fi
if command -v ntpq >/dev/null 2>&1; then
  ntpq -p 2>/dev/null
fi

printf '\n--- Default route ---\n'
if command -v netstat >/dev/null 2>&1; then
  netstat -rn 2>/dev/null | grep -E 'default|^0.0.0.0' || echo "No default route found"
else
  ip route show 2>/dev/null | grep default || echo "No default route command available"
fi

printf '\n--- HACMP resource group ---\n'
clRGinfo 2>/dev/null || echo "clRGinfo not available"

printf '\n--- Paging space ---\n'
if command -v lsps >/dev/null 2>&1; then
  lsps -s 2>/dev/null
else
  swapon -s 2>/dev/null || free -h 2>/dev/null || echo "Paging/Swap info not available"
fi

printf '\n--- Network interfaces ---\n'
ifconfig -a 2>/dev/null || ip addr show 2>/dev/null || echo "ifconfig/ip not present"

printf '\n--- NFS/GPFS mounts ---\n'
mount | egrep 'nfs|gpfs' 2>/dev/null || echo "No NFS/GPFS mounts found"

printf '\n--- Error report (errpt) ---\n'
if command -v errpt >/dev/null 2>&1; then
  errpt -a | head -n 20 2>/dev/null || echo "No errors in errpt"
else
  echo "errpt not present"
fi

printf '\n--- Package validation (lppchk) ---\n'
if command -v lppchk >/dev/null 2>&1; then
  lppchk -v 2>/dev/null || echo "No lppchk issues found"
else
  echo "lppchk not present"
fi

printf '\n--- Disk multipathing ---\n'
lsdev -Cc disk 2>/dev/null || echo "lsdev not available or no disks listed"

printf '\n--- PowerPath status ---\n'
if [ -x /usr/sbin/powermt ]; then
  /usr/sbin/powermt display dev=all 2>/dev/null || echo "powermt returned no data"
else
  echo "PowerPath not present"
fi

printf '\n--- exportfs ---\n'
exportfs 2>/dev/null || echo "No exports or exportfs not available"

printf '\n--- showmount ---\n'
showmount -a 2>/dev/null || echo "No NFS clients or showmount not available"

printf '\n--- PowerPath reserve_policy ---\n'
if command -v lsattr >/dev/null 2>&1; then
  lsattr -El powerpath0 -a reserve_policy 2>/dev/null || echo "powerpath0 not present"
else
  echo "lsattr not available"
fi

printf '\n--- MPIO Other detection ---\n'
lsdev -Cc disk 2>/dev/null | grep 'Other' || echo "No MPIO 'Other' devices found"

printf '\n--- Cluster service status ---\n'
if command -v lssrc >/dev/null 2>&1; then
  lssrc -ls clstrmgrES 2>/dev/null || echo "Cluster service not active"
else
  echo "lssrc not available"
fi

printf '\n--- IPSec check ---\n'
if command -v lssrc >/dev/null 2>&1; then
  lssrc -a 2>/dev/null | grep -i ipsec || echo "No IPSec service found"
else
  echo "lssrc not present"
fi

REMOTE

SSH_EXIT=$?
if [ $SSH_EXIT -eq 0 ]; then
  print_status ok
  echo "Snapshot saved: $OUTFILE"
  exit $EXIT_OK
else
  print_status fail
  echo "SSH or remote command failed (exit code: $SSH_EXIT). See $OUTFILE for details."
  exit $EXIT_ERR
fi

==========================================================================
Example Console Output
Command:
# ./collect_pre_reboot_snapshot.ksh aixlpar01
Example Output:
-------------------------------------------------------------
Testing ping to aixlpar01
-------------------------------------------------------------
[ OK ]
-------------------------------------------------------------
Testing SSH to aixlpar01
-------------------------------------------------------------
[ OK ]
-------------------------------------------------------------
Collecting pre-reboot snapshot from aixlpar01
-------------------------------------------------------------
[ OK ]
Snapshot saved: /tmp/pre_reboot_snapshots/aixlpar01_pre_snap_20251108_143212.txt

If the host is unreachable or SSH fails:
-------------------------------------------------------------
Testing SSH to aixlpar01
-------------------------------------------------------------
[ FAIL ]
Unable to SSH to root@aixlpar01

Example Snapshot File Output

File: /tmp/pre_reboot_snapshots/aixlpar01_pre_snap_20251108_143212.txt
--- Host identity ---
AIX aixlpar01 7 7100-05-02-1810 powerpc
--- dtsession saverTimeout/lockTimeout ---
/etc/dt/config/Xconfig: saverTimeout: 600
/etc/dt/config/Xconfig: lockTimeout: 900
--- TZ / NTP check ---
Subsystem Group PID Status
xntpd tcpip 12345 active
remote refid st t when poll reach delay offset jitter
==========================================================
*time1.ntp.ibm.co 192.168.1.1 2 u 256 1024 377 0.54 0.03 0.05
--- Default route ---
default 192.168.10.1 UG 0 36 en0 1500
--- HACMP resource group ---
Resource Group Name: rg_oracle01
State: Online
Node: aixlpar01
--- Paging space ---
Size %Used Physical Volume
2048MB 12% hd6
--- Network interfaces ---
en0: flags=4e080863<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST,GROUPRT>
inet 192.168.10.15 netmask 0xffffff00 broadcast 192.168.10.255
ether 0a:1b:2c:3d:4e:5f
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST>
inet 127.0.0.1 netmask 0xff000000
--- NFS/GPFS mounts ---
server01:/exports/data on /data type nfs (rw,soft,intr,proto=tcp)
--- Error report (errpt) ---
IDENTIFIER TIMESTAMP T C RESOURCE_NAME DESCRIPTION
BFE4C025 1108225315 P H hdisk1 DISK OPERATION ERROR
BFE4C025 1108224915 P H hdisk0 DISK OPERATION ERROR
--- Package validation (lppchk) ---
lppchk: The following filesets are ok:
bos.rte, bos.mp64, devices.pci.14103302.rte
--- Disk multipathing ---
hdisk0 Available 00-00-01 MPIO Other FC Disk
hdisk1 Available 00-00-02 MPIO Other FC Disk
--- PowerPath status ---
Pseudo name=hdiskpower0
Symmetrix ID=000192600218
Logical device ID=1A00
state=alive; policy=SymmOpt; priority=0; queued-IOs=0
--- exportfs ---
/exports/data
--- showmount ---
All mount points on aixlpar01:
server01:/exports/data
--- PowerPath reserve_policy ---
reserve_policy no_reserve
--- MPIO Other detection ---
None
--- Cluster service status ---
Subsystem Group PID Status
clstrmgrES cluster 43120 active
--- IPSec check ---
ipsecconf tcpip 5319 inoperative


Script 2 — verify_post_reboot.ksh
==========================================================================
#!/usr/bin/ksh
###############################################################################
# verify_post_reboot.ksh <hostname>
# Author: adminCtrlX
#
# Comprehensive system validation script for AIX & VIOS servers.
# Collects a post-reboot snapshot from the target and compares it against the
# pre-reboot snapshot saved under:
#   /tmp/pre_reboot_snapshots/<hostname>_pre_snap.txt
###############################################################################

EXIT_OK=0
EXIT_ERR=1
EXIT_WARN=2

VERBOSE=1
TMPDIR="/tmp/verify_snap.$$"
SSH_USER="root"
SNAP_BASE="/tmp/pre_reboot_snapshots"

usage() {
  echo "Usage: $0 <hostname>"
  exit $EXIT_ERR
}

# -------------------- Argument validation --------------------
if [[ $# -ne 1 ]]; then
  usage
fi

HOST="$1"
PRE_SNAP="${SNAP_BASE}/${HOST}_pre_snap.txt"
POST_SNAP="${SNAP_BASE}/${HOST}_post_snap.txt"
DIFF_OUT="${SNAP_BASE}/${HOST}_verify_diff.txt"

# -------------------- Directory setup --------------------
mkdir -p "$SNAP_BASE" || {
  echo "Failed to create $SNAP_BASE"
  exit $EXIT_ERR
}
mkdir -p "$TMPDIR" || {
  echo "Failed to create $TMPDIR"
  exit $EXIT_ERR
}

trap 'rm -rf "$TMPDIR"' EXIT INT TERM

# -------------------- Helper functions --------------------
print_line() { printf '%s\n' "-------------------------------------------------------------"; }

print_status() {
  case "$1" in
    ok)   printf '[ OK ]\n' ;;
    warn) printf '[ WARN ]\n' ;;
    fail) printf '[ FAIL ]\n' ;;
  esac
}

section_header() {
  print_line
  printf '%s\n' "$1"
  print_line
}

# Portable ping helper (tries different syntax options)
ping_host() {
  if ping -c 1 -W 1 "$HOST" >/dev/null 2>&1; then return 0
  elif ping -c 1 "$HOST" >/dev/null 2>&1; then return 0
  elif ping -n 1 "$HOST" >/dev/null 2>&1; then return 0
  else return 1
  fi
}

# -------------------- Connectivity tests --------------------
section_header "Testing ping to $HOST"
if ping_host; then
  print_status ok
else
  print_status fail
  echo "Host $HOST not responding to ping."
  exit $EXIT_ERR
fi

section_header "Testing SSH to $HOST"
# BatchMode avoids interactive password prompts; ConnectTimeout limits wait time
if ssh -o BatchMode=yes -o ConnectTimeout=10 -q "$SSH_USER@$HOST" "echo ok" >/dev/null 2>&1; then
  print_status ok
else
  print_status fail
  echo "Unable to SSH to $SSH_USER@$HOST"
  exit $EXIT_ERR
fi

# -------------------- Snapshot collection --------------------
section_header "Collecting post-reboot snapshot from $HOST"

# Run remote commands via SSH heredoc (quoted to avoid local variable expansion)
ssh -o BatchMode=yes -o ConnectTimeout=30 "$SSH_USER@$HOST" 'ksh -s' <<'REMOTE' >"$POST_SNAP" 2>&1
printf '--- HOST ---\n'
uname -a || true

printf '\n--- dtsession saverTimeout/lockTimeout ---\n'
if [ -d /etc/dt/config ]; then
  grep -H -E 'saverTimeout|lockTimeout' /etc/dt/config/* 2>/dev/null || echo 'Not found'
else
  echo '/etc/dt/config not present'
fi

printf '\n--- TZ / NTP ---\n'
if command -v lssrc >/dev/null 2>&1; then lssrc -s xntpd 2>/dev/null || true; fi
if command -v ntpq >/dev/null 2>&1; then ntpq -p 2>/dev/null || true; fi

printf '\n--- Default route ---\n'
if command -v netstat >/dev/null 2>&1; then
  netstat -rn 2>/dev/null | grep -E "default|^0.0.0.0" || true
else
  ip route show 2>/dev/null | grep default || true
fi

printf '\n--- HACMP resource group ---\n'
clRGinfo 2>/dev/null || echo 'clRGinfo not present'

printf '\n--- Paging space / Swap ---\n'
if command -v lsps >/dev/null 2>&1; then
  lsps -s 2>/dev/null || true
else
  swapon -s 2>/dev/null || free -h 2>/dev/null || echo 'Swap info unavailable'
fi

printf '\n--- Network interfaces ---\n'
ifconfig -a 2>/dev/null || ip addr show 2>/dev/null || echo 'No ifconfig/ip'

printf '\n--- NFS/GPFS mounts ---\n'
mount | egrep 'nfs|gpfs' 2>/dev/null || echo 'No NFS/GPFS mounts'

printf '\n--- Error report (errpt) ---\n'
if command -v errpt >/dev/null 2>&1; then
  errpt -a | head -n 20 2>/dev/null || true
else
  echo 'errpt not present'
fi

printf '\n--- Package validation (lppchk) ---\n'
if command -v lppchk >/dev/null 2>&1; then
  lppchk -v 2>/dev/null || true
else
  echo 'lppchk not present'
fi

printf '\n--- Disk multipathing ---\n'
lsdev -Cc disk 2>/dev/null || echo 'lsdev not present'

printf '\n--- PowerPath status ---\n'
if [ -x /usr/sbin/powermt ]; then
  /usr/sbin/powermt display dev=all 2>/dev/null || true
else
  echo 'powermt not present'
fi

printf '\n--- exportfs ---\n'
exportfs 2>/dev/null || echo 'exportfs not present'

printf '\n--- showmount ---\n'
showmount -a 2>/dev/null || echo 'showmount not present'

printf '\n--- PowerPath reserve_policy ---\n'
if command -v lsattr >/dev/null 2>&1; then
  lsattr -El powerpath0 -a reserve_policy 2>/dev/null || echo 'powerpath0 not present'
fi

printf '\n--- MPIO Other detection ---\n'
lsdev -Cc disk 2>/dev/null | grep 'Other' || echo 'None'

printf '\n--- Cluster service status ---\n'
if command -v lssrc >/dev/null 2>&1; then
  lssrc -ls clstrmgrES 2>/dev/null || echo 'clstrmgrES not present'
else
  echo 'lssrc not present'
fi

printf '\n--- IPSec check ---\n'
if command -v lssrc >/dev/null 2>&1; then
  lssrc -a 2>/dev/null | grep -i ipsec || echo 'No IPSec listed'
else
  echo 'lssrc not present'
fi

REMOTE

SSH_EXIT=$?
if [ $SSH_EXIT -ne 0 ]; then
  print_status fail
  echo "Remote collection failed (SSH exit code $SSH_EXIT). See $POST_SNAP for details."
  exit $EXIT_ERR
fi
print_status ok

# -------------------- Diff comparison --------------------
section_header "Comparing pre and post snapshots"

if [ ! -f "$PRE_SNAP" ]; then
  echo "Pre-reboot snapshot not found: $PRE_SNAP"
  print_status fail
  exit $EXIT_ERR
fi

# Normalize transient lines before diff (remove timestamps, uptime, etc.)
grep -v -E '^(--- HOST ---|^Date:|^Uptime:|^uptime:|^Last login:|^login:)' "$PRE_SNAP"  >"$TMPDIR/pre_norm"
grep -v -E '^(--- HOST ---|^Date:|^Uptime:|^uptime:|^Last login:|^login:)' "$POST_SNAP" >"$TMPDIR/post_norm"

diff -u "$TMPDIR/pre_norm" "$TMPDIR/post_norm" >"$DIFF_OUT" 2>/dev/null
DIFF_RC=$?

case $DIFF_RC in
  0)
    print_status ok
    echo "No differences detected."
    exit $EXIT_OK
    ;;
  1)
    echo "Differences detected (excerpt):"
    awk '/^(\+|\-)/ && !/^\+\+\+|^---/' "$DIFF_OUT" | head -n 50
    print_status warn
    echo "Full diff saved: $DIFF_OUT"
    exit $EXIT_WARN
    ;;
  *)
    print_status fail
    echo "Diff failed (rc=$DIFF_RC). See $DIFF_OUT and $POST_SNAP for details."
    exit $EXIT_ERR
    ;;
esac

==========================================================================
Example 1 — Successful run (No differences found)
Command:
# ./verify_post_reboot.ksh aixlpar01
Console Output:
-------------------------------------------------------------
Testing ping to aixlpar01
-------------------------------------------------------------
[ OK ]
-------------------------------------------------------------
Testing SSH to aixlpar01
-------------------------------------------------------------
[ OK ]
-------------------------------------------------------------
Collecting post-reboot snapshot from aixlpar01
-------------------------------------------------------------
[ OK ]
-------------------------------------------------------------
Comparing pre and post snapshots
-------------------------------------------------------------
[ OK ]
No differences detected.

Files generated:
/tmp/pre_reboot_snapshots/aixlpar01_pre_snap.txt
/tmp/pre_reboot_snapshots/aixlpar01_post_snap.txt
/tmp/pre_reboot_snapshots/aixlpar01_verify_diff.txt

Contents of aixlpar01_verify_diff.txt:

# Empty file – no differences detected

Example 2 — Differences detected (e.g. IP or NFS change)
Command:
# ./verify_post_reboot.ksh aixlpar01
Console Output:
-------------------------------------------------------------
Testing ping to aixlpar01
-------------------------------------------------------------
[ OK ]
-------------------------------------------------------------
Testing SSH to aixlpar01
-------------------------------------------------------------
[ OK ]
-------------------------------------------------------------
Collecting post-reboot snapshot from aixlpar01
-------------------------------------------------------------
[ OK ]
-------------------------------------------------------------
Comparing pre and post snapshots
-------------------------------------------------------------
Differences detected (excerpt):
+--- Network interfaces ---
+en0: flags=4e080863<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST,GROUPRT>
+ inet 192.168.10.25 netmask 0xffffff00 broadcast 192.168.10.255
---- Network interfaces ---
-en0: flags=4e080863<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST,GROUPRT>
- inet 192.168.10.15 netmask 0xffffff00 broadcast 192.168.10.255
+--- NFS/GPFS mounts ---
+server02:/exports/data on /data type nfs (rw,soft,intr,proto=tcp)
---- NFS/GPFS mounts ---
-server01:/exports/data on /data type nfs (rw,soft,intr,proto=tcp)
[ WARN ]
Full diff saved: /tmp/pre_reboot_snapshots/aixlpar01_verify_diff.txt
Contents of aixlpar01_verify_diff.txt (first few lines):
--- /tmp/verify_snap.10231/pre_norm 2025-11-08 14:32:12.000000000 +0530
+++ /tmp/verify_snap.10231/post_norm 2025-11-08 14:32:56.000000000 +0530
@@ -145,7 +145,7 @@
--- Network interfaces ---
-en0: flags=4e080863<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST,GROUPRT>
- inet 192.168.10.15 netmask 0xffffff00 broadcast 192.168.10.255
+en0: flags=4e080863<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST,GROUPRT>
+ inet 192.168.10.25 netmask 0xffffff00 broadcast 192.168.10.255
@@ -198,7 +198,7 @@
--- NFS/GPFS mounts ---
-server01:/exports/data on /data type nfs (rw,soft,intr,proto=tcp)
+server02:/exports/data on /data type nfs (rw,soft,intr,proto=tcp)
Interpretation:
Network IP changed from .15 → .25
NFS mount source changed from server01 → server02
These are flagged as [ WARN ], but not fatal errors.

Example 3 — SSH or ping failure
Command:
$ ./verify_post_reboot.ksh aixlpar01
Console Output:
-------------------------------------------------------------
Testing ping to aixlpar01
-------------------------------------------------------------
[ FAIL ]
Host aixlpar01 not responding to ping.
or
-------------------------------------------------------------
Testing SSH to aixlpar01
-------------------------------------------------------------
[ FAIL ]
Unable to SSH to root@aixlpar01

In these cases, the script exits immediately with:
Exit code: 1 (EXIT_ERR)

No comments:

Post a Comment