This script connects to a remote host, validates reachability (ping + SSH), and runs a sequence of system checks to capture critical configuration and status data into a single snapshot file.
Features:
- Works across AIX, VIOS, Linux, Solaris
- Safe remote execution using SSH heredoc (no temp files)
Captures:
- Host identity
- Screen saver / timeout config
- NTP / TZ info
- Default route and network interfaces
- HACMP resource group
- Paging space, PowerPath, MPIO
- Cluster services, IPSec status
- NFS/GPFS mounts
- errpt, lppchk, exportfs, showmount, etc.
Saves results locally under /tmp/pre_reboot_snapshots/<hostname>_pre_snap.txt
Purpose:
After reboot, this script collects a new snapshot and compares it with the pre-reboot one.
It highlights configuration changes, missing devices, or service failures.
Features:
- Automatically detects pre-snapshot file
- Collects post-reboot snapshot using same logic as pre-reboot
- Performs line-by-line diff
- Provides summary of differences
- Logs results to /tmp/post_reboot_verification
How It Works:
- Validates SSH connection to target.
- Executes same snapshot logic (reusing collect_pre_reboot_snapshot.ksh function set).
- Stores results in /tmp/post_reboot_verification/<hostname>_post_snap.txt.
- Runs diff between pre and post files.
- Marks [ OK ] if no differences, [ WARN ] if differences exist.
Sample Folder Layout
/usr/local/syscheck/
├── collect_pre_reboot_snapshot.ksh
├── verify_post_reboot.ksh
├── /tmp/pre_reboot_snapshots/
│ └── aixlpar01_pre_snap.txt
└── /tmp/post_reboot_verification/
├── aixlpar01_post_snap.txt
└── aixlpar01_diff_report.txt
Both scripts use ksh for maximum portability across enterprise Unix platforms.
Script 1 — collect_pre_reboot_snapshot.ksh
==========================================================================
#!/usr/bin/ksh
###############################################################################
# collect_pre_reboot_snapshot.ksh <hostname>
# Author: adminCtrlX
#
# Purpose:
# Collects a comprehensive pre-reboot system snapshot from an AIX or VIOS host.
# Runs remote commands over SSH and stores the output locally in a timestamped file.
#
# Features:
# - Network and SSH connectivity tests
# - Time sync, route, HACMP, paging, and storage validation
# - Collects system identity and configuration details before reboot
#
# Compatible with: AIX / VIOS servers
###############################################################################
EXIT_OK=0
EXIT_ERR=1
EXIT_WARN=2
SSH_USER="root"
if [[ $# -ne 1 ]]; then
echo "Usage: $0 <hostname>"
exit $EXIT_ERR
fi
HOST="$1"
DATESTAMP=$(date +%Y%m%d_%H%M%S)
SNAPDIR="/tmp/pre_reboot_snapshots"
OUTFILE="${SNAPDIR}/${HOST}_pre_snap_${DATESTAMP}.txt"
mkdir -p "$SNAPDIR" || {
echo "Failed to create snapshot directory: $SNAPDIR"
exit $EXIT_ERR
}
print_line() { printf '%s\n' "-------------------------------------------------------------"; }
print_status() {
case "$1" in
ok) printf '[ OK ]\n' ;;
warn) printf '[ WARN ]\n' ;;
fail) printf '[ FAIL ]\n' ;;
esac
}
section_header() {
print_line
printf '%s\n' "$1"
print_line
}
# --- Function: Ping detection (portable) ---
ping_host() {
if ping -c1 -W1 "$HOST" >/dev/null 2>&1; then
return 0
elif ping -c1 "$HOST" >/dev/null 2>&1; then
return 0
elif ping -n 1 "$HOST" >/dev/null 2>&1; then
return 0
else
return 1
fi
}
###############################################################################
# Connectivity Checks
###############################################################################
section_header "Testing ping to $HOST"
if ping_host; then
print_status ok
else
print_status fail
echo "Host $HOST is not responding to ping."
exit $EXIT_ERR
fi
section_header "Testing SSH to $HOST"
if ssh -o BatchMode=yes -o ConnectTimeout=10 -q "$SSH_USER@$HOST" "echo ok" >/dev/null 2>&1; then
print_status ok
else
print_status fail
echo "Unable to SSH to $SSH_USER@$HOST"
exit $EXIT_ERR
fi
###############################################################################
# Collect Snapshot
###############################################################################
section_header "Collecting pre-reboot snapshot from $HOST"
ssh -o BatchMode=yes -o ConnectTimeout=30 "$SSH_USER@$HOST" 'ksh -s' <<'REMOTE' > "$OUTFILE" 2>&1
printf '--- Host identity ---\n'
uname -a || echo "Unable to get uname output"
printf '\n--- dtsession saverTimeout/lockTimeout ---\n'
if [ -d /etc/dt/config ]; then
grep -H -E 'saverTimeout|lockTimeout' /etc/dt/config/* 2>/dev/null || echo "No saver/lockTimeout settings found"
else
echo "/etc/dt/config not present"
fi
printf '\n--- TZ / NTP check ---\n'
if command -v lssrc >/dev/null 2>&1; then
lssrc -s xntpd 2>/dev/null
fi
if command -v ntpq >/dev/null 2>&1; then
ntpq -p 2>/dev/null
fi
printf '\n--- Default route ---\n'
if command -v netstat >/dev/null 2>&1; then
netstat -rn 2>/dev/null | grep -E 'default|^0.0.0.0' || echo "No default route found"
else
ip route show 2>/dev/null | grep default || echo "No default route command available"
fi
printf '\n--- HACMP resource group ---\n'
clRGinfo 2>/dev/null || echo "clRGinfo not available"
printf '\n--- Paging space ---\n'
if command -v lsps >/dev/null 2>&1; then
lsps -s 2>/dev/null
else
swapon -s 2>/dev/null || free -h 2>/dev/null || echo "Paging/Swap info not available"
fi
printf '\n--- Network interfaces ---\n'
ifconfig -a 2>/dev/null || ip addr show 2>/dev/null || echo "ifconfig/ip not present"
printf '\n--- NFS/GPFS mounts ---\n'
mount | egrep 'nfs|gpfs' 2>/dev/null || echo "No NFS/GPFS mounts found"
printf '\n--- Error report (errpt) ---\n'
if command -v errpt >/dev/null 2>&1; then
errpt -a | head -n 20 2>/dev/null || echo "No errors in errpt"
else
echo "errpt not present"
fi
printf '\n--- Package validation (lppchk) ---\n'
if command -v lppchk >/dev/null 2>&1; then
lppchk -v 2>/dev/null || echo "No lppchk issues found"
else
echo "lppchk not present"
fi
printf '\n--- Disk multipathing ---\n'
lsdev -Cc disk 2>/dev/null || echo "lsdev not available or no disks listed"
printf '\n--- PowerPath status ---\n'
if [ -x /usr/sbin/powermt ]; then
/usr/sbin/powermt display dev=all 2>/dev/null || echo "powermt returned no data"
else
echo "PowerPath not present"
fi
printf '\n--- exportfs ---\n'
exportfs 2>/dev/null || echo "No exports or exportfs not available"
printf '\n--- showmount ---\n'
showmount -a 2>/dev/null || echo "No NFS clients or showmount not available"
printf '\n--- PowerPath reserve_policy ---\n'
if command -v lsattr >/dev/null 2>&1; then
lsattr -El powerpath0 -a reserve_policy 2>/dev/null || echo "powerpath0 not present"
else
echo "lsattr not available"
fi
printf '\n--- MPIO Other detection ---\n'
lsdev -Cc disk 2>/dev/null | grep 'Other' || echo "No MPIO 'Other' devices found"
printf '\n--- Cluster service status ---\n'
if command -v lssrc >/dev/null 2>&1; then
lssrc -ls clstrmgrES 2>/dev/null || echo "Cluster service not active"
else
echo "lssrc not available"
fi
printf '\n--- IPSec check ---\n'
if command -v lssrc >/dev/null 2>&1; then
lssrc -a 2>/dev/null | grep -i ipsec || echo "No IPSec service found"
else
echo "lssrc not present"
fi
REMOTE
SSH_EXIT=$?
if [ $SSH_EXIT -eq 0 ]; then
print_status ok
echo "Snapshot saved: $OUTFILE"
exit $EXIT_OK
else
print_status fail
echo "SSH or remote command failed (exit code: $SSH_EXIT). See $OUTFILE for details."
exit $EXIT_ERR
fi
==========================================================================
Example Console OutputCommand:
# ./collect_pre_reboot_snapshot.ksh aixlpar01
Example Output:
-------------------------------------------------------------
Testing ping to aixlpar01
-------------------------------------------------------------
[ OK ]
-------------------------------------------------------------
Testing SSH to aixlpar01
-------------------------------------------------------------
[ OK ]
-------------------------------------------------------------
Collecting pre-reboot snapshot from aixlpar01
-------------------------------------------------------------
[ OK ]
Snapshot saved: /tmp/pre_reboot_snapshots/aixlpar01_pre_snap_20251108_143212.txt
If the host is unreachable or SSH fails:
-------------------------------------------------------------
Testing SSH to aixlpar01
-------------------------------------------------------------
[ FAIL ]
Unable to SSH to root@aixlpar01
-------------------------------------------------------------
Testing SSH to aixlpar01
-------------------------------------------------------------
[ FAIL ]
Unable to SSH to root@aixlpar01
Example Snapshot File Output
File: /tmp/pre_reboot_snapshots/aixlpar01_pre_snap_20251108_143212.txt
--- Host identity ---
AIX aixlpar01 7 7100-05-02-1810 powerpc
--- dtsession saverTimeout/lockTimeout ---
/etc/dt/config/Xconfig: saverTimeout: 600
/etc/dt/config/Xconfig: lockTimeout: 900
--- TZ / NTP check ---
Subsystem Group PID Status
xntpd tcpip 12345 active
remote refid st t when poll reach delay offset jitter
==========================================================
*time1.ntp.ibm.co 192.168.1.1 2 u 256 1024 377 0.54 0.03 0.05
--- Default route ---
default 192.168.10.1 UG 0 36 en0 1500
--- HACMP resource group ---
Resource Group Name: rg_oracle01
State: Online
Node: aixlpar01
--- Paging space ---
Size %Used Physical Volume
2048MB 12% hd6
--- Network interfaces ---
en0: flags=4e080863<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST,GROUPRT>
inet 192.168.10.15 netmask 0xffffff00 broadcast 192.168.10.255
ether 0a:1b:2c:3d:4e:5f
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST>
inet 127.0.0.1 netmask 0xff000000
--- NFS/GPFS mounts ---
server01:/exports/data on /data type nfs (rw,soft,intr,proto=tcp)
--- Error report (errpt) ---
IDENTIFIER TIMESTAMP T C RESOURCE_NAME DESCRIPTION
BFE4C025 1108225315 P H hdisk1 DISK OPERATION ERROR
BFE4C025 1108224915 P H hdisk0 DISK OPERATION ERROR
--- Package validation (lppchk) ---
lppchk: The following filesets are ok:
bos.rte, bos.mp64, devices.pci.14103302.rte
--- Disk multipathing ---
hdisk0 Available 00-00-01 MPIO Other FC Disk
hdisk1 Available 00-00-02 MPIO Other FC Disk
--- PowerPath status ---
Pseudo name=hdiskpower0
Symmetrix ID=000192600218
Logical device ID=1A00
state=alive; policy=SymmOpt; priority=0; queued-IOs=0
--- exportfs ---
/exports/data
--- showmount ---
All mount points on aixlpar01:
server01:/exports/data
--- PowerPath reserve_policy ---
reserve_policy no_reserve
--- MPIO Other detection ---
None
--- Cluster service status ---
Subsystem Group PID Status
clstrmgrES cluster 43120 active
--- IPSec check ---
ipsecconf tcpip 5319 inoperative
Script 2 — verify_post_reboot.ksh
==========================================================================
#!/usr/bin/ksh
###############################################################################
# verify_post_reboot.ksh <hostname>
# Author: adminCtrlX
#
# Comprehensive system validation script for AIX & VIOS servers.
# Collects a post-reboot snapshot from the target and compares it against the
# pre-reboot snapshot saved under:
# /tmp/pre_reboot_snapshots/<hostname>_pre_snap.txt
###############################################################################
EXIT_OK=0
EXIT_ERR=1
EXIT_WARN=2
VERBOSE=1
TMPDIR="/tmp/verify_snap.$$"
SSH_USER="root"
SNAP_BASE="/tmp/pre_reboot_snapshots"
usage() {
echo "Usage: $0 <hostname>"
exit $EXIT_ERR
}
# -------------------- Argument validation --------------------
if [[ $# -ne 1 ]]; then
usage
fi
HOST="$1"
PRE_SNAP="${SNAP_BASE}/${HOST}_pre_snap.txt"
POST_SNAP="${SNAP_BASE}/${HOST}_post_snap.txt"
DIFF_OUT="${SNAP_BASE}/${HOST}_verify_diff.txt"
# -------------------- Directory setup --------------------
mkdir -p "$SNAP_BASE" || {
echo "Failed to create $SNAP_BASE"
exit $EXIT_ERR
}
mkdir -p "$TMPDIR" || {
echo "Failed to create $TMPDIR"
exit $EXIT_ERR
}
trap 'rm -rf "$TMPDIR"' EXIT INT TERM
# -------------------- Helper functions --------------------
print_line() { printf '%s\n' "-------------------------------------------------------------"; }
print_status() {
case "$1" in
ok) printf '[ OK ]\n' ;;
warn) printf '[ WARN ]\n' ;;
fail) printf '[ FAIL ]\n' ;;
esac
}
section_header() {
print_line
printf '%s\n' "$1"
print_line
}
# Portable ping helper (tries different syntax options)
ping_host() {
if ping -c 1 -W 1 "$HOST" >/dev/null 2>&1; then return 0
elif ping -c 1 "$HOST" >/dev/null 2>&1; then return 0
elif ping -n 1 "$HOST" >/dev/null 2>&1; then return 0
else return 1
fi
}
# -------------------- Connectivity tests --------------------
section_header "Testing ping to $HOST"
if ping_host; then
print_status ok
else
print_status fail
echo "Host $HOST not responding to ping."
exit $EXIT_ERR
fi
section_header "Testing SSH to $HOST"
# BatchMode avoids interactive password prompts; ConnectTimeout limits wait time
if ssh -o BatchMode=yes -o ConnectTimeout=10 -q "$SSH_USER@$HOST" "echo ok" >/dev/null 2>&1; then
print_status ok
else
print_status fail
echo "Unable to SSH to $SSH_USER@$HOST"
exit $EXIT_ERR
fi
# -------------------- Snapshot collection --------------------
section_header "Collecting post-reboot snapshot from $HOST"
# Run remote commands via SSH heredoc (quoted to avoid local variable expansion)
ssh -o BatchMode=yes -o ConnectTimeout=30 "$SSH_USER@$HOST" 'ksh -s' <<'REMOTE' >"$POST_SNAP" 2>&1
printf '--- HOST ---\n'
uname -a || true
printf '\n--- dtsession saverTimeout/lockTimeout ---\n'
if [ -d /etc/dt/config ]; then
grep -H -E 'saverTimeout|lockTimeout' /etc/dt/config/* 2>/dev/null || echo 'Not found'
else
echo '/etc/dt/config not present'
fi
printf '\n--- TZ / NTP ---\n'
if command -v lssrc >/dev/null 2>&1; then lssrc -s xntpd 2>/dev/null || true; fi
if command -v ntpq >/dev/null 2>&1; then ntpq -p 2>/dev/null || true; fi
printf '\n--- Default route ---\n'
if command -v netstat >/dev/null 2>&1; then
netstat -rn 2>/dev/null | grep -E "default|^0.0.0.0" || true
else
ip route show 2>/dev/null | grep default || true
fi
printf '\n--- HACMP resource group ---\n'
clRGinfo 2>/dev/null || echo 'clRGinfo not present'
printf '\n--- Paging space / Swap ---\n'
if command -v lsps >/dev/null 2>&1; then
lsps -s 2>/dev/null || true
else
swapon -s 2>/dev/null || free -h 2>/dev/null || echo 'Swap info unavailable'
fi
printf '\n--- Network interfaces ---\n'
ifconfig -a 2>/dev/null || ip addr show 2>/dev/null || echo 'No ifconfig/ip'
printf '\n--- NFS/GPFS mounts ---\n'
mount | egrep 'nfs|gpfs' 2>/dev/null || echo 'No NFS/GPFS mounts'
printf '\n--- Error report (errpt) ---\n'
if command -v errpt >/dev/null 2>&1; then
errpt -a | head -n 20 2>/dev/null || true
else
echo 'errpt not present'
fi
printf '\n--- Package validation (lppchk) ---\n'
if command -v lppchk >/dev/null 2>&1; then
lppchk -v 2>/dev/null || true
else
echo 'lppchk not present'
fi
printf '\n--- Disk multipathing ---\n'
lsdev -Cc disk 2>/dev/null || echo 'lsdev not present'
printf '\n--- PowerPath status ---\n'
if [ -x /usr/sbin/powermt ]; then
/usr/sbin/powermt display dev=all 2>/dev/null || true
else
echo 'powermt not present'
fi
printf '\n--- exportfs ---\n'
exportfs 2>/dev/null || echo 'exportfs not present'
printf '\n--- showmount ---\n'
showmount -a 2>/dev/null || echo 'showmount not present'
printf '\n--- PowerPath reserve_policy ---\n'
if command -v lsattr >/dev/null 2>&1; then
lsattr -El powerpath0 -a reserve_policy 2>/dev/null || echo 'powerpath0 not present'
fi
printf '\n--- MPIO Other detection ---\n'
lsdev -Cc disk 2>/dev/null | grep 'Other' || echo 'None'
printf '\n--- Cluster service status ---\n'
if command -v lssrc >/dev/null 2>&1; then
lssrc -ls clstrmgrES 2>/dev/null || echo 'clstrmgrES not present'
else
echo 'lssrc not present'
fi
printf '\n--- IPSec check ---\n'
if command -v lssrc >/dev/null 2>&1; then
lssrc -a 2>/dev/null | grep -i ipsec || echo 'No IPSec listed'
else
echo 'lssrc not present'
fi
REMOTE
SSH_EXIT=$?
if [ $SSH_EXIT -ne 0 ]; then
print_status fail
echo "Remote collection failed (SSH exit code $SSH_EXIT). See $POST_SNAP for details."
exit $EXIT_ERR
fi
print_status ok
# -------------------- Diff comparison --------------------
section_header "Comparing pre and post snapshots"
if [ ! -f "$PRE_SNAP" ]; then
echo "Pre-reboot snapshot not found: $PRE_SNAP"
print_status fail
exit $EXIT_ERR
fi
# Normalize transient lines before diff (remove timestamps, uptime, etc.)
grep -v -E '^(--- HOST ---|^Date:|^Uptime:|^uptime:|^Last login:|^login:)' "$PRE_SNAP" >"$TMPDIR/pre_norm"
grep -v -E '^(--- HOST ---|^Date:|^Uptime:|^uptime:|^Last login:|^login:)' "$POST_SNAP" >"$TMPDIR/post_norm"
diff -u "$TMPDIR/pre_norm" "$TMPDIR/post_norm" >"$DIFF_OUT" 2>/dev/null
DIFF_RC=$?
case $DIFF_RC in
0)
print_status ok
echo "No differences detected."
exit $EXIT_OK
;;
1)
echo "Differences detected (excerpt):"
awk '/^(\+|\-)/ && !/^\+\+\+|^---/' "$DIFF_OUT" | head -n 50
print_status warn
echo "Full diff saved: $DIFF_OUT"
exit $EXIT_WARN
;;
*)
print_status fail
echo "Diff failed (rc=$DIFF_RC). See $DIFF_OUT and $POST_SNAP for details."
exit $EXIT_ERR
;;
esac
==========================================================================
Example 1 — Successful run (No differences found)Command:
# ./verify_post_reboot.ksh aixlpar01
Console Output:
-------------------------------------------------------------
Testing ping to aixlpar01
-------------------------------------------------------------
[ OK ]
-------------------------------------------------------------
Testing SSH to aixlpar01
-------------------------------------------------------------
[ OK ]
-------------------------------------------------------------
Collecting post-reboot snapshot from aixlpar01
-------------------------------------------------------------
[ OK ]
-------------------------------------------------------------
Comparing pre and post snapshots
-------------------------------------------------------------
[ OK ]
No differences detected.
Files generated:
/tmp/pre_reboot_snapshots/aixlpar01_pre_snap.txt
/tmp/pre_reboot_snapshots/aixlpar01_post_snap.txt
/tmp/pre_reboot_snapshots/aixlpar01_verify_diff.txt
Contents of aixlpar01_verify_diff.txt:
# Empty file – no differences detected
Example 2 — Differences detected (e.g. IP or NFS change)
Command:
# ./verify_post_reboot.ksh aixlpar01
Console Output:
-------------------------------------------------------------
Testing ping to aixlpar01
-------------------------------------------------------------
[ OK ]
-------------------------------------------------------------
Testing SSH to aixlpar01
-------------------------------------------------------------
[ OK ]
-------------------------------------------------------------
Collecting post-reboot snapshot from aixlpar01
-------------------------------------------------------------
[ OK ]
-------------------------------------------------------------
Comparing pre and post snapshots
-------------------------------------------------------------
Differences detected (excerpt):
+--- Network interfaces ---
+en0: flags=4e080863<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST,GROUPRT>
+ inet 192.168.10.25 netmask 0xffffff00 broadcast 192.168.10.255
---- Network interfaces ---
-en0: flags=4e080863<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST,GROUPRT>
- inet 192.168.10.15 netmask 0xffffff00 broadcast 192.168.10.255
+--- NFS/GPFS mounts ---
+server02:/exports/data on /data type nfs (rw,soft,intr,proto=tcp)
---- NFS/GPFS mounts ---
-server01:/exports/data on /data type nfs (rw,soft,intr,proto=tcp)
[ WARN ]
Full diff saved: /tmp/pre_reboot_snapshots/aixlpar01_verify_diff.txt
Contents of aixlpar01_verify_diff.txt (first few lines):
--- /tmp/verify_snap.10231/pre_norm 2025-11-08 14:32:12.000000000 +0530
+++ /tmp/verify_snap.10231/post_norm 2025-11-08 14:32:56.000000000 +0530
@@ -145,7 +145,7 @@
--- Network interfaces ---
-en0: flags=4e080863<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST,GROUPRT>
- inet 192.168.10.15 netmask 0xffffff00 broadcast 192.168.10.255
+en0: flags=4e080863<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST,GROUPRT>
+ inet 192.168.10.25 netmask 0xffffff00 broadcast 192.168.10.255
@@ -198,7 +198,7 @@
--- NFS/GPFS mounts ---
-server01:/exports/data on /data type nfs (rw,soft,intr,proto=tcp)
+server02:/exports/data on /data type nfs (rw,soft,intr,proto=tcp)
Interpretation:
Network IP changed from .15 → .25
NFS mount source changed from server01 → server02
These are flagged as [ WARN ], but not fatal errors.
Example 3 — SSH or ping failure
Command:
$ ./verify_post_reboot.ksh aixlpar01
Console Output:
-------------------------------------------------------------
Testing ping to aixlpar01
-------------------------------------------------------------
[ FAIL ]
Host aixlpar01 not responding to ping.
or
-------------------------------------------------------------
Testing SSH to aixlpar01
-------------------------------------------------------------
[ FAIL ]
Unable to SSH to root@aixlpar01
In these cases, the script exits immediately with:
Exit code: 1 (EXIT_ERR)
No comments:
Post a Comment