This guide builds bulletproof Oracle infrastructure for banking, telecom, government—systems that can't go down. Zero data loss, automated everything, online patching ready.
Architecture Goals
- 99.99% uptime (26 minutes downtime/year max)
- Zero data loss (MAXIMUM PROTECTION Data Guard)
- <4hr backup windows
- Online patching & rolling upgrades
- Performance baselines with alerting
1. ASM: Your Storage Foundation
1.1 Redundancy Cheat Sheet
| Redundancy | Mirrors | When to Use |
|---|---|---|
| EXTERNAL | None | Storage arrays handle it (e.g., SAN) |
| NORMAL | 2x | Standard production (most common) |
| HIGH | 3x | Mission-critical, no single failure tolerated |
1.2 Create ASM Disks
Never use dd in production. Use parted:
# For each disk (/dev/sdb, sdc, etc.)
$ sudo parted /dev/sdb mklabel gpt
$ sudo parted /dev/sdb mkpart primary 1MiB 100%
# Create ASM disks
$ oracleasm createdisk DATA01 /dev/sdb1
$ oracleasm createdisk DATA02 /dev/sdc1
$ oracleasm createdisk FRA01 /dev/sdd1
$ oracleasm createdisk FRA02 /dev/sde1
$ oracleasm listdisks # Verify
1.3 Create Diskgroups
# DATA (8TB, NORMAL redundancy)
$ asmca -silent -createDiskGroup -diskGroupName DATA -disk '/dev/oracleasm/disks/DATA*' -redundancy NORMAL -au_size 4
# FRA (4TB backup/recovery)
$ asmca -silent -createDiskGroup -diskGroupName FRA -disk '/dev/oracleasm/disks/FRA*' -redundancy NORMAL
Pro tip: Separate DATA/FRA = no backup I/O contention.
1.4 Monitor Rebalance
-- Check progress
SELECT * FROM v$asm_operation;
-- Speed up (1-11, default=1)
ALTER DISKGROUP DATA REBALANCE POWER 8;
1.5 Failure Groups
CREATE DISKGROUP DATA NORMAL REDUNDANCY
FAILGROUP rack1 DISK '+DATA/DATA01'
FAILGROUP rack2 DISK '+DATA/DATA02';
2. RMAN: Enterprise Backup Mastery
2.1 Block Change Tracking (10x Faster Incrementals)
ALTER DATABASE ENABLE BLOCK CHANGE TRACKING USING FILE '+DATA/bct.dbf';
SELECT status FROM v$block_change_tracking; -- SHOULD_READ
2.2 RMAN Catalog (Mandatory for Enterprise)
Recovery Catalog DB (separate small DB):
rman catalog rman/secret@catdb
REGISTER DATABASE;
2.3 Incremental Forever Strategy
Weekly full → Daily incrementals → Merge image copy:
RUN {
RECOVER COPY OF DATABASE WITH TAG 'incr_update';
BACKUP INCREMENTAL LEVEL 1 FOR RECOVER OF COPY
WITH TAG 'incr_update' DATABASE;
}
Benefits: 30min backups, instant point-in-time recovery.
2.4 Validate Everything
RESTORE DATABASE VALIDATE;
BACKUP VALIDATE DATABASE ARCHIVELOG ALL;
2.5 Quarterly Restore Drill
RUN {
SET UNTIL TIME "SYSDATE-1";
RESTORE DATABASE;
RECOVER DATABASE;
}
Document your RTO/RPO results.
3. Data Guard Broker
3.1 Broker Setup
ALTER SYSTEM SET dg_broker_start=TRUE;
$ dgmgrl sys/secret
DGMGRL> CREATE CONFIGURATION 'DGCONFIG' AS
PRIMARY DATABASE IS 'racdb' CONNECT IDENTIFIER IS racdb;
DGMGRL> ADD DATABASE 'standbydb' AS
CONNECT IDENTIFIER IS standbydb MAINTAINED AS PHYSICAL;
DGMGRL> ENABLE CONFIGURATION;
DGMGRL> EDIT CONFIGURATION SET PROTECTION MODE AS MAXAVAILABILITY;
3.2 Switchover Test
DGMGRL> SWITCHOVER TO standbydb;
4. Flashback & Guaranteed Restore Points
Enable Flashback Database:
ALTER DATABASE FLASHBACK ON;
Before patching/upgrades:
CREATE RESTORE POINT before_patch GUARANTEE FLASHBACK DATABASE;
Rollback:
FLASHBACK DATABASE TO RESTORE POINT before_patch;
5. Performance: Enterprise Gold Standard
5.1 Memory (Disable AMM Forever)
ALTER SYSTEM SET memory_target=0 SCOPE=SPFILE;
ALTER SYSTEM SET sga_target=8G SCOPE=SPFILE;
ALTER SYSTEM SET pga_aggregate_target=2G SCOPE=SPFILE;
5.2 SGA Breakdown
SELECT component, ROUND(current_size/1024/1024,0) "MB"
FROM v$sga_dynamic_components;
5.3 AWR Baselines
BEGIN
DBMS_WORKLOAD_REPOSITORY.CREATE_BASELINE(
start_snap_id => 100, end_snap_id => 110,
baseline_name => 'MONTHLY_BASELINE');
END;
/
5.4 I/O Calibration
DECLARE
lat INTEGER; iops INTEGER; mbps INTEGER;
BEGIN
DBMS_RESOURCE_MANAGER.CALIBRATE_IO(
num_physical_disks => 8, max_latency => 10,
max_iops => iops, max_mbps => mbps, actual_latency => lat);
DBMS_OUTPUT.PUT_LINE('Max IOPS: ' || iops);
END;
/
6. RAC-Specific Checks
Global Cache Health:
SELECT inst_id, event, total_waits
FROM gv$system_event WHERE event LIKE 'gc%';
Interconnect:
SELECT * FROM gv$cluster_interconnects;
7. TDE Encryption
-- Create wallet in ASM
ADMINISTER KEY MANAGEMENT CREATE KEYSTORE '+DATA' IDENTIFIED BY secret;
-- Open & create master key
ADMINISTER KEY MANAGEMENT SET KEYSTORE OPEN IDENTIFIED BY secret;
ADMINISTER KEY MANAGEMENT SET KEY IDENTIFIED BY secret WITH BACKUP;
-- Encrypt tablespace online
ALTER TABLESPACE users ENCRYPTION ONLINE USING 'AES256' ENCRYPT;
8. Health Check Script
Save as /home/oracle/scripts/health_check.sh and cron it daily:
#!/bin/bash
# Enterprise Oracle Health Check | 19c/23c/RAC/ASM
# Covers: OS, DB, ASM, RMAN, Performance, Blocking Sessions
DATE=$(date +"%Y-%m-%d_%H-%M-%S")
LOG_DIR="/u02/health_logs"
LOG_FILE="$LOG_DIR/health_$DATE.log"
mkdir -p "$LOG_DIR"
exec > >(tee -a "$LOG_FILE")
exec 2>&1
GREEN='\033[0;32m' RED='\033[0;31m' YELLOW='\033[1;33m' NC='\033[0m'
print_header() {
echo "=================================================="
echo " ORACLE ENTERPRISE HEALTH CHECK - $DATE"
echo "=================================================="
}
check_critical() {
if [ $? -ne 0 ]; then
echo -e "${RED}✗ FAIL: $1${NC}"
CRITICAL=1
else
echo -e "${GREEN}✔ PASS: $1${NC}"
fi
}
print_header
# [1] OS Health
echo -e "\n${YELLOW}[1] OS STATUS${NC}"
uptime
free -h | head -n 3
df -h /u0 /u01 /u02 /u03 2>/dev/null | head -n 10
check_critical "Disk space OK"
# [2] Oracle 19c/23c Detection & Status
for env_file in /home/oracle/scripts/setEnv*.sh; do
if [ -f "$env_file" ]; then
echo -e "\n${YELLOW}[2] $(basename $env_file)${NC}"
source "$env_file"
sqlplus -s / as sysdba <<EOF
SET PAGESIZE 0 FEEDBACK OFF ECHO OFF
SELECT 'DB: '||name||' - '||open_mode||' - '||log_mode FROM v\$database;
SELECT 'Instance: '||instance_name||' - '||status FROM v\$instance;
SELECT 'PDBS: '||COUNT(*)||' pluggable DBs' FROM v\$pdbs;
SELECT 'Jobs: '||COUNT(*)||' running jobs' FROM dba_scheduler_running_jobs;
EXIT;
EOF
check_critical "Oracle DB responsive"
fi
done
# [3] RAC Cluster (if exists)
if command -v crsctl >/dev/null 2>&1; then
echo -e "\n${YELLOW}[3] RAC CLUSTER${NC}"
crsctl check crs
srvctl status database -d * 2>/dev/null | head -n 20
check_critical "Clusterware OK"
fi
# [4] ASM Status
if command -v asmcmd >/dev/null 2>&1; then
echo -e "\n${YELLOW}[4] ASM DISKGROUPS${NC}"
asmcmd lsdg --suppress | head -n 20
asmcmd lsof | grep -v "TYPE.*ASMP" | head -n 10
check_critical "ASM healthy"
fi
# [5] FRA & Archivelog
echo -e "\n${YELLOW}[5] RECOVERY AREA${NC}"
sqlplus -s / as sysdba <<EOF
SET PAGESIZE 0
COL used_gb FORMAT 999.0
COL total_gb FORMAT 999.0
SELECT 'FRA: '||name||' Used: '||ROUND(space_used/1024/1024/1024,1)||
'GB / Limit: '||ROUND(space_limit/1024/1024/1024,1)||'GB ('||
ROUND(space_used*100/space_limit,1)||'%)' used_gb
FROM v\$recovery_file_dest;
SELECT 'Archivelog: '||log_mode FROM v\$database;
SELECT 'Seq: '||thread#||'.'||sequence#||' Dest: '||dest_id||' Status: '||status
FROM v\$archived_log WHERE completion_time > SYSDATE-1 ORDER BY completion_time;
EXIT;
EOF
# [6] RMAN Backup Status (Last 7 days)
echo -e "\n${YELLOW}[6] RMAN BACKUPS${NC}"
rman target / <<EOF
SET ECHO OFF
LIST BACKUP SUMMARY WHERE completion_time > 'SYSDATE-7';
EXIT;
EOF
check_critical "RMAN accessible"
# [7] Blocking Sessions & Long Runners
echo -e "\n${YELLOW}[7] SESSIONS${NC}"
sqlplus -s / as sysdba <<EOF
SET PAGESIZE 0 LINESIZE 200
-- Blockers
SELECT 'BLOCKER: '||s1.sid||'->'||s2.sid||' SQL: '||
s2.sql_id||' USER: '||s2.username||' WAIT: '||s2.event
FROM v\$lock l1, v\$session s1, v\$lock l2, v\$session s2
WHERE s1.sid = l1.sid AND s2.sid = l2.sid
AND l1.BLOCK = 1 AND l2.request > 0
AND l1.id1 = l2.id1 AND l1.id2 = l2.id2;
-- Top CPU
SELECT 'CPU_HOG: '||sid||' '||username||' '||program||' CPU: '||cpu_time
FROM v\$session WHERE cpu_time > 3600 ORDER BY cpu_time DESC;
EXIT;
EOF
# [8] Performance Alerts
echo -e "\n${YELLOW}[8] PERFORMANCE${NC}"
sqlplus -s / as sysdba <<EOF
SET PAGESIZE 0
-- Top waits (last hour)
SELECT 'WAIT: '||event||' '||total_waits||' waits '||ROUND(time_waited/100,1)||'s'
FROM (SELECT event, total_waits, time_waited FROM v\$system_event
WHERE time_waited > 0 AND event NOT LIKE '%SQL*Net%'
ORDER BY time_waited DESC) WHERE ROWNUM <= 5;
-- Invalid objects
SELECT 'INVALID: '||COUNT(*)||' objects' FROM dba_objects WHERE status='INVALID';
-- Tablespace usage
SELECT 'TS: '||tablespace_name||' '||ROUND(used_percent,1)||'% full'
FROM dba_tablespace_usage_metrics WHERE used_percent > 80;
EXIT;
EOF
# [9] Listener & Network
echo -e "\n${YELLOW}[9] LISTENER${NC}"
lsnrctl status | head -n 30
echo -e "\n${GREEN}==================================================${NC}"
echo -e "${GREEN}✓ HEALTH CHECK COMPLETE | Log: $LOG_FILE${NC}"
echo -e "${GREEN}✓ Run time: $(date -d 'now' --date='5 seconds ago' +%H:%M:%S)${NC}"
[ "${CRITICAL:-0}" -eq 1 ] && echo -e "${RED} CRITICAL ISSUES DETECTED${NC}"
exit ${CRITICAL:-0}
Cron: 0 6 * * * /home/oracle/scripts/health_check.sh
9. Data Guard Automation Scripts
dg_setup.sh - Full Broker Config
#!/bin/bash
# Data Guard Broker: Primary → Standby Setup
source /home/oracle/.bash_profile
# On PRIMARY
sqlplus / as sysdba <<EOF
ALTER SYSTEM SET dg_broker_start=TRUE;
ALTER SYSTEM SET log_archive_dest_2='SERVICE=standbydb VALID_FOR=(ONLINE_LOGFILES,PRIMARY_ROLE) DB_UNIQUE_NAME=standbydb';
ALTER SYSTEM SET fal_server=standbydb;
ALTER SYSTEM SET standby_file_management=AUTO;
EOF
# Broker config (run on PRIMARY)
dgmgrl sys/secret@primary <<EOF
CREATE CONFIGURATION 'DGCONFIG' AS PRIMARY DATABASE IS 'primary' CONNECT IDENTIFIER IS primary;
ADD DATABASE 'standby' AS CONNECT IDENTIFIER IS standby MAINTAINED AS PHYSICAL;
EDIT DATABASE 'standby' SET STATE='APPLY-ON';
EDIT CONFIGURATION SET PROTECTION MODE AS MAXAVAILABILITY;
ENABLE CONFIGURATION;
SHOW CONFIGURATION;
EXIT;
EOF
dg_health.sh - Daily Guard Check
#!/bin/bash
dgmgrl -silent sys/secret <<EOF
SHOW CONFIGURATION VERBOSITY=1;
SHOW DATABASE VERBOSITY=1;
EXIT;
EOF | grep -E "(SUCCESS|OK|ERROR|WARNING)"
10. RAC-Specific Monitoring
rac_global_cache.sh - GC Performance
#!/bin/bash
sqlplus -s / as sysdba <<EOF
SET PAGESIZE 0 LINESIZE 200
COL inst FORMAT A10
COL event FORMAT A30
-- Global Cache Cr/Current stats (should be <5% of DB time)
SELECT inst_id inst, event, total_waits, time_waited
FROM gv\$system_event
WHERE event LIKE 'gc%'
ORDER BY inst_id, time_waited DESC;
-- GCS stats (high numbers = network bottleneck)
SELECT inst_id, SUM(value) blocks
FROM gv\$ges_statistics
WHERE name IN ('gc cr blocks received','gc current blocks received')
GROUP BY inst_id;
-- Interconnect
SELECT * FROM gv\$cluster_interconnects;
EXIT;
EOF
rac_balance_check.sh - Instance Load Balancing
#!/bin/bash
srvctl status database -d racdb -v
crsctl stat res -t | grep ora.racdb.db
sqlplus -s / as sysdba <<EOF
SELECT inst_id, instance_name, host_name, status, active_state FROM gv\$instance;
SELECT inst_id, COUNT(*) sessions FROM gv\$session GROUP BY inst_id;
EXIT;
EOF
11. Production Cron Schedule
# /etc/cron.d/oracle_monitoring
# Health checks
0 6 * * * oracle /home/oracle/scripts/master_control.sh
# RMAN Incremental Merge (2AM daily)
0 2 * * 1-5 oracle /home/oracle/scripts/rman_incr_merge.sh
# Data Guard health (hourly)
5 * * * * oracle /home/oracle/scripts/dg_health.sh >> /u02/logs/dg_health.log
# RAC cache monitoring (15min)
*/15 * * * * oracle /home/oracle/scripts/rac_global_cache.sh >> /u02/logs/rac_cache.log
# AWR export (midnight)
0 0 * * 1 oracle @$ORACLE_HOME/rdbms/admin/awrrpt.sql
12 Quick Deploy Commands
# Deploy all scripts
$ mkdir -p /home/oracle/scripts /u02/{health_logs,logs}
$ chown -R oracle:oinstall /home/oracle/scripts /u02
$ chmod +x /home/oracle/scripts/*.sh
# Test run
$ su - oracle -c "/home/oracle/scripts/master_control.sh"
# Add to crontab
$ su - oracle -c "crontab /home/oracle/scripts/oracle_cron"
No comments:
Post a Comment