Pages

Oracle 19c/23c Enterprise Ops: ASM, RMAN, Data Guard & RAC Monitoring

This guide builds bulletproof Oracle infrastructure for banking, telecom, government—systems that can't go down. Zero data loss, automated everything, online patching ready.

Architecture Goals
  • 99.99% uptime (26 minutes downtime/year max)
  • Zero data loss (MAXIMUM PROTECTION Data Guard)
  • <4hr backup windows
  • Online patching & rolling upgrades
  • Performance baselines with alerting
1. ASM: Your Storage Foundation
1.1 Redundancy Cheat Sheet
RedundancyMirrorsWhen to Use
EXTERNALNoneStorage arrays handle it (e.g., SAN)
NORMAL2xStandard production (most common)
HIGH3xMission-critical, no single failure tolerated

1.2 Create ASM Disks
Never use dd in production. Use parted:
# For each disk (/dev/sdb, sdc, etc.)
$ sudo parted /dev/sdb mklabel gpt
$ sudo parted /dev/sdb mkpart primary 1MiB 100%

# Create ASM disks
$ oracleasm createdisk DATA01 /dev/sdb1
$ oracleasm createdisk DATA02 /dev/sdc1
$ oracleasm createdisk FRA01 /dev/sdd1
$ oracleasm createdisk FRA02 /dev/sde1
$ oracleasm listdisks  # Verify

1.3 Create Diskgroups
# DATA (8TB, NORMAL redundancy)
$ asmca -silent -createDiskGroup -diskGroupName DATA -disk '/dev/oracleasm/disks/DATA*' -redundancy NORMAL -au_size 4

# FRA (4TB backup/recovery)
$ asmca -silent -createDiskGroup -diskGroupName FRA -disk '/dev/oracleasm/disks/FRA*' -redundancy NORMAL

Pro tip: Separate DATA/FRA = no backup I/O contention.

1.4 Monitor Rebalance
-- Check progress
SELECT * FROM v$asm_operation;

-- Speed up (1-11, default=1)
ALTER DISKGROUP DATA REBALANCE POWER 8;

1.5 Failure Groups
CREATE DISKGROUP DATA NORMAL REDUNDANCY
FAILGROUP rack1 DISK '+DATA/DATA01'
FAILGROUP rack2 DISK '+DATA/DATA02';

2. RMAN: Enterprise Backup Mastery
2.1 Block Change Tracking (10x Faster Incrementals)
ALTER DATABASE ENABLE BLOCK CHANGE TRACKING USING FILE '+DATA/bct.dbf';
SELECT status FROM v$block_change_tracking;  -- SHOULD_READ
2.2 RMAN Catalog (Mandatory for Enterprise)

Recovery Catalog DB (separate small DB):
rman catalog rman/secret@catdb
REGISTER DATABASE;

2.3 Incremental Forever Strategy
Weekly full → Daily incrementals → Merge image copy:
RUN {
  RECOVER COPY OF DATABASE WITH TAG 'incr_update';
  BACKUP INCREMENTAL LEVEL 1 FOR RECOVER OF COPY 
    WITH TAG 'incr_update' DATABASE;
}

Benefits: 30min backups, instant point-in-time recovery.

2.4 Validate Everything
RESTORE DATABASE VALIDATE;
BACKUP VALIDATE DATABASE ARCHIVELOG ALL;

2.5 Quarterly Restore Drill
RUN {
  SET UNTIL TIME "SYSDATE-1";
  RESTORE DATABASE;
  RECOVER DATABASE;
}

Document your RTO/RPO results.

3. Data Guard Broker
3.1 Broker Setup
ALTER SYSTEM SET dg_broker_start=TRUE;
$ dgmgrl sys/secret
DGMGRL> CREATE CONFIGURATION 'DGCONFIG' AS 
         PRIMARY DATABASE IS 'racdb' CONNECT IDENTIFIER IS racdb;
DGMGRL> ADD DATABASE 'standbydb' AS 
         CONNECT IDENTIFIER IS standbydb MAINTAINED AS PHYSICAL;
DGMGRL> ENABLE CONFIGURATION;
DGMGRL> EDIT CONFIGURATION SET PROTECTION MODE AS MAXAVAILABILITY;

3.2 Switchover Test
DGMGRL> SWITCHOVER TO standbydb;

4. Flashback & Guaranteed Restore Points
Enable Flashback Database:
ALTER DATABASE FLASHBACK ON;
Before patching/upgrades:
CREATE RESTORE POINT before_patch GUARANTEE FLASHBACK DATABASE;
Rollback:
FLASHBACK DATABASE TO RESTORE POINT before_patch;

5. Performance: Enterprise Gold Standard
5.1 Memory (Disable AMM Forever)
ALTER SYSTEM SET memory_target=0 SCOPE=SPFILE;
ALTER SYSTEM SET sga_target=8G SCOPE=SPFILE;
ALTER SYSTEM SET pga_aggregate_target=2G SCOPE=SPFILE;

5.2 SGA Breakdown
SELECT component, ROUND(current_size/1024/1024,0) "MB"
FROM v$sga_dynamic_components;

5.3 AWR Baselines
BEGIN
  DBMS_WORKLOAD_REPOSITORY.CREATE_BASELINE(
    start_snap_id => 100, end_snap_id => 110,
    baseline_name => 'MONTHLY_BASELINE');
END;
/

5.4 I/O Calibration
DECLARE
  lat INTEGER; iops INTEGER; mbps INTEGER;
BEGIN
  DBMS_RESOURCE_MANAGER.CALIBRATE_IO(
    num_physical_disks => 8, max_latency => 10,
    max_iops => iops, max_mbps => mbps, actual_latency => lat);
  DBMS_OUTPUT.PUT_LINE('Max IOPS: ' || iops);
END;
/

6. RAC-Specific Checks
Global Cache Health:
SELECT inst_id, event, total_waits
FROM gv$system_event WHERE event LIKE 'gc%';
Interconnect:
SELECT * FROM gv$cluster_interconnects;

7. TDE Encryption
-- Create wallet in ASM
ADMINISTER KEY MANAGEMENT CREATE KEYSTORE '+DATA' IDENTIFIED BY secret;
-- Open & create master key
ADMINISTER KEY MANAGEMENT SET KEYSTORE OPEN IDENTIFIED BY secret;
ADMINISTER KEY MANAGEMENT SET KEY IDENTIFIED BY secret WITH BACKUP;
-- Encrypt tablespace online
ALTER TABLESPACE users ENCRYPTION ONLINE USING 'AES256' ENCRYPT;

8. Health Check Script
Save as /home/oracle/scripts/health_check.sh and cron it daily:

#!/bin/bash
# Enterprise Oracle Health Check | 19c/23c/RAC/ASM
# Covers: OS, DB, ASM, RMAN, Performance, Blocking Sessions
DATE=$(date +"%Y-%m-%d_%H-%M-%S")
LOG_DIR="/u02/health_logs"
LOG_FILE="$LOG_DIR/health_$DATE.log"

mkdir -p "$LOG_DIR"
exec > >(tee -a "$LOG_FILE")
exec 2>&1

GREEN='\033[0;32m' RED='\033[0;31m' YELLOW='\033[1;33m' NC='\033[0m'

print_header() {
    echo "=================================================="
    echo " ORACLE ENTERPRISE HEALTH CHECK - $DATE"
    echo "=================================================="
}

check_critical() {
    if [ $? -ne 0 ]; then
        echo -e "${RED}✗ FAIL: $1${NC}"
        CRITICAL=1
    else
        echo -e "${GREEN}✔ PASS: $1${NC}"
    fi
}

print_header

# [1] OS Health
echo -e "\n${YELLOW}[1] OS STATUS${NC}"
uptime
free -h | head -n 3
df -h /u0 /u01 /u02 /u03 2>/dev/null | head -n 10
check_critical "Disk space OK"

# [2] Oracle 19c/23c Detection & Status
for env_file in /home/oracle/scripts/setEnv*.sh; do
    if [ -f "$env_file" ]; then
        echo -e "\n${YELLOW}[2] $(basename $env_file)${NC}"
        source "$env_file"
        sqlplus -s / as sysdba <<EOF
SET PAGESIZE 0 FEEDBACK OFF ECHO OFF
SELECT 'DB: '||name||' - '||open_mode||' - '||log_mode FROM v\$database;
SELECT 'Instance: '||instance_name||' - '||status FROM v\$instance;
SELECT 'PDBS: '||COUNT(*)||' pluggable DBs' FROM v\$pdbs;
SELECT 'Jobs: '||COUNT(*)||' running jobs' FROM dba_scheduler_running_jobs;
EXIT;
EOF
        check_critical "Oracle DB responsive"
    fi
done

# [3] RAC Cluster (if exists)
if command -v crsctl >/dev/null 2>&1; then
    echo -e "\n${YELLOW}[3] RAC CLUSTER${NC}"
    crsctl check crs
    srvctl status database -d * 2>/dev/null | head -n 20
    check_critical "Clusterware OK"
fi

# [4] ASM Status
if command -v asmcmd >/dev/null 2>&1; then
    echo -e "\n${YELLOW}[4] ASM DISKGROUPS${NC}"
    asmcmd lsdg --suppress | head -n 20
    asmcmd lsof | grep -v "TYPE.*ASMP" | head -n 10
    check_critical "ASM healthy"
fi

# [5] FRA & Archivelog
echo -e "\n${YELLOW}[5] RECOVERY AREA${NC}"
sqlplus -s / as sysdba <<EOF
SET PAGESIZE 0
COL used_gb FORMAT 999.0
COL total_gb FORMAT 999.0
SELECT 'FRA: '||name||' Used: '||ROUND(space_used/1024/1024/1024,1)||
       'GB / Limit: '||ROUND(space_limit/1024/1024/1024,1)||'GB ('||
       ROUND(space_used*100/space_limit,1)||'%)' used_gb
FROM v\$recovery_file_dest;
SELECT 'Archivelog: '||log_mode FROM v\$database;
SELECT 'Seq: '||thread#||'.'||sequence#||' Dest: '||dest_id||' Status: '||status
FROM v\$archived_log WHERE completion_time > SYSDATE-1 ORDER BY completion_time;
EXIT;
EOF

# [6] RMAN Backup Status (Last 7 days)
echo -e "\n${YELLOW}[6] RMAN BACKUPS${NC}"
rman target / <<EOF
SET ECHO OFF
LIST BACKUP SUMMARY WHERE completion_time > 'SYSDATE-7';
EXIT;
EOF
check_critical "RMAN accessible"

# [7] Blocking Sessions & Long Runners
echo -e "\n${YELLOW}[7] SESSIONS${NC}"
sqlplus -s / as sysdba <<EOF
SET PAGESIZE 0 LINESIZE 200
-- Blockers
SELECT 'BLOCKER: '||s1.sid||'->'||s2.sid||' SQL: '||
       s2.sql_id||' USER: '||s2.username||' WAIT: '||s2.event
FROM v\$lock l1, v\$session s1, v\$lock l2, v\$session s2
WHERE s1.sid = l1.sid AND s2.sid = l2.sid
AND l1.BLOCK = 1 AND l2.request > 0
AND l1.id1 = l2.id1 AND l1.id2 = l2.id2;

-- Top CPU
SELECT 'CPU_HOG: '||sid||' '||username||' '||program||' CPU: '||cpu_time
FROM v\$session WHERE cpu_time > 3600 ORDER BY cpu_time DESC;

EXIT;
EOF

# [8] Performance Alerts
echo -e "\n${YELLOW}[8] PERFORMANCE${NC}"
sqlplus -s / as sysdba <<EOF
SET PAGESIZE 0
-- Top waits (last hour)
SELECT 'WAIT: '||event||' '||total_waits||' waits '||ROUND(time_waited/100,1)||'s'
FROM (SELECT event, total_waits, time_waited FROM v\$system_event
      WHERE time_waited > 0 AND event NOT LIKE '%SQL*Net%'
      ORDER BY time_waited DESC) WHERE ROWNUM <= 5;

-- Invalid objects
SELECT 'INVALID: '||COUNT(*)||' objects' FROM dba_objects WHERE status='INVALID';

-- Tablespace usage
SELECT 'TS: '||tablespace_name||' '||ROUND(used_percent,1)||'% full'
FROM dba_tablespace_usage_metrics WHERE used_percent > 80;

EXIT;
EOF

# [9] Listener & Network
echo -e "\n${YELLOW}[9] LISTENER${NC}"
lsnrctl status | head -n 30

echo -e "\n${GREEN}==================================================${NC}"
echo -e "${GREEN}✓ HEALTH CHECK COMPLETE | Log: $LOG_FILE${NC}"
echo -e "${GREEN}✓ Run time: $(date -d 'now' --date='5 seconds ago' +%H:%M:%S)${NC}"

[ "${CRITICAL:-0}" -eq 1 ] && echo -e "${RED} CRITICAL ISSUES DETECTED${NC}"
exit ${CRITICAL:-0}

Cron: 0 6 * * * /home/oracle/scripts/health_check.sh

9. Data Guard Automation Scripts
dg_setup.sh - Full Broker Config

#!/bin/bash
# Data Guard Broker: Primary → Standby Setup
source /home/oracle/.bash_profile

# On PRIMARY
sqlplus / as sysdba <<EOF
ALTER SYSTEM SET dg_broker_start=TRUE;
ALTER SYSTEM SET log_archive_dest_2='SERVICE=standbydb VALID_FOR=(ONLINE_LOGFILES,PRIMARY_ROLE) DB_UNIQUE_NAME=standbydb';
ALTER SYSTEM SET fal_server=standbydb;
ALTER SYSTEM SET standby_file_management=AUTO;
EOF

# Broker config (run on PRIMARY)
dgmgrl sys/secret@primary <<EOF
CREATE CONFIGURATION 'DGCONFIG' AS PRIMARY DATABASE IS 'primary' CONNECT IDENTIFIER IS primary;
ADD DATABASE 'standby' AS CONNECT IDENTIFIER IS standby MAINTAINED AS PHYSICAL;
EDIT DATABASE 'standby' SET STATE='APPLY-ON';
EDIT CONFIGURATION SET PROTECTION MODE AS MAXAVAILABILITY;
ENABLE CONFIGURATION;
SHOW CONFIGURATION;
EXIT;
EOF

dg_health.sh - Daily Guard Check

#!/bin/bash
dgmgrl -silent sys/secret <<EOF
SHOW CONFIGURATION VERBOSITY=1;
SHOW DATABASE VERBOSITY=1;
EXIT;
EOF | grep -E "(SUCCESS|OK|ERROR|WARNING)"

10. RAC-Specific Monitoring
rac_global_cache.sh - GC Performance

#!/bin/bash
sqlplus -s / as sysdba <<EOF
SET PAGESIZE 0 LINESIZE 200
COL inst FORMAT A10
COL event FORMAT A30

-- Global Cache Cr/Current stats (should be <5% of DB time)
SELECT inst_id inst, event, total_waits, time_waited
FROM gv\$system_event 
WHERE event LIKE 'gc%' 
ORDER BY inst_id, time_waited DESC;

-- GCS stats (high numbers = network bottleneck)
SELECT inst_id, SUM(value) blocks
FROM gv\$ges_statistics 
WHERE name IN ('gc cr blocks received','gc current blocks received')
GROUP BY inst_id;

-- Interconnect
SELECT * FROM gv\$cluster_interconnects;

EXIT;
EOF

rac_balance_check.sh - Instance Load Balancing

#!/bin/bash
srvctl status database -d racdb -v
crsctl stat res -t | grep ora.racdb.db
sqlplus -s / as sysdba <<EOF
SELECT inst_id, instance_name, host_name, status, active_state FROM gv\$instance;
SELECT inst_id, COUNT(*) sessions FROM gv\$session GROUP BY inst_id;
EXIT;
EOF

11. Production Cron Schedule
# /etc/cron.d/oracle_monitoring
# Health checks
0 6 * * * oracle /home/oracle/scripts/master_control.sh

# RMAN Incremental Merge (2AM daily)
0 2 * * 1-5 oracle /home/oracle/scripts/rman_incr_merge.sh

# Data Guard health (hourly)
5 * * * * oracle /home/oracle/scripts/dg_health.sh >> /u02/logs/dg_health.log

# RAC cache monitoring (15min)
*/15 * * * * oracle /home/oracle/scripts/rac_global_cache.sh >> /u02/logs/rac_cache.log

# AWR export (midnight)
0 0 * * 1 oracle @$ORACLE_HOME/rdbms/admin/awrrpt.sql

12 Quick Deploy Commands
# Deploy all scripts
$ mkdir -p /home/oracle/scripts /u02/{health_logs,logs}
$ chown -R oracle:oinstall /home/oracle/scripts /u02
$ chmod +x /home/oracle/scripts/*.sh

# Test run
$ su - oracle -c "/home/oracle/scripts/master_control.sh"

# Add to crontab
$ su - oracle -c "crontab /home/oracle/scripts/oracle_cron"

No comments:

Post a Comment