Composite Health Check Suite (health.sql)

What This Script Does

This composite health check script provides comprehensive database analysis by:

Executing a series of specialized diagnostic scripts
Analyzing random I/O performance patterns
Checking scattered read (full table scan) performance
Monitoring CPU utilization and health
Assessing RAC interconnect performance (if applicable)
Examining enqueue (lock) wait statistics

Script

set pause off;
set pagesize 200;
@random_io
@scattered_io
@cpu_health
@interconnect_health
@enqueue

Usage

SQL> @health.sql

Prerequisites

This script requires the following individual scripts to be available:

random_io.sql - Random I/O analysis
scattered_io.sql - Scattered read analysis
cpu_health.sql - CPU utilization monitoring
interconnect_health.sql - RAC interconnect diagnostics
enqueue.sql - Enqueue wait analysis

Required Privileges

SELECT on V$SYSTEM_EVENT
SELECT on V$SESSION_EVENT
SELECT on V$SYSSTAT
SELECT on V$OSSTAT
SELECT on V$INSTANCE
SELECT on GV$ views (for RAC environments)

Component Analysis

Random I/O Analysis

Purpose: Analyze single block read performance

Key Metrics: db file sequential read waits
Indicators: Average wait time, total waits
Targets: < 10ms average for good performance

Scattered I/O Analysis

Purpose: Analyze multi-block read performance

Key Metrics: db file scattered read waits
Indicators: Multi-block read efficiency
Targets: Consistent with storage capabilities

CPU Health Monitoring

Purpose: Assess CPU utilization patterns

Key Metrics: CPU time, wait ratios
Indicators: CPU bottlenecks, scheduling delays
Targets: CPU utilization < 80% sustained

Interconnect Health (RAC)

Purpose: Monitor cluster interconnect performance

Key Metrics: gc cr/current block receive times
Indicators: Network latency, cluster efficiency
Targets: < 5ms for local interconnect

Enqueue Analysis

Purpose: Identify locking and contention issues

Key Metrics: Enqueue wait events
Indicators: Lock contention, blocking sessions
Targets: Minimal enqueue waits

Sample Output Structure

Each component produces specific diagnostic output:

=== Random I/O Analysis ===
Event: db file sequential read
Total Waits: 1,234,567
Total Time: 5,678 seconds
Average Wait: 4.6 ms

=== Scattered I/O Analysis ===
Event: db file scattered read
Total Waits: 234,567
Total Time: 1,234 seconds
Average Wait: 5.3 ms

=== CPU Health Analysis ===
CPU Used by This Session: 45,678 seconds
DB Time: 123,456 seconds
CPU Ratio: 37%

=== Interconnect Health ===
gc cr block receive time: 2.1 ms avg
gc current block receive time: 1.8 ms avg

=== Enqueue Analysis ===
TX - Transaction Lock: 45 waits
TM - DML Lock: 12 waits

Interpretation Guidelines

Performance Baselines

Random I/O: 5-15ms typical for disk storage
Scattered I/O: Should be higher than random I/O
CPU Ratio: 50-80% optimal for OLTP systems
Interconnect: < 5ms for properly configured RAC

Warning Indicators

High Random I/O times: Storage performance issues
Excessive CPU waits: CPU bottleneck or scheduling issues
High interconnect times: Network or cluster problems
Frequent enqueue waits: Locking contention

Common Use Cases

Daily Health Checks
- Run as part of morning diagnostics
- Identify overnight performance issues
- Establish baseline performance metrics
Performance Troubleshooting
- First-line diagnostic for performance issues
- Quickly identify major bottleneck areas
- Prioritize detailed analysis efforts
Capacity Planning
- Monitor resource utilization trends
- Identify growing performance pressures
- Plan infrastructure upgrades

Customization Options

Adding Additional Checks

-- Enhanced version with additional diagnostics
set pause off;
set pagesize 200;
@random_io
@scattered_io
@cpu_health
@interconnect_health
@enqueue
@memory_health
@tablespace_health
@session_health

Automated Scheduling

-- Wrapper for automated execution
SPOOL /path/to/logs/health_check_$(date +%Y%m%d_%H%M).log
@health.sql
SPOOL OFF

Integration with Monitoring

AWR Integration

Consider running before/after AWR snapshots:

EXEC DBMS_WORKLOAD_REPOSITORY.CREATE_SNAPSHOT();
@health.sql
-- Wait for activity
EXEC DBMS_WORKLOAD_REPOSITORY.CREATE_SNAPSHOT();

Alert Integration

Create alerts based on output thresholds:

-- Example threshold monitoring
SELECT CASE
  WHEN avg_wait > 20 THEN 'CRITICAL: High I/O waits'
  WHEN avg_wait > 10 THEN 'WARNING: Elevated I/O waits'
  ELSE 'OK'
END status
FROM (SELECT time_waited/total_waits avg_wait
      FROM v$system_event
      WHERE event = 'db file sequential read');

Troubleshooting Component Failures

Missing Scripts

If individual component scripts are missing:

-- Create minimal replacement scripts
-- For missing random_io.sql:
SELECT 'Random I/O Health Check' title FROM dual;
SELECT event, total_waits, time_waited,
       ROUND(time_waited/total_waits, 2) avg_wait_ms
FROM v$system_event
WHERE event = 'db file sequential read';

Permission Issues

Ensure proper privileges for all V$ views:

GRANT SELECT ON V$SYSTEM_EVENT TO monitoring_user;
GRANT SELECT ON V$SYSSTAT TO monitoring_user;
GRANT SELECT ON V$OSSTAT TO monitoring_user;

Automation and Scheduling

Cron Integration

#!/bin/bash
# Daily health check script
export ORACLE_HOME=/u01/app/oracle/product/19.0.0/dbhome_1
export ORACLE_SID=ORCL
cd /path/to/scripts
sqlplus -s / as sysdba @health.sql > /logs/daily_health_$(date +%Y%m%d).log

Enterprise Manager Integration

Schedule as Database Control job
Set up email notifications for critical thresholds
Store results in repository for trending

Best Practices

Regular Execution
- Run daily during off-peak hours
- Compare results over time
- Document baseline performance
Threshold Management
- Establish environment-specific thresholds
- Adjust for different workload patterns
- Document expected performance ranges
Response Procedures
- Create runbooks for common issues
- Define escalation procedures
- Maintain contact lists for critical issues