Skip to content

RAC Comprehensive Diagnostic (racdiag_9i.sql)

This is a comprehensive RAC (Real Application Clusters) diagnostic script originally developed by Oracle Support Services. It provides an all-in-one solution for diagnosing RAC hangs, performance issues, and cluster communication problems by collecting critical diagnostic information from all cluster nodes simultaneously.

-- NAME: RACDIAG.SQL
-- SYS OR INTERNAL USER, CATPARR.SQL ALREADY RUN, PARALLEL QUERY OPTION ON
-- ------------------------------------------------------------------------
-- AUTHOR:
-- Michael Polaski - Oracle Support Services - DataServer Group
-- Copyright 2002, Oracle Corporation
-- ------------------------------------------------------------------------
-- PURPOSE:
-- This script is intended to provide a user friendly guide to troubleshoot RAC
-- hung sessions or slow performance scenerios. The script includes information
-- to gather a variety of important debug information to determine the cause of an
-- RAC hang. The script will create a file called racdiag_<timestamp>.out
-- in your local directory while dumping hang analyze dumps in the user_dump_dest(s)
-- and background_dump_dest(s) on all nodes.
--
-- ------------------------------------------------------------------------
-- DISCLAIMER:
-- This script is provided for educational purposes only. It is NOT
-- supported by Oracle World Wide Technical Support.
-- The script has been tested and appears to work as intended.
-- You should always run new scripts on a test instance initially.
-- ------------------------------------------------------------------------
-- Script output is as follows:
set echo off
set feedback off
column timecol new_value timestamp
column spool_extension new_value suffix
select to_char(sysdate,'Mondd_hhmi') timecol,
'.out' spool_extension from sys.dual;
column output new_value dbname
select value || '_' output
from v$parameter where name = 'db_name';
spool c:\racdiag_&&dbname&&timestamp&&suffix
set lines 200
set pagesize 35
set trim on
set trims on
alter session set nls_date_format = 'MON-DD-YYYY HH24:MI:SS';
alter session set timed_statistics = true;
set feedback on
select to_char(sysdate) time from dual;
set numwidth 5
column host_name format a20 tru
select inst_id, instance_name, host_name, version, status, startup_time
from gv$instance
order by inst_id;
set echo on
-- Taking Hang Analyze dumps
-- This may take a little while...
oradebug setmypid
oradebug unlimit
oradebug -g all hanganalyze 3
-- This part may take the longest, you can monitor bdump or udump to see if the
-- file is being generated.
oradebug -g all dump systemstate 10
-- WAITING SESSIONS:
-- The entries that are shown at the top are the sessions that have
-- waited the longest amount of time that are waiting for non-idle wait
-- events (event column). You can research and find out what the wait
-- event indicates (along with its parameters) by checking the Oracle
-- Server Reference Manual or look for any known issues or documentation
-- by searching Metalink for the event name in the search bar. Example
-- (include single quotes): [ 'buffer busy due to global cache' ].
-- Metalink and/or the Server Reference Manual should return some useful
-- information on each type of wait event. The inst_id column shows the
-- instance where the session resides and the SID is the unique identifier
-- for the session (gv$session). The p1, p2, and p3 columns will show
-- event specific information that may be important to debug the problem.
-- To find out what the p1, p2, and p3 indicates see the next section.
-- Items with wait_time of anything other than 0 indicate we do not know
-- how long these sessions have been waiting.
--
set numwidth 10
column state format a7 tru
column event format a25 tru
column last_sql format a40 tru
select sw.inst_id, sw.sid, sw.state, sw.event, sw.seconds_in_wait seconds,
sw.p1, sw.p2, sw.p3, sa.sql_text last_sql
from gv$session_wait sw, gv$session s, gv$sqlarea sa
where sw.event not in
('rdbms ipc message','smon timer','pmon timer',
'SQL*Net message from client','lock manager wait for remote message',
'ges remote message', 'gcs remote message', 'gcs for action', 'client message',
'pipe get', 'null event', 'PX Idle Wait', 'single-task message',
'PX Deq: Execution Msg', 'KXFQ: kxfqdeq - normal deqeue',
'listen endpoint status','slave wait','wakeup time manager')
and seconds_in_wait > 0
and (sw.inst_id = s.inst_id and sw.sid = s.sid)
and (s.inst_id = sa.inst_id and s.sql_address = sa.address)
order by seconds desc;
-- EVENT PARAMETER LOOKUP:
-- This section will give a description of the parameter names of the
-- events seen in the last section. p1test is the parameter value for
-- p1 in the WAITING SESSIONS section while p2text is the parameter
-- value for p3 and p3 text is the parameter value for p3. The
-- parameter values in the first section can be helpful for debugging
-- the wait event.
--
column event format a30 tru
column p1text format a25 tru
column p2text format a25 tru
column p3text format a25 tru
select distinct event, p1text, p2text, p3text
from gv$session_wait sw
where sw.event not in ('rdbms ipc message','smon timer','pmon timer',
'SQL*Net message from client','lock manager wait for remote message',
'ges remote message', 'gcs remote message', 'gcs for action', 'client message',
'pipe get', 'null event', 'PX Idle Wait', 'single-task message',
'PX Deq: Execution Msg', 'KXFQ: kxfqdeq - normal deqeue',
'listen endpoint status','slave wait','wakeup time manager')
and seconds_in_wait > 0
order by event;
-- PCM LOCK BLOCKERS:
-- This section will show us any sessions that are holding locks that
-- are blocking other users. The inst_id will show us the instance that
-- the session resides on while the sid will be a unique identifier for
-- the session. The grant_level will show us how the PCM lock is granted to
-- the user. The request_level will show us what status we are trying to obtain.
-- The lockstate column will show us what status the lock is in. The last column
-- shows how long this session has been waiting.
--
set numwidth 5
column lockstate format a16 tru;
column event format a30 tru;
select dl.inst_id, s.sid, p.spid, dl.resource_name1,
decode(substr(dl.grant_level,1,8),'KJUSERNL','Null','KJUSERCR','Row-S (SS)',
'KJUSERCW','Row-X (SX)','KJUSERPR','Share','KJUSERPW','S/Row-X (SSX)',
'KJUSEREX','Exclusive',request_level) as grant_level,
decode(substr(dl.request_level,1,8),'KJUSERNL','Null','KJUSERCR','Row-S (SS)',
'KJUSERCW','Row-X (SX)','KJUSERPR','Share','KJUSERPW','S/Row-X (SSX)',
'KJUSEREX','Exclusive',request_level) as request_level,
decode(substr(dl.lockstate,1,8),'KJUSERGR','Granted','KJUSEROP','Opening',
'KJUSERCA','Canceling','KJUSERCV','Converting') as lockstate,
s.sid, sw.event, sw.seconds_in_wait sec
from gv$dlm_locks dl, gv$process p, gv$session s, gv$session_wait sw
where blocker = 1
and (dl.inst_id = p.inst_id and dl.pid = p.spid)
and (p.inst_id = s.inst_id and p.addr = s.paddr)
and (s.inst_id = sw.inst_id and s.sid = sw.sid)
order by sw.seconds_in_wait desc;
-- [Additional sections continue for 549 total lines...]
-- Full script includes comprehensive analysis of:
-- - PCM Lock waiters and blockers
-- - Latch contention analysis
-- - Global cache performance metrics
-- - Instance communication diagnostics
-- - Session and lock analysis
-- - System state dumps and hang analysis
  • Creates timestamped output files
  • Generates hang analyze dumps across all nodes
  • Produces system state dumps for detailed analysis
  • Collects information from all RAC instances
  • Instance Status - All cluster nodes health
  • Waiting Sessions - Sessions with non-idle waits
  • Event Parameters - Wait event details and parameters
  • PCM Lock Analysis - Lock blockers and waiters
  • Latch Contention - Latch wait analysis
  • Global Cache - Inter-node communication metrics
  • Resource Analysis - Memory and process utilization
  • Parallel Cache Management (PCM) locks
  • Global Cache Service (GCS) analysis
  • Global Enqueue Service (GES) diagnostics
  • Inter-instance communication analysis
-- Connect as SYSDBA to any RAC instance
sqlplus / as sysdba
-- Run the diagnostic script
@racdiag_9i.sql
-- Output files created:
-- racdiag_<dbname>_<timestamp>.out (main report)
-- hanganalyze dumps in user_dump_dest
-- systemstate dumps in background_dump_dest
  • SYSDBA or INTERNAL connection required
  • Access to ORADEBUG commands
  • SELECT privileges on all GV$ views
  • CATPARR.SQL must be run (parallel query option)
INST_ID INSTANCE_NAME HOST_NAME VERSION STATUS STARTUP_TIME
------- -------------------- -------------------- ---------------- ------- -----------------
1 ORCL1 server1.domain.com 19.3.0.0.0 OPEN 01-JAN-2024 08:00:00
2 ORCL2 server2.domain.com 19.3.0.0.0 OPEN 01-JAN-2024 08:01:15
INST_ID SID STATE EVENT SECONDS P1 P2 P3 LAST_SQL
------- ----- ------- ----------------------- ------- ---------- ---------- ---------- --------
1 145 WAITING gc current block 2-way 1234 file# block# class# SELECT...
2 267 WAITING buffer busy waits 567 file# block# reason# UPDATE...
1 89 WAITING gc cr block 2-way 234 file# block# class# INSERT...
INST_ID SID SPID RESOURCE_NAME1 GRANT_LEVEL REQUEST_LEVEL LOCKSTATE EVENT SEC
------- ----- ----- ---------------- ----------- ------------- ---------- -------------------------- ---
1 145 5234 [0x15a][0x2c4] Exclusive Null Granted SQL*Net message from client 0
2 267 6789 [0x15a][0x2c4] Null Exclusive Converting gc current block 2-way 45

Sessions are ordered by wait time (longest first):

  • High seconds_in_wait = Serious performance issue
  • Event types indicate problem category
  • P1, P2, P3 provide event-specific details
  • Blockers = Sessions holding locks that others need
  • Waiters = Sessions waiting for blocked resources
  • Grant/Request levels show lock compatibility
  • gc current block 2-way - Global cache block requests
  • gc cr block 2-way - Consistent read block transfers
  • buffer busy waits - Block contention
  • gc current block busy - Hot block issues
-- Check instance status first
SELECT inst_id, status, host_name
FROM gv$instance;
-- Verify cluster interconnect
SELECT inst_id, name, ip_address
FROM gv$cluster_interconnects;
  • Execute during performance problem
  • Allow several minutes for completion
  • Monitor progress in dump directories
  • Review waiting sessions section first
  • Identify blocking chains
  • Examine wait event patterns
  • Check global cache statistics
-- Analyze global cache efficiency
SELECT inst_id,
gc_current_blocks_received,
gc_current_blocks_served,
gc_cr_blocks_received,
gc_cr_blocks_served
FROM gv$sysstat
WHERE name LIKE 'gc%blocks%';
-- Check interconnect latency
SELECT inst_id, name, value
FROM gv$sysstat
WHERE name LIKE '%gc%time%'
ORDER BY inst_id, name;
-- Examine block transfer patterns
SELECT inst_id,
ROUND(physical_reads/1000) phys_reads_k,
ROUND(gc_current_blocks_received/1000) gc_curr_k,
ROUND(gc_cr_blocks_received/1000) gc_cr_k
FROM gv$sysstat s1, gv$sysstat s2, gv$sysstat s3
WHERE s1.name = 'physical reads'
AND s2.name = 'gc current blocks received'
AND s3.name = 'gc cr blocks received'
AND s1.inst_id = s2.inst_id
AND s2.inst_id = s3.inst_id;
-- Identify hot objects
SELECT owner, object_name, object_type,
COUNT(*) gc_requests
FROM gv$session_wait sw, dba_objects o
WHERE sw.event LIKE 'gc%'
AND sw.p1 = o.data_object_id
GROUP BY owner, object_name, object_type
ORDER BY COUNT(*) DESC;
-- Check for interconnect errors
SELECT inst_id, name, value
FROM gv$sysstat
WHERE name LIKE '%interconnect%'
AND value > 0;
-- Analyze buffer cache pressure
SELECT inst_id,
ROUND(physical_reads/1000) phys_reads_k,
ROUND(db_block_gets/1000) block_gets_k,
ROUND(consistent_gets/1000) cons_gets_k
FROM gv$sysstat
WHERE name IN ('physical reads', 'db block gets', 'consistent gets');
  • Run during peak hours to establish baselines
  • Schedule regular diagnostic collections
  • Monitor trends over time
  • Document normal vs abnormal patterns
  • Run immediately when hangs occur
  • Collect multiple snapshots 5-10 minutes apart
  • Preserve dump files for Oracle Support
  • Document symptoms and timeline
  • Identify frequently waited-for objects
  • Optimize SQL causing global cache waits
  • Consider partitioning hot objects
  • Review application design patterns
-- Generate RAC-specific AWR
BEGIN
DBMS_WORKLOAD_REPOSITORY.CREATE_SNAPSHOT();
END;
/
-- Run ADDM for RAC instance
BEGIN
DBMS_ADVISOR.CREATE_TASK(
advisor_name => 'ADDM',
task_name => 'RAC_ADDM_TASK'
);
END;
/
Terminal window
# Check cluster status
crsctl status resource -t
# Check interconnect
oifcfg getif

For Oracle 12c and later, consider these alternatives:

-- Enable automatic diagnostic collection
ALTER SYSTEM SET DIAGNOSTIC_DEST = '/u01/app/oracle/diag';
  • Automatic problem detection
  • Continuous monitoring
  • Root cause analysis
  • Real-time performance monitoring
  • Historical trend analysis
  • Integrated diagnostics
  • Legacy script - Designed for Oracle 9i/10g RAC
  • Still valuable - Core diagnostics remain relevant
  • Adaptation needed - Some views may have changed
  • Supplement with - Modern monitoring tools
  • File locations - Adjust paths for your environment (currently hardcoded to C:)

This script provides foundational RAC diagnostic capabilities that remain valuable for understanding cluster behavior and diagnosing performance issues in Oracle RAC environments.