RAC Comprehensive Diagnostic (racdiag_9i.sql)
What This Script Does
Section titled “What This Script Does”This is a comprehensive RAC (Real Application Clusters) diagnostic script originally developed by Oracle Support Services. It provides an all-in-one solution for diagnosing RAC hangs, performance issues, and cluster communication problems by collecting critical diagnostic information from all cluster nodes simultaneously.
Script
Section titled “Script”-- NAME: RACDIAG.SQL-- SYS OR INTERNAL USER, CATPARR.SQL ALREADY RUN, PARALLEL QUERY OPTION ON-- -------------------------------------------------------------------------- AUTHOR:-- Michael Polaski - Oracle Support Services - DataServer Group-- Copyright 2002, Oracle Corporation-- -------------------------------------------------------------------------- PURPOSE:-- This script is intended to provide a user friendly guide to troubleshoot RAC-- hung sessions or slow performance scenerios. The script includes information-- to gather a variety of important debug information to determine the cause of an-- RAC hang. The script will create a file called racdiag_<timestamp>.out-- in your local directory while dumping hang analyze dumps in the user_dump_dest(s)-- and background_dump_dest(s) on all nodes.---- -------------------------------------------------------------------------- DISCLAIMER:-- This script is provided for educational purposes only. It is NOT-- supported by Oracle World Wide Technical Support.-- The script has been tested and appears to work as intended.-- You should always run new scripts on a test instance initially.-- -------------------------------------------------------------------------- Script output is as follows:
set echo offset feedback offcolumn timecol new_value timestampcolumn spool_extension new_value suffixselect to_char(sysdate,'Mondd_hhmi') timecol,'.out' spool_extension from sys.dual;column output new_value dbnameselect value || '_' outputfrom v$parameter where name = 'db_name';spool c:\racdiag_&&dbname&×tamp&&suffixset lines 200set pagesize 35set trim onset trims onalter session set nls_date_format = 'MON-DD-YYYY HH24:MI:SS';alter session set timed_statistics = true;set feedback onselect to_char(sysdate) time from dual;
set numwidth 5column host_name format a20 truselect inst_id, instance_name, host_name, version, status, startup_timefrom gv$instanceorder by inst_id;
set echo on
-- Taking Hang Analyze dumps-- This may take a little while...oradebug setmypidoradebug unlimitoradebug -g all hanganalyze 3-- This part may take the longest, you can monitor bdump or udump to see if the-- file is being generated.oradebug -g all dump systemstate 10
-- WAITING SESSIONS:-- The entries that are shown at the top are the sessions that have-- waited the longest amount of time that are waiting for non-idle wait-- events (event column). You can research and find out what the wait-- event indicates (along with its parameters) by checking the Oracle-- Server Reference Manual or look for any known issues or documentation-- by searching Metalink for the event name in the search bar. Example-- (include single quotes): [ 'buffer busy due to global cache' ].-- Metalink and/or the Server Reference Manual should return some useful-- information on each type of wait event. The inst_id column shows the-- instance where the session resides and the SID is the unique identifier-- for the session (gv$session). The p1, p2, and p3 columns will show-- event specific information that may be important to debug the problem.-- To find out what the p1, p2, and p3 indicates see the next section.-- Items with wait_time of anything other than 0 indicate we do not know-- how long these sessions have been waiting.--set numwidth 10column state format a7 trucolumn event format a25 trucolumn last_sql format a40 truselect sw.inst_id, sw.sid, sw.state, sw.event, sw.seconds_in_wait seconds,sw.p1, sw.p2, sw.p3, sa.sql_text last_sqlfrom gv$session_wait sw, gv$session s, gv$sqlarea sawhere sw.event not in('rdbms ipc message','smon timer','pmon timer','SQL*Net message from client','lock manager wait for remote message','ges remote message', 'gcs remote message', 'gcs for action', 'client message','pipe get', 'null event', 'PX Idle Wait', 'single-task message','PX Deq: Execution Msg', 'KXFQ: kxfqdeq - normal deqeue','listen endpoint status','slave wait','wakeup time manager')and seconds_in_wait > 0and (sw.inst_id = s.inst_id and sw.sid = s.sid)and (s.inst_id = sa.inst_id and s.sql_address = sa.address)order by seconds desc;
-- EVENT PARAMETER LOOKUP:-- This section will give a description of the parameter names of the-- events seen in the last section. p1test is the parameter value for-- p1 in the WAITING SESSIONS section while p2text is the parameter-- value for p3 and p3 text is the parameter value for p3. The-- parameter values in the first section can be helpful for debugging-- the wait event.--column event format a30 trucolumn p1text format a25 trucolumn p2text format a25 trucolumn p3text format a25 truselect distinct event, p1text, p2text, p3textfrom gv$session_wait swwhere sw.event not in ('rdbms ipc message','smon timer','pmon timer','SQL*Net message from client','lock manager wait for remote message','ges remote message', 'gcs remote message', 'gcs for action', 'client message','pipe get', 'null event', 'PX Idle Wait', 'single-task message','PX Deq: Execution Msg', 'KXFQ: kxfqdeq - normal deqeue','listen endpoint status','slave wait','wakeup time manager')and seconds_in_wait > 0order by event;
-- PCM LOCK BLOCKERS:-- This section will show us any sessions that are holding locks that-- are blocking other users. The inst_id will show us the instance that-- the session resides on while the sid will be a unique identifier for-- the session. The grant_level will show us how the PCM lock is granted to-- the user. The request_level will show us what status we are trying to obtain.-- The lockstate column will show us what status the lock is in. The last column-- shows how long this session has been waiting.--set numwidth 5column lockstate format a16 tru;column event format a30 tru;select dl.inst_id, s.sid, p.spid, dl.resource_name1,decode(substr(dl.grant_level,1,8),'KJUSERNL','Null','KJUSERCR','Row-S (SS)','KJUSERCW','Row-X (SX)','KJUSERPR','Share','KJUSERPW','S/Row-X (SSX)','KJUSEREX','Exclusive',request_level) as grant_level,decode(substr(dl.request_level,1,8),'KJUSERNL','Null','KJUSERCR','Row-S (SS)','KJUSERCW','Row-X (SX)','KJUSERPR','Share','KJUSERPW','S/Row-X (SSX)','KJUSEREX','Exclusive',request_level) as request_level,decode(substr(dl.lockstate,1,8),'KJUSERGR','Granted','KJUSEROP','Opening','KJUSERCA','Canceling','KJUSERCV','Converting') as lockstate,s.sid, sw.event, sw.seconds_in_wait secfrom gv$dlm_locks dl, gv$process p, gv$session s, gv$session_wait swwhere blocker = 1and (dl.inst_id = p.inst_id and dl.pid = p.spid)and (p.inst_id = s.inst_id and p.addr = s.paddr)and (s.inst_id = sw.inst_id and s.sid = sw.sid)order by sw.seconds_in_wait desc;
-- [Additional sections continue for 549 total lines...]-- Full script includes comprehensive analysis of:-- - PCM Lock waiters and blockers-- - Latch contention analysis-- - Global cache performance metrics-- - Instance communication diagnostics-- - Session and lock analysis-- - System state dumps and hang analysis
Key Features
Section titled “Key Features”1. Automated Diagnostic Collection
Section titled “1. Automated Diagnostic Collection”- Creates timestamped output files
- Generates hang analyze dumps across all nodes
- Produces system state dumps for detailed analysis
- Collects information from all RAC instances
2. Comprehensive Analysis Sections
Section titled “2. Comprehensive Analysis Sections”- Instance Status - All cluster nodes health
- Waiting Sessions - Sessions with non-idle waits
- Event Parameters - Wait event details and parameters
- PCM Lock Analysis - Lock blockers and waiters
- Latch Contention - Latch wait analysis
- Global Cache - Inter-node communication metrics
- Resource Analysis - Memory and process utilization
3. RAC-Specific Diagnostics
Section titled “3. RAC-Specific Diagnostics”- Parallel Cache Management (PCM) locks
- Global Cache Service (GCS) analysis
- Global Enqueue Service (GES) diagnostics
- Inter-instance communication analysis
-- Connect as SYSDBA to any RAC instancesqlplus / as sysdba
-- Run the diagnostic script@racdiag_9i.sql
-- Output files created:-- racdiag_<dbname>_<timestamp>.out (main report)-- hanganalyze dumps in user_dump_dest-- systemstate dumps in background_dump_dest
Required Privileges
Section titled “Required Privileges”- SYSDBA or INTERNAL connection required
- Access to
ORADEBUG
commands - SELECT privileges on all GV$ views
- CATPARR.SQL must be run (parallel query option)
Sample Output Sections
Section titled “Sample Output Sections”1. Instance Overview
Section titled “1. Instance Overview”INST_ID INSTANCE_NAME HOST_NAME VERSION STATUS STARTUP_TIME------- -------------------- -------------------- ---------------- ------- ----------------- 1 ORCL1 server1.domain.com 19.3.0.0.0 OPEN 01-JAN-2024 08:00:00 2 ORCL2 server2.domain.com 19.3.0.0.0 OPEN 01-JAN-2024 08:01:15
2. Waiting Sessions Analysis
Section titled “2. Waiting Sessions Analysis”INST_ID SID STATE EVENT SECONDS P1 P2 P3 LAST_SQL------- ----- ------- ----------------------- ------- ---------- ---------- ---------- -------- 1 145 WAITING gc current block 2-way 1234 file# block# class# SELECT... 2 267 WAITING buffer busy waits 567 file# block# reason# UPDATE... 1 89 WAITING gc cr block 2-way 234 file# block# class# INSERT...
3. PCM Lock Blockers
Section titled “3. PCM Lock Blockers”INST_ID SID SPID RESOURCE_NAME1 GRANT_LEVEL REQUEST_LEVEL LOCKSTATE EVENT SEC------- ----- ----- ---------------- ----------- ------------- ---------- -------------------------- --- 1 145 5234 [0x15a][0x2c4] Exclusive Null Granted SQL*Net message from client 0 2 267 6789 [0x15a][0x2c4] Null Exclusive Converting gc current block 2-way 45
Understanding the Output
Section titled “Understanding the Output”1. Waiting Sessions Priority
Section titled “1. Waiting Sessions Priority”Sessions are ordered by wait time (longest first):
- High seconds_in_wait = Serious performance issue
- Event types indicate problem category
- P1, P2, P3 provide event-specific details
2. PCM Lock Analysis
Section titled “2. PCM Lock Analysis”- Blockers = Sessions holding locks that others need
- Waiters = Sessions waiting for blocked resources
- Grant/Request levels show lock compatibility
3. Common RAC Wait Events
Section titled “3. Common RAC Wait Events”gc current block 2-way
- Global cache block requestsgc cr block 2-way
- Consistent read block transfersbuffer busy waits
- Block contentiongc current block busy
- Hot block issues
Diagnostic Workflow
Section titled “Diagnostic Workflow”1. Initial Assessment
Section titled “1. Initial Assessment”-- Check instance status firstSELECT inst_id, status, host_nameFROM gv$instance;
-- Verify cluster interconnectSELECT inst_id, name, ip_addressFROM gv$cluster_interconnects;
2. Run RACDIAG Script
Section titled “2. Run RACDIAG Script”- Execute during performance problem
- Allow several minutes for completion
- Monitor progress in dump directories
3. Analyze Results
Section titled “3. Analyze Results”- Review waiting sessions section first
- Identify blocking chains
- Examine wait event patterns
- Check global cache statistics
Advanced Analysis
Section titled “Advanced Analysis”1. Global Cache Hit Ratios
Section titled “1. Global Cache Hit Ratios”-- Analyze global cache efficiencySELECT inst_id, gc_current_blocks_received, gc_current_blocks_served, gc_cr_blocks_received, gc_cr_blocks_servedFROM gv$sysstatWHERE name LIKE 'gc%blocks%';
2. Interconnect Performance
Section titled “2. Interconnect Performance”-- Check interconnect latencySELECT inst_id, name, valueFROM gv$sysstatWHERE name LIKE '%gc%time%'ORDER BY inst_id, name;
3. Block Transfer Analysis
Section titled “3. Block Transfer Analysis”-- Examine block transfer patternsSELECT inst_id, ROUND(physical_reads/1000) phys_reads_k, ROUND(gc_current_blocks_received/1000) gc_curr_k, ROUND(gc_cr_blocks_received/1000) gc_cr_kFROM gv$sysstat s1, gv$sysstat s2, gv$sysstat s3WHERE s1.name = 'physical reads'AND s2.name = 'gc current blocks received'AND s3.name = 'gc cr blocks received'AND s1.inst_id = s2.inst_idAND s2.inst_id = s3.inst_id;
Troubleshooting Common Issues
Section titled “Troubleshooting Common Issues”1. High Global Cache Waits
Section titled “1. High Global Cache Waits”-- Identify hot objectsSELECT owner, object_name, object_type, COUNT(*) gc_requestsFROM gv$session_wait sw, dba_objects oWHERE sw.event LIKE 'gc%'AND sw.p1 = o.data_object_idGROUP BY owner, object_name, object_typeORDER BY COUNT(*) DESC;
2. Interconnect Problems
Section titled “2. Interconnect Problems”-- Check for interconnect errorsSELECT inst_id, name, valueFROM gv$sysstatWHERE name LIKE '%interconnect%'AND value > 0;
3. Memory Pressure
Section titled “3. Memory Pressure”-- Analyze buffer cache pressureSELECT inst_id, ROUND(physical_reads/1000) phys_reads_k, ROUND(db_block_gets/1000) block_gets_k, ROUND(consistent_gets/1000) cons_gets_kFROM gv$sysstatWHERE name IN ('physical reads', 'db block gets', 'consistent gets');
Best Practices
Section titled “Best Practices”1. Regular Monitoring
Section titled “1. Regular Monitoring”- Run during peak hours to establish baselines
- Schedule regular diagnostic collections
- Monitor trends over time
- Document normal vs abnormal patterns
2. Emergency Response
Section titled “2. Emergency Response”- Run immediately when hangs occur
- Collect multiple snapshots 5-10 minutes apart
- Preserve dump files for Oracle Support
- Document symptoms and timeline
3. Performance Tuning
Section titled “3. Performance Tuning”- Identify frequently waited-for objects
- Optimize SQL causing global cache waits
- Consider partitioning hot objects
- Review application design patterns
Integration with Other Tools
Section titled “Integration with Other Tools”1. AWR Reports
Section titled “1. AWR Reports”-- Generate RAC-specific AWRBEGIN DBMS_WORKLOAD_REPOSITORY.CREATE_SNAPSHOT();END;/
2. ADDM Analysis
Section titled “2. ADDM Analysis”-- Run ADDM for RAC instanceBEGIN DBMS_ADVISOR.CREATE_TASK( advisor_name => 'ADDM', task_name => 'RAC_ADDM_TASK' );END;/
3. Cluster Ready Services
Section titled “3. Cluster Ready Services”# Check cluster statuscrsctl status resource -t
# Check interconnectoifcfg getif
Related Scripts
Section titled “Related Scripts”- AWR System Wait Event Analysis (dhsysev.md) - Historical wait analysis
- Blocking Locks (gvlockb.md) - Current lock analysis
- Active Queue Analysis (gvaq.md) - Queue analysis
- RAC Instance Status (../database-info/gvinst.md) - Instance overview
Modern RAC Alternatives
Section titled “Modern RAC Alternatives”For Oracle 12c and later, consider these alternatives:
1. Cluster Health Monitor (CHM)
Section titled “1. Cluster Health Monitor (CHM)”-- Enable automatic diagnostic collectionALTER SYSTEM SET DIAGNOSTIC_DEST = '/u01/app/oracle/diag';
2. Real-Time ADDM
Section titled “2. Real-Time ADDM”- Automatic problem detection
- Continuous monitoring
- Root cause analysis
3. Database Performance Hub
Section titled “3. Database Performance Hub”- Real-time performance monitoring
- Historical trend analysis
- Integrated diagnostics
- Legacy script - Designed for Oracle 9i/10g RAC
- Still valuable - Core diagnostics remain relevant
- Adaptation needed - Some views may have changed
- Supplement with - Modern monitoring tools
- File locations - Adjust paths for your environment (currently hardcoded to C:)
This script provides foundational RAC diagnostic capabilities that remain valuable for understanding cluster behavior and diagnosing performance issues in Oracle RAC environments.