Histogram Analysis (dhstgrmv.sql)
What This Script Does
Section titled “What This Script Does”This script provides Oracle database administration functionality via the dhstgrmv.sql script.
The Script
Section titled “The Script”rem dhstgrm.sqlremset linesize 132remttitle 'Histograms'col OWNER format a10 heading 'OWNER'col TABLE_NAME format a20 heading 'TABLE NAME'col COLUMN_NAME format a20 heading 'COLUMN NAME'col DATA_TYPE format a9 heading 'DATA TYPE'col nullable format a1 heading 'NULL?'col num_distinct format 9,999,999 heading 'NUM|DISTINCT'col low_value heading 'LOW VALUE'col high_value heading 'HIGH VALUE'col density format 0.9999 heading 'DENSITY'col num_nulls format 99,999,999 heading 'NUM|NULLS'col num_buckets format 999 heading 'BUCKETS'col ENDPOINT_NUMBER format 9,999,999 heading 'ENDPOINT#'rem col ENDPOINT_VALUE format 999999999999999999 heading 'ENDPOINT VALUE'col ENDPOINT_VALUE heading 'ENDPOINT VALUE'col ENDPOINT_actual_VALUE format a10 trunc heading 'ENDPOINT|ACTUAL|VALUE'remclear breakbreak on owner on table_name on column_name - on DATA_TYPE - on nullable - on num_distinct - on low_value - on high_value - on density - on num_nulls - on num_buckets skip 1remselect c.OWNER, c.TABLE_NAME, c.COLUMN_NAME, c.DATA_TYPE, decode(c.NULLABLE, 'Y', null, c.NULLABLE) nullable, c.num_distinct,-- c.low_value,-- c.high_value, c.density, c.num_nulls, c.num_buckets, h.ENDPOINT_NUMBER, h.ENDPOINT_actual_VALUEfrom dba_histograms h, dba_tab_columns cwhere h.owner like nvl(upper('&owner'),'%') and h.table_name like nvl(upper('&table'),'%') and h.column_name like nvl(upper('&column'),'%') and c.owner=h.owner and c.table_name=h.table_name and c.column_name=h.column_nameorder by c.OWNER, c.TABLE_NAME, c.COLUMN_ID, h.ENDPOINT_NUMBER/remset linesize 80
What This Script Does
Section titled “What This Script Does”The dhstgrmv.sql script analyzes column histogram statistics in Oracle databases. It provides detailed information about data distribution patterns, including histogram buckets, endpoint values, and column statistics. This is essential for understanding query optimizer behavior and identifying columns with skewed data distributions.
Run the script and provide input when prompted:
SQL> @dhstgrmv.sqlEnter value for owner: SALESEnter value for table: CUSTOMERSEnter value for column: STATUS
Parameters:
owner
: Schema owner name (supports wildcard % matching)table
: Table name pattern (supports wildcard % matching)column
: Column name pattern (supports wildcard % matching)
Required Privileges
Section titled “Required Privileges”SELECT
privilege onDBA_HISTOGRAMS
andDBA_TAB_COLUMNS
(usually requires DBA role)- For non-DBA users, modify script to use
USER_HISTOGRAMS
andUSER_TAB_COLUMNS
Sample Output
Section titled “Sample Output”Histograms
OWNER TABLE NAME COLUMN NAME DATA TYPE NULL? NUM DISTINCT DENSITY NUM NULLS BUCKETS ENDPOINT# ENDPOINT ACTUAL VALUE---------- -------------------- -------------------- --------- ----- ------------ ------- --------- ------- --------- --------------------SALES CUSTOMERS STATUS VARCHAR2 4 0.2500 0 4 1 ACTIVE 2 INACTIVE 875 PENDING 1000 SUSPENDED
SALES CUSTOMERS REGION_ID NUMBER 25 0.0400 0 25 1 1 2 2 50 3 150 4 300 5 ... (more buckets)
Key Output Columns
Section titled “Key Output Columns”- OWNER: Schema that owns the table
- TABLE NAME: Name of the table containing the column
- COLUMN NAME: Name of the column with histogram data
- DATA TYPE: Data type of the column
- NULL?: Whether the column allows null values
- NUM DISTINCT: Number of distinct values in the column
- DENSITY: Selectivity estimate (1/num_distinct for uniform distribution)
- NUM NULLS: Number of null values in the column
- BUCKETS: Number of histogram buckets
- ENDPOINT#: Cumulative frequency count up to this bucket
- ENDPOINT ACTUAL VALUE: The actual data value at this endpoint
Understanding Histogram Types
Section titled “Understanding Histogram Types”Height-Balanced Histograms
Section titled “Height-Balanced Histograms”- Buckets > 1: Multiple buckets indicate height-balanced histogram
- Popular Values: Buckets with large endpoint number jumps
- Uniform Distribution: Even endpoint number increments
Frequency Histograms
Section titled “Frequency Histograms”- Buckets = Distinct Values: Each distinct value gets its own bucket
- Skewed Data: Large differences in endpoint numbers reveal data skew
Performance Optimization Insights
Section titled “Performance Optimization Insights”Data Skew Analysis
Section titled “Data Skew Analysis”- High Endpoint Jumps: Indicate very popular values
- Low Density: Suggests good selectivity for equality predicates
- Many Buckets: Complex data distribution requiring detailed statistics
Query Optimizer Impact
Section titled “Query Optimizer Impact”- Cardinality Estimation: Histograms improve cost-based optimizer estimates
- Index Usage: Skewed columns may benefit from conditional indexing
- Partition Pruning: Histogram data helps with partition elimination
Common Use Cases
Section titled “Common Use Cases”-
Query Optimization Analysis
- Identify columns with skewed data distributions
- Understand why certain execution plans are chosen
-
Index Design Decisions
- Determine if columns benefit from indexing based on selectivity
- Identify candidates for function-based indexes
-
Statistics Quality Assessment
- Verify that statistics accurately represent data distribution
- Identify columns needing more detailed histogram analysis
-
Performance Troubleshooting
- Investigate why queries perform differently than expected
- Analyze cost-based optimizer decision factors
Interpreting Results
Section titled “Interpreting Results”Uniform Distribution
Section titled “Uniform Distribution”ENDPOINT# ENDPOINT VALUE--------- -------------- 200 100 400 200 600 300 800 400 1000 500
Even increments indicate uniform data distribution.
Skewed Distribution
Section titled “Skewed Distribution”ENDPOINT# ENDPOINT VALUE--------- -------------- 950 'ACTIVE' 975 'INACTIVE' 995 'PENDING' 1000 'SUSPENDED'
Large jump to 950 indicates ‘ACTIVE’ is very common (95% of rows).
Troubleshooting Guide
Section titled “Troubleshooting Guide”No Histogram Data
Section titled “No Histogram Data”- Check Statistics: Verify that table statistics have been gathered
- Method Parameter: Histograms require
METHOD_OPT
to include histogram collection - Column Selection: Oracle may not create histograms for all columns
Gathering Histograms
Section titled “Gathering Histograms”-- Gather statistics with histogramsEXEC DBMS_STATS.GATHER_TABLE_STATS( ownname => 'SALES', tabname => 'CUSTOMERS', method_opt => 'FOR ALL COLUMNS SIZE AUTO');
Histogram Quality Issues
Section titled “Histogram Quality Issues”- Too Few Buckets: May not capture data skew accurately
- Outdated Statistics: Histograms may not reflect current data distribution
- Wrong Histogram Type: Height-balanced vs. frequency choice impacts accuracy
Advanced Analysis
Section titled “Advanced Analysis”Popular Value Detection
Section titled “Popular Value Detection”Look for large jumps in ENDPOINT# values:
-- Calculate frequency of each valueSELECT endpoint_actual_value, endpoint_number - LAG(endpoint_number, 1, 0) OVER (ORDER BY endpoint_number) as frequencyFROM user_histogramsWHERE table_name = 'CUSTOMERS' AND column_name = 'STATUS';
Selectivity Calculation
Section titled “Selectivity Calculation”- Uniform Values:
1 / num_distinct
- Popular Values:
(endpoint# - previous_endpoint#) / total_rows
- Density: Overall average selectivity
Related Scripts
Section titled “Related Scripts”- dtabcol.sql - Column statistics analysis
- dindstats.sql - Index statistics with histograms
- dtable.sql - General table analysis
Script Features
Section titled “Script Features”- Hierarchical Display: Groups output by table and column for easy reading
- Flexible Filtering: Supports wildcard patterns for targeted analysis
- Comprehensive Statistics: Shows both column metadata and histogram details
- Formatted Output: Clean display of numeric and text values
Best Practices
Section titled “Best Practices”- Regular Review: Monitor histogram accuracy after significant data changes
- Selective Collection: Only gather histograms for columns with known skew
- Performance Testing: Validate that histogram collection improves query performance
- Documentation: Track which columns benefit from histogram statistics
Limitations
Section titled “Limitations”- Requires current table statistics for accurate results
- Histogram quality depends on statistics gathering parameters
- May not reflect real-time data distribution changes
- Limited to columns where histograms have been explicitly collected