Histogram Analysis (dhstgrmv.sql)

What This Script Does

This script provides Oracle database administration functionality via the dhstgrmv.sql script.

The Script

rem dhstgrm.sql
rem
set linesize 132
rem
ttitle 'Histograms'
col OWNER format a10 heading 'OWNER'
col TABLE_NAME format a20 heading 'TABLE NAME'
col COLUMN_NAME format a20 heading 'COLUMN NAME'
col DATA_TYPE format a9 heading 'DATA TYPE'
col nullable format a1 heading 'NULL?'
col num_distinct format 9,999,999 heading 'NUM|DISTINCT'
col low_value  heading 'LOW VALUE'
col high_value heading 'HIGH VALUE'
col density format 0.9999 heading 'DENSITY'
col num_nulls format 99,999,999 heading 'NUM|NULLS'
col num_buckets format 999 heading 'BUCKETS'
col ENDPOINT_NUMBER format 9,999,999 heading 'ENDPOINT#'
rem col ENDPOINT_VALUE format 999999999999999999 heading 'ENDPOINT VALUE'
col ENDPOINT_VALUE heading 'ENDPOINT VALUE'
col ENDPOINT_actual_VALUE format a10 trunc heading 'ENDPOINT|ACTUAL|VALUE'
rem
clear break
break on owner on table_name on column_name -
 on DATA_TYPE -
 on nullable -
 on num_distinct -
 on low_value -
 on high_value -
 on density -
 on num_nulls -
 on num_buckets skip 1
rem
select
 c.OWNER,
 c.TABLE_NAME,
 c.COLUMN_NAME,
 c.DATA_TYPE,
 decode(c.NULLABLE, 'Y', null, c.NULLABLE) nullable,
 c.num_distinct,
-- c.low_value,
-- c.high_value,
 c.density,
 c.num_nulls,
 c.num_buckets,
 h.ENDPOINT_NUMBER,
 h.ENDPOINT_actual_VALUE
from dba_histograms h, dba_tab_columns c
where h.owner like nvl(upper('&owner'),'%')
  and h.table_name like nvl(upper('&table'),'%')
  and h.column_name like nvl(upper('&column'),'%')
  and c.owner=h.owner
  and c.table_name=h.table_name
  and c.column_name=h.column_name
order by
 c.OWNER,
 c.TABLE_NAME,
 c.COLUMN_ID,
 h.ENDPOINT_NUMBER
/
rem
set linesize 80

What This Script Does

The dhstgrmv.sql script analyzes column histogram statistics in Oracle databases. It provides detailed information about data distribution patterns, including histogram buckets, endpoint values, and column statistics. This is essential for understanding query optimizer behavior and identifying columns with skewed data distributions.

Usage

Run the script and provide input when prompted:

SQL> @dhstgrmv.sql
Enter value for owner: SALES
Enter value for table: CUSTOMERS
Enter value for column: STATUS

Parameters:

owner: Schema owner name (supports wildcard % matching)
table: Table name pattern (supports wildcard % matching)
column: Column name pattern (supports wildcard % matching)

Required Privileges

SELECT privilege on DBA_HISTOGRAMS and DBA_TAB_COLUMNS (usually requires DBA role)
For non-DBA users, modify script to use USER_HISTOGRAMS and USER_TAB_COLUMNS

Sample Output

Histograms

OWNER      TABLE NAME           COLUMN NAME          DATA TYPE NULL? NUM DISTINCT DENSITY NUM NULLS BUCKETS ENDPOINT# ENDPOINT ACTUAL VALUE
---------- -------------------- -------------------- --------- ----- ------------ ------- --------- ------- --------- --------------------
SALES      CUSTOMERS            STATUS               VARCHAR2            4        0.2500       0      4         1 ACTIVE
                                                                                                                    2 INACTIVE
                                                                                                                  875 PENDING
                                                                                                                 1000 SUSPENDED

SALES      CUSTOMERS            REGION_ID            NUMBER              25       0.0400       0     25         1 1
                                                                                                                    2 2
                                                                                                                   50 3
                                                                                                                  150 4
                                                                                                                  300 5
                                                                                                                  ... (more buckets)

Key Output Columns

OWNER: Schema that owns the table
TABLE NAME: Name of the table containing the column
COLUMN NAME: Name of the column with histogram data
DATA TYPE: Data type of the column
NULL?: Whether the column allows null values
NUM DISTINCT: Number of distinct values in the column
DENSITY: Selectivity estimate (1/num_distinct for uniform distribution)
NUM NULLS: Number of null values in the column
BUCKETS: Number of histogram buckets
ENDPOINT#: Cumulative frequency count up to this bucket
ENDPOINT ACTUAL VALUE: The actual data value at this endpoint

Understanding Histogram Types

Height-Balanced Histograms

Buckets > 1: Multiple buckets indicate height-balanced histogram
Popular Values: Buckets with large endpoint number jumps
Uniform Distribution: Even endpoint number increments

Frequency Histograms

Buckets = Distinct Values: Each distinct value gets its own bucket
Skewed Data: Large differences in endpoint numbers reveal data skew

Performance Optimization Insights

Data Skew Analysis

High Endpoint Jumps: Indicate very popular values
Low Density: Suggests good selectivity for equality predicates
Many Buckets: Complex data distribution requiring detailed statistics

Query Optimizer Impact

Cardinality Estimation: Histograms improve cost-based optimizer estimates
Index Usage: Skewed columns may benefit from conditional indexing
Partition Pruning: Histogram data helps with partition elimination

Common Use Cases

Query Optimization Analysis
- Identify columns with skewed data distributions
- Understand why certain execution plans are chosen
Index Design Decisions
- Determine if columns benefit from indexing based on selectivity
- Identify candidates for function-based indexes
Statistics Quality Assessment
- Verify that statistics accurately represent data distribution
- Identify columns needing more detailed histogram analysis
Performance Troubleshooting
- Investigate why queries perform differently than expected
- Analyze cost-based optimizer decision factors

Interpreting Results

Uniform Distribution

ENDPOINT#    ENDPOINT VALUE
---------    --------------
      200    100
      400    200
      600    300
      800    400
     1000    500

Even increments indicate uniform data distribution.

Skewed Distribution

ENDPOINT#    ENDPOINT VALUE
---------    --------------
      950    'ACTIVE'
      975    'INACTIVE'
      995    'PENDING'
     1000    'SUSPENDED'

Large jump to 950 indicates ‘ACTIVE’ is very common (95% of rows).

Troubleshooting Guide

No Histogram Data

Check Statistics: Verify that table statistics have been gathered
Method Parameter: Histograms require METHOD_OPT to include histogram collection
Column Selection: Oracle may not create histograms for all columns

Gathering Histograms

-- Gather statistics with histograms
EXEC DBMS_STATS.GATHER_TABLE_STATS(
  ownname => 'SALES',
  tabname => 'CUSTOMERS',
  method_opt => 'FOR ALL COLUMNS SIZE AUTO'
);

Histogram Quality Issues

Too Few Buckets: May not capture data skew accurately
Outdated Statistics: Histograms may not reflect current data distribution
Wrong Histogram Type: Height-balanced vs. frequency choice impacts accuracy

Advanced Analysis

Popular Value Detection

Look for large jumps in ENDPOINT# values:

-- Calculate frequency of each value
SELECT endpoint_actual_value,
       endpoint_number - LAG(endpoint_number, 1, 0)
       OVER (ORDER BY endpoint_number) as frequency
FROM user_histograms
WHERE table_name = 'CUSTOMERS' AND column_name = 'STATUS';

Selectivity Calculation

Uniform Values: 1 / num_distinct
Popular Values: (endpoint# - previous_endpoint#) / total_rows
Density: Overall average selectivity

dtabcol.sql - Column statistics analysis
dindstats.sql - Index statistics with histograms
dtable.sql - General table analysis

Script Features

Hierarchical Display: Groups output by table and column for easy reading
Flexible Filtering: Supports wildcard patterns for targeted analysis
Comprehensive Statistics: Shows both column metadata and histogram details
Formatted Output: Clean display of numeric and text values

Best Practices

Regular Review: Monitor histogram accuracy after significant data changes
Selective Collection: Only gather histograms for columns with known skew
Performance Testing: Validate that histogram collection improves query performance
Documentation: Track which columns benefit from histogram statistics

Limitations

Requires current table statistics for accurate results
Histogram quality depends on statistics gathering parameters
May not reflect real-time data distribution changes
Limited to columns where histograms have been explicitly collected

Histogram Analysis (dhstgrmv.sql)

What This Script Does

The Script

What This Script Does

Usage

Required Privileges

Sample Output

Key Output Columns

Understanding Histogram Types

Height-Balanced Histograms

Frequency Histograms

Performance Optimization Insights

Data Skew Analysis

Query Optimizer Impact

Common Use Cases

Interpreting Results

Uniform Distribution

Skewed Distribution

Troubleshooting Guide

No Histogram Data

Gathering Histograms

Histogram Quality Issues

Advanced Analysis

Popular Value Detection

Selectivity Calculation

Related Scripts

Script Features

Best Practices

Limitations