Skip to content

Histogram Analysis (dhstgrmv.sql)

This script provides Oracle database administration functionality via the dhstgrmv.sql script.

rem dhstgrm.sql
rem
set linesize 132
rem
ttitle 'Histograms'
col OWNER format a10 heading 'OWNER'
col TABLE_NAME format a20 heading 'TABLE NAME'
col COLUMN_NAME format a20 heading 'COLUMN NAME'
col DATA_TYPE format a9 heading 'DATA TYPE'
col nullable format a1 heading 'NULL?'
col num_distinct format 9,999,999 heading 'NUM|DISTINCT'
col low_value heading 'LOW VALUE'
col high_value heading 'HIGH VALUE'
col density format 0.9999 heading 'DENSITY'
col num_nulls format 99,999,999 heading 'NUM|NULLS'
col num_buckets format 999 heading 'BUCKETS'
col ENDPOINT_NUMBER format 9,999,999 heading 'ENDPOINT#'
rem col ENDPOINT_VALUE format 999999999999999999 heading 'ENDPOINT VALUE'
col ENDPOINT_VALUE heading 'ENDPOINT VALUE'
col ENDPOINT_actual_VALUE format a10 trunc heading 'ENDPOINT|ACTUAL|VALUE'
rem
clear break
break on owner on table_name on column_name -
on DATA_TYPE -
on nullable -
on num_distinct -
on low_value -
on high_value -
on density -
on num_nulls -
on num_buckets skip 1
rem
select
c.OWNER,
c.TABLE_NAME,
c.COLUMN_NAME,
c.DATA_TYPE,
decode(c.NULLABLE, 'Y', null, c.NULLABLE) nullable,
c.num_distinct,
-- c.low_value,
-- c.high_value,
c.density,
c.num_nulls,
c.num_buckets,
h.ENDPOINT_NUMBER,
h.ENDPOINT_actual_VALUE
from dba_histograms h, dba_tab_columns c
where h.owner like nvl(upper('&owner'),'%')
and h.table_name like nvl(upper('&table'),'%')
and h.column_name like nvl(upper('&column'),'%')
and c.owner=h.owner
and c.table_name=h.table_name
and c.column_name=h.column_name
order by
c.OWNER,
c.TABLE_NAME,
c.COLUMN_ID,
h.ENDPOINT_NUMBER
/
rem
set linesize 80

The dhstgrmv.sql script analyzes column histogram statistics in Oracle databases. It provides detailed information about data distribution patterns, including histogram buckets, endpoint values, and column statistics. This is essential for understanding query optimizer behavior and identifying columns with skewed data distributions.

Run the script and provide input when prompted:

SQL> @dhstgrmv.sql
Enter value for owner: SALES
Enter value for table: CUSTOMERS
Enter value for column: STATUS

Parameters:

  • owner: Schema owner name (supports wildcard % matching)
  • table: Table name pattern (supports wildcard % matching)
  • column: Column name pattern (supports wildcard % matching)
  • SELECT privilege on DBA_HISTOGRAMS and DBA_TAB_COLUMNS (usually requires DBA role)
  • For non-DBA users, modify script to use USER_HISTOGRAMS and USER_TAB_COLUMNS
Histograms
OWNER TABLE NAME COLUMN NAME DATA TYPE NULL? NUM DISTINCT DENSITY NUM NULLS BUCKETS ENDPOINT# ENDPOINT ACTUAL VALUE
---------- -------------------- -------------------- --------- ----- ------------ ------- --------- ------- --------- --------------------
SALES CUSTOMERS STATUS VARCHAR2 4 0.2500 0 4 1 ACTIVE
2 INACTIVE
875 PENDING
1000 SUSPENDED
SALES CUSTOMERS REGION_ID NUMBER 25 0.0400 0 25 1 1
2 2
50 3
150 4
300 5
... (more buckets)
  • OWNER: Schema that owns the table
  • TABLE NAME: Name of the table containing the column
  • COLUMN NAME: Name of the column with histogram data
  • DATA TYPE: Data type of the column
  • NULL?: Whether the column allows null values
  • NUM DISTINCT: Number of distinct values in the column
  • DENSITY: Selectivity estimate (1/num_distinct for uniform distribution)
  • NUM NULLS: Number of null values in the column
  • BUCKETS: Number of histogram buckets
  • ENDPOINT#: Cumulative frequency count up to this bucket
  • ENDPOINT ACTUAL VALUE: The actual data value at this endpoint
  • Buckets > 1: Multiple buckets indicate height-balanced histogram
  • Popular Values: Buckets with large endpoint number jumps
  • Uniform Distribution: Even endpoint number increments
  • Buckets = Distinct Values: Each distinct value gets its own bucket
  • Skewed Data: Large differences in endpoint numbers reveal data skew
  • High Endpoint Jumps: Indicate very popular values
  • Low Density: Suggests good selectivity for equality predicates
  • Many Buckets: Complex data distribution requiring detailed statistics
  • Cardinality Estimation: Histograms improve cost-based optimizer estimates
  • Index Usage: Skewed columns may benefit from conditional indexing
  • Partition Pruning: Histogram data helps with partition elimination
  1. Query Optimization Analysis

    • Identify columns with skewed data distributions
    • Understand why certain execution plans are chosen
  2. Index Design Decisions

    • Determine if columns benefit from indexing based on selectivity
    • Identify candidates for function-based indexes
  3. Statistics Quality Assessment

    • Verify that statistics accurately represent data distribution
    • Identify columns needing more detailed histogram analysis
  4. Performance Troubleshooting

    • Investigate why queries perform differently than expected
    • Analyze cost-based optimizer decision factors
ENDPOINT# ENDPOINT VALUE
--------- --------------
200 100
400 200
600 300
800 400
1000 500

Even increments indicate uniform data distribution.

ENDPOINT# ENDPOINT VALUE
--------- --------------
950 'ACTIVE'
975 'INACTIVE'
995 'PENDING'
1000 'SUSPENDED'

Large jump to 950 indicates ‘ACTIVE’ is very common (95% of rows).

  • Check Statistics: Verify that table statistics have been gathered
  • Method Parameter: Histograms require METHOD_OPT to include histogram collection
  • Column Selection: Oracle may not create histograms for all columns
-- Gather statistics with histograms
EXEC DBMS_STATS.GATHER_TABLE_STATS(
ownname => 'SALES',
tabname => 'CUSTOMERS',
method_opt => 'FOR ALL COLUMNS SIZE AUTO'
);
  • Too Few Buckets: May not capture data skew accurately
  • Outdated Statistics: Histograms may not reflect current data distribution
  • Wrong Histogram Type: Height-balanced vs. frequency choice impacts accuracy

Look for large jumps in ENDPOINT# values:

-- Calculate frequency of each value
SELECT endpoint_actual_value,
endpoint_number - LAG(endpoint_number, 1, 0)
OVER (ORDER BY endpoint_number) as frequency
FROM user_histograms
WHERE table_name = 'CUSTOMERS' AND column_name = 'STATUS';
  • Uniform Values: 1 / num_distinct
  • Popular Values: (endpoint# - previous_endpoint#) / total_rows
  • Density: Overall average selectivity
  • Hierarchical Display: Groups output by table and column for easy reading
  • Flexible Filtering: Supports wildcard patterns for targeted analysis
  • Comprehensive Statistics: Shows both column metadata and histogram details
  • Formatted Output: Clean display of numeric and text values
  1. Regular Review: Monitor histogram accuracy after significant data changes
  2. Selective Collection: Only gather histograms for columns with known skew
  3. Performance Testing: Validate that histogram collection improves query performance
  4. Documentation: Track which columns benefit from histogram statistics
  • Requires current table statistics for accurate results
  • Histogram quality depends on statistics gathering parameters
  • May not reflect real-time data distribution changes
  • Limited to columns where histograms have been explicitly collected