Skip to content

Column Cardinality Analyzer (qcoluniq.sql)

This script analyzes column cardinality by comparing total row count with distinct value count, essential for index design decisions and query optimization strategies.

rem qcoluniq.sql
rem
ttitle 'Distinct Column Values'
rem
select count(*) row_count,
count( distinct &column ) distinct_values
from &table;

This script provides column cardinality analysis by:

  • Total Row Count: Shows the complete number of rows in the table
  • Distinct Value Count: Counts unique values in the specified column
  • Cardinality Assessment: Helps determine column selectivity
  • Index Evaluation: Assists in deciding index usefulness
  • Interactive Analysis: Prompts for table and column names
-- Run the script and enter table and column when prompted
@qcoluniq.sql
Enter value for column: customer_id
Enter value for table: orders
-- Or specify values directly in SQL*Plus
@qcoluniq.sql
-- When prompted:
-- &column = status
-- &table = customer_orders
-- For direct usage in scripts:
SELECT count(*) row_count,
count(distinct status) distinct_values
FROM customer_orders;
  • SELECT privilege on the target table
  • DESCRIBE privilege helpful for column verification
Distinct Column Values
ROW_COUNT DISTINCT_VALUES
--------- ---------------
1500000 1247
-- High Cardinality Example (customer_id):
ROW_COUNT DISTINCT_VALUES
--------- ---------------
1500000 1499823
-- Low Cardinality Example (status):
ROW_COUNT DISTINCT_VALUES
--------- ---------------
1500000 4
-- Medium Cardinality Example (state):
ROW_COUNT DISTINCT_VALUES
--------- ---------------
1500000 50
  • Distinct Values: Close to total row count
  • Selectivity: > 90% unique values
  • Index Benefit: Excellent candidate for B-tree indexes
  • Examples: Primary keys, customer IDs, order numbers

Medium Cardinality (Moderate Indexing Value)

Section titled “Medium Cardinality (Moderate Indexing Value)”
  • Distinct Values: 10-90% of total rows
  • Selectivity: Moderate selectivity
  • Index Benefit: May benefit from indexing depending on query patterns
  • Examples: States, categories, product types

Low Cardinality (Poor for Traditional Indexing)

Section titled “Low Cardinality (Poor for Traditional Indexing)”
  • Distinct Values: < 10% of total rows
  • Selectivity: Very low selectivity
  • Index Benefit: Traditional B-tree indexes not effective
  • Examples: Status flags, boolean columns, gender
-- Should show near 100% cardinality
SELECT count(*) row_count,
count(distinct customer_id) distinct_values,
ROUND((count(distinct customer_id) / count(*)) * 100, 2) as cardinality_pct
FROM customers;
-- Typically shows low cardinality
SELECT count(*) row_count,
count(distinct status) distinct_values,
ROUND((count(distinct status) / count(*)) * 100, 2) as cardinality_pct
FROM orders;
-- Show actual distinct values
SELECT status, count(*)
FROM orders
GROUP BY status
ORDER BY count(*) DESC;
-- Check cardinality of date columns
SELECT count(*) row_count,
count(distinct TRUNC(order_date)) distinct_dates,
count(distinct TO_CHAR(order_date, 'YYYY-MM')) distinct_months
FROM orders;
-- Analyze combination cardinality
SELECT count(*) row_count,
count(distinct state) distinct_states,
count(distinct city) distinct_cities,
count(distinct state||city) distinct_state_city_combinations
FROM customers;

High Cardinality Columns (Good for Indexes)

Section titled “High Cardinality Columns (Good for Indexes)”
-- Columns with > 80% cardinality
SELECT
'customer_id' as column_name,
count(*) as total_rows,
count(distinct customer_id) as distinct_values,
ROUND((count(distinct customer_id) / count(*)) * 100, 2) as cardinality_pct,
CASE
WHEN (count(distinct customer_id) / count(*)) > 0.8 THEN 'Excellent for B-tree index'
WHEN (count(distinct customer_id) / count(*)) > 0.4 THEN 'Good for indexing'
WHEN (count(distinct customer_id) / count(*)) > 0.1 THEN 'Consider composite index'
ELSE 'Poor for traditional indexing'
END as index_recommendation
FROM orders;
-- For low cardinality columns, show value distribution
SELECT
status,
count(*) as row_count,
ROUND((count(*) / SUM(count(*)) OVER()) * 100, 2) as percentage
FROM orders
GROUP BY status
ORDER BY count(*) DESC;
-- Estimate query selectivity for different column values
WITH column_stats AS (
SELECT
count(*) as total_rows,
count(distinct status) as distinct_status_values
FROM orders
)
SELECT
o.status,
count(*) as matching_rows,
cs.total_rows,
ROUND((count(*) / cs.total_rows) * 100, 2) as selectivity_pct,
CASE
WHEN (count(*) / cs.total_rows) < 0.05 THEN 'Highly selective - good for index'
WHEN (count(*) / cs.total_rows) < 0.20 THEN 'Moderately selective'
ELSE 'Poor selectivity - consider other options'
END as index_effectiveness
FROM orders o, column_stats cs
GROUP BY o.status, cs.total_rows, cs.distinct_status_values
ORDER BY count(*);
-- For low cardinality columns, evaluate bitmap index potential
SELECT
'status' as column_name,
count(*) as total_rows,
count(distinct status) as distinct_values,
ROUND(count(*) / count(distinct status), 0) as avg_rows_per_value,
CASE
WHEN count(distinct status) < 100 THEN 'Good candidate for bitmap index'
WHEN count(distinct status) < 1000 THEN 'Consider bitmap index'
ELSE 'B-tree index more appropriate'
END as bitmap_recommendation
FROM orders;
-- Include null analysis in cardinality assessment
SELECT
count(*) as total_rows,
count(middle_name) as non_null_values,
count(distinct middle_name) as distinct_non_null_values,
count(*) - count(middle_name) as null_values,
ROUND((count(*) - count(middle_name)) / count(*) * 100, 2) as null_percentage
FROM customers;
-- Analyze cardinality within data ranges
SELECT
CASE
WHEN order_date >= SYSDATE - 30 THEN 'Last 30 days'
WHEN order_date >= SYSDATE - 90 THEN 'Last 90 days'
WHEN order_date >= SYSDATE - 365 THEN 'Last year'
ELSE 'Older than 1 year'
END as date_range,
count(*) as row_count,
count(distinct customer_id) as distinct_customers,
ROUND((count(distinct customer_id) / count(*)) * 100, 2) as cardinality_pct
FROM orders
GROUP BY
CASE
WHEN order_date >= SYSDATE - 30 THEN 'Last 30 days'
WHEN order_date >= SYSDATE - 90 THEN 'Last 90 days'
WHEN order_date >= SYSDATE - 365 THEN 'Last year'
ELSE 'Older than 1 year'
END
ORDER BY MIN(order_date) DESC;
-- Check if multiple columns together provide better cardinality
SELECT
count(*) as total_rows,
count(distinct city) as distinct_cities,
count(distinct state) as distinct_states,
count(distinct city || state) as distinct_city_state,
count(distinct postal_code) as distinct_postal_codes,
ROUND((count(distinct city || state) / count(*)) * 100, 2) as combined_cardinality_pct
FROM customers;
-- Template for comprehensive cardinality analysis
SELECT
'&column_name' as column_analyzed,
count(*) as total_rows,
count(&column_name) as non_null_count,
count(distinct &column_name) as distinct_values,
count(*) - count(&column_name) as null_count,
ROUND((count(distinct &column_name) / count(&column_name)) * 100, 2) as cardinality_pct,
ROUND((count(*) - count(&column_name)) / count(*) * 100, 2) as null_pct,
CASE
WHEN count(distinct &column_name) / count(&column_name) > 0.8 THEN 'High cardinality - excellent for B-tree index'
WHEN count(distinct &column_name) / count(&column_name) > 0.4 THEN 'Medium cardinality - good for indexing'
WHEN count(distinct &column_name) / count(&column_name) > 0.1 THEN 'Low-medium cardinality - consider composite index'
WHEN count(distinct &column_name) < 100 THEN 'Low cardinality - consider bitmap index'
ELSE 'Very low cardinality - indexing may not be beneficial'
END as index_recommendation
FROM &table_name;
-- Before creating indexes, analyze all potential columns
SELECT 'Run cardinality analysis on: ' || column_name as analysis_needed
FROM user_tab_columns
WHERE table_name = 'ORDERS'
AND data_type IN ('VARCHAR2', 'NUMBER', 'DATE')
ORDER BY column_name;

Use cardinality results to:

  • Choose optimal WHERE clause predicates
  • Design composite indexes with high-cardinality leading columns
  • Identify candidates for function-based indexes
-- Analyze potential partition key cardinality
SELECT
TO_CHAR(order_date, 'YYYY-MM') as partition_key,
count(*) as rows_per_partition,
MIN(order_date) as partition_start,
MAX(order_date) as partition_end
FROM orders
GROUP BY TO_CHAR(order_date, 'YYYY-MM')
ORDER BY partition_key;

This script works well with:

  1. Before Index Creation: Always analyze column cardinality first
  2. Regular Assessment: Re-evaluate cardinality as data grows
  3. Composite Analysis: Consider column combinations for composite indexes
  4. Query Pattern Alignment: Match index design to actual query patterns
  5. Performance Testing: Test index performance with realistic data volumes

This simple but powerful script forms the foundation for intelligent index design and query optimization decisions.