Column Cardinality Analyzer (qcoluniq.sql)
Column Cardinality Analyzer
Section titled “Column Cardinality Analyzer”This script analyzes column cardinality by comparing total row count with distinct value count, essential for index design decisions and query optimization strategies.
Script: qcoluniq.sql
Section titled “Script: qcoluniq.sql”rem qcoluniq.sqlremttitle 'Distinct Column Values'remselect count(*) row_count, count( distinct &column ) distinct_values from &table;
What This Script Does
Section titled “What This Script Does”This script provides column cardinality analysis by:
- Total Row Count: Shows the complete number of rows in the table
- Distinct Value Count: Counts unique values in the specified column
- Cardinality Assessment: Helps determine column selectivity
- Index Evaluation: Assists in deciding index usefulness
- Interactive Analysis: Prompts for table and column names
-- Run the script and enter table and column when prompted@qcoluniq.sqlEnter value for column: customer_idEnter value for table: orders
-- Or specify values directly in SQL*Plus@qcoluniq.sql-- When prompted:-- &column = status-- &table = customer_orders
-- For direct usage in scripts:SELECT count(*) row_count, count(distinct status) distinct_valuesFROM customer_orders;
Required Privileges
Section titled “Required Privileges”- SELECT privilege on the target table
- DESCRIBE privilege helpful for column verification
Sample Output
Section titled “Sample Output” Distinct Column Values
ROW_COUNT DISTINCT_VALUES--------- --------------- 1500000 1247
-- High Cardinality Example (customer_id):ROW_COUNT DISTINCT_VALUES--------- --------------- 1500000 1499823
-- Low Cardinality Example (status):ROW_COUNT DISTINCT_VALUES--------- --------------- 1500000 4
-- Medium Cardinality Example (state):ROW_COUNT DISTINCT_VALUES--------- --------------- 1500000 50
Understanding Cardinality Results
Section titled “Understanding Cardinality Results”High Cardinality (Good for Indexing)
Section titled “High Cardinality (Good for Indexing)”- Distinct Values: Close to total row count
- Selectivity: > 90% unique values
- Index Benefit: Excellent candidate for B-tree indexes
- Examples: Primary keys, customer IDs, order numbers
Medium Cardinality (Moderate Indexing Value)
Section titled “Medium Cardinality (Moderate Indexing Value)”- Distinct Values: 10-90% of total rows
- Selectivity: Moderate selectivity
- Index Benefit: May benefit from indexing depending on query patterns
- Examples: States, categories, product types
Low Cardinality (Poor for Traditional Indexing)
Section titled “Low Cardinality (Poor for Traditional Indexing)”- Distinct Values: < 10% of total rows
- Selectivity: Very low selectivity
- Index Benefit: Traditional B-tree indexes not effective
- Examples: Status flags, boolean columns, gender
Cardinality Analysis Examples
Section titled “Cardinality Analysis Examples”1. Primary Key Analysis
Section titled “1. Primary Key Analysis”-- Should show near 100% cardinalitySELECT count(*) row_count, count(distinct customer_id) distinct_values, ROUND((count(distinct customer_id) / count(*)) * 100, 2) as cardinality_pctFROM customers;
2. Status Column Analysis
Section titled “2. Status Column Analysis”-- Typically shows low cardinalitySELECT count(*) row_count, count(distinct status) distinct_values, ROUND((count(distinct status) / count(*)) * 100, 2) as cardinality_pctFROM orders;
-- Show actual distinct valuesSELECT status, count(*)FROM ordersGROUP BY statusORDER BY count(*) DESC;
3. Date Column Analysis
Section titled “3. Date Column Analysis”-- Check cardinality of date columnsSELECT count(*) row_count, count(distinct TRUNC(order_date)) distinct_dates, count(distinct TO_CHAR(order_date, 'YYYY-MM')) distinct_monthsFROM orders;
4. Composite Column Analysis
Section titled “4. Composite Column Analysis”-- Analyze combination cardinalitySELECT count(*) row_count, count(distinct state) distinct_states, count(distinct city) distinct_cities, count(distinct state||city) distinct_state_city_combinationsFROM customers;
Index Design Decisions
Section titled “Index Design Decisions”High Cardinality Columns (Good for Indexes)
Section titled “High Cardinality Columns (Good for Indexes)”-- Columns with > 80% cardinalitySELECT 'customer_id' as column_name, count(*) as total_rows, count(distinct customer_id) as distinct_values, ROUND((count(distinct customer_id) / count(*)) * 100, 2) as cardinality_pct, CASE WHEN (count(distinct customer_id) / count(*)) > 0.8 THEN 'Excellent for B-tree index' WHEN (count(distinct customer_id) / count(*)) > 0.4 THEN 'Good for indexing' WHEN (count(distinct customer_id) / count(*)) > 0.1 THEN 'Consider composite index' ELSE 'Poor for traditional indexing' END as index_recommendationFROM orders;
Low Cardinality Optimization
Section titled “Low Cardinality Optimization”-- For low cardinality columns, show value distributionSELECT status, count(*) as row_count, ROUND((count(*) / SUM(count(*)) OVER()) * 100, 2) as percentageFROM ordersGROUP BY statusORDER BY count(*) DESC;
Performance Impact Assessment
Section titled “Performance Impact Assessment”1. Query Selectivity Analysis
Section titled “1. Query Selectivity Analysis”-- Estimate query selectivity for different column valuesWITH column_stats AS ( SELECT count(*) as total_rows, count(distinct status) as distinct_status_values FROM orders)SELECT o.status, count(*) as matching_rows, cs.total_rows, ROUND((count(*) / cs.total_rows) * 100, 2) as selectivity_pct, CASE WHEN (count(*) / cs.total_rows) < 0.05 THEN 'Highly selective - good for index' WHEN (count(*) / cs.total_rows) < 0.20 THEN 'Moderately selective' ELSE 'Poor selectivity - consider other options' END as index_effectivenessFROM orders o, column_stats csGROUP BY o.status, cs.total_rows, cs.distinct_status_valuesORDER BY count(*);
2. Bitmap Index Evaluation
Section titled “2. Bitmap Index Evaluation”-- For low cardinality columns, evaluate bitmap index potentialSELECT 'status' as column_name, count(*) as total_rows, count(distinct status) as distinct_values, ROUND(count(*) / count(distinct status), 0) as avg_rows_per_value, CASE WHEN count(distinct status) < 100 THEN 'Good candidate for bitmap index' WHEN count(distinct status) < 1000 THEN 'Consider bitmap index' ELSE 'B-tree index more appropriate' END as bitmap_recommendationFROM orders;
Advanced Cardinality Analysis
Section titled “Advanced Cardinality Analysis”1. Null Value Impact
Section titled “1. Null Value Impact”-- Include null analysis in cardinality assessmentSELECT count(*) as total_rows, count(middle_name) as non_null_values, count(distinct middle_name) as distinct_non_null_values, count(*) - count(middle_name) as null_values, ROUND((count(*) - count(middle_name)) / count(*) * 100, 2) as null_percentageFROM customers;
2. Cardinality by Data Ranges
Section titled “2. Cardinality by Data Ranges”-- Analyze cardinality within data rangesSELECT CASE WHEN order_date >= SYSDATE - 30 THEN 'Last 30 days' WHEN order_date >= SYSDATE - 90 THEN 'Last 90 days' WHEN order_date >= SYSDATE - 365 THEN 'Last year' ELSE 'Older than 1 year' END as date_range, count(*) as row_count, count(distinct customer_id) as distinct_customers, ROUND((count(distinct customer_id) / count(*)) * 100, 2) as cardinality_pctFROM ordersGROUP BY CASE WHEN order_date >= SYSDATE - 30 THEN 'Last 30 days' WHEN order_date >= SYSDATE - 90 THEN 'Last 90 days' WHEN order_date >= SYSDATE - 365 THEN 'Last year' ELSE 'Older than 1 year' ENDORDER BY MIN(order_date) DESC;
3. Column Correlation Analysis
Section titled “3. Column Correlation Analysis”-- Check if multiple columns together provide better cardinalitySELECT count(*) as total_rows, count(distinct city) as distinct_cities, count(distinct state) as distinct_states, count(distinct city || state) as distinct_city_state, count(distinct postal_code) as distinct_postal_codes, ROUND((count(distinct city || state) / count(*)) * 100, 2) as combined_cardinality_pctFROM customers;
Index Strategy Recommendations
Section titled “Index Strategy Recommendations”Create Analysis Template
Section titled “Create Analysis Template”-- Template for comprehensive cardinality analysisSELECT '&column_name' as column_analyzed, count(*) as total_rows, count(&column_name) as non_null_count, count(distinct &column_name) as distinct_values, count(*) - count(&column_name) as null_count, ROUND((count(distinct &column_name) / count(&column_name)) * 100, 2) as cardinality_pct, ROUND((count(*) - count(&column_name)) / count(*) * 100, 2) as null_pct, CASE WHEN count(distinct &column_name) / count(&column_name) > 0.8 THEN 'High cardinality - excellent for B-tree index' WHEN count(distinct &column_name) / count(&column_name) > 0.4 THEN 'Medium cardinality - good for indexing' WHEN count(distinct &column_name) / count(&column_name) > 0.1 THEN 'Low-medium cardinality - consider composite index' WHEN count(distinct &column_name) < 100 THEN 'Low cardinality - consider bitmap index' ELSE 'Very low cardinality - indexing may not be beneficial' END as index_recommendationFROM &table_name;
Common Use Cases
Section titled “Common Use Cases”1. Pre-Index Analysis
Section titled “1. Pre-Index Analysis”-- Before creating indexes, analyze all potential columnsSELECT 'Run cardinality analysis on: ' || column_name as analysis_neededFROM user_tab_columnsWHERE table_name = 'ORDERS' AND data_type IN ('VARCHAR2', 'NUMBER', 'DATE')ORDER BY column_name;
2. Query Optimization
Section titled “2. Query Optimization”Use cardinality results to:
- Choose optimal WHERE clause predicates
- Design composite indexes with high-cardinality leading columns
- Identify candidates for function-based indexes
3. Partition Key Selection
Section titled “3. Partition Key Selection”-- Analyze potential partition key cardinalitySELECT TO_CHAR(order_date, 'YYYY-MM') as partition_key, count(*) as rows_per_partition, MIN(order_date) as partition_start, MAX(order_date) as partition_endFROM ordersGROUP BY TO_CHAR(order_date, 'YYYY-MM')ORDER BY partition_key;
Integration with Other Scripts
Section titled “Integration with Other Scripts”This script works well with:
- Index Analysis (dindex.sql) - For existing index evaluation
- Table Statistics (dstattab.sql) - For comprehensive table analysis
- Column Statistics (dtabcol.sql) - For detailed column information
- Performance Analysis (explain-sqlid.sql) - For execution plan analysis
Best Practices
Section titled “Best Practices”- Before Index Creation: Always analyze column cardinality first
- Regular Assessment: Re-evaluate cardinality as data grows
- Composite Analysis: Consider column combinations for composite indexes
- Query Pattern Alignment: Match index design to actual query patterns
- Performance Testing: Test index performance with realistic data volumes
This simple but powerful script forms the foundation for intelligent index design and query optimization decisions.