Low Cardinality Index Analysis (dlowcardind.sql)
What This Script Does
Section titled “What This Script Does”This script identifies problematic low cardinality indexes by:
- Finding indexes where the leading column has low selectivity
- Calculating filter factor (distinct values / total rows) for each index
- Filtering to show only indexes with < 15% selectivity and < 50 distinct values
- Focusing on single-column indexes (position 1) for analysis
- Highlighting potential candidates for index removal or redesign
Script
Section titled “Script”rem dindex.sqlremremremttitle 'Index Storage Parameters'remcol table_owner format a8 heading 'OWNER'col table_type format a1 heading 'T'col table_name format a20 heading 'TABLE NAME'col status format a1 heading 'S'col u format a1 heading 'U'col index_name format a18 heading 'INDEX NAME'col tablespace_name format a10 heading 'TABLESPACE'col pct_free format 999 heading 'PCT|FREE'col ini_trans format 99 heading 'INI|TRX'col initial_extent format 999999 heading 'INITIAL|EXTENT|(K)'col next_extent format 999999 heading 'NEXT|EXTENT|(K)'col pct_increase format 999 heading 'PCT|INC'col min_extents format 99 heading 'MIN|EXT'col max_extents format 999 heading 'MAX|EXT'col freelists format 999 heading 'FREE|LIST'col BLEVEL format 9999 head 'BLEVEL'col LEAF_BLOCKS format 999,999 heading 'LEAF|BOCKS'col DISTINCT_KEYS format 999,999,999 heading 'DISTINCT|KEYS'col AVG_LEAF_BLOCKS_PER_KEY format 9,999 heading 'AVG|LEAF|BLOCKS|PER|KEY'col AVG_DATA_BLOCKS_PER_KEY format 9,999,999 heading 'AVG|DATA|BLOCKS|PER|KEY'col CLUSTERING_FACTOR format 99,999,999 heading 'CLUSTERING|FACTOR'col NUM_ROWS format 999,999,999 heading 'NUM ROWS'col SAMPLE_SIZE format 999,999 heading 'SAMPLE'col LAST_ANALYZED heading 'LAST|ANALYZED'col FILTER_FACTOR format 9.999 HEADING 'FILTER|FACTOR'-- col DEGREE VARCHAR2(40)-- col INSTANCES VARCHAR2(40)-- col PARTITIONED VARCHAR2(3)-- col TEMPORARY VARCHAR2(1)-- col GENERATED VARCHAR2(1)-- col BUFFER_POOL VARCHAR2(7)rembreak on table_owner skip 1 on table_name on table_typeremselect i.table_owner, i.table_name, i.index_name, c.column_position, c.column_name, COL.NUM_DISTINCT, TAB.NUM_ROWS, (COL.NUM_DISTINCT / TAB.NUM_ROWS) FILTER_FACTOR from sys.dba_ind_columns c, sys.dba_indexes i, DBA_TAB_COLUMNS COL, DBA_TABLES TAB where i.table_owner like nvl( upper('&owner'), '%') and i.table_name like nvl(upper('&table'),'%') AND I.OWNER = TAB.OWNER AND I.TABLE_NAME = TAB.TABLE_NAME AND C.TABLE_OWNER = TAB.OWNER AND C.TABLE_NAME = TAB.TABLE_NAME and i.owner = c.index_owner AND I.owner = COL.OWNER AND I.TABLE_NAME = COL.TABLE_NAME AND C.COLUMN_NAME = COL.COLUMN_NAME and i.index_name = c.index_name AND c.column_position = 1 AND TAB.NUM_ROWS > 0 AND (COL.NUM_DISTINCT / TAB.NUM_ROWS) < 0.15 AND COL.NUM_DISTINCT < 50 order by i.table_owner, i.table_name, 2, i.index_name, c.column_position;rem
SQL> @dlowcardind.sqlEnter value for owner: HREnter value for table: %
Parameters
Section titled “Parameters”- owner: Schema owner (use % for all schemas or specific schema name)
- table: Table name pattern (use % for all tables or specific table name)
Required Privileges
Section titled “Required Privileges”- SELECT on SYS.DBA_IND_COLUMNS
- SELECT on SYS.DBA_INDEXES
- SELECT on SYS.DBA_TAB_COLUMNS
- SELECT on SYS.DBA_TABLES
Sample Output
Section titled “Sample Output”Index Storage Parameters
OWNER TABLE NAME INDEX NAME COLUMN_POSITION COLUMN_NAME NUM_DISTINCT NUM ROWS FILTER FACTOR-------- -------------------- ------------------ --------------- --------------- ------------- ---------- -------------HR EMPLOYEES EMP_STATUS_IDX 1 STATUS 3 107,000 0.000 EMPLOYEES EMP_GENDER_IDX 1 GENDER 2 107,000 0.000
SALES ORDERS ORD_PRIORITY_IDX 1 PRIORITY 5 45,000 0.000 ORDERS ORD_STATUS_IDX 1 ORDER_STATUS 8 45,000 0.000
INV PRODUCTS PROD_CATEGORY_IDX 1 CATEGORY 12 25,000 0.000 PRODUCTS PROD_ACTIVE_IDX 1 ACTIVE_FLAG 2 25,000 0.000
Key Output Columns
Section titled “Key Output Columns”- OWNER: Schema that owns the table and index
- TABLE NAME: Name of the table being indexed
- INDEX NAME: Name of the index with low cardinality
- COLUMN_POSITION: Position in index (script focuses on position 1)
- COLUMN_NAME: Name of the column being indexed
- NUM_DISTINCT: Number of distinct values in the column
- NUM ROWS: Total number of rows in the table
- FILTER FACTOR: Selectivity ratio (NUM_DISTINCT / NUM_ROWS)
Understanding Low Cardinality Issues
Section titled “Understanding Low Cardinality Issues”Why Low Cardinality Indexes Are Problematic
Section titled “Why Low Cardinality Indexes Are Problematic”-
Poor Selectivity
- Oracle may choose full table scan over index scan
- Index provides little filtering benefit
- Wasted storage and maintenance overhead
-
Performance Impact
- Slower DML operations (INSERT, UPDATE, DELETE)
- Increased buffer cache usage
- Unnecessary index maintenance
-
Storage Waste
- Index consumes space without providing benefit
- Additional I/O for index maintenance
- Backup and recovery overhead
Filter Factor Analysis
Section titled “Filter Factor Analysis”- < 0.01 (1%): Extremely poor selectivity
- 0.01 - 0.05 (1-5%): Very poor selectivity
- 0.05 - 0.15 (5-15%): Poor selectivity (script threshold)
- > 0.15 (15%+): Acceptable selectivity
Common Low Cardinality Examples
Section titled “Common Low Cardinality Examples”Typical Candidates for Removal
Section titled “Typical Candidates for Removal”-- Status flags (Y/N, T/F)ACTIVE_FLAG: 2 distinct values (ACTIVE, INACTIVE)
-- Gender columnsGENDER: 2-3 distinct values (M, F, U)
-- Priority levelsPRIORITY: 3-5 distinct values (LOW, MEDIUM, HIGH)
-- Boolean indicatorsIS_DELETED: 2 distinct values (0, 1)
Advanced Analysis
Section titled “Advanced Analysis”Check index usage frequency:
Section titled “Check index usage frequency:”SELECT i.owner, i.table_name, i.index_name, s.last_analyzed, s.num_rows, NVL(u.total_access_count, 0) access_countFROM dba_indexes iLEFT JOIN dba_index_usage u ON i.owner = u.owner AND i.index_name = u.index_nameJOIN dba_ind_statistics s ON i.owner = s.owner AND i.index_name = s.index_nameWHERE i.uniqueness = 'NONUNIQUE'ORDER BY access_count, s.num_rows DESC;
Estimate space savings:
Section titled “Estimate space savings:”SELECT i.owner, i.table_name, i.index_name, ROUND(s.leaf_blocks * ts.block_size / 1024 / 1024, 2) size_mb, s.distinct_keys, s.num_rows, ROUND(s.distinct_keys / s.num_rows * 100, 2) selectivity_pctFROM dba_indexes i, dba_ind_statistics s, dba_tablespaces tsWHERE i.owner = s.ownerAND i.index_name = s.index_nameAND i.tablespace_name = ts.tablespace_nameAND s.distinct_keys < 50AND s.num_rows > 1000ORDER BY size_mb DESC;
Decision Matrix for Index Removal
Section titled “Decision Matrix for Index Removal”Safe to Remove
Section titled “Safe to Remove”- Filter Factor < 0.05 AND Never Used AND Not Unique/Primary Key
- Distinct Values < 10 AND Table Size > 10,000 rows
- Boolean columns in large tables
Consider Composite Index
Section titled “Consider Composite Index”- Low cardinality leading column BUT Used in multi-column predicates
- Status + Date combinations (e.g., STATUS = ‘ACTIVE’ AND CREATED_DATE > DATE)
Keep for Specific Use Cases
Section titled “Keep for Specific Use Cases”- Small tables (< 1000 rows) - overhead is minimal
- Frequently updated status columns with specific query patterns
- Columns used in ORDER BY with small result sets
Index Optimization Strategies
Section titled “Index Optimization Strategies”Alternative Approaches
Section titled “Alternative Approaches”- Composite Indexes
-- Instead of: CREATE INDEX idx_status ON orders(status);-- Consider: CREATE INDEX idx_status_date ON orders(status, order_date);
- Function-Based Indexes
-- For specific value searchesCREATE INDEX idx_active_customers ON customers(CASE WHEN status = 'ACTIVE' THEN customer_id END);
- Partial Indexes (Oracle 12c+)
-- Index only specific valuesCREATE INDEX idx_pending_orders ON orders(order_id) WHERE status = 'PENDING';
Cleanup Recommendations
Section titled “Cleanup Recommendations”Generate Drop Statements
Section titled “Generate Drop Statements”SELECT 'DROP INDEX ' || owner || '.' || index_name || ';' as drop_statementFROM ( SELECT DISTINCT i.owner, i.index_name, c.num_distinct, t.num_rows, ROUND(c.num_distinct / t.num_rows, 4) filter_factor FROM dba_indexes i, dba_ind_columns ic, dba_tab_columns c, dba_tables t WHERE i.owner = ic.index_owner AND i.index_name = ic.index_name AND ic.column_position = 1 AND i.owner = c.owner AND i.table_name = c.table_name AND ic.column_name = c.column_name AND i.owner = t.owner AND i.table_name = t.table_name AND i.uniqueness = 'NONUNIQUE' AND t.num_rows > 1000 AND c.num_distinct < 20 AND c.num_distinct / t.num_rows < 0.05 AND i.owner NOT IN ('SYS', 'SYSTEM'));
Testing Before Removal
Section titled “Testing Before Removal”Monitor Performance Impact
Section titled “Monitor Performance Impact”- Identify dependent queries
- Test with index invisible (Oracle 11g+)
- Monitor execution plans
- Check performance metrics
-- Make index invisible for testingALTER INDEX hr.emp_status_idx INVISIBLE;
-- Monitor for impact, then decide-- Either make visible again or drop
Best Practices
Section titled “Best Practices”-
Regular Analysis
- Run monthly on large schemas
- Focus on tables with > 10,000 rows
- Prioritize by space savings potential
-
Before Removal
- Check index usage statistics
- Review application query patterns
- Test performance impact
-
Documentation
- Document removed indexes
- Track space savings
- Monitor for performance regression