Skip to content

dian-farah/Index_Condition_Pushdown_Research

Repository files navigation

Index Condition Pushdown (ICP) — MySQL Query Optimisation

Note: This repository a group project completed for ISSS625 Query Processing & Optimisation at Singapore Management University (SMU). This repo contains the shared SQL script, dataset, and slides that I was involved in.


Project Context

This project investigates Index Condition Pushdown (ICP) as a MySQL 8.4 InnoDB optimiser strategy, applied to a real-world animal shelter dataset. The goal was to evaluate how ICP reduces I/O cost compared to a full table scan baseline, and to understand when and why ICP activates.

Target engine: MySQL 8.4 InnoDB
Dataset: Austin Animal Center Outcomes (City of Austin Open Data Portal)
Course: ISSS625 Query Processing & Optimisation, SMU MITB


What is ICP?

Index Condition Pushdown is a MySQL optimiser feature that pushes WHERE clause filtering of secondary indexed columns down from the server layer into the InnoDB storage engine.

Without ICP With ICP
How it works Fetch full rows → server filters → discard unmatched Filter inside B-tree leaf nodes → fetch only matching rows
I/O cost High — reads all rows before filtering Low — skips non-matching rows before disk access
EXPLAIN Extra Using where Using index condition
Big-O O(n) O(k·log n)

ICP activates when: a composite index has an equality condition on the leading column and a range condition on the second column.

SELECT * FROM animals
WHERE outcome_type = 'Euthanasia' AND age_in_days > 365;

Dataset

Source: Austin Animal Center Outcomes — catalog.data.gov
Licence: Public Domain (U.S. Government open data)
Period: October 2013 – May 2025

Stat Value
Total rows 173,775
Outcome types 11
Outcome subtypes 27
Animal types 5
Distinct breeds 3,010

Key columns used:

  • outcome_type — equality filter (e.g. 'Euthanasia')
  • outcome_subtype — secondary filter
  • animal_type — categorical
  • age_in_days — derived column computed via UPDATE CASE (years×365, months×30, weeks×7, days×1)

Load method: LOAD DATA INFILE into InnoDB table with STATS_SAMPLE_PAGES=25, STATS_PERSISTENT=1, STATS_AUTO_RECALC=1


Problem Statement

Animal shelter staff run frequent quarterly euthanasia audit reports to identify euthanised animals based on specific condition subsets (e.g. age). Without index optimisation, these queries perform a full table scan across 173,775 rows every time — slow and expensive at scale.


Techniques Considered

Before selecting ICP, three other strategies were evaluated and rejected:

Technique Verdict Reason
Composite ref lookup ✗ Rejected Only works when both columns use equality (=). Our query uses a range condition (age_in_days > 365)
Partitioning ✗ Rejected Euthanasia records are spread evenly across all 13 years (4–10% per year) — partitioning would still scan almost every partition
Covering index ✗ Rejected Valid, but requires naming specific columns in SELECT. Our SELECT * makes it inapplicable
Index Condition Pushdown Selected Single table · measurable EXPLAIN difference · distinct Big-O · works with AND range patterns

Index Design

-- Composite index for ICP (equality + range)
CREATE INDEX idx_icp_age ON animals (outcome_type, age_in_days);

-- Single-column comparison baseline
CREATE INDEX idx_outcome ON animals (outcome_type);
Index Column Cardinality Purpose
idx_icp_age outcome_type 11 Equality entry point
idx_icp_age age_in_days 242 Range filter (ICP target)
idx_outcome outcome_type 11 Single-column comparison

Results

Head-to-head comparison (averaged across 10 runs)

Strategy Cost Avg time (ms) Rows examined Big-O EXPLAIN Extra vs Full Scan
Full table scan (baseline) 17,614 674.40 173,775 O(n) Using where
ICP OFF (age > 365) 2,436 81.46 5,413 O(k·log n) Using where 88% faster
ICP ON (age > 365) 2,436 74.19 5,413 O(k·log n) Using index condition 89% faster
ICP ON tighter (age > 730) 1,062 21.72 2,359 O(k·log n) Using index condition 97% faster

Scale-up test

Table size Strategy Cost Actual time (ms) Result rows Cost reduction
350K rows Full scan 35,705 809 10,826 Baseline
350K rows After ICP 12,530 1,976 10,826 64.9%
700K rows Full scan 73,243 2,609 21,652 Baseline
700K rows After ICP 38,757 3,302 21,652 47.1%

ICP maintained its cost advantage as data doubled — consistent with indexed-access behaviour scaling more favourably than full scans.


Key Takeaways

ICP is an optimiser strategy, not just an index — ICP ON and ICP OFF return identical results (5,413 rows for age > 365), but ICP ON filters earlier inside the index rather than after row fetch. This reduced runtime by ~9% vs ICP OFF and ~89% vs the full table scan.

Selectivity drives ICP's benefit — tightening the range from age > 365 to age > 730 dropped the result set from 5,413 rows to 2,359 rows, and runtime improved by ~71%. The narrower the qualifying range, the more filtering MySQL can do inside the B-tree before fetching rows.

ICP scales — cost reduction held at 350K and 700K rows, while full scan cost rose sharply. ICP is most valuable on large, frequently queried tables with selective range conditions.


Benefits & Limitations

Benefits

  • Reduces memory usage — filters using only index data inside B-tree, avoids loading full table pages into buffer pool
  • Smarter range handling — leverages cardinality to use secondary columns in composite indexes
  • Overcomes optimiser limits for "leading column equality + trailing column range" patterns

Limitations

  • Strict column order dependency — if the leading column doesn't narrow the search effectively, ICP walks through a large amount of index data and can be slower than a table scan
  • Not compatible with virtual/generated columns or complex expressions — falls back to fetch-then-filter behaviour

Repository Structure

├── data/                  # Austin Animal Center dataset
├── sql/                   # SQL scripts (index creation, queries, EXPLAIN analysis)
├── slides/                # Group presentation slides
└── README.md

References


About

Part of ISSS625 Query Processing & Optimisation, SMU MITB programme.
Full group project: Index Condition Pushdown (ICP) — G2T9 Save the Animals.
Group members: Dian Farah Binte Riduan, Foo Jun Hao, Huang Kang-chih, Tan Wenying Audrey, Yang Xinqi.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors