Skip to content

[FEATURE] Plugin-style Global Range Filter (GRF) module for scan/range pruning #98

@feichai0017

Description

@feichai0017

🚀 Is your feature request related to a problem?

Current point/range read paths still rely mostly on per-SST (table-level) filters and regular level traversal. For range-heavy and scan-heavy workloads, repeated filter probes and cross-level checks can add avoidable CPU overhead.

We want a cleaner, reusable way to experiment with global-level filtering strategies (inspired by the GRF paper) without tightly coupling the logic to one specific LSM implementation path.

Paper reference:

💡 Proposed Solution

Introduce a plugin-style Global Range Filter (GRF) module in NoKV that can be enabled/disabled and evolved independently.

High-level goals:

  • Provide a pluggable GRF interface for run/level pruning decisions.
  • Keep compatibility with existing Bloom/table filter path.
  • Support A/B benchmarking via runtime config flags.
  • Maintain correctness-first semantics with fallback to existing path on uncertainty.

🎨 Alternatives Considered

  1. Hardwire GRF into current levels/table traversal.
  • Pros: fast to prototype.
  • Cons: high coupling, harder rollback/testing, difficult to compare algorithms.
  1. Keep only per-table filters.
  • Pros: simpler.
  • Cons: misses potential CPU reduction on range/scan-heavy workloads.
  1. Plugin-style GRF module (proposed).
  • Pros: easier experimentation, clean abstraction, safer rollout.
  • Cons: additional interfaces and integration complexity.

📝 Technical Details / Implementation Plan

  • Define GlobalFilter interface (build/load/query/metrics hooks).
  • Add GRF config options (enable flag, memory budget, build mode, fallback strategy).
  • Integrate GRF query path into levels/range pruning decision pipeline.
  • Ensure correctness fallback: when GRF result is uncertain, use legacy traversal.
  • Add manifest/version metadata for GRF state compatibility.
  • Add unit tests for key-shape edge cases (long keys, boundary keys, empty ranges).
  • Add integration tests for point/range/scan correctness under compaction and restart.
  • Add benchmark matrix (with/without GRF, varying scan length and levels).
  • Add observability counters (hit/miss/fallback/pruned-runs/query-latency).
  • Document architecture and usage in docs.

🧐 Additional Context

This issue tracks the plugin architecture and engineering integration for GRF.

Suggested milestones:

  1. Interface + no-op implementation + wiring.
  2. Baseline GRF implementation and correctness tests.
  3. Performance validation and tuning.
  4. Optional advanced variants (learned/shape-aware enhancements).

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    Status

    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions