Skip to content

(clp-s::search): Scan ERT tables using column-based filtering#2281

Open
ShangDanLuXian wants to merge 2 commits into
y-scope:mainfrom
ShangDanLuXian:column_based_scan
Open

(clp-s::search): Scan ERT tables using column-based filtering#2281
ShangDanLuXian wants to merge 2 commits into
y-scope:mainfrom
ShangDanLuXian:column_based_scan

Conversation

@ShangDanLuXian
Copy link
Copy Markdown
Contributor

@ShangDanLuXian ShangDanLuXian commented May 14, 2026

Description

This PR adds an initial column-scan filter execution path for clp_s search.

The implementation is intentionally kept small and focused. QueryRunner now prepares the schema-table filter and attempts to use ColumnScan for supported predicates. If column scan cannot handle the query expression, the existing row-scan path is used as a fallback.

Main changes:

  • Add ColumnScan under components/core/src/clp_s/search/.
  • Support building per-schema match bitmaps for supported filter ASTs.
  • Support column scan for simple scalar/string filter predicates and AND / OR expression trees.
  • Integrate column scan through QueryRunner::prepare_filter.
  • Add CMake wiring for the new column-scan source files.

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

  • Tested representative MongoDB search queries locally.
  • For query id: 22419, observed total search time reduced from roughly 30s to roughly 15s on the test dataset, with table scan costs 1-2s.
  • Sanity checked all queries for the MongoDB dataset in the paper.

Summary by CodeRabbit

  • New Features

    • Enhanced search filtering with optimized message evaluation using column-based scanning.
  • Chores

    • Updated build system to include new search components.

Review Change Stack

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 14, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: ba5ced04-7571-4172-8a66-923cd7180384

📥 Commits

Reviewing files that changed from the base of the PR and between 4fe21a9 and fd9541a.

📒 Files selected for processing (6)
  • components/core/src/clp_s/search/CMakeLists.txt
  • components/core/src/clp_s/search/ColumnScan.cpp
  • components/core/src/clp_s/search/ColumnScan.hpp
  • components/core/src/clp_s/search/Output.cpp
  • components/core/src/clp_s/search/QueryRunner.cpp
  • components/core/src/clp_s/search/QueryRunner.hpp

Walkthrough

This PR introduces ColumnScan, a bitmap-based filter class for evaluating search AST expressions over indexed columns. It integrates into QueryRunner via a prepare_filter factory method and refactors Output to consume the prepared filter, enabling efficient per-message filtering without rebuilding evaluation state.

Changes

ColumnScan bitmap-based filtering

Layer / File(s) Summary
ColumnScan class contract and interface
components/core/src/clp_s/search/ColumnScan.hpp
ColumnScan defines a FilterClass-derived component with a static try_create factory, init and filter overrides, and private helpers for AST validation and bitmap construction from expression trees and reader/query/match lookups.
ColumnScan implementation—validation and bitmap evaluation
components/core/src/clp_s/search/ColumnScan.cpp
Implements construction, per-message filtering via bitmap lookup, iterative AST validation, filter buildability checks, and bitmap evaluation for AND/OR compositions and typed filter operations (numeric, boolean, CLP string, variable string, EXISTS/NEXISTS).
QueryRunner prepare_filter integration
components/core/src/clp_s/search/QueryRunner.hpp, components/core/src/clp_s/search/QueryRunner.cpp
QueryRunner adds prepare_filter public method, ColumnScan member, and includes; prepare_filter conditionally constructs ColumnScan and returns either the scan or QueryRunner itself as the active filter for a schema.
Output filter consumption
components/core/src/clp_s/search/Output.cpp
Output refactors filter() to call prepare_filter once per schema and pass the returned filter into get_next_message calls, replacing the prior initialize_filter call.
Build system compilation setup
components/core/src/clp_s/search/CMakeLists.txt
CMakeLists adds ColumnScan.cpp and ColumnScan.hpp to CLP_S_SEARCH_SOURCES for compilation.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 15.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and accurately summarizes the main change: adding column-based filtering for scanning ERT tables in clp-s search, which aligns with the PR's primary objective of introducing ColumnScan component.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ShangDanLuXian ShangDanLuXian changed the title feat(clp-s::search): Add column-scan filter evaluation eat(clp-s::search): Scan ERT tables using column-based filtering May 14, 2026
@ShangDanLuXian ShangDanLuXian changed the title eat(clp-s::search): Scan ERT tables using column-based filtering (clp-s::search): Scan ERT tables using column-based filtering May 14, 2026
@ShangDanLuXian ShangDanLuXian marked this pull request as ready for review May 14, 2026 20:51
@ShangDanLuXian ShangDanLuXian requested review from a team and gibber9809 as code owners May 14, 2026 20:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant