optimize index selection #23213

iamlinjunhong · 2025-12-04T03:37:45Z

User description

What type of PR is this?

Which issue(s) this PR fixes:

issue #https://github.com/matrixorigin/MO-Cloud/issues/6723

What this PR does / why we need it:

Enhanced SQL parser to support new vector query syntax.
Refactored IVFFlat and HNSW index application logic for better query optimization.
Added comprehensive test coverage for vector IVF mode functionality.

PR Type

Enhancement, Tests

Description

Refactored IVFFlat and HNSW index application logic by extracting context preparation into dedicated methods (prepareIvfIndexContext, prepareHnswIndexContext) for improved code organization and reusability
Separated RankOption from Limit in the AST structure (Select struct) to better represent SQL semantics with dedicated RankOption field and updated parser grammar accordingly
Introduced vectorSortContext struct to encapsulate vector sort query context information, improving parameter passing and code clarity
Implemented special index guard system with methods (prepareSpecialIndexGuards, resetSpecialIndexGuards, collectSpecialIndexGuards) to protect scan nodes from inappropriate index application
Enhanced parser to support "force" mode in addition to "pre" and "post" modes for vector index execution control
Updated QueryBuilder to integrate guard management and RankOption separation with new fields (protectedScans, projectSpecialGuards)
Added comprehensive test coverage for vector IVF mode functionality including basic operations, advanced edge cases, and complex query scenarios with multiple distance metrics and filter combinations
Removed legacy IVF test cases and consolidated into new comprehensive test suite

Diagram Walkthrough

flowchart LR
  Parser["Parser Grammar<br/>Separate RankOption"] -->|"rankOption token"| AST["AST Structure<br/>RankOption field"]
  AST -->|"vectorSortContext"| VectorOptimizer["Vector Index<br/>Optimization"]
  VectorOptimizer -->|"guard system"| GuardMgmt["Special Index<br/>Guard Management"]
  GuardMgmt -->|"protect scans"| IndexApp["Index Application<br/>IVFFlat/HNSW"]
  IndexApp -->|"execution modes<br/>pre/post/force"| QueryExec["Query<br/>Execution"]
  QueryExec -->|"test coverage"| Tests["Comprehensive<br/>Test Suite"]

File Walkthrough

Relevant files

Enhancement

8 files

apply_indices_ivfflat.go `Refactor IVFFlat index optimization with context extraction` pkg/sql/plan/apply_indices_ivfflat.go Refactored IVFFlat index application logic by extracting context preparation into `prepareIvfIndexContext` method Updated `applyIndicesForSortUsingIvfflat` to accept `vectorSortContext` instead of individual node parameters Added support for rebinding scan nodes and applying regular indexes to secondary scans when `mode=pre` Introduced helper methods `buildPkExprFromNode`, `getColName`, `rebindScanNode`, `replaceColRefTag`, `canApplyRegularIndex`, and `colRefsWithin` for better code organization	+285/-94
apply_indices.go `Add special index guard system for scan protection` pkg/sql/plan/apply_indices.go Added `specialIndexKind` and `specialIndexGuard` types to track protected scan nodes Implemented guard management methods: `prepareSpecialIndexGuards`, `resetSpecialIndexGuards`, `collectSpecialIndexGuards`, `registerProjectGuard`, `clearProjectGuard`, `isScanProtected` Added `detectFullTextGuard` and `detectVectorGuard` methods to identify scans protected by special indexes Introduced `collectVectorIndexes` helper method to extract vector indexes from scan nodes Modified `applyIndicesForFilters` and `applyIndicesForJoins` to skip protected scans Updated vector index application to use `vectorSortContext` and new helper methods	+233/-30
query_builder.go `Integrate RankOption separation and guard management` pkg/sql/plan/query_builder.go Added `protectedScans` and `projectSpecialGuards` fields to `QueryBuilder` struct initialization Updated `buildUnion` method signature to accept `astRankOption` parameter separately from `astLimit` Modified `bindLimit` to accept both `astLimit` and `astRankOption` parameters Enhanced `parseRankOption` to support "force" mode in addition to "pre" and "post" modes Added guard preparation and reset calls around `applyIndices` in `createQuery` method	+64/-61
apply_indices_hnsw.go `Refactor HNSW index optimization with context extraction` pkg/sql/plan/apply_indices_hnsw.go Refactored HNSW index application logic by extracting context preparation into `prepareHnswIndexContext` method Updated `applyIndicesForSortUsingHnsw` to accept `vectorSortContext` instead of individual node parameters Simplified method signatures and improved code reusability with context structure	+77/-49
select.go `Separate RankOption from Limit in AST structure` pkg/sql/parsers/tree/select.go Added `RankOption` field to `Select` struct Implemented `RankOption.Format` method for SQL formatting with "by rank" clause Removed `ByRank`, `Option`, and `Mode` fields from `Limit` struct Created new `RankOption` struct with `Option` map field Updated `Select.Format` to output `RankOption` separately from `Limit`	+37/-35
apply_indices_vector.go `Introduce vectorSortContext for query context management` pkg/sql/plan/apply_indices_vector.go Added `vectorSortContext` struct to encapsulate vector sort query context information Implemented `buildVectorSortContext` method to construct context from projection node Added `pickVectorLimit` helper function to extract limit and rank option from query nodes	+70/-0
types.go `Add guard tracking fields to QueryBuilder` pkg/sql/plan/types.go Added `protectedScans` map field to track protected scan node counts Added `projectSpecialGuards` map field to store special index guards per project node	+2/-0
mysql_sql.y `Refactor parser grammar to separate rank option` pkg/sql/parsers/dialect/mysql/mysql_sql.y Added `rankOption` type declaration to parser union Changed grammar rule from `limit_rank_suffix` to `rank_opt` for cleaner separation Updated `select_no_parens` production rules to include `rank_opt` as separate token after `limit_opt` Modified `limit_clause` rules to remove rank option handling Updated `limit_rank_suffix` rule to create `RankOption` struct instead of `Limit` struct	+18/-33

Tests

6 files

mysql_sql_test.go `Update tests for RankOption separation from Limit` pkg/sql/parsers/dialect/mysql/mysql_sql_test.go Updated test assertions to check `selectStmt.RankOption` instead of `selectStmt.Limit.ByRank` Changed assertions to verify `selectStmt.RankOption.Option` instead of `selectStmt.Limit.Option` Updated all six test cases in `TestLimitByRank` to reflect the new AST structure	+18/-18
query_builder_test.go `Update tests for RankOption parameter and force mode` pkg/sql/plan/query_builder_test.go Updated `TestQueryBuilder_bindLimit` to pass `nil` for new `astRankOption` parameter Added new test case `valid mode force` to verify "force" mode support in `parseRankOption` Updated error message assertion to expect "mode must be 'pre', 'post', or 'force'"	+12/-2
vector_ivf_mode.result `Add comprehensive vector IVF mode test coverage` test/distributed/cases/vector/vector_ivf_mode.result Added comprehensive test cases for vector IVF mode functionality with three modes: "pre", "post", and "force" Created test tables with vector indexes and various filter conditions Verified vector search results with different rank modes and filter combinations Tested edge cases including OFFSET, composite indexes, and multiple filter conditions	+314/-0
vector_ivf_mode.sql `Comprehensive IVF vector index mode test suite` test/distributed/cases/vector/vector_ivf_mode.sql Added comprehensive test suite for IVF (Inverted File) vector index mode with 242 lines of SQL test cases Tests cover three execution modes: `mode=pre` (pre-filtering), `mode=post` (post-filtering), and `mode=force` (disable index) Includes test cases for vector distance functions (`l2_distance`, `cosine_distance`), various LIMIT sizes, OFFSET clauses, and complex WHERE conditions Tests combinations of vector indexes with regular indexes and composite indexes to validate query optimization	+242/-0
vector_ivf_mode_advanced.sql `Advanced IVF mode edge cases and complex query tests` test/distributed/cases/vector/vector_ivf_mode_advanced.sql Added 265 lines of advanced test cases for IVF mode covering edge cases and complex scenarios Tests highly selective filters, OR/AND conditions, range filters, LIKE patterns, IN clauses, and NULL value handling Includes tests for different distance metrics (`l2_distance`, `cosine_distance`, `inner_product`) with various over-fetch factors Validates result consistency across different execution modes and tests OFFSET with complex filters	+265/-0
vector_ivf_mode_advanced.result `Expected test results for advanced IVF mode tests` test/distributed/cases/vector/vector_ivf_mode_advanced.result Added expected output results for 261 lines of advanced IVF mode test cases Contains query results showing correct row ordering and filtering for various test scenarios Includes error handling test case showing invalid `op_type` error message Demonstrates consistent results across `mode=pre`, `mode=post`, and `mode=force` execution paths	+261/-0

Additional files

3 files

mysql_sql.go	+3135/-3131
pre.result	+0/-29
pre.sql	+0/-17

qodo-code-review · 2025-12-04T03:38:45Z

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
🟢	No security concerns identified No security vulnerabilities detected by AI analysis. Human verification advised for critical code.
Ticket Compliance
⚪	🎫 No ticket provided Create ticket/issue
Codebase Duplication Compliance
⚪	Codebase context is not defined Follow the guide to enable codebase context checks.
Custom Compliance
🟢	Generic: Meaningful Naming and Self-Documenting Code Objective: Ensure all identifiers clearly express their purpose and intent, making code self-documenting Status: Passed Learn more about managing compliance generic rules or creating your own custom rules
	Generic: Secure Error Handling Objective: To prevent the leakage of sensitive system information through error messages while providing sufficient detail for internal debugging. Status: Passed Learn more about managing compliance generic rules or creating your own custom rules
	Generic: Secure Logging Practices Objective: To ensure logs are useful for debugging and auditing without exposing sensitive information like PII, PHI, or cardholder data. Status: Passed Learn more about managing compliance generic rules or creating your own custom rules
⚪	Generic: Comprehensive Audit Trails Objective: To create a detailed and reliable record of critical system actions for security analysis and compliance. Status: No Audit Logs: New vector index application paths and guard mechanisms add critical behaviors (index selection, mode handling) without any logging for traceability. Referred Code // RankOption.Mode controls vector index behavior: // - "pre": Enable vector index with BloomFilter pushdown for better filtering // - "force": Disable vector index, force full table scan (for debugging/comparison) // - nil/other: Enable vector index with default behavior if vecCtx.rankOption != nil && vecCtx.rankOption.Mode == "force" { return nil, nil } metaDef := multiTableIndex.IndexDefs[catalog.SystemSI_IVFFLAT_TblType_Metadata] idxDef := multiTableIndex.IndexDefs[catalog.SystemSI_IVFFLAT_TblType_Centroids] entriesDef := multiTableIndex.IndexDefs[catalog.SystemSI_IVFFLAT_TblType_Entries] if metaDef == nil \|\| idxDef == nil \|\| entriesDef == nil { return nil, nil } opTypeAst, err := sonic.Get([]byte(metaDef.IndexAlgoParams), catalog.IndexAlgoParamOpType) if err != nil { return nil, nil } opType, err := opTypeAst.StrictString() if err != nil { ... (clipped 50 lines) Learn more about managing compliance generic rules or creating your own custom rules
	Generic: Robust Error Handling and Edge Case Management Objective: Ensure comprehensive error handling that provides meaningful context and graceful degradation Status: Silent Failures: Multiple early returns on error or nil conditions swallow context and skip optimization without logging or surfacing actionable error details to callers. Referred Code opTypeAst, err := sonic.Get([]byte(metaDef.IndexAlgoParams), catalog.IndexAlgoParamOpType) if err != nil { return nil, nil } opType, err := opTypeAst.StrictString() if err != nil { return nil, nil } origFuncName := vecCtx.distFnExpr.Func.ObjName if opType != metric.DistFuncOpTypes[origFuncName] { return nil, nil } keyPart := idxDef.Parts[0] partPos := vecCtx.scanNode.TableDef.Name2ColIndex[keyPart] _, vecLitArg, found := builder.getArgsFromDistFn(vecCtx.distFnExpr, partPos) if !found { return nil, nil } ... (clipped 35 lines) Learn more about managing compliance generic rules or creating your own custom rules
	Generic: Security-First Input Validation and Data Handling Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent vulnerabilities Status: Option Validation: Parser accepts BY RANK options and stores arbitrary key-values with limited downstream validation (only 'mode'), lacking bounds/type checks for other options like 'nprobe' or 'fudge_factor'. Referred Code } } $$ = &tree.RankOption{ Option: optionMap, } } Learn more about managing compliance generic rules or creating your own custom rules
Update

Compliance status legend

🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

qodo-code-review · 2025-12-04T03:39:49Z

PR Code Suggestions ✨

Explore these optional code suggestions:

Category	Suggestion	Impact
Possible issue	Correct the invalid index operator Correct the invalid index operator `op_type` from 'vector_inner_product' to 'vector_ip_ops' to fix an error and allow the test case to run. test/distributed/cases/vector/vector_ivf_mode_advanced.sql [212] -CREATE INDEX idx_inner USING ivfflat ON inner_product_test(vec) lists=4 op_type 'vector_inner_product'; +CREATE INDEX idx_inner USING ivfflat ON inner_product_test(vec) lists=4 op_type 'vector_ip_ops'; Apply / Chat Suggestion importance[1-10]: 8 __ Why: The suggestion correctly identifies an invalid `op_type` 'vector_inner_product' which causes an error, as confirmed by the `.result` file, and provides the correct value 'vector_ip_ops', which is crucial for the test case to function as intended.	Medium
	Search both children of a join Update `buildPkExprFromNode` to search both children of a `JOIN` node for the primary key, not just the left child. pkg/sql/plan/apply_indices_ivfflat.go [551-554] case plan.Node_JOIN: if len(node.Children) > 0 { - return builder.buildPkExprFromNode(node.Children[0], pkType, pkName) + if pkExpr := builder.buildPkExprFromNode(node.Children[0], pkType, pkName); pkExpr != nil { + return pkExpr + } + } + if len(node.Children) > 1 { + return builder.buildPkExprFromNode(node.Children[1], pkType, pkName) } Apply / Chat Suggestion importance[1-10]: 6 __ Why: The suggestion correctly identifies that the function only searches the left child of a `JOIN` node and proposes a fix to search both children, making the logic more robust.	Low
	Add nil check to prevent panic Add a `nil` check for `builder.protectedScans` in `clearProjectGuard` to prevent a potential panic. pkg/sql/plan/apply_indices.go [176-198] func (builder QueryBuilder) clearProjectGuard(projID int32) { - if builder.projectSpecialGuards == nil { + if builder.projectSpecialGuards == nil \|\| builder.protectedScans == nil { return } guard, ok := builder.projectSpecialGuards[projID] if !ok { return } - if builder.protectedScans != nil { - for _, scanID := range guard.scanNodeIDs { - if cnt, ok := builder.protectedScans[scanID]; ok { - if cnt <= 1 { - delete(builder.protectedScans, scanID) - } else { - builder.protectedScans[scanID] = cnt - 1 - } + for _, scanID := range guard.scanNodeIDs { + if cnt, ok := builder.protectedScans[scanID]; ok { + if cnt <= 1 { + delete(builder.protectedScans, scanID) + } else { + builder.protectedScans[scanID] = cnt - 1 } } } delete(builder.projectSpecialGuards, projID) } Apply / Chat* Suggestion importance[1-10]: 5 __ Why: The suggestion correctly points out a missing `nil` check that could lead to a panic, improving the function's robustness, although the likelihood of this occurring is low.	Low
Update

mergify · 2025-12-08T03:39:27Z

Merge Queue Status

✅ The pull request has been merged

This pull request spent 8 seconds in the queue, with no time running CI.
The checks were run in-place.

Required conditions to merge

iamlinjunhong requested review from aunjgr, heni02 and ouyuanning as code owners December 4, 2025 03:37

iamlinjunhong had a problem deploying to ci December 4, 2025 03:38 — with GitHub Actions Error

iamlinjunhong temporarily deployed to ci December 4, 2025 03:38 — with GitHub Actions Inactive

iamlinjunhong had a problem deploying to ci December 4, 2025 03:38 — with GitHub Actions Error

mergify bot requested a review from XuPeng-SH December 4, 2025 03:38

iamlinjunhong had a problem deploying to ci December 4, 2025 03:38 — with GitHub Actions Error

qodo-code-review bot added the Review effort 4/5 label Dec 4, 2025

iamlinjunhong force-pushed the d-6723 branch from f60a25a to 8ee68eb Compare December 4, 2025 03:39

iamlinjunhong temporarily deployed to ci December 4, 2025 03:39 — with GitHub Actions Inactive

iamlinjunhong had a problem deploying to ci December 4, 2025 03:39 — with GitHub Actions Failure

iamlinjunhong temporarily deployed to ci December 4, 2025 03:39 — with GitHub Actions Inactive

iamlinjunhong had a problem deploying to ci December 4, 2025 03:39 — with GitHub Actions Failure

iamlinjunhong had a problem deploying to ci December 4, 2025 03:41 — with GitHub Actions Failure

iamlinjunhong temporarily deployed to ci December 4, 2025 03:41 — with GitHub Actions Inactive

XuPeng-SH temporarily deployed to ci December 6, 2025 03:42 — with GitHub Actions Inactive

Merge branch '3.0-dev' into d-6723

2ee8a2a

iamlinjunhong temporarily deployed to ci December 8, 2025 02:52 — with GitHub Actions Inactive

mergify bot added the queued label Dec 8, 2025

mergify bot merged commit a11a064 into matrixorigin:3.0-dev Dec 8, 2025
19 checks passed

mergify bot temporarily deployed to ci December 8, 2025 03:39 Inactive

mergify bot had a problem deploying to ci December 8, 2025 03:39 Failure

mergify bot temporarily deployed to ci December 8, 2025 03:39 Inactive

mergify bot removed the queued label Dec 8, 2025

mergify bot temporarily deployed to ci December 8, 2025 03:46 Inactive

mergify bot temporarily deployed to ci December 8, 2025 04:00 Inactive

mergify bot had a problem deploying to ci December 8, 2025 04:05 Failure

mergify bot temporarily deployed to ci December 8, 2025 04:28 Inactive

mergify bot temporarily deployed to ci December 8, 2025 04:52 Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

optimize index selection #23213

optimize index selection #23213

Uh oh!

iamlinjunhong commented Dec 4, 2025 •

edited by qodo-code-review bot

Loading

Uh oh!

qodo-code-review bot commented Dec 4, 2025 •

edited

Loading

Uh oh!

qodo-code-review bot commented Dec 4, 2025 •

edited

Loading

Uh oh!

mergify bot commented Dec 8, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

optimize index selection #23213

optimize index selection #23213

Uh oh!

Conversation

iamlinjunhong commented Dec 4, 2025 • edited by qodo-code-review bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

User description

What type of PR is this?

Which issue(s) this PR fixes:

What this PR does / why we need it:

PR Type

Description

Diagram Walkthrough

File Walkthrough

Uh oh!

qodo-code-review bot commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Compliance Guide 🔍

Uh oh!

qodo-code-review bot commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Suggestions ✨

Uh oh!

mergify bot commented Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merge Queue Status

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

iamlinjunhong commented Dec 4, 2025 •

edited by qodo-code-review bot

Loading

qodo-code-review bot commented Dec 4, 2025 •

edited

Loading

qodo-code-review bot commented Dec 4, 2025 •

edited

Loading

mergify bot commented Dec 8, 2025 •

edited

Loading