Skip to content

Conversation

@iamlinjunhong
Copy link
Contributor

@iamlinjunhong iamlinjunhong commented Dec 4, 2025

User description

What type of PR is this?

  • API-change
  • BUG
  • Improvement
  • Documentation
  • Feature
  • Test and CI
  • Code Refactoring

Which issue(s) this PR fixes:

issue #https://github.com/matrixorigin/MO-Cloud/issues/6723

What this PR does / why we need it:

Enhanced SQL parser to support new vector query syntax.
Refactored IVFFlat and HNSW index application logic for better query optimization.
Added comprehensive test coverage for vector IVF mode functionality.


PR Type

Enhancement, Tests


Description

  • Refactored IVFFlat and HNSW index application logic by extracting context preparation into dedicated methods (prepareIvfIndexContext, prepareHnswIndexContext) for improved code organization and reusability

  • Separated RankOption from Limit in the AST structure (Select struct) to better represent SQL semantics with dedicated RankOption field and updated parser grammar accordingly

  • Introduced vectorSortContext struct to encapsulate vector sort query context information, improving parameter passing and code clarity

  • Implemented special index guard system with methods (prepareSpecialIndexGuards, resetSpecialIndexGuards, collectSpecialIndexGuards) to protect scan nodes from inappropriate index application

  • Enhanced parser to support "force" mode in addition to "pre" and "post" modes for vector index execution control

  • Updated QueryBuilder to integrate guard management and RankOption separation with new fields (protectedScans, projectSpecialGuards)

  • Added comprehensive test coverage for vector IVF mode functionality including basic operations, advanced edge cases, and complex query scenarios with multiple distance metrics and filter combinations

  • Removed legacy IVF test cases and consolidated into new comprehensive test suite


Diagram Walkthrough

flowchart LR
  Parser["Parser Grammar<br/>Separate RankOption"] -->|"rankOption token"| AST["AST Structure<br/>RankOption field"]
  AST -->|"vectorSortContext"| VectorOptimizer["Vector Index<br/>Optimization"]
  VectorOptimizer -->|"guard system"| GuardMgmt["Special Index<br/>Guard Management"]
  GuardMgmt -->|"protect scans"| IndexApp["Index Application<br/>IVFFlat/HNSW"]
  IndexApp -->|"execution modes<br/>pre/post/force"| QueryExec["Query<br/>Execution"]
  QueryExec -->|"test coverage"| Tests["Comprehensive<br/>Test Suite"]
Loading

File Walkthrough

Relevant files
Enhancement
8 files
apply_indices_ivfflat.go
Refactor IVFFlat index optimization with context extraction

pkg/sql/plan/apply_indices_ivfflat.go

  • Refactored IVFFlat index application logic by extracting context
    preparation into prepareIvfIndexContext method
  • Updated applyIndicesForSortUsingIvfflat to accept vectorSortContext
    instead of individual node parameters
  • Added support for rebinding scan nodes and applying regular indexes to
    secondary scans when mode=pre
  • Introduced helper methods buildPkExprFromNode, getColName,
    rebindScanNode, replaceColRefTag, canApplyRegularIndex, and
    colRefsWithin for better code organization
+285/-94
apply_indices.go
Add special index guard system for scan protection             

pkg/sql/plan/apply_indices.go

  • Added specialIndexKind and specialIndexGuard types to track protected
    scan nodes
  • Implemented guard management methods: prepareSpecialIndexGuards,
    resetSpecialIndexGuards, collectSpecialIndexGuards,
    registerProjectGuard, clearProjectGuard, isScanProtected
  • Added detectFullTextGuard and detectVectorGuard methods to identify
    scans protected by special indexes
  • Introduced collectVectorIndexes helper method to extract vector
    indexes from scan nodes
  • Modified applyIndicesForFilters and applyIndicesForJoins to skip
    protected scans
  • Updated vector index application to use vectorSortContext and new
    helper methods
+233/-30
query_builder.go
Integrate RankOption separation and guard management         

pkg/sql/plan/query_builder.go

  • Added protectedScans and projectSpecialGuards fields to QueryBuilder
    struct initialization
  • Updated buildUnion method signature to accept astRankOption parameter
    separately from astLimit
  • Modified bindLimit to accept both astLimit and astRankOption
    parameters
  • Enhanced parseRankOption to support "force" mode in addition to "pre"
    and "post" modes
  • Added guard preparation and reset calls around applyIndices in
    createQuery method
+64/-61 
apply_indices_hnsw.go
Refactor HNSW index optimization with context extraction 

pkg/sql/plan/apply_indices_hnsw.go

  • Refactored HNSW index application logic by extracting context
    preparation into prepareHnswIndexContext method
  • Updated applyIndicesForSortUsingHnsw to accept vectorSortContext
    instead of individual node parameters
  • Simplified method signatures and improved code reusability with
    context structure
+77/-49 
select.go
Separate RankOption from Limit in AST structure                   

pkg/sql/parsers/tree/select.go

  • Added RankOption field to Select struct
  • Implemented RankOption.Format method for SQL formatting with "by rank"
    clause
  • Removed ByRank, Option, and Mode fields from Limit struct
  • Created new RankOption struct with Option map field
  • Updated Select.Format to output RankOption separately from Limit
+37/-35 
apply_indices_vector.go
Introduce vectorSortContext for query context management 

pkg/sql/plan/apply_indices_vector.go

  • Added vectorSortContext struct to encapsulate vector sort query
    context information
  • Implemented buildVectorSortContext method to construct context from
    projection node
  • Added pickVectorLimit helper function to extract limit and rank option
    from query nodes
+70/-0   
types.go
Add guard tracking fields to QueryBuilder                               

pkg/sql/plan/types.go

  • Added protectedScans map field to track protected scan node counts
  • Added projectSpecialGuards map field to store special index guards per
    project node
+2/-0     
mysql_sql.y
Refactor parser grammar to separate rank option                   

pkg/sql/parsers/dialect/mysql/mysql_sql.y

  • Added rankOption type declaration to parser union
  • Changed grammar rule from limit_rank_suffix to rank_opt for cleaner
    separation
  • Updated select_no_parens production rules to include rank_opt as
    separate token after limit_opt
  • Modified limit_clause rules to remove rank option handling
  • Updated limit_rank_suffix rule to create RankOption struct instead of
    Limit struct
+18/-33 
Tests
6 files
mysql_sql_test.go
Update tests for RankOption separation from Limit               

pkg/sql/parsers/dialect/mysql/mysql_sql_test.go

  • Updated test assertions to check selectStmt.RankOption instead of
    selectStmt.Limit.ByRank
  • Changed assertions to verify selectStmt.RankOption.Option instead of
    selectStmt.Limit.Option
  • Updated all six test cases in TestLimitByRank to reflect the new AST
    structure
+18/-18 
query_builder_test.go
Update tests for RankOption parameter and force mode         

pkg/sql/plan/query_builder_test.go

  • Updated TestQueryBuilder_bindLimit to pass nil for new astRankOption
    parameter
  • Added new test case valid mode force to verify "force" mode support in
    parseRankOption
  • Updated error message assertion to expect "mode must be 'pre', 'post',
    or 'force'"
+12/-2   
vector_ivf_mode.result
Add comprehensive vector IVF mode test coverage                   

test/distributed/cases/vector/vector_ivf_mode.result

  • Added comprehensive test cases for vector IVF mode functionality with
    three modes: "pre", "post", and "force"
  • Created test tables with vector indexes and various filter conditions
  • Verified vector search results with different rank modes and filter
    combinations
  • Tested edge cases including OFFSET, composite indexes, and multiple
    filter conditions
+314/-0 
vector_ivf_mode.sql
Comprehensive IVF vector index mode test suite                     

test/distributed/cases/vector/vector_ivf_mode.sql

  • Added comprehensive test suite for IVF (Inverted File) vector index
    mode with 242 lines of SQL test cases
  • Tests cover three execution modes: mode=pre (pre-filtering), mode=post
    (post-filtering), and mode=force (disable index)
  • Includes test cases for vector distance functions (l2_distance,
    cosine_distance), various LIMIT sizes, OFFSET clauses, and complex
    WHERE conditions
  • Tests combinations of vector indexes with regular indexes and
    composite indexes to validate query optimization
+242/-0 
vector_ivf_mode_advanced.sql
Advanced IVF mode edge cases and complex query tests         

test/distributed/cases/vector/vector_ivf_mode_advanced.sql

  • Added 265 lines of advanced test cases for IVF mode covering edge
    cases and complex scenarios
  • Tests highly selective filters, OR/AND conditions, range filters, LIKE
    patterns, IN clauses, and NULL value handling
  • Includes tests for different distance metrics (l2_distance,
    cosine_distance, inner_product) with various over-fetch factors
  • Validates result consistency across different execution modes and
    tests OFFSET with complex filters
+265/-0 
vector_ivf_mode_advanced.result
Expected test results for advanced IVF mode tests               

test/distributed/cases/vector/vector_ivf_mode_advanced.result

  • Added expected output results for 261 lines of advanced IVF mode test
    cases
  • Contains query results showing correct row ordering and filtering for
    various test scenarios
  • Includes error handling test case showing invalid op_type error
    message
  • Demonstrates consistent results across mode=pre, mode=post, and
    mode=force execution paths
+261/-0 
Additional files
3 files
mysql_sql.go +3135/-3131
pre.result +0/-29   
pre.sql +0/-17   

@qodo-code-review
Copy link

qodo-code-review bot commented Dec 4, 2025

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
🟢
No security concerns identified No security vulnerabilities detected by AI analysis. Human verification advised for critical code.
Ticket Compliance
🎫 No ticket provided
  • Create ticket/issue
Codebase Duplication Compliance
Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance
🟢
Generic: Meaningful Naming and Self-Documenting Code

Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Error Handling

Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Logging Practices

Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Comprehensive Audit Trails

Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.

Status:
No Audit Logs: New vector index application paths and guard mechanisms add critical behaviors (index
selection, mode handling) without any logging for traceability.

Referred Code
// RankOption.Mode controls vector index behavior:
// - "pre": Enable vector index with BloomFilter pushdown for better filtering
// - "force": Disable vector index, force full table scan (for debugging/comparison)
// - nil/other: Enable vector index with default behavior
if vecCtx.rankOption != nil && vecCtx.rankOption.Mode == "force" {
	return nil, nil
}

metaDef := multiTableIndex.IndexDefs[catalog.SystemSI_IVFFLAT_TblType_Metadata]
idxDef := multiTableIndex.IndexDefs[catalog.SystemSI_IVFFLAT_TblType_Centroids]
entriesDef := multiTableIndex.IndexDefs[catalog.SystemSI_IVFFLAT_TblType_Entries]
if metaDef == nil || idxDef == nil || entriesDef == nil {
	return nil, nil
}

opTypeAst, err := sonic.Get([]byte(metaDef.IndexAlgoParams), catalog.IndexAlgoParamOpType)
if err != nil {
	return nil, nil
}
opType, err := opTypeAst.StrictString()
if err != nil {


 ... (clipped 50 lines)

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Robust Error Handling and Edge Case Management

Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation

Status:
Silent Failures: Multiple early returns on error or nil conditions swallow context and skip optimization
without logging or surfacing actionable error details to callers.

Referred Code
opTypeAst, err := sonic.Get([]byte(metaDef.IndexAlgoParams), catalog.IndexAlgoParamOpType)
if err != nil {
	return nil, nil
}
opType, err := opTypeAst.StrictString()
if err != nil {
	return nil, nil
}

origFuncName := vecCtx.distFnExpr.Func.ObjName
if opType != metric.DistFuncOpTypes[origFuncName] {
	return nil, nil
}

keyPart := idxDef.Parts[0]
partPos := vecCtx.scanNode.TableDef.Name2ColIndex[keyPart]
_, vecLitArg, found := builder.getArgsFromDistFn(vecCtx.distFnExpr, partPos)
if !found {
	return nil, nil
}



 ... (clipped 35 lines)

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Security-First Input Validation and Data Handling

Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities

Status:
Option Validation: Parser accepts BY RANK options and stores arbitrary key-values with limited downstream
validation (only 'mode'), lacking bounds/type checks for other options like
'nprobe' or 'fudge_factor'.

Referred Code
        }
    }

    $$ = &tree.RankOption{
        Option: optionMap,
    }
}

Learn more about managing compliance generic rules or creating your own custom rules

  • Update
Compliance status legend 🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

@qodo-code-review
Copy link

qodo-code-review bot commented Dec 4, 2025

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Correct the invalid index operator

Correct the invalid index operator op_type from 'vector_inner_product' to
'vector_ip_ops' to fix an error and allow the test case to run.

test/distributed/cases/vector/vector_ivf_mode_advanced.sql [212]

-CREATE INDEX idx_inner USING ivfflat ON inner_product_test(vec) lists=4 op_type 'vector_inner_product';
+CREATE INDEX idx_inner USING ivfflat ON inner_product_test(vec) lists=4 op_type 'vector_ip_ops';
  • Apply / Chat
Suggestion importance[1-10]: 8

__

Why: The suggestion correctly identifies an invalid op_type 'vector_inner_product' which causes an error, as confirmed by the .result file, and provides the correct value 'vector_ip_ops', which is crucial for the test case to function as intended.

Medium
Search both children of a join

Update buildPkExprFromNode to search both children of a JOIN node for the
primary key, not just the left child.

pkg/sql/plan/apply_indices_ivfflat.go [551-554]

 	case plan.Node_JOIN:
 		if len(node.Children) > 0 {
-			return builder.buildPkExprFromNode(node.Children[0], pkType, pkName)
+			if pkExpr := builder.buildPkExprFromNode(node.Children[0], pkType, pkName); pkExpr != nil {
+				return pkExpr
+			}
+		}
+		if len(node.Children) > 1 {
+			return builder.buildPkExprFromNode(node.Children[1], pkType, pkName)
 		}
  • Apply / Chat
Suggestion importance[1-10]: 6

__

Why: The suggestion correctly identifies that the function only searches the left child of a JOIN node and proposes a fix to search both children, making the logic more robust.

Low
Add nil check to prevent panic

Add a nil check for builder.protectedScans in clearProjectGuard to prevent a
potential panic.

pkg/sql/plan/apply_indices.go [176-198]

 func (builder *QueryBuilder) clearProjectGuard(projID int32) {
-	if builder.projectSpecialGuards == nil {
+	if builder.projectSpecialGuards == nil || builder.protectedScans == nil {
 		return
 	}
 	guard, ok := builder.projectSpecialGuards[projID]
 	if !ok {
 		return
 	}
 
-	if builder.protectedScans != nil {
-		for _, scanID := range guard.scanNodeIDs {
-			if cnt, ok := builder.protectedScans[scanID]; ok {
-				if cnt <= 1 {
-					delete(builder.protectedScans, scanID)
-				} else {
-					builder.protectedScans[scanID] = cnt - 1
-				}
+	for _, scanID := range guard.scanNodeIDs {
+		if cnt, ok := builder.protectedScans[scanID]; ok {
+			if cnt <= 1 {
+				delete(builder.protectedScans, scanID)
+			} else {
+				builder.protectedScans[scanID] = cnt - 1
 			}
 		}
 	}
 
 	delete(builder.projectSpecialGuards, projID)
 }
  • Apply / Chat
Suggestion importance[1-10]: 5

__

Why: The suggestion correctly points out a missing nil check that could lead to a panic, improving the function's robustness, although the likelihood of this occurring is low.

Low
  • Update

@mergify
Copy link
Contributor

mergify bot commented Dec 8, 2025

Merge Queue Status

✅ The pull request has been merged

This pull request spent 8 seconds in the queue, with no time running CI.
The checks were run in-place.

Required conditions to merge
  • #approved-reviews-by >= 1 [🛡 GitHub branch protection]
  • #changes-requested-reviews-by = 0 [🛡 GitHub branch protection]
  • branch-protection-review-decision = APPROVED [🛡 GitHub branch protection]
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone CI / SCA Test on Ubuntu/x86
    • check-neutral = Matrixone CI / SCA Test on Ubuntu/x86
    • check-skipped = Matrixone CI / SCA Test on Ubuntu/x86
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone CI / UT Test on Ubuntu/x86
    • check-neutral = Matrixone CI / UT Test on Ubuntu/x86
    • check-skipped = Matrixone CI / UT Test on Ubuntu/x86
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Standlone CI / Multi-CN e2e BVT Test on Linux/x64(LAUNCH, PROXY)
    • check-neutral = Matrixone Standlone CI / Multi-CN e2e BVT Test on Linux/x64(LAUNCH, PROXY)
    • check-skipped = Matrixone Standlone CI / Multi-CN e2e BVT Test on Linux/x64(LAUNCH, PROXY)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Standlone CI / e2e BVT Test on Linux/x64(LAUNCH, PESSIMISTIC)
    • check-neutral = Matrixone Standlone CI / e2e BVT Test on Linux/x64(LAUNCH, PESSIMISTIC)
    • check-skipped = Matrixone Standlone CI / e2e BVT Test on Linux/x64(LAUNCH, PESSIMISTIC)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Compose CI / multi cn e2e bvt test docker compose(PESSIMISTIC)
    • check-neutral = Matrixone Compose CI / multi cn e2e bvt test docker compose(PESSIMISTIC)
    • check-skipped = Matrixone Compose CI / multi cn e2e bvt test docker compose(PESSIMISTIC)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Compose CI / multi cn e2e bvt test docker compose(Optimistic/PUSH)
    • check-neutral = Matrixone Compose CI / multi cn e2e bvt test docker compose(Optimistic/PUSH)
    • check-skipped = Matrixone Compose CI / multi cn e2e bvt test docker compose(Optimistic/PUSH)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Standlone CI / e2e BVT Test on Linux/x64(LAUNCH,Optimistic)
    • check-neutral = Matrixone Standlone CI / e2e BVT Test on Linux/x64(LAUNCH,Optimistic)
    • check-skipped = Matrixone Standlone CI / e2e BVT Test on Linux/x64(LAUNCH,Optimistic)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Upgrade CI / Compatibility Test With Release on Linux/x64(LAUNCH)
    • check-neutral = Matrixone Upgrade CI / Compatibility Test With Release on Linux/x64(LAUNCH)
    • check-skipped = Matrixone Upgrade CI / Compatibility Test With Release on Linux/x64(LAUNCH)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Upgrade CI / Compatibility Test With Target on Linux/x64(LAUNCH)
    • check-neutral = Matrixone Upgrade CI / Compatibility Test With Target on Linux/x64(LAUNCH)
    • check-skipped = Matrixone Upgrade CI / Compatibility Test With Target on Linux/x64(LAUNCH)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Utils CI / Coverage
    • check-neutral = Matrixone Utils CI / Coverage
    • check-skipped = Matrixone Utils CI / Coverage

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Review effort 4/5 size/XXL Denotes a PR that changes 2000+ lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants