Skip to content

Conversation

@iamlinjunhong
Copy link
Contributor

@iamlinjunhong iamlinjunhong commented Dec 4, 2025

User description

What type of PR is this?

  • API-change
  • BUG
  • Improvement
  • Documentation
  • Feature
  • Test and CI
  • Code Refactoring

Which issue(s) this PR fixes:

issue #https://github.com/matrixorigin/MO-Cloud/issues/6723

What this PR does / why we need it:

Enhanced SQL parser to support new vector query syntax.
Refactored IVFFlat and HNSW index application logic for better query optimization.
Added comprehensive test coverage for vector IVF mode functionality.


PR Type

Enhancement, Tests


Description

  • Refactored SQL parser to separate RankOption from Limit clause in AST structure, enabling independent handling of rank modes ("pre", "post", "force")

  • Enhanced IVFFlat and HNSW index application logic by extracting context preparation into dedicated functions (prepareIvfIndexContext, prepareHnswIndexContext, buildVectorSortContext)

  • Implemented special index guard mechanism to protect scan nodes from inappropriate regular index application when vector indexes are present

  • Added vectorSortContext struct to encapsulate vector sort query context information for improved index optimization tracking

  • Modified grammar rules in mysql_sql.y to parse RankOption independently from Limit clause

  • Updated QueryBuilder to track protected scans and special index guards per project node

  • Added comprehensive test coverage for IVFFlat vector index functionality including basic, advanced, and edge case scenarios with multiple distance metrics and filter combinations

  • Migrated test cases from legacy ivf/ directory structure to new vector/vector_ivf_mode* test suite


Diagram Walkthrough

flowchart LR
  Parser["Parser<br/>Separate RankOption<br/>from Limit"] --> AST["AST Structure<br/>RankOption field<br/>in Select"]
  AST --> QueryBuilder["QueryBuilder<br/>Track protected scans<br/>& special guards"]
  QueryBuilder --> VectorContext["Vector Context<br/>buildVectorSortContext<br/>pickVectorLimit"]
  VectorContext --> IVFFlat["IVFFlat Index<br/>prepareIvfIndexContext<br/>applyIndicesForSortUsingIvfflat"]
  VectorContext --> HNSW["HNSW Index<br/>prepareHnswIndexContext<br/>applyIndicesForSortUsingHnsw"]
  IVFFlat --> Guard["Special Index Guard<br/>Protect scans from<br/>regular indexes"]
  HNSW --> Guard
  Guard --> Tests["Comprehensive Tests<br/>IVF mode coverage<br/>pre/post/force modes"]
Loading

File Walkthrough

Relevant files
Enhancement
8 files
apply_indices_ivfflat.go
Refactor IVFFlat index application with context extraction

pkg/sql/plan/apply_indices_ivfflat.go

  • Refactored IVFFlat index application logic by extracting context
    preparation into prepareIvfIndexContext function
  • Modified applyIndicesForSortUsingIvfflat to accept vectorSortContext
    instead of individual node parameters
  • Added support for colRefCnt and idxColMap parameters for better index
    optimization tracking
  • Implemented helper functions buildPkExprFromNode, getColName,
    rebindScanNode, replaceColRefTag, canApplyRegularIndex, and
    colRefsWithin for improved node manipulation and validation
  • Added logic to apply regular indexes on secondary scan nodes when
    pushdownEnabled is true
+285/-94
apply_indices.go
Add special index guard mechanism for scan protection       

pkg/sql/plan/apply_indices.go

  • Added specialIndexKind enum and specialIndexGuard struct to track
    protected scan nodes
  • Implemented prepareSpecialIndexGuards, resetSpecialIndexGuards,
    collectSpecialIndexGuards, and related guard management functions
  • Added detectFullTextGuard and detectVectorGuard functions to identify
    scans protected by special indexes
  • Implemented collectVectorIndexes helper to gather vector indexes from
    scan nodes
  • Added protection checks in applyIndicesForFilters and
    applyIndicesForJoins to prevent applying regular indexes to protected
    scans
  • Refactored vector index application to use buildVectorSortContext and
    pass additional parameters
+233/-30
query_builder.go
Separate RankOption from Limit and add guard preparation 

pkg/sql/plan/query_builder.go

  • Added protectedScans and projectSpecialGuards fields to QueryBuilder
    struct initialization
  • Modified buildUnion function signature to accept astRankOption
    parameter separately from astLimit
  • Updated bindSelect to extract and handle RankOption independently from
    Limit clause
  • Modified bindLimit function to accept both astLimit and astRankOption
    parameters
  • Updated parseRankOption to support "force" mode in addition to "pre"
    and "post" modes
  • Added calls to prepareSpecialIndexGuards and resetSpecialIndexGuards
    around applyIndices invocation
+65/-62 
apply_indices_hnsw.go
Refactor HNSW index application with context extraction   

pkg/sql/plan/apply_indices_hnsw.go

  • Extracted HNSW index context preparation into prepareHnswIndexContext
    function
  • Modified applyIndicesForSortUsingHnsw to accept vectorSortContext
    instead of individual node parameters
  • Simplified function signature by removing redundant node parameters
  • Added support for "force" mode in RankOption to disable vector index
    usage
+77/-49 
select.go
Separate RankOption from Limit in AST structure                   

pkg/sql/parsers/tree/select.go

  • Added RankOption field to Select struct
  • Removed ByRank, Option, and Mode fields from Limit struct
  • Created new RankOption struct with Option map field
  • Implemented Format method for RankOption to handle SQL formatting
  • Updated Select.Format to output RankOption separately from Limit
+37/-35 
apply_indices_vector.go
Add vector sort context extraction utilities                         

pkg/sql/plan/apply_indices_vector.go

  • Added vectorSortContext struct to encapsulate vector sort query
    context information
  • Implemented buildVectorSortContext function to construct context from
    projection node
  • Implemented pickVectorLimit helper function to extract limit and rank
    option from query nodes
+70/-0   
types.go
Add guard tracking fields to QueryBuilder                               

pkg/sql/plan/types.go

  • Added protectedScans map field to track protected scan node counts
  • Added projectSpecialGuards map field to store special index guards per
    project node
+2/-0     
mysql_sql.y
Refactor grammar to separate rank option from limit           

pkg/sql/parsers/dialect/mysql/mysql_sql.y

  • Added rankOption type declaration to yacc grammar
  • Renamed limit_rank_suffix rule to rank_opt for clarity
  • Updated select_no_parens production rules to include rank_opt as
    separate clause after limit_opt
  • Modified grammar to parse RankOption independently from Limit clause
  • Simplified limit_clause rules by removing rank suffix handling
+18/-33 
Tests
6 files
mysql_sql_test.go
Update parser tests for RankOption separation                       

pkg/sql/parsers/dialect/mysql/mysql_sql_test.go

  • Updated test assertions to check selectStmt.RankOption instead of
    selectStmt.Limit.ByRank
  • Changed assertions to verify RankOption.Option instead of Limit.Option
  • Updated all test cases in TestLimitByRank to reflect the new AST
    structure
+18/-18 
query_builder_test.go
Update query builder tests for RankOption parameter           

pkg/sql/plan/query_builder_test.go

  • Updated TestQueryBuilder_bindLimit to pass nil for new astRankOption
    parameter
  • Added test case for "force" mode in TestParseRankOption
  • Updated error message assertions to include "force" as valid mode
    option
+12/-2   
vector_ivf_mode.result
Add comprehensive vector IVF mode test coverage                   

test/distributed/cases/vector/vector_ivf_mode.result

  • Added comprehensive test cases for IVFFlat vector index with different
    rank modes ("pre", "post", "force")
  • Created test tables with vector columns and IVFFlat indexes
  • Added test queries demonstrating vector search with filters, multiple
    indexes, and composite indexes
  • Verified correct query results for all three rank modes with various
    filter conditions
+314/-0 
vector_ivf_mode.sql
Comprehensive IVF vector index mode test coverage               

test/distributed/cases/vector/vector_ivf_mode.sql

  • Added comprehensive test suite for IVF (Inverted File) vector index
    mode with 242 lines of SQL test cases
  • Tests three execution modes: mode=pre (pre-filtering), mode=post
    (post-filtering), and mode=force (disable index)
  • Covers vector distance functions (l2_distance, cosine_distance),
    various LIMIT sizes, and over-fetch factor scenarios
  • Tests combination of vector indexes with regular indexes and composite
    indexes on multiple tables
+242/-0 
vector_ivf_mode_advanced.sql
Advanced IVF mode edge cases and complex query testing     

test/distributed/cases/vector/vector_ivf_mode_advanced.sql

  • Added 265 lines of advanced test cases for IVF mode edge cases and
    complex scenarios
  • Tests highly selective filters, OR/AND conditions, range filters, LIKE
    patterns, and NULL value handling
  • Covers different distance metrics (l2_distance, cosine_distance,
    inner_product) with various filter combinations
  • Tests boundary conditions for LIMIT values and OFFSET with complex
    WHERE clauses
+265/-0 
vector_ivf_mode_advanced.result
Expected results for advanced IVF vector index tests         

test/distributed/cases/vector/vector_ivf_mode_advanced.result

  • Added 261 lines of expected test results for advanced IVF mode test
    cases
  • Contains query output showing correct result sets for various filter
    and distance metric combinations
  • Includes error handling verification (e.g., invalid op_type error for
    vector_inner_product)
  • Validates result consistency across different execution modes
    (mode=pre, mode=post, mode=force)
+261/-0 
Additional files
3 files
mysql_sql.go +7448/-7482
pre.result +0/-29   
pre.sql +0/-17   

@qodo-code-review
Copy link

qodo-code-review bot commented Dec 4, 2025

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
🟢
No security concerns identified No security vulnerabilities detected by AI analysis. Human verification advised for critical code.
Ticket Compliance
🎫 No ticket provided
  • Create ticket/issue
Codebase Duplication Compliance
Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance
🟢
Generic: Meaningful Naming and Self-Documenting Code

Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Error Handling

Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Logging Practices

Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Comprehensive Audit Trails

Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.

Status:
No Auditing: The new vector index preparation and application flows add operational behaviors (e.g.,
variable resolution, guard registration, index application) without any audit logging of
these critical actions or outcomes.

Referred Code
func (builder *QueryBuilder) prepareIvfIndexContext(vecCtx *vectorSortContext, multiTableIndex *MultiTableIndex) (*ivfIndexContext, error) {
	if vecCtx == nil || multiTableIndex == nil {
		return nil, nil
	}
	if vecCtx.distFnExpr == nil {
		return nil, nil
	}

	// RankOption.Mode controls vector index behavior:
	// - "pre": Enable vector index with BloomFilter pushdown for better filtering
	// - "force": Disable vector index, force full table scan (for debugging/comparison)
	// - nil/other: Enable vector index with default behavior
	if vecCtx.rankOption != nil && vecCtx.rankOption.Mode == "force" {
		return nil, nil
	}

	metaDef := multiTableIndex.IndexDefs[catalog.SystemSI_IVFFLAT_TblType_Metadata]
	idxDef := multiTableIndex.IndexDefs[catalog.SystemSI_IVFFLAT_TblType_Centroids]
	entriesDef := multiTableIndex.IndexDefs[catalog.SystemSI_IVFFLAT_TblType_Entries]
	if metaDef == nil || idxDef == nil || entriesDef == nil {
		return nil, nil


 ... (clipped 94 lines)

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Robust Error Handling and Edge Case Management

Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation

Status:
Silent Skips: Several early returns convert errors or invalid states into nil (e.g., parsing algo
params, opType mismatch), potentially hiding actionable issues and making debugging
difficult.

Referred Code
opTypeAst, err := sonic.Get([]byte(metaDef.IndexAlgoParams), catalog.IndexAlgoParamOpType)
if err != nil {
	return nil, nil
}
opType, err := opTypeAst.StrictString()
if err != nil {
	return nil, nil
}

origFuncName := vecCtx.distFnExpr.Func.ObjName
if opType != metric.DistFuncOpTypes[origFuncName] {
	return nil, nil
}

keyPart := idxDef.Parts[0]
partPos := vecCtx.scanNode.TableDef.Name2ColIndex[keyPart]
_, vecLitArg, found := builder.getArgsFromDistFn(vecCtx.distFnExpr, partPos)
if !found {
	return nil, nil
}

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Security-First Input Validation and Data Handling

Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities

Status:
Option Validation: RankOption parsing accepts arbitrary key-value options at grammar level without validation
here, relying on later parsing, which may allow unsupported options to pass through
silently.

Referred Code
        }
    }

    $$ = &tree.RankOption{
        Option: optionMap,
    }
}

Learn more about managing compliance generic rules or creating your own custom rules

  • Update
Compliance status legend 🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

@qodo-code-review
Copy link

qodo-code-review bot commented Dec 4, 2025

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
High-level
Re-evaluate the global index guard

The suggestion proposes replacing the complex, global "special index guard" with
a more localized or cost-based method for deciding when to apply regular
indexes, aiming for a simpler and more robust design.

Examples:

pkg/sql/plan/apply_indices.go [102-214]
func (builder *QueryBuilder) prepareSpecialIndexGuards(rootID int32) {
	if builder.protectedScans == nil {
		builder.protectedScans = make(map[int32]int)
	} else {
		for k := range builder.protectedScans {
			delete(builder.protectedScans, k)
		}
	}

	if builder.projectSpecialGuards == nil {

 ... (clipped 103 lines)
pkg/sql/plan/query_builder.go [2079-2081]
		builder.prepareSpecialIndexGuards(rootID)
		rootID, err = builder.applyIndices(rootID, colRefCnt, make(map[[2]int32]*plan.Expr))
		builder.resetSpecialIndexGuards()

Solution Walkthrough:

Before:

// In QueryBuilder.createQuery
builder.prepareSpecialIndexGuards(rootID) // Pass 1: Traverse tree, find special indexes, populate builder.protectedScans
rootID, err = builder.applyIndices(rootID, ...)
builder.resetSpecialIndexGuards()

// In QueryBuilder.collectSpecialIndexGuards
// ... recursively traverses tree
// if node is PROJECT:
//   if detectVectorGuard(node):
//     registerProjectGuard(...) -> populates builder.protectedScans

// In QueryBuilder.applyIndicesForFilters
func applyIndicesForFilters(node *plan.Node, ...) {
    if builder.isScanProtected(node.NodeId) { // Check global guard
        return // Skip applying regular indexes
    }
    // ... apply regular filter indexes
}

After:

// In QueryBuilder.createQuery
// No more global guard preparation/reset
rootID, err = builder.applyIndices(rootID, ...)

// In QueryBuilder.applyIndices (conceptual change)
func applyIndices(nodeID int32, ...) {
    // ...
    // Try to apply vector indexes first for a project node
    newNodeID, applied := builder.applyVectorIndices(nodeID, ...)
    if applied {
        // The vector index application has transformed the plan.
        // The original scan node is either replaced or tagged,
        // so subsequent index logic won't apply to it.
        return newNodeID
    }

    // ...
    // If no vector index was applied, proceed with regular index application.
    // The check is now implicit in the plan structure, not a global map.
    return builder.applyIndicesForFilters(nodeID, ...)
}
Suggestion importance[1-10]: 9

__

Why: This is a critical design suggestion that correctly identifies the complexity and potential brittleness of the new global special index guard mechanism, proposing a more robust and maintainable localized alternative.

High
Possible issue
Fix invalid index operator type

Change the invalid index operator type vector_inner_product to the correct
vector_ip_ops to ensure the inner product test case runs against an indexed
table as intended.

test/distributed/cases/vector/vector_ivf_mode_advanced.sql [212]

-CREATE INDEX idx_inner USING ivfflat ON inner_product_test(vec) lists=4 op_type 'vector_inner_product';
+CREATE INDEX idx_inner USING ivfflat ON inner_product_test(vec) lists=4 op_type 'vector_ip_ops';
  • Apply / Chat
Suggestion importance[1-10]: 8

__

Why: The suggestion correctly identifies an invalid op_type that causes an error, as confirmed by the .result file, and prevents a test case from validating its intended indexed query path.

Medium
Fix mismatched index and distance

Modify the cosine_distance test to query the mini_embed_data table, which has a
compatible vector_cosine_ops index, instead of the mini_vector_data table with
its vector_l2_ops index.

test/distributed/cases/vector/vector_ivf_mode.sql [4-61]

 CREATE INDEX `idx_vec` USING ivfflat ON `mini_vector_data` (`vec`) lists = 16 op_type 'vector_l2_ops';
 ...
 -- Test Case: DESC ordering with cosine_distance and mode=pre
-WITH q AS (SELECT id, text, cosine_distance(vec, '[0.1,-0.2,0.3,0.4,-0.1,0.2,0.0,0.5]') AS dist FROM mini_vector_data) SELECT * FROM q ORDER BY dist DESC LIMIT 3 by rank with option 'mode=pre';
+WITH q AS (SELECT id, content, cosine_distance(embedding, '[0.1,-0.2,0.3,0.4,-0.1,0.2,0.0,0.5]') AS dist FROM mini_embed_data) SELECT * FROM q ORDER BY dist DESC LIMIT 3 by rank with option 'mode=pre';

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 8

__

Why: The suggestion correctly identifies a test case that uses a cosine_distance function on a table with an l2_distance index, which means the test is not validating the intended indexed query path.

Medium
Use safe type assertion

Replace the unsafe type assertion nThread.(int64) with a safe "comma ok"
assertion to prevent a potential panic if the system variable ivf_threads_search
has an unexpected type.

pkg/sql/plan/apply_indices_ivfflat.go [89-121]

-	nThread, err := builder.compCtx.ResolveVariable("ivf_threads_search", true, false)
+	nThreadAny, err := builder.compCtx.ResolveVariable("ivf_threads_search", true, false)
 	if err != nil {
 		return nil, err
 	}
+	nThread, ok := nThreadAny.(int64)
+	if !ok {
+		return nil, moerr.NewInternalErrorNoCtx("ResolveVariable: ivf_threads_search is not int64")
+	}
 ...
-		nThread:      nThread.(int64),
+		nThread:      nThread,

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies a potential panic due to an unsafe type assertion and proposes a robust fix, which aligns with best practices and existing patterns in the codebase.

Medium
Search both children of joins

Modify buildPkExprFromNode to search both children of a JOIN node for the
primary key, not just the left child.

pkg/sql/plan/apply_indices_ivfflat.go [551-554]

 	case plan.Node_JOIN:
 		if len(node.Children) > 0 {
-			return builder.buildPkExprFromNode(node.Children[0], pkType, pkName)
+			if pkExpr := builder.buildPkExprFromNode(node.Children[0], pkType, pkName); pkExpr != nil {
+				return pkExpr
+			}
+		}
+		if len(node.Children) > 1 {
+			return builder.buildPkExprFromNode(node.Children[1], pkType, pkName)
 		}
  • Apply / Chat
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies that the function only searches the left child of a JOIN node, which could miss the primary key if it's in the right child, and proposes a correct fix to search both children.

Medium
  • Update

@matrix-meow matrix-meow added the size/XXL Denotes a PR that changes 2000+ lines label Dec 4, 2025
@mergify
Copy link
Contributor

mergify bot commented Dec 5, 2025

Merge Queue Status

✅ The pull request has been merged

This pull request spent 56 minutes 37 seconds in the queue, including 56 minutes 22 seconds running CI.
The checks were run in-place.

Required conditions to merge
  • #approved-reviews-by >= 1 [🛡 GitHub branch protection]
  • #changes-requested-reviews-by = 0 [🛡 GitHub branch protection]
  • #review-threads-unresolved = 0 [🛡 GitHub branch protection]
  • branch-protection-review-decision = APPROVED [🛡 GitHub branch protection]
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Compose CI / multi cn e2e bvt test docker compose(PESSIMISTIC)
    • check-neutral = Matrixone Compose CI / multi cn e2e bvt test docker compose(PESSIMISTIC)
    • check-skipped = Matrixone Compose CI / multi cn e2e bvt test docker compose(PESSIMISTIC)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Standlone CI / Multi-CN e2e BVT Test on Linux/x64(LAUNCH, PROXY)
    • check-neutral = Matrixone Standlone CI / Multi-CN e2e BVT Test on Linux/x64(LAUNCH, PROXY)
    • check-skipped = Matrixone Standlone CI / Multi-CN e2e BVT Test on Linux/x64(LAUNCH, PROXY)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Standlone CI / e2e BVT Test on Linux/x64(LAUNCH, PESSIMISTIC)
    • check-neutral = Matrixone Standlone CI / e2e BVT Test on Linux/x64(LAUNCH, PESSIMISTIC)
    • check-skipped = Matrixone Standlone CI / e2e BVT Test on Linux/x64(LAUNCH, PESSIMISTIC)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone CI / SCA Test on Ubuntu/x86
    • check-neutral = Matrixone CI / SCA Test on Ubuntu/x86
    • check-skipped = Matrixone CI / SCA Test on Ubuntu/x86
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone CI / UT Test on Ubuntu/x86
    • check-neutral = Matrixone CI / UT Test on Ubuntu/x86
    • check-skipped = Matrixone CI / UT Test on Ubuntu/x86
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Compose CI / multi cn e2e bvt test docker compose(Optimistic/PUSH)
    • check-neutral = Matrixone Compose CI / multi cn e2e bvt test docker compose(Optimistic/PUSH)
    • check-skipped = Matrixone Compose CI / multi cn e2e bvt test docker compose(Optimistic/PUSH)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Standlone CI / e2e BVT Test on Linux/x64(LAUNCH,Optimistic)
    • check-neutral = Matrixone Standlone CI / e2e BVT Test on Linux/x64(LAUNCH,Optimistic)
    • check-skipped = Matrixone Standlone CI / e2e BVT Test on Linux/x64(LAUNCH,Optimistic)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Upgrade CI / Compatibility Test With Target on Linux/x64(LAUNCH)
    • check-neutral = Matrixone Upgrade CI / Compatibility Test With Target on Linux/x64(LAUNCH)
    • check-skipped = Matrixone Upgrade CI / Compatibility Test With Target on Linux/x64(LAUNCH)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Utils CI / Coverage
    • check-neutral = Matrixone Utils CI / Coverage
    • check-skipped = Matrixone Utils CI / Coverage

@mergify mergify bot added the queued label Dec 5, 2025
@mergify mergify bot merged commit 21c74cc into matrixorigin:main Dec 5, 2025
19 checks passed
@mergify mergify bot removed the queued label Dec 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Review effort 4/5 size/XXL Denotes a PR that changes 2000+ lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants