Skip to content

Conversation

@gouhongshen
Copy link
Contributor

@gouhongshen gouhongshen commented Dec 4, 2025

What type of PR is this?

  • API-change
  • BUG
  • Improvement
  • Documentation
  • Feature
  • Test and CI
  • Code Refactoring

Which issue(s) this PR fixes:

issue ##21979

What this PR does / why we need it:

  1. Change the minimal unit in single SQL size from batch to rows.
  2. Support validating the stage directory.
  3. data branch create table/database

@qodo-code-review
Copy link

qodo-code-review bot commented Dec 4, 2025

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
SQL construction risk

Description: Using a regex to extract destination and source from execCtx.input.sql and then formatting
into a raw SQL string may risk malformed input or edge cases; ensure the identifiers and
snapshot options are properly parsed and quoted to prevent injection via crafted table
names or options.
data_branch.go [440-455]

Referred Code
//data branch create table xxx from yyy snap_opt to_account_op;
re := regexp.MustCompile(`(?i)^DATA\s+BRANCH\s+CREATE\s+TABLE\s+(\S+)\s+FROM\s+(.+?);?$`)
srcAndDst := re.FindStringSubmatch(execCtx.input.sql)
if srcAndDst == nil {
	return moerr.NewInternalErrorNoCtxf("cannot find src and dst table: %s", execCtx.input.sql)
}

sql := fmt.Sprintf("CREATE TABLE %s CLONE %s", srcAndDst[1], srcAndDst[2])

execCtx.reqCtx = context.WithValue(execCtx.reqCtx, tree.CloneLevelCtxKey{}, tree.NormalCloneLevelTable)

tempExecCtx = &ExecCtx{
	reqCtx: execCtx.reqCtx,
	input:  &UserInput{sql: sql},
}
Ticket Compliance
🟡
🎫 #21979
🟢 Add support for new data branching syntax in SQL (e.g., data branch create ...).
Ensure end-to-end execution of new branching syntax for tables and databases.
Provide tests demonstrating the new branching syntax works across scenarios, including
with snapshots.
Maintain or improve related functionality such as diff/merge to work with new syntax.
Validate stage/output directory paths more robustly when exporting diffs/files.
Prefer row-based batching limits for SQL construction instead of batch-size based limits.
Persist metadata/receipts for branch tracking when performing clone/branch operations.
Validate that the regex-based parsing of source/destination in dataBranchCreateTable
covers all valid identifier and snapshot option formats and edge cases.
Confirm cross-account clone/branch behavior matches product requirements and permissions
in multi-tenant deployments.
Run integration tests against object storage providers to ensure List/Stat-based directory
validation works across S3-compatible and other backends.
Codebase Duplication Compliance
Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance
🟢
Generic: Meaningful Naming and Self-Documenting Code

Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Logging Practices

Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Comprehensive Audit Trails

Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.

Status:
Missing audit logs: New critical operations (e.g., dataBranchCreateTable, dataBranchCreateDatabase, directory
validation decisions) execute without any added auditing/log entries capturing who
performed the action, the target resources, and outcomes.

Referred Code
func dataBranchCreateTable(
	execCtx *ExecCtx,
	ses *Session,
	stmt *tree.DataBranchCreateTable,
) (err error) {
	var (
		bh          BackgroundExec
		deferred    func(error) error
		receipt     cloneReceipt
		cloneStmt   *tree.CloneTable
		tempExecCtx *ExecCtx
	)

	if bh, deferred, err = getBackExecutor(execCtx.reqCtx, ses); err != nil {
		return
	}

	defer func() {
		if deferred != nil {
			err = deferred(err)
		}


 ... (clipped 74 lines)

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Robust Error Handling and Edge Case Management

Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation

Status:
Regex parsing risk: dataBranchCreateTable relies on a regex over raw SQL input and returns internal errors on
mismatch, which may be brittle and lacks explicit edge-case handling for unusual
whitespace/quoting or malformed inputs.

Referred Code
re := regexp.MustCompile(`(?i)^DATA\s+BRANCH\s+CREATE\s+TABLE\s+(\S+)\s+FROM\s+(.+?);?$`)
srcAndDst := re.FindStringSubmatch(execCtx.input.sql)
if srcAndDst == nil {
	return moerr.NewInternalErrorNoCtxf("cannot find src and dst table: %s", execCtx.input.sql)
}

sql := fmt.Sprintf("CREATE TABLE %s CLONE %s", srcAndDst[1], srcAndDst[2])

execCtx.reqCtx = context.WithValue(execCtx.reqCtx, tree.CloneLevelCtxKey{}, tree.NormalCloneLevelTable)

tempExecCtx = &ExecCtx{
	reqCtx: execCtx.reqCtx,
	input:  &UserInput{sql: sql},
}

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Error Handling

Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.

Status:
Detailed internal error: Errors like "cannot find src and dst table: <full SQL>" and directory
validation messages echo user-supplied SQL/paths which could expose internal details if
surfaced to end users.

Referred Code
if srcAndDst == nil {
	return moerr.NewInternalErrorNoCtxf("cannot find src and dst table: %s", execCtx.input.sql)
}

sql := fmt.Sprintf("CREATE TABLE %s CLONE %s", srcAndDst[1], srcAndDst[2])

execCtx.reqCtx = context.WithValue(execCtx.reqCtx, tree.CloneLevelCtxKey{}, tree.NormalCloneLevelTable)

tempExecCtx = &ExecCtx{
	reqCtx: execCtx.reqCtx,
	input:  &UserInput{sql: sql},
}

if receipt, err = handleCloneTable(tempExecCtx, ses, cloneStmt, bh); err != nil {
	return
}

if err = updateBranchMetaTable(execCtx.reqCtx, ses, bh, receipt); err != nil {
	return
}



 ... (clipped 15 lines)

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Security-First Input Validation and Data Handling

Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities

Status:
Path validation nuances: The new validateOutputDirPath logic introduces List-based detection on service roots and
parents which may vary across backends and could allow ambiguous paths; additional
normalization/authorization checks may be needed.

Referred Code
if len(strings.Trim(targetPath, "/")) == 0 {
	// service root: try listing to ensure the bucket/root is reachable
	for _, err = range targetFS.List(ctx, targetPath) {
		if err != nil {
			if moerr.IsMoErrCode(err, moerr.ErrFileNotFound) {
				return moerr.NewInvalidInputNoCtxf("output directory %s does not exist", inputDirPath)
			}
			return err
		}
		// any entry means list works
		break
	}
	return nil
}

if entry, err = targetFS.StatFile(ctx, targetPath); err != nil {
	if moerr.IsMoErrCode(err, moerr.ErrFileNotFound) {
		// fallthrough to List-based directory detection
	} else {
		return
	}


 ... (clipped 42 lines)

Learn more about managing compliance generic rules or creating your own custom rules

  • Update
Compliance status legend 🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

@gouhongshen gouhongshen changed the title update data branch: use row-based size limit and validate stage directory update data branch: data branch create table and database Dec 4, 2025
@qodo-code-review
Copy link

qodo-code-review bot commented Dec 4, 2025

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
General
Use parsed statement to build SQL

Reconstruct the CLONE TABLE SQL statement from the parsed
tree.DataBranchCreateTable statement instead of using a regular expression on
the raw input SQL.

pkg/frontend/data_branch.go [440-447]

-	//data branch create table xxx from yyy snap_opt to_account_op;
-	re := regexp.MustCompile(`(?i)^DATA\s+BRANCH\s+CREATE\s+TABLE\s+(\S+)\s+FROM\s+(.+?);?$`)
-	srcAndDst := re.FindStringSubmatch(execCtx.input.sql)
-	if srcAndDst == nil {
-		return moerr.NewInternalErrorNoCtxf("cannot find src and dst table: %s", execCtx.input.sql)
+	sql := fmt.Sprintf("CREATE TABLE %s CLONE %s",
+		stmt.CreateTable.Table.String(),
+		stmt.SrcTable.String(),
+	)
+	if stmt.AtTsExpr != nil {
+		sql += " " + stmt.AtTsExpr.String()
+	}
+	if stmt.ToAccountOpt != nil {
+		sql += " " + stmt.ToAccountOpt.String()
 	}
 
-	sql := fmt.Sprintf("CREATE TABLE %s CLONE %s", srcAndDst[1], srcAndDst[2])
-
  • Apply / Chat
Suggestion importance[1-10]: 6

__

Why: The suggestion correctly points out that using a regular expression to parse SQL is fragile and proposes a more robust approach by building the SQL from the parsed AST, which is a good practice.

Low
  • Update

@mergify mergify bot added the queued label Dec 8, 2025
@mergify mergify bot merged commit 0f0d389 into matrixorigin:main Dec 8, 2025
19 checks passed
@mergify
Copy link
Contributor

mergify bot commented Dec 8, 2025

Merge Queue Status

✅ The pull request has been merged

This pull request spent 6 seconds in the queue, with no time running CI.
The checks were run in-place.

Required conditions to merge
  • #approved-reviews-by >= 1 [🛡 GitHub branch protection]
  • #changes-requested-reviews-by = 0 [🛡 GitHub branch protection]
  • #review-threads-unresolved = 0 [🛡 GitHub branch protection]
  • branch-protection-review-decision = APPROVED [🛡 GitHub branch protection]
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Compose CI / multi cn e2e bvt test docker compose(PESSIMISTIC)
    • check-neutral = Matrixone Compose CI / multi cn e2e bvt test docker compose(PESSIMISTIC)
    • check-skipped = Matrixone Compose CI / multi cn e2e bvt test docker compose(PESSIMISTIC)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Standlone CI / Multi-CN e2e BVT Test on Linux/x64(LAUNCH, PROXY)
    • check-neutral = Matrixone Standlone CI / Multi-CN e2e BVT Test on Linux/x64(LAUNCH, PROXY)
    • check-skipped = Matrixone Standlone CI / Multi-CN e2e BVT Test on Linux/x64(LAUNCH, PROXY)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Standlone CI / e2e BVT Test on Linux/x64(LAUNCH, PESSIMISTIC)
    • check-neutral = Matrixone Standlone CI / e2e BVT Test on Linux/x64(LAUNCH, PESSIMISTIC)
    • check-skipped = Matrixone Standlone CI / e2e BVT Test on Linux/x64(LAUNCH, PESSIMISTIC)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone CI / SCA Test on Ubuntu/x86
    • check-neutral = Matrixone CI / SCA Test on Ubuntu/x86
    • check-skipped = Matrixone CI / SCA Test on Ubuntu/x86
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone CI / UT Test on Ubuntu/x86
    • check-neutral = Matrixone CI / UT Test on Ubuntu/x86
    • check-skipped = Matrixone CI / UT Test on Ubuntu/x86
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Compose CI / multi cn e2e bvt test docker compose(Optimistic/PUSH)
    • check-neutral = Matrixone Compose CI / multi cn e2e bvt test docker compose(Optimistic/PUSH)
    • check-skipped = Matrixone Compose CI / multi cn e2e bvt test docker compose(Optimistic/PUSH)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Standlone CI / e2e BVT Test on Linux/x64(LAUNCH,Optimistic)
    • check-neutral = Matrixone Standlone CI / e2e BVT Test on Linux/x64(LAUNCH,Optimistic)
    • check-skipped = Matrixone Standlone CI / e2e BVT Test on Linux/x64(LAUNCH,Optimistic)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Upgrade CI / Compatibility Test With Target on Linux/x64(LAUNCH)
    • check-neutral = Matrixone Upgrade CI / Compatibility Test With Target on Linux/x64(LAUNCH)
    • check-skipped = Matrixone Upgrade CI / Compatibility Test With Target on Linux/x64(LAUNCH)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Utils CI / Coverage
    • check-neutral = Matrixone Utils CI / Coverage
    • check-skipped = Matrixone Utils CI / Coverage

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/bug Something isn't working kind/enhancement kind/feature Review effort 4/5 size/XL Denotes a PR that changes [1000, 1999] lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants