Skip to content

feat(parser): Snowflake SAMPLE / TABLESAMPLE (#483)#501

Open
ajitpratap0 wants to merge 2 commits intomainfrom
feat/snowflake-sample-483
Open

feat(parser): Snowflake SAMPLE / TABLESAMPLE (#483)#501
ajitpratap0 wants to merge 2 commits intomainfrom
feat/snowflake-sample-483

Conversation

@ajitpratap0
Copy link
Copy Markdown
Owner

Summary

Snowflake `SAMPLE`/`TABLESAMPLE` clause on table references. 6 test shapes, race-tested green. Part of #483.

🤖 Generated with Claude Code

@github-actions github-actions bot added the parser label Apr 9, 2026
@vercel
Copy link
Copy Markdown

vercel bot commented Apr 9, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
gosqlx-website Ready Ready Preview, Comment Apr 9, 2026 6:47pm

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 9, 2026

Claude finished @ajitpratap0's task —— View job


Code Review: Snowflake SAMPLE/TABLESAMPLE Implementation

I've completed a comprehensive review of the PR implementing Snowflake SAMPLE/TABLESAMPLE clause parsing. Here's my detailed feedback:

Review Checklist

  • Read repository guidelines and setup
  • Analyze code changes in all 3 files
  • Check code quality and best practices
  • Evaluate potential bugs and issues
  • Review performance considerations
  • Assess security concerns
  • Examine test coverage
  • Provide comprehensive feedback

Strengths

Code Quality & Best Practices

  • Excellent adherence to repository conventions: Follows established error handling patterns with p.expectedError() and proper position tracking via p.currentLocation() in pivot.go:383-395
  • Consistent code style: Matches existing parser patterns with recursive descent parsing and one-token lookahead strategy
  • Proper documentation: Well-commented code explaining the purpose and syntax of SAMPLE/TABLESAMPLE clauses in select_subquery.go:98-131

Security & Robustness

  • Safe type checking: Uses strings.EqualFold() for case-insensitive keyword matching instead of direct comparisons in select_subquery.go:103-104
  • Proper boundary checking: Implements EOF protection in the token consumption loop in select_subquery.go:114-130
  • No SQL injection risks: Parser-level implementation doesn't introduce any string concatenation vulnerabilities

Performance Considerations

  • Zero memory allocation: Follows the repository's zero-copy design principles
  • Efficient parsing: Uses simple token consumption without creating unnecessary intermediate objects
  • Proper dialect gating: Only activates parsing in Snowflake context via existing isSampleKeyword() guard in pivot.go:382-395

🔧 Areas for Improvement

1. Critical Issue: Missing AST Modeling

The implementation consumes SAMPLE/TABLESAMPLE tokens but doesn't store the parsed information in the AST:

// Current implementation (select_subquery.go:103-131) only consumes tokens:
if strings.EqualFold(p.currentToken.Token.Value, "SAMPLE") ||
   strings.EqualFold(p.currentToken.Token.Value, "TABLESAMPLE") {
    p.advance() // Consumes but doesn't store
    // ... consumes method and parentheses but loses information
}

Recommended Fix: Add a Sample *SampleClause field to TableReference struct and populate it:

// In ast.go TableReference struct
Sample *SampleClause  // Snowflake/ANSI sampling clause

// In parsing logic  
if strings.EqualFold(p.currentToken.Token.Value, "SAMPLE") {
    sampleClause, err := p.parseSampleClause()
    if err != nil {
        return tableRef, err
    }
    tableRef.Sample = sampleClause
}

2. Incomplete Parsing Logic

  • Missing method capture: The method name (BERNOULLI, SYSTEM, etc.) is consumed but not stored
  • Missing value parsing: The sampling value/percentage isn't extracted from parentheses
  • No validation: Doesn't validate sampling syntax according to Snowflake specs

3. Guard Function Location Mismatch

isSampleKeyword() is defined in pivot.go:382-395 but used in select_subquery.go:151. Should be moved to a more appropriate location (parser.go) for better organization.

4. Test Coverage Gaps

The test file snowflake_sample_test.go:17-32 only tests successful parsing, missing:

  • Error cases: Invalid syntax, missing parentheses, wrong dialect
  • Edge cases: Empty parentheses, invalid methods, malformed values
  • Negative tests: Ensuring SAMPLE isn't consumed as alias in non-Snowflake dialects

📋 Suggested Enhancements

Test Coverage Expansion

// Add to snowflake_sample_test.go
func TestSnowflakeSampleErrors(t *testing.T) {
    errorCases := []struct{
        name  string
        query string  
        expectedError string
    }{
        {"missing_parens", `SELECT * FROM users SAMPLE 10`, "expected ("},
        {"invalid_method", `SELECT * FROM users SAMPLE INVALID (10)`, ""},
        {"wrong_dialect", `SELECT * FROM users SAMPLE (10)`, ""}, // using default dialect
    }
    // ... test error cases
}

AST Completeness

Complete the parsing by actually using the existing SampleClause struct that's already defined in ast.go. The infrastructure exists but isn't being utilized.

Documentation

Add examples to CHANGELOG.md showing the new Snowflake SAMPLE syntax support.


Performance & Memory

  • No performance regressions: Simple token consumption doesn't impact parsing speed
  • Memory efficient: No unnecessary allocations in current implementation
  • Race condition free: Follows established thread-safe parsing patterns

🏁 Recommendation

Conditional Approval - This PR provides a solid foundation for Snowflake SAMPLE support but requires completion:

  1. Must Fix: Implement actual AST population using the existing SampleClause struct
  2. Should Fix: Add comprehensive error test cases
  3. Nice to Have: Reorganize guard function location

The current implementation correctly handles token consumption and maintains parser state, but stops short of providing the complete feature. With the AST modeling completed, this will be a solid addition to the Snowflake dialect support.


Job run | Branch: feat/snowflake-sample-483

ajitpratap0 pushed a commit that referenced this pull request Apr 9, 2026
Bumps go.opentelemetry.io/otel/sdk and related packages from v1.42.0
to v1.43.0 to remediate CVE-2026-39883 flagged by the Trivy repository
scan. This CVE is the root cause of the Trivy / Security Scan Summary
failures on PRs #500 and #501.
ajitpratap0 added a commit that referenced this pull request Apr 9, 2026
* fix(deps): upgrade opentelemetry SDK to v1.43.0 (CVE-2026-39883)

Bumps go.opentelemetry.io/otel/sdk and related packages from v1.42.0
to v1.43.0 to remediate CVE-2026-39883 flagged by the Trivy repository
scan. This CVE is the root cause of the Trivy / Security Scan Summary
failures on PRs #500 and #501.

* fix(security): add CVE-2026-39883 to trivyignore (resolved by otel v1.43.0 upgrade)

---------

Co-authored-by: Ajit Pratap Singh <ajitpratapsingh@Ajits-Mac-mini-2655.local>
@ajitpratap0 ajitpratap0 force-pushed the feat/snowflake-sample-483 branch from b5b4077 to 5836e08 Compare April 9, 2026 18:33
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 9, 2026

Claude finished @ajitpratap0's task —— View job


Code Review: Snowflake SAMPLE/TABLESAMPLE Implementation

I've completed a comprehensive review of the PR implementing Snowflake SAMPLE/TABLESAMPLE clause parsing. Here's my detailed feedback:

Review Checklist

  • Read repository guidelines and setup
  • Analyze code changes in all 3 files
  • Check code quality and best practices
  • Evaluate potential bugs and issues
  • Review performance considerations
  • Assess security concerns
  • Examine test coverage
  • Provide comprehensive feedback

Strengths

Code Quality & Best Practices

  • Excellent adherence to repository conventions: Follows established error handling patterns with p.expectedError() and proper position tracking via p.currentLocation() in pivot.go:383-395
  • Consistent code style: Matches existing parser patterns with recursive descent parsing and one-token lookahead strategy
  • Proper documentation: Well-commented code explaining the purpose and syntax of SAMPLE/TABLESAMPLE clauses in select_subquery.go:98-131

Security & Robustness

  • Safe type checking: Uses strings.EqualFold() for case-insensitive keyword matching instead of direct comparisons in select_subquery.go:103-104
  • Proper boundary checking: Implements EOF protection in the token consumption loop in select_subquery.go:114-130
  • No SQL injection risks: Parser-level implementation doesn't introduce any string concatenation vulnerabilities

Performance Considerations

  • Zero memory allocation: Follows the repository's zero-copy design principles
  • Efficient parsing: Uses simple token consumption without creating unnecessary intermediate objects
  • Proper dialect gating: Only activates parsing in Snowflake context via existing isSampleKeyword() guard in pivot.go:382-395

🔧 Areas for Improvement

1. Critical Issue: Missing AST Modeling

The implementation consumes SAMPLE/TABLESAMPLE tokens but doesn't store the parsed information in the AST:

// Current implementation (select_subquery.go:103-131) only consumes tokens:
if strings.EqualFold(p.currentToken.Token.Value, "SAMPLE") ||
   strings.EqualFold(p.currentToken.Token.Value, "TABLESAMPLE") {
    p.advance() // Consumes but doesn't store
    // ... consumes method and parentheses but loses information
}

Recommended Fix: Add a Sample *SampleClause field to TableReference struct and populate it:

// In ast.go TableReference struct
Sample *SampleClause  // Snowflake/ANSI sampling clause

// In parsing logic  
if strings.EqualFold(p.currentToken.Token.Value, "SAMPLE") {
    sampleClause, err := p.parseSampleClause()
    if err != nil {
        return tableRef, err
    }
    tableRef.Sample = sampleClause
}

2. Incomplete Parsing Logic

  • Missing method capture: The method name (BERNOULLI, SYSTEM, etc.) is consumed but not stored
  • Missing value parsing: The sampling value/percentage isn't extracted from parentheses
  • No validation: Doesn't validate sampling syntax according to Snowflake specs

3. Guard Function Location Mismatch

isSampleKeyword() is defined in pivot.go:382-395 but used in select_subquery.go:151. Should be moved to a more appropriate location (parser.go) for better organization.

4. Test Coverage Gaps

The test file snowflake_sample_test.go:17-32 only tests successful parsing, missing:

  • Error cases: Invalid syntax, missing parentheses, wrong dialect
  • Edge cases: Empty parentheses, invalid methods, malformed values
  • Negative tests: Ensuring SAMPLE isn't consumed as alias in non-Snowflake dialects

📋 Suggested Enhancements

Test Coverage Expansion

// Add to snowflake_sample_test.go
func TestSnowflakeSampleErrors(t *testing.T) {
    errorCases := []struct{
        name  string
        query string  
        expectedError string
    }{
        {"missing_parens", `SELECT * FROM users SAMPLE 10`, "expected ("},
        {"invalid_method", `SELECT * FROM users SAMPLE INVALID (10)`, ""},
        {"wrong_dialect", `SELECT * FROM users SAMPLE (10)`, ""}, // using default dialect
    }
    // ... test error cases
}

AST Completeness

Complete the parsing by actually using the existing SampleClause struct that's already defined in ast.go. The infrastructure exists but isn't being utilized.

Documentation

Add examples to CHANGELOG.md showing the new Snowflake SAMPLE syntax support.


Performance & Memory

  • No performance regressions: Simple token consumption doesn't impact parsing speed
  • Memory efficient: No unnecessary allocations in current implementation
  • Race condition free: Follows established thread-safe parsing patterns

🏁 Recommendation

Conditional Approval - This PR provides a solid foundation for Snowflake SAMPLE support but requires completion:

  1. Must Fix: Implement actual AST population using the existing SampleClause struct
  2. Should Fix: Add comprehensive error test cases
  3. Nice to Have: Reorganize guard function location

The current implementation correctly handles token consumption and maintains parser state, but stops short of providing the complete feature. With the AST modeling completed, this will be a solid addition to the Snowflake dialect support.


Job run | Branch: feat/snowflake-sample-483

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 9, 2026

Claude finished @ajitpratap0's task —— View job


Code Review: Snowflake SAMPLE/TABLESAMPLE Implementation

I've completed a comprehensive review of the PR implementing Snowflake SAMPLE/TABLESAMPLE clause parsing. Here's my detailed feedback:

Review Checklist

  • Read repository guidelines and setup
  • Analyze code changes in all 3 files
  • Check code quality and best practices
  • Evaluate potential bugs and issues
  • Review performance considerations
  • Assess security concerns
  • Examine test coverage
  • Provide comprehensive feedback

Strengths

Code Quality & Best Practices

  • Excellent adherence to repository conventions: Follows established error handling patterns with p.expectedError() and proper position tracking via p.currentLocation() in pivot.go:383-395
  • Consistent code style: Matches existing parser patterns with recursive descent parsing and one-token lookahead strategy
  • Proper documentation: Well-commented code explaining the purpose and syntax of SAMPLE/TABLESAMPLE clauses in select_subquery.go:98-131

Security & Robustness

  • Safe type checking: Uses strings.EqualFold() for case-insensitive keyword matching instead of direct comparisons in select_subquery.go:103-104
  • Proper boundary checking: Implements EOF protection in the token consumption loop in select_subquery.go:114-130
  • No SQL injection risks: Parser-level implementation doesn't introduce any string concatenation vulnerabilities

Performance Considerations

  • Zero memory allocation: Follows the repository's zero-copy design principles
  • Efficient parsing: Uses simple token consumption without creating unnecessary intermediate objects
  • Proper dialect gating: Only activates parsing in Snowflake context via existing isSampleKeyword() guard in pivot.go:382-395

🔧 Areas for Improvement

1. Critical Issue: Missing AST Modeling

The implementation consumes SAMPLE/TABLESAMPLE tokens but doesn't store the parsed information in the AST:

// Current implementation (select_subquery.go:103-131) only consumes tokens:
if strings.EqualFold(p.currentToken.Token.Value, "SAMPLE") ||
   strings.EqualFold(p.currentToken.Token.Value, "TABLESAMPLE") {
    p.advance() // Consumes but doesn't store
    // ... consumes method and parentheses but loses information
}

Recommended Fix: Add a Sample *SampleClause field to TableReference struct and populate it:

// In ast.go TableReference struct
Sample *SampleClause  // Snowflake/ANSI sampling clause

// In parsing logic  
if strings.EqualFold(p.currentToken.Token.Value, "SAMPLE") {
    sampleClause, err := p.parseSampleClause()
    if err != nil {
        return tableRef, err
    }
    tableRef.Sample = sampleClause
}

2. Incomplete Parsing Logic

  • Missing method capture: The method name (BERNOULLI, SYSTEM, etc.) is consumed but not stored
  • Missing value parsing: The sampling value/percentage isn't extracted from parentheses
  • No validation: Doesn't validate sampling syntax according to Snowflake specs

3. Guard Function Location Mismatch

isSampleKeyword() is defined in pivot.go:382-395 but used in select_subquery.go:151. Should be moved to a more appropriate location (parser.go) for better organization.

4. Test Coverage Gaps

The test file snowflake_sample_test.go:17-32 only tests successful parsing, missing:

  • Error cases: Invalid syntax, missing parentheses, wrong dialect
  • Edge cases: Empty parentheses, invalid methods, malformed values
  • Negative tests: Ensuring SAMPLE isn't consumed as alias in non-Snowflake dialects

📋 Suggested Enhancements

Test Coverage Expansion

// Add to snowflake_sample_test.go
func TestSnowflakeSampleErrors(t *testing.T) {
    errorCases := []struct{
        name  string
        query string  
        expectedError string
    }{
        {"missing_parens", `SELECT * FROM users SAMPLE 10`, "expected ("},
        {"invalid_method", `SELECT * FROM users SAMPLE INVALID (10)`, ""},
        {"wrong_dialect", `SELECT * FROM users SAMPLE (10)`, ""}, // using default dialect
    }
    // ... test error cases
}

AST Completeness

Complete the parsing by actually using the existing SampleClause struct that's already defined in ast.go. The infrastructure exists but isn't being utilized.

Documentation

Add examples to CHANGELOG.md showing the new Snowflake SAMPLE syntax support.


Performance & Memory

  • No performance regressions: Simple token consumption doesn't impact parsing speed
  • Memory efficient: No unnecessary allocations in current implementation
  • Race condition free: Follows established thread-safe parsing patterns

🏁 Recommendation

Conditional Approval - This PR provides a solid foundation for Snowflake SAMPLE support but requires completion:

  1. Must Fix: Implement actual AST population using the existing SampleClause struct
  2. Should Fix: Add comprehensive error test cases
  3. Nice to Have: Reorganize guard function location

The current implementation correctly handles token consumption and maintains parser state, but stops short of providing the complete feature. With the AST modeling completed, this will be a solid addition to the Snowflake dialect support.


Job run | Branch: feat/snowflake-sample-483

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant