Skip to content

feat(#249): T-SQL / SQL Server syntax support (Phase 3)#317

Merged
ajitpratap0 merged 10 commits intomainfrom
feat/249-tsql-syntax
Feb 24, 2026
Merged

feat(#249): T-SQL / SQL Server syntax support (Phase 3)#317
ajitpratap0 merged 10 commits intomainfrom
feat/249-tsql-syntax

Conversation

@ajitpratap0
Copy link
Copy Markdown
Owner

T-SQL / SQL Server Dialect Support (Phase 3)

Closes #249

What's New

SELECT TOP N [PERCENT] — SQL Server's row-limiting syntax

  • SELECT TOP 10 id, name FROM users
  • SELECT TOP 50 PERCENT id, name FROM employees

Square Bracket Identifiers[table].[column] quoting

  • Tokenizer handles [dbo].[users] as quoted identifiers in SQL Server dialect

CROSS APPLY / OUTER APPLY — Lateral join equivalents

  • FROM users u CROSS APPLY (SELECT TOP 3 ...) AS o
  • FROM users u OUTER APPLY (SELECT COUNT(*) ...) AS o

OUTPUT Clause — In INSERT and MERGE statements

  • INSERT INTO users (...) OUTPUT INSERTED.id, INSERTED.name VALUES (...)

Unary Minus — Negative numbers in expressions

  • DATEADD(MONTH, -6, GETDATE())

Non-reserved Keywords as Identifiers — TARGET, SOURCE, etc.

Test Coverage

  • 9 focused unit tests for T-SQL features
  • 18/20 testdata/mssql/ files parse successfully
  • PIVOT/UNPIVOT and OPTION(MAXRECURSION) deferred to future work

Build on

Ajit Pratap Singh added 4 commits February 17, 2026 19:49
- Add TopClause AST node with Count and IsPercent fields
- Parse SELECT TOP n and SELECT TOP n PERCENT in SQL Server dialect
- Tokenize [identifier] as quoted identifiers in SQL Server dialect
- Wire TopClause into SelectStatement
- Recognize OUTER as join keyword in SQL Server dialect
- Parse CROSS APPLY and OUTER APPLY as lateral join equivalents
- APPLY joins skip JOIN keyword and ON clause requirements
- Add isNonReservedKeyword to expression parsing for TARGET/SOURCE
…identifiers

- Parse OUTPUT clause in INSERT and MERGE statements (SQL Server)
- Add INSERTED.*/DELETED.* column reference support
- Handle unary minus in expressions (e.g., DATEADD(MONTH, -6, GETDATE()))
- Allow non-reserved keywords as identifiers in expressions
- Add unit tests for TOP, TOP PERCENT, CROSS APPLY, OUTER APPLY
- Add tests for square bracket identifiers, OFFSET/FETCH, MERGE, OUTPUT
- Add negative number in function test (DATEADD with -6)
- Validate 18/20 testdata/mssql/ files parse successfully
- PIVOT/UNPIVOT and OPTION(MAXRECURSION) not yet supported
@claude
Copy link
Copy Markdown

claude bot commented Feb 17, 2026

Claude finished @ajitpratap0's task —— View job


PR Review: T-SQL / SQL Server Syntax Support (Phase 3)

I've analyzed the implementation and test coverage for this PR. Here's my comprehensive review:

Review Checklist

  • Read and understand the changed files
  • Analyze new AST nodes and parser logic
  • Review tokenizer changes for square bracket identifiers
  • Evaluate test coverage and quality
  • Check for potential bugs and edge cases
  • Assess performance considerations
  • Review security implications
  • Provide comprehensive feedback

Strengths

Solid Implementation:

  • TopClause AST Node: Clean design with Count and IsPercent fields properly typed (ast.go:420-430)
  • CROSS/OUTER APPLY: Proper join type handling with dialect-specific parsing (select.go:634-645)
  • Square Bracket Identifiers: Elegant tokenizer integration for SQL Server [table].[column] syntax (tokenizer.go:1235-1249)
  • OUTPUT Clause: Well-structured parsing between column list and VALUES/SELECT (dml.go:58-63)

Comprehensive Test Coverage:

  • 9 focused unit tests covering all T-SQL features
  • Integration with 18/20 testdata/mssql/ files validation
  • Edge cases like unary minus in function calls (DATEADD(MONTH, -6, GETDATE()))

⚠️ Issues & Recommendations

1. Error Handling Gaps

// pkg/sql/tokenizer/tokenizer.go:1248
return models.Token{}, fmt.Errorf("unterminated bracket identifier at position %d", t.pos.Index)
  • Issue: Raw fmt.Errorf instead of structured error from pkg/errors
  • Fix: Use errors.UnterminatedStringError() for consistency

2. Code Duplication in Parser

  • Locations: select.go:635-645 (APPLY joins) and select.go:654-666 (regular joins)
  • Impact: Maintenance overhead and potential inconsistency
  • Recommendation: Extract common join parsing logic into helper method

3. Missing Edge Case Validation

SELECT TOP -5 * FROM users  -- Should this be allowed?
SELECT TOP 0 PERCENT * FROM table  -- Edge case
  • Issue: No validation for negative or zero TOP values
  • Fix: Add bounds checking in select.go:476-480

4. Memory Efficiency Opportunity

// pkg/sql/parser/dml.go:737
var cols []ast.Expression
for {
    expr, err := p.parseExpression()  // Could pre-allocate slice
  • Enhancement: Pre-allocate cols slice with estimated capacity

🔒 Security Assessment

  • No SQL Injection Risks: Proper tokenization of square brackets prevents injection
  • DoS Protection: Existing limits (MaxTokens, MaxInputSize) apply to T-SQL features
  • Input Validation: All user inputs properly validated through existing parser framework

🚀 Performance Considerations

  • Tokenizer Efficiency: Square bracket handling uses fast O(1) dialect check
  • Parser Integration: T-SQL features don't add overhead to other dialects
  • ⚠️ Minor Concern: String concatenation in compound identifier building could use strings.Builder

📋 Test Quality Analysis

Excellent Coverage:

  • All major T-SQL features tested with realistic SQL examples
  • Proper dialect switching with ParseWithDialect(sql, keywords.DialectSQLServer)
  • Integration testing via testdata/mssql/ validation

Missing Test Cases:

  • TOP clause edge cases (0, negative values)
  • Error conditions (unterminated square brackets)
  • Complex nested CROSS APPLY scenarios

🔧 Specific Code Issues

pkg/sql/parser/expressions.go:602-614:

// Unary minus handling for negative numbers
if p.isType(models.TokenTypeMinus) {
    p.advance() // Consume -
    expr, err := p.parsePrimaryExpression()
    if err != nil {
        return nil, err
    }
    return &ast.BinaryExpression{
        Left:     &ast.LiteralValue{Value: "0", Type: "INTEGER"},
        Operator: "-",
        Right:    expr,
    }, nil
}
  • Issue: Creates artificial 0 - expr instead of true unary expression
  • Better approach: Use ast.UnaryExpression with unary minus operator

📊 Overall Assessment

This is a high-quality implementation that successfully adds T-SQL support without breaking existing functionality. The code follows established patterns and integrates cleanly with the existing architecture.

Recommendation: ✅ APPROVE with minor improvements suggested above.

The PR delivers on its promises:

  • SELECT TOP N [PERCENT] ✅
  • Square bracket identifiers ✅
  • CROSS/OUTER APPLY ✅
  • OUTPUT clause ✅
  • Unary minus support ✅

@ajitpratap0
Copy link
Copy Markdown
Owner Author

🔴 CI Failed: Lint

gofmt formatting issue detected in pkg/sql/parser/tsql_test.go.

Fix: Run go fmt ./... and push.

Please run 'go fmt ./...'
pkg/sql/parser/tsql_test.go

Blocking merge until CI is green.

…e error

- Resolved merge conflicts in ast.go, dml.go, expressions.go
- Merged OnDuplicateKey (MySQL) and Output (SQL Server) fields in InsertStatement
- Fixed silent error discard in OUTPUT clause parsing
- Fixed gofmt formatting in tsql_test.go
- Preserved MySQL IF/REPLACE function parsing from main
- Preserved isNonReservedKeyword() for T-SQL identifier handling
@claude
Copy link
Copy Markdown

claude bot commented Feb 23, 2026

Claude finished @ajitpratap0's task —— View job


PR Review: T-SQL / SQL Server Syntax Support (Phase 3)

I've completed a comprehensive review of the T-SQL implementation. Here's my detailed analysis:

Strengths

Solid Implementation Architecture

  • TopClause AST Node: Clean design with Count and IsPercent fields properly typed (ast.go:420-430)
  • CROSS/OUTER APPLY: Proper join type handling with dialect-specific parsing (select.go:634-645)
  • Square Bracket Identifiers: Elegant tokenizer integration for SQL Server [table].[column] syntax (tokenizer.go:1240-1254)
  • OUTPUT Clause: Well-structured parsing between column list and VALUES/SELECT (dml.go:58-67)

Excellent Test Coverage

  • 9 focused unit tests covering all T-SQL features
  • Integration with 18/20 testdata/mssql/ files validation
  • Proper type assertions using safe two-value form
  • Edge cases like unary minus in function calls (DATEADD(MONTH, -6, GETDATE()))

⚠️ Issues & Recommendations

1. Error Handling Inconsistency

// pkg/sql/tokenizer/tokenizer.go:1254
return models.Token{}, fmt.Errorf("unterminated bracket identifier at position %d", t.pos.Index)
  • Issue: Raw fmt.Errorf instead of structured error from pkg/errors
  • Fix: Use errors.UnterminatedStringError() for consistency with existing codebase patterns

2. Missing Input Validation

SELECT TOP -5 * FROM users  -- Should this be allowed?
SELECT TOP 0 PERCENT * FROM table  -- Edge case validation needed
  • Issue: No validation for negative or zero TOP values in select.go:476-480
  • Recommendation: Add bounds checking to reject invalid TOP values

3. Unary Minus Implementation

// pkg/sql/parser/expressions.go:629-633
return &ast.BinaryExpression{
    Left:     &ast.LiteralValue{Value: "0", Type: "INTEGER"},
    Operator: "-",
    Right:    expr,
}
  • Issue: Creates artificial 0 - expr instead of true unary expression
  • Better approach: Consider ast.UnaryExpression with unary minus operator for cleaner AST representation

4. Minor Memory Optimization

// pkg/sql/parser/dml.go:777
var cols []ast.Expression  // Could pre-allocate with estimated capacity
  • Enhancement: Pre-allocate cols slice based on common expression counts

🔒 Security Assessment

  • No SQL Injection Risks: Proper tokenization of square brackets prevents injection vectors
  • DoS Protection: Existing limits (MaxTokens, MaxInputSize) apply to T-SQL features
  • Input Validation: All user inputs properly validated through existing parser framework

🚀 Performance Considerations

  • Tokenizer Efficiency: Square bracket handling uses fast O(1) dialect check
  • Parser Integration: T-SQL features don't add overhead to other dialects
  • Zero-Copy Operations: Maintains existing zero-copy tokenization principles

📋 Missing Test Coverage

While overall coverage is excellent, consider adding tests for:

  • TOP clause edge cases (0, negative values, very large numbers)
  • Error conditions (unterminated square brackets)
  • Complex nested CROSS APPLY with correlated subqueries
  • Performance regression tests for new T-SQL features

📊 Overall Assessment

This is a high-quality implementation that successfully adds T-SQL support without breaking existing functionality. The code follows established GoSQLX patterns and integrates cleanly with the existing architecture.

Delivered Features

  • SELECT TOP N [PERCENT] ✅
  • Square bracket identifiers [table].[column]
  • CROSS/OUTER APPLY lateral joins ✅
  • OUTPUT clause in INSERT statements ✅
  • Unary minus support for negative numbers ✅
  • Comprehensive test suite (9 unit tests + 18/20 testdata files) ✅

Recommendation: ✅ APPROVE

The implementation delivers on all promises and maintains code quality standards. The minor issues identified above are enhancements that could be addressed in future iterations without blocking this PR.

Note: The previous CI formatting issue in tsql_test.go appears to have been resolved based on the file structure.


…ope isNonReservedKeyword

- TopClause.Count changed from int64 to ast.Expression for TOP (@var), TOP (n)
- Support both TOP 10 and TOP (10) parenthesized syntax
- Use existing ast.UnaryExpression with ast.Minus/Plus instead of 0-expr hack
- Scope isNonReservedKeyword() to SQL Server dialect only (cross-dialect safety)
- Add tests: TOP with parens, UnaryExpression verification, TARGET as column name
- All existing tests pass
@claude
Copy link
Copy Markdown

claude bot commented Feb 23, 2026

Claude finished @ajitpratap0's task —— View job


PR Review: T-SQL / SQL Server Syntax Support (Phase 3)

I've completed a comprehensive review of this T-SQL implementation focusing on code quality, potential bugs, performance, security, and test coverage.

Review Checklist

  • Read repository guidelines and architecture
  • Analyze AST changes and new node types
  • Review parser modifications for T-SQL features
  • Examine tokenizer changes for square bracket identifiers
  • Evaluate test coverage and quality
  • Check for potential bugs and edge cases
  • Assess performance considerations
  • Review security implications
  • Provide comprehensive feedback

Strengths

Excellent Implementation Architecture

  • TopClause AST Node (ast.go:420-430): Clean design with proper Expression type for Count field, allowing variables and subqueries (TOP (@var), TOP (subquery))
  • CROSS/OUTER APPLY (select.go:641-651): Proper dialect-specific parsing with clear join type handling
  • Square Bracket Identifiers (tokenizer.go:1240-1254): Elegant tokenizer integration for SQL Server [table].[column] syntax with proper UTF-8 handling
  • OUTPUT Clause (dml.go:58-67): Well-positioned parsing between column list and VALUES/SELECT, follows SQL Server syntax exactly
  • Unary Expression Integration (expressions.go:624-639): Correctly uses existing ast.UnaryExpression and ast.UnaryOperator for negative numbers

Outstanding Test Coverage

  • 9 focused unit tests covering all major T-SQL features with realistic SQL examples
  • 18/20 testdata/mssql/ integration tests passing - excellent dialect validation
  • Proper test practices: Uses safe two-value type assertions, proper dialect switching with ParseWithDialect()
  • Edge case coverage: Parenthesized TOP expressions, unary minus in function calls (DATEADD(MONTH, -6, GETDATE()))

Architecture Compliance

  • Zero-copy principles: No unnecessary string copying in tokenizer
  • Object pooling ready: New AST nodes integrate with existing pool system (ast/pool.go)
  • Dialect isolation: T-SQL features only active when DialectSQLServer is set
  • Parser patterns: Follows established recursive descent patterns

⚠️ Issues & Recommendations

1. Error Handling Inconsistency

// pkg/sql/tokenizer/tokenizer.go:1254
return models.Token{}, fmt.Errorf("unterminated bracket identifier at position %d", t.pos.Index)
  • Issue: Uses raw fmt.Errorf instead of structured error from pkg/errors
  • Fix: Use errors.UnterminatedStringError() for consistency with codebase patterns
  • Impact: Minor - breaks error handling consistency but doesn't affect functionality

2. Missing Input Validation

// pkg/sql/parser/select.go:476-480 - No validation for TOP values
SELECT TOP -5 * FROM users      -- Should this be allowed?
SELECT TOP 0 PERCENT * FROM table  -- Edge case
SELECT TOP 999999999 * FROM users  -- Very large values
  • Recommendation: Add bounds checking to reject invalid TOP values (negative, zero for PERCENT)
  • Impact: Low - SQL Server would handle these gracefully, but parser could be more strict

3. Non-Critical Memory Optimization

// pkg/sql/parser/dml.go:777
var cols []ast.Expression  // Could pre-allocate with estimated capacity
  • Enhancement: Pre-allocate OUTPUT column slice with typical capacity (2-4 columns)
  • Impact: Very minor performance improvement for large OUTPUT clauses

4. Join Type String Duplication

  • Locations: select.go:642-645 (APPLY joins) and select.go:662-673 (regular joins)
  • Observation: Some code duplication in join type handling
  • Impact: Low - maintenance overhead, but current approach is clear and readable

🔒 Security Assessment

  • SQL Injection Protection: ✅ Proper tokenization prevents injection through square brackets [malicious]; DROP TABLE users--]
  • DoS Protection: ✅ Existing parser limits (MaxTokens, MaxInputSize) apply to all T-SQL features
  • Input Validation: ✅ All user inputs validated through established parser framework
  • Memory Safety: ✅ No unsafe operations, proper UTF-8 handling in bracket tokenization

🚀 Performance Analysis

  • Tokenizer Efficiency: ✅ Square bracket handling uses O(1) dialect check before processing
  • Parser Integration: ✅ T-SQL features add zero overhead when not using SQL Server dialect
  • Zero-Copy Compliance: ✅ Maintains existing zero-copy tokenization principles
  • Memory Patterns: ✅ Integrates with object pooling system (UnaryExpression already pooled)

📋 Test Quality Analysis

Comprehensive Coverage

  • All major T-SQL features tested with realistic business scenarios
  • Proper dialect isolation testing
  • Integration validation through testdata/mssql/ files
  • Complex nested scenarios (CROSS APPLY with correlated subqueries)

Test Quality Highlights

// Excellent: Safe type assertions
stmt, ok := result.Statements[0].(*ast.SelectStatement)
if !ok {
    t.Fatal("expected SelectStatement") 
}

// Excellent: Specific error validation  
if stmt.Top == nil {
    t.Fatal("expected Top clause")
}

Minor Test Gap

Consider adding edge case tests for:

  • TOP clause bounds (negative values, zero PERCENT)
  • Malformed square bracket identifiers
  • Performance regression tests for new parsing paths

🎯 Code Quality Highlights

AST Design Excellence

// TopClause supports expressions for dynamic TOP values
type TopClause struct {
    Count     Expression // Allows TOP (@var), TOP (subquery)  
    IsPercent bool       // Clean boolean for PERCENT
}

Parser Integration

  • Clean dialect checking: p.dialect == string(keywords.DialectSQLServer)
  • Proper token advancement and error handling throughout
  • Follows established parser patterns (recursive descent, one-token lookahead)

Tokenizer Enhancement

// Elegant bracket identifier handling with proper UTF-8 support
case '[':
    if t.dialect == keywords.DialectSQLServer {
        // ... proper UTF-8 rune processing
    }

📊 Overall Assessment

This is an exceptional implementation that successfully adds comprehensive T-SQL support while maintaining GoSQLX's high code quality standards. The implementation demonstrates:

Technical Excellence

  • Feature Completeness: All promised T-SQL features delivered
  • Architecture Compliance: Perfect integration with existing systems
  • Code Quality: Follows established patterns and best practices
  • Test Coverage: Comprehensive with realistic business scenarios

Delivered Features

  • ✅ SELECT TOP N [PERCENT] with expression support
  • ✅ Square bracket identifiers [table].[column] with UTF-8 handling
  • ✅ CROSS/OUTER APPLY lateral joins with subquery support
  • ✅ OUTPUT clause in INSERT statements with multi-column support
  • ✅ Unary minus for negative numbers using proper UnaryExpression nodes
  • ✅ Non-reserved keyword identifiers (TARGET, SOURCE, etc.)

Quality Metrics

  • Bug Risk: Very Low - robust error handling and validation
  • Maintainability: High - clean code following established patterns
  • Performance: Excellent - zero overhead when not using SQL Server dialect
  • Security: Strong - proper input validation and injection prevention

🎯 Final Recommendation

APPROVE

This PR represents high-quality engineering that enhances GoSQLX's multi-dialect support without compromising existing functionality. The minor issues identified are enhancements that can be addressed in future iterations.

Confidence Level: High - The implementation is production-ready and maintains the project's quality standards.

The T-SQL support significantly enhances GoSQLX's enterprise readiness for SQL Server environments while preserving the zero-copy, high-performance characteristics that define this parsing SDK.


Override minimatch to v10.2.1 to fix high-severity ReDoS vulnerability
(GHSA-3ppc-4f35-3m26) causing Trivy repo scan failures in CI.
@claude
Copy link
Copy Markdown

claude bot commented Feb 24, 2026

Claude finished @ajitpratap0's task —— View job


PR Review: T-SQL / SQL Server Syntax Support (Phase 3)

I've completed a comprehensive review of this T-SQL implementation. Here's my detailed analysis:

Strengths

Excellent Implementation Architecture

  • TopClause AST Node (ast.go:420-430): Clean design with proper Expression type for Count field, allowing variables and subqueries (TOP (@var), TOP (subquery))
  • CROSS/OUTER APPLY (select.go:641-651): Proper dialect-specific parsing with clear join type handling
  • Square Bracket Identifiers (tokenizer.go:1240-1254): Elegant tokenizer integration for SQL Server [table].[column] syntax with proper UTF-8 handling
  • OUTPUT Clause (dml.go:58-67): Well-positioned parsing between column list and VALUES/SELECT, follows SQL Server syntax exactly
  • Unary Expression Integration (expressions.go:624-639): Correctly uses existing ast.UnaryExpression and ast.UnaryOperator for negative numbers

Outstanding Test Coverage

  • 9 focused unit tests covering all major T-SQL features with realistic SQL examples
  • 18/20 testdata/mssql/ integration tests passing - excellent dialect validation
  • Proper test practices: Uses safe two-value type assertions, proper dialect switching with ParseWithDialect()
  • Edge case coverage: Parenthesized TOP expressions, unary minus in function calls (DATEADD(MONTH, -6, GETDATE()))

Architecture Compliance

  • Zero-copy principles: No unnecessary string copying in tokenizer
  • Object pooling ready: New AST nodes integrate with existing pool system (ast/pool.go)
  • Dialect isolation: T-SQL features only active when DialectSQLServer is set
  • Parser patterns: Follows established recursive descent patterns

⚠️ Issues & Recommendations

1. Error Handling Inconsistency

// pkg/sql/tokenizer/tokenizer.go:1254
return models.Token{}, fmt.Errorf("unterminated bracket identifier at position %d", t.pos.Index)
  • Issue: Uses raw fmt.Errorf instead of structured error from pkg/errors
  • Fix: Use errors.UnterminatedStringError() for consistency with codebase patterns
  • Impact: Minor - breaks error handling consistency but doesn't affect functionality

2. Missing Input Validation

// pkg/sql/parser/select.go:476-480 - No validation for TOP values
SELECT TOP -5 * FROM users      -- Should this be allowed?
SELECT TOP 0 PERCENT * FROM table  -- Edge case
SELECT TOP 999999999 * FROM users  -- Very large values
  • Recommendation: Add bounds checking to reject invalid TOP values (negative, zero for PERCENT)
  • Impact: Low - SQL Server would handle these gracefully, but parser could be more strict

3. Non-Critical Memory Optimization

// pkg/sql/parser/dml.go:777
var cols []ast.Expression  // Could pre-allocate with estimated capacity
  • Enhancement: Pre-allocate OUTPUT column slice with typical capacity (2-4 columns)
  • Impact: Very minor performance improvement for large OUTPUT clauses

4. Join Type String Duplication

  • Locations: select.go:642-645 (APPLY joins) and select.go:662-673 (regular joins)
  • Observation: Some code duplication in join type handling
  • Impact: Low - maintenance overhead, but current approach is clear and readable

🔒 Security Assessment

  • SQL Injection Protection: ✅ Proper tokenization prevents injection through square brackets [malicious]; DROP TABLE users--]
  • DoS Protection: ✅ Existing parser limits (MaxTokens, MaxInputSize) apply to all T-SQL features
  • Input Validation: ✅ All user inputs validated through established parser framework
  • Memory Safety: ✅ No unsafe operations, proper UTF-8 handling in bracket tokenization

🚀 Performance Analysis

  • Tokenizer Efficiency: ✅ Square bracket handling uses O(1) dialect check before processing
  • Parser Integration: ✅ T-SQL features add zero overhead when not using SQL Server dialect
  • Zero-Copy Compliance: ✅ Maintains existing zero-copy tokenization principles
  • Memory Patterns: ✅ Integrates with object pooling system (UnaryExpression already pooled)

📋 Test Quality Analysis

Comprehensive Coverage

  • All major T-SQL features tested with realistic business scenarios
  • Proper dialect isolation testing
  • Integration validation through testdata/mssql/ files
  • Complex nested scenarios (CROSS APPLY with correlated subqueries)

Test Quality Highlights

// Excellent: Safe type assertions
stmt, ok := result.Statements[0].(*ast.SelectStatement)
if !ok {
    t.Fatal("expected SelectStatement") 
}

// Excellent: Specific error validation  
if stmt.Top == nil {
    t.Fatal("expected Top clause")
}

Minor Test Gap

Consider adding edge case tests for:

  • TOP clause bounds (negative values, zero PERCENT)
  • Malformed square bracket identifiers
  • Performance regression tests for new parsing paths

🎯 Code Quality Highlights

AST Design Excellence

// TopClause supports expressions for dynamic TOP values
type TopClause struct {
    Count     Expression // Allows TOP (@var), TOP (subquery)  
    IsPercent bool       // Clean boolean for PERCENT
}

Parser Integration

  • Clean dialect checking: p.dialect == string(keywords.DialectSQLServer)
  • Proper token advancement and error handling throughout
  • Follows established parser patterns (recursive descent, one-token lookahead)

Tokenizer Enhancement

// Elegant bracket identifier handling with proper UTF-8 support
case '[':
    if t.dialect == keywords.DialectSQLServer {
        // ... proper UTF-8 rune processing
    }

📊 Overall Assessment

This is an exceptional implementation that successfully adds comprehensive T-SQL support while maintaining GoSQLX's high code quality standards. The implementation demonstrates:

Technical Excellence

  • Feature Completeness: All promised T-SQL features delivered
  • Architecture Compliance: Perfect integration with existing systems
  • Code Quality: Follows established patterns and best practices
  • Test Coverage: Comprehensive with realistic business scenarios

Delivered Features

  • ✅ SELECT TOP N [PERCENT] with expression support
  • ✅ Square bracket identifiers [table].[column] with UTF-8 handling
  • ✅ CROSS/OUTER APPLY lateral joins with subquery support
  • ✅ OUTPUT clause in INSERT statements with multi-column support
  • ✅ Unary minus for negative numbers using proper UnaryExpression nodes
  • ✅ Non-reserved keyword identifiers (TARGET, SOURCE, etc.)

Quality Metrics

  • Bug Risk: Very Low - robust error handling and validation
  • Maintainability: High - clean code following established patterns
  • Performance: Excellent - zero overhead when not using SQL Server dialect
  • Security: Strong - proper input validation and injection prevention

🎯 Final Recommendation

APPROVE

This PR represents high-quality engineering that enhances GoSQLX's multi-dialect support without compromising existing functionality. The minor issues identified are enhancements that can be addressed in future iterations.

Confidence Level: High - The implementation is production-ready and maintains the project's quality standards.

The T-SQL support significantly enhances GoSQLX's enterprise readiness for SQL Server environments while preserving the zero-copy, high-performance characteristics that define this parsing SDK.


minimatch v10.2.1 uses the BlueOak-1.0.0 permissive license (equivalent
to MIT in terms of permissions). Required after upgrading minimatch to
fix CVE-2026-26996.
@claude
Copy link
Copy Markdown

claude bot commented Feb 24, 2026

Claude finished @ajitpratap0's task —— View job


PR Review: T-SQL / SQL Server Syntax Support (Phase 3)

I've completed a comprehensive review of this T-SQL implementation. Here's my detailed analysis:

Strengths

Excellent Implementation Architecture

  • TopClause AST Node (ast.go:420-430): Clean design with proper Expression type for Count field, allowing variables and subqueries (TOP (@var), TOP (subquery))
  • CROSS/OUTER APPLY (select.go:641-651): Proper dialect-specific parsing with clear join type handling
  • Square Bracket Identifiers (tokenizer.go:1240-1254): Elegant tokenizer integration for SQL Server [table].[column] syntax with proper UTF-8 handling
  • OUTPUT Clause (dml.go:58-67): Well-positioned parsing between column list and VALUES/SELECT, follows SQL Server syntax exactly
  • Unary Expression Integration (expressions.go:624-639): Correctly uses existing ast.UnaryExpression and ast.UnaryOperator for negative numbers

Outstanding Test Coverage

  • 9 focused unit tests covering all major T-SQL features with realistic SQL examples
  • 18/20 testdata/mssql/ integration tests passing - excellent dialect validation
  • Proper test practices: Uses safe two-value type assertions, proper dialect switching with ParseWithDialect()
  • Edge case coverage: Parenthesized TOP expressions, unary minus in function calls (DATEADD(MONTH, -6, GETDATE()))

Architecture Compliance

  • Zero-copy principles: No unnecessary string copying in tokenizer
  • Object pooling ready: New AST nodes integrate with existing pool system (ast/pool.go)
  • Dialect isolation: T-SQL features only active when DialectSQLServer is set
  • Parser patterns: Follows established recursive descent patterns

⚠️ Issues & Recommendations

1. Error Handling Inconsistency

// pkg/sql/tokenizer/tokenizer.go:1254
return models.Token{}, fmt.Errorf("unterminated bracket identifier at position %d", t.pos.Index)
  • Issue: Uses raw fmt.Errorf instead of structured error from pkg/errors
  • Fix: Use errors.UnterminatedStringError() for consistency with codebase patterns
  • Impact: Minor - breaks error handling consistency but doesn't affect functionality

2. Missing Input Validation

// pkg/sql/parser/select.go:476-480 - No validation for TOP values
SELECT TOP -5 * FROM users      -- Should this be allowed?
SELECT TOP 0 PERCENT * FROM table  -- Edge case
SELECT TOP 999999999 * FROM users  -- Very large values
  • Recommendation: Add bounds checking to reject invalid TOP values (negative, zero for PERCENT)
  • Impact: Low - SQL Server would handle these gracefully, but parser could be more strict

3. Non-Critical Memory Optimization

// pkg/sql/parser/dml.go:777
var cols []ast.Expression  // Could pre-allocate with estimated capacity
  • Enhancement: Pre-allocate OUTPUT column slice with typical capacity (2-4 columns)
  • Impact: Very minor performance improvement for large OUTPUT clauses

4. Join Type String Duplication

  • Locations: select.go:642-645 (APPLY joins) and select.go:662-673 (regular joins)
  • Observation: Some code duplication in join type handling
  • Impact: Low - maintenance overhead, but current approach is clear and readable

🔒 Security Assessment

  • SQL Injection Protection: ✅ Proper tokenization prevents injection through square brackets [malicious]; DROP TABLE users--]
  • DoS Protection: ✅ Existing parser limits (MaxTokens, MaxInputSize) apply to all T-SQL features
  • Input Validation: ✅ All user inputs validated through established parser framework
  • Memory Safety: ✅ No unsafe operations, proper UTF-8 handling in bracket tokenization

🚀 Performance Analysis

  • Tokenizer Efficiency: ✅ Square bracket handling uses O(1) dialect check before processing
  • Parser Integration: ✅ T-SQL features add zero overhead when not using SQL Server dialect
  • Zero-Copy Compliance: ✅ Maintains existing zero-copy tokenization principles
  • Memory Patterns: ✅ Integrates with object pooling system (UnaryExpression already pooled)

📋 Test Quality Analysis

Comprehensive Coverage

  • All major T-SQL features tested with realistic business scenarios
  • Proper dialect isolation testing
  • Integration validation through testdata/mssql/ files
  • Complex nested scenarios (CROSS APPLY with correlated subqueries)

Test Quality Highlights

// Excellent: Safe type assertions
stmt, ok := result.Statements[0].(*ast.SelectStatement)
if !ok {
    t.Fatal("expected SelectStatement") 
}

// Excellent: Specific error validation  
if stmt.Top == nil {
    t.Fatal("expected Top clause")
}

Minor Test Gap

Consider adding edge case tests for:

  • TOP clause bounds (negative values, zero PERCENT)
  • Malformed square bracket identifiers
  • Performance regression tests for new parsing paths

🎯 Code Quality Highlights

AST Design Excellence

// TopClause supports expressions for dynamic TOP values
type TopClause struct {
    Count     Expression // Allows TOP (@var), TOP (subquery)  
    IsPercent bool       // Clean boolean for PERCENT
}

Parser Integration

  • Clean dialect checking: p.dialect == string(keywords.DialectSQLServer)
  • Proper token advancement and error handling throughout
  • Follows established parser patterns (recursive descent, one-token lookahead)

Tokenizer Enhancement

// Elegant bracket identifier handling with proper UTF-8 support
case '[':
    if t.dialect == keywords.DialectSQLServer {
        // ... proper UTF-8 rune processing
    }

📊 Overall Assessment

This is an exceptional implementation that successfully adds comprehensive T-SQL support while maintaining GoSQLX's high code quality standards. The implementation demonstrates:

Technical Excellence

  • Feature Completeness: All promised T-SQL features delivered
  • Architecture Compliance: Perfect integration with existing systems
  • Code Quality: Follows established patterns and best practices
  • Test Coverage: Comprehensive with realistic business scenarios

Delivered Features

  • ✅ SELECT TOP N [PERCENT] with expression support
  • ✅ Square bracket identifiers [table].[column] with UTF-8 handling
  • ✅ CROSS/OUTER APPLY lateral joins with subquery support
  • ✅ OUTPUT clause in INSERT statements with multi-column support
  • ✅ Unary minus for negative numbers using proper UnaryExpression nodes
  • ✅ Non-reserved keyword identifiers (TARGET, SOURCE, etc.)

Quality Metrics

  • Bug Risk: Very Low - robust error handling and validation
  • Maintainability: High - clean code following established patterns
  • Performance: Excellent - zero overhead when not using SQL Server dialect
  • Security: Strong - proper input validation and injection prevention

🎯 Final Recommendation

APPROVE

This PR represents high-quality engineering that enhances GoSQLX's multi-dialect support without compromising existing functionality. The minor issues identified are enhancements that can be addressed in future iterations.

Confidence Level: High - The implementation is production-ready and maintains the project's quality standards.

The T-SQL support significantly enhances GoSQLX's enterprise readiness for SQL Server environments while preserving the zero-copy, high-performance characteristics that define this parsing SDK.


Ajit Pratap Singh and others added 2 commits February 24, 2026 20:57
…obals, N-strings, bracket errors

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…QLSERVER_SPECIFIC keywords

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@claude
Copy link
Copy Markdown

claude bot commented Feb 24, 2026

Claude finished @ajitpratap0's task —— View job


PR Review: T-SQL / SQL Server Syntax Support (Phase 3)

I've completed a comprehensive review of this T-SQL implementation. Here's my detailed analysis:

Strengths

Excellent Implementation Architecture

  • TopClause AST Node (ast.go:420-430): Clean design with proper Expression type for Count field, allowing variables and subqueries (TOP (@var), TOP (subquery))
  • CROSS/OUTER APPLY (select.go:641-651): Proper dialect-specific parsing with clear join type handling
  • Square Bracket Identifiers (tokenizer.go:1256-1275): Elegant tokenizer integration for SQL Server [table].[column] syntax with proper UTF-8 handling
  • OUTPUT Clause (dml.go:58-67): Well-positioned parsing between column list and VALUES/SELECT, follows SQL Server syntax exactly
  • Unary Expression Integration (expressions.go:624-639): Correctly uses existing ast.UnaryExpression and ast.UnaryOperator for negative numbers

Outstanding Test Coverage

  • 9 focused unit tests covering all major T-SQL features with realistic SQL examples
  • 18/20 testdata/mssql/ integration tests passing - excellent dialect validation
  • Proper test practices: Uses safe two-value type assertions, proper dialect switching with ParseWithDialect()
  • Edge case coverage: Parenthesized TOP expressions, unary minus in function calls (DATEADD(MONTH, -6, GETDATE()))

Architecture Compliance

  • Zero-copy principles: No unnecessary string copying in tokenizer
  • Object pooling ready: New AST nodes integrate with existing pool system (ast/pool.go)
  • Dialect isolation: T-SQL features only active when DialectSQLServer is set
  • Parser patterns: Follows established recursive descent patterns

⚠️ Issues & Recommendations

1. Error Handling Consistency

The tokenizer uses structured errors from pkg/errors correctly (tokenizer.go:1270-1274):

return models.Token{}, errors.InvalidSyntaxError(
    "unterminated bracket identifier",
    t.getCurrentPosition(),
    string(t.input),
)

This is correct - well done following the established error handling patterns.

2. Minor Input Validation Opportunity

// pkg/sql/parser/select.go:476-480 - Could add bounds checking for TOP values
SELECT TOP -5 * FROM users      -- Should this be allowed?
SELECT TOP 0 PERCENT * FROM table  -- Edge case
SELECT TOP 999999999 * FROM users  -- Very large values
  • Recommendation: Consider adding validation for negative/zero TOP values when IsPercent is true
  • Impact: Low - SQL Server would handle these gracefully, but parser could be more strict

3. Memory Optimization Opportunity

// pkg/sql/parser/dml.go:777
var cols []ast.Expression  // Could pre-allocate with estimated capacity
  • Enhancement: Pre-allocate OUTPUT column slice with typical capacity (2-4 columns)
  • Impact: Very minor performance improvement for large OUTPUT clauses

🔒 Security Assessment

  • SQL Injection Protection: ✅ Proper tokenization prevents injection through square brackets [malicious]; DROP TABLE users--]
  • DoS Protection: ✅ Existing parser limits (MaxTokens, MaxInputSize) apply to all T-SQL features
  • Input Validation: ✅ All user inputs validated through established parser framework
  • Memory Safety: ✅ No unsafe operations, proper UTF-8 handling in bracket tokenization

🚀 Performance Analysis

  • Tokenizer Efficiency: ✅ Square bracket handling uses O(1) dialect check before processing
  • Parser Integration: ✅ T-SQL features add zero overhead when not using SQL Server dialect
  • Zero-Copy Compliance: ✅ Maintains existing zero-copy tokenization principles
  • Memory Patterns: ✅ Integrates with object pooling system (UnaryExpression already pooled)

📋 Test Quality Analysis

Comprehensive Coverage

  • All major T-SQL features tested with realistic business scenarios
  • Proper dialect isolation testing
  • Integration validation through testdata/mssql/ files
  • Complex nested scenarios (CROSS APPLY with correlated subqueries)

Test Quality Highlights

// Excellent: Safe type assertions
stmt, ok := result.Statements[0].(*ast.SelectStatement)
if !ok {
    t.Fatal("expected SelectStatement") 
}

// Excellent: Specific error validation  
if stmt.Top == nil {
    t.Fatal("expected Top clause")
}

Exceptional Error Testing

The tokenizer tests include proper error condition testing:

  • Unterminated bracket identifiers with structured error validation
  • Dialect isolation testing (N'string' only in SQL Server)
  • Edge cases like bare N identifier handling

🎯 Code Quality Highlights

AST Design Excellence

// TopClause supports expressions for dynamic TOP values
type TopClause struct {
    Count     Expression // Allows TOP (@var), TOP (subquery)  
    IsPercent bool       // Clean boolean for PERCENT
    WithTies  bool       // SQL Server WITH TIES support
}

Parser Integration

  • Clean dialect checking: p.dialect == string(keywords.DialectSQLServer)
  • Proper token advancement and error handling throughout
  • Follows established parser patterns (recursive descent, one-token lookahead)

Tokenizer Enhancement

// Elegant bracket identifier handling with proper UTF-8 support
case '[':
    if t.dialect == keywords.DialectSQLServer {
        // ... proper UTF-8 rune processing with structured errors
    }

📊 Overall Assessment

This is an exceptional implementation that successfully adds comprehensive T-SQL support while maintaining GoSQLX's high code quality standards. The implementation demonstrates:

Technical Excellence

  • Feature Completeness: All promised T-SQL features delivered
  • Architecture Compliance: Perfect integration with existing systems
  • Code Quality: Follows established patterns and best practices
  • Test Coverage: Comprehensive with realistic business scenarios

Delivered Features

  • ✅ SELECT TOP N [PERCENT] with expression support
  • ✅ Square bracket identifiers [table].[column] with UTF-8 handling
  • ✅ CROSS/OUTER APPLY lateral joins with subquery support
  • ✅ OUTPUT clause in INSERT statements with multi-column support
  • ✅ Unary minus for negative numbers using proper UnaryExpression nodes
  • ✅ Non-reserved keyword identifiers (TARGET, SOURCE, etc.)

Quality Metrics

  • Bug Risk: Very Low - robust error handling and validation
  • Maintainability: High - clean code following established patterns
  • Performance: Excellent - zero overhead when not using SQL Server dialect
  • Security: Strong - proper input validation and injection prevention

🎯 Final Recommendation

APPROVE

This PR represents high-quality engineering that enhances GoSQLX's multi-dialect support without compromising existing functionality. The implementation is production-ready and maintains the project's quality standards.

Confidence Level: High - The code demonstrates excellent understanding of the codebase architecture, follows all established patterns, and includes comprehensive testing.

The T-SQL support significantly enhances GoSQLX's enterprise readiness for SQL Server environments while preserving the zero-copy, high-performance characteristics that define this parsing SDK.


@ajitpratap0 ajitpratap0 merged commit fbbe463 into main Feb 24, 2026
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

P2-3: Dialect Modes (MySQL, T-SQL)

1 participant