Skip to content

Refactor: decouple statement splitting from AST tokenizer #756

@TCeason

Description

@TCeason

Background

SqlParser::parse_statements currently relies on the AST tokenizer
to find ; boundaries. This creates a tight coupling: any change
in tokenizer error-recovery behaviour (e.g. the 0.2.5 change to
/* handling) can silently break statement splitting.

A pre-scan (unclosed_block_comment_start) was added as a
workaround, but it reimplements a subset of the tokenizer's lexical
rules (', ", $$, --) and cannot cover all literal forms
the tokenizer accepts (backtick-quoted identifiers, @ literals,
backslash escapes).

Proposal

Replace the tokenizer-based splitting with a standalone state
machine whose only job is to find ; outside of:

  • single/double/backtick-quoted strings (with escape handling)
  • dollar-quoted strings ($$...$$)
  • block comments (/* ... */)
  • line comments (-- ...\n)

The tokenizer is then only used downstream for syntax validation,
highlighting, and formatting — never for splitting.

This also unblocks multi-statement support: the splitter produces
N statements, and the upper layer decides whether to execute them
sequentially or in batch.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions