-
Notifications
You must be signed in to change notification settings - Fork 25
Description
Background
SqlParser::parse_statements currently relies on the AST tokenizer
to find ; boundaries. This creates a tight coupling: any change
in tokenizer error-recovery behaviour (e.g. the 0.2.5 change to
/* handling) can silently break statement splitting.
A pre-scan (unclosed_block_comment_start) was added as a
workaround, but it reimplements a subset of the tokenizer's lexical
rules (', ", $$, --) and cannot cover all literal forms
the tokenizer accepts (backtick-quoted identifiers, @ literals,
backslash escapes).
Proposal
Replace the tokenizer-based splitting with a standalone state
machine whose only job is to find ; outside of:
- single/double/backtick-quoted strings (with escape handling)
- dollar-quoted strings (
$$...$$) - block comments (
/* ... */) - line comments (
-- ...\n)
The tokenizer is then only used downstream for syntax validation,
highlighting, and formatting — never for splitting.
This also unblocks multi-statement support: the splitter produces
N statements, and the upper layer decides whether to execute them
sequentially or in batch.