Skip to content

Add support for parsing with semicolons optional #1843

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

aharpervc
Copy link
Contributor

@aharpervc aharpervc commented May 7, 2025

Note: this PR is temporarily rebased on #1834 before that is merged


This PR introduces support for parsing SQL without requiring semicolon statement delimiters. The implementation is as follows:

  1. new ParserOption require_semicolon_statement_delimiter that defaults to a new Dialect function supports_statements_without_semicolon_delimiter
  2. semicolon statement delimiters remain required by default for all dialects except SQL Server, where they're optional

To make this work, the following supplementary changes were required:

  1. RETURN statement parsing was tightened up. I think I suggested in the prior PR for that function that it might need attention in the future (turns out the future is now). The reason this needed work is that without a semicolon to terminate the statement, it's hard to know if the thing following RETURN is part of RETURN or it's own thing. That ambiguity is addressed by having parse_return only allow a strict list of "returnable expr's" instead of any expr. This could have been it's own stand-alone PR but seems reasonable to review together instead of having a deeper stack of branches.
  2. There are many tests that assert a parse statement error in the form of Expected: end of statement, found: asdf. When semicolons are optional, the parse error becomes Expected: an SQL statement, found: asdf. Since this is otherwise an acceptable error for that kind of test, a new helper was introduced assert_err_parse_statements that splits the dialects based on semicolon optional/required and asserts the appropriate error. The helper also has a pleasant side effect of reducing the volume of extremely similar error assertions.
  3. Additionally for testing, a statements_without_semicolons_parse_to helper was added. It's the same as statements_parse_to but it will strip semicolons from the SQL text. This makes it easier to add supplementary assertions for both SQL variants.
  4. Finally, I added a test_supports_statements_without_semicolon_delimiter test case to the SQL Server tests, to assert the proper AST for several examples not using semicolons. This is a belt and suspenders test to validate that parsing still produces the expected tree even without semicolons.

I understand this is a big change, and possibly controversial. However like GO batch delimiters in #1809, this enhancement takes a BIG step towards having this library be able to successfully parse real-world SQL files.


Fixes #1800

aharpervc added 3 commits May 5, 2025 11:36
- formerly, a semicolon after the last statement in a procedure was non-canonical (because they were added via `join`); a `BeginEndStatements` statements list will always write them out
- `BeginEndStatements` begin/end tokens won't be written when empty
- EOF now concludes parsing a statement list
…tokens

- this further consolidates with existing patterns
- plus, a corresponding `supports_statements_without_semicolon_delimiter` Dialect trait function
- this is optional for SQL Server, so it's set to true
- for the implementation, `RETURN` parsing needs to be tightened up to avoid ambiguity & tests that formerly asserted "end of statement" now maybe need to assert "an SQL statement"
- a new `assert_err_parse_statements` splits the dialects based on semicolon requirements & asserts the expected error message accordingly
@aharpervc
Copy link
Contributor Author

Keeping this in draft at the moment to work out the "compile-no-std" ci job, and possibly some more test cases. However, anyone feel free to post a review/thoughts

- at least all of select/insert/update/delete (plus exec) can be added
@@ -666,6 +666,23 @@ fn parse_select_with_table_alias() {
);
}

#[test]
fn parse_consecutive_queries() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what the best approach here is, or if there's even a better solution apart from disallowing any unquoted keyword as an alias

Copy link
Contributor Author

@aharpervc aharpervc May 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turns out doing that seems reasonable and reduces ambiguity. Dialects that require semicolon delimiters remain unaffected, but dialects where it's optional would need to quote the keyword as an identifier. That seems completely reasonable from a parser perspective.

I'm willing to drop the commit that added keywords to RESERVED_FOR_TABLE_ALIAS since this approach addresses that concern.

@aharpervc aharpervc force-pushed the parse-statement-semicolon-requirement branch from 6eeebfe to 2585c5b Compare May 8, 2025 17:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SQL Server: Increase support for multi statement object definitions (without semicolons)
1 participant