-
Notifications
You must be signed in to change notification settings - Fork 422
Add DuckDB Dialect Support #738
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This pull request is automatically built and testable in CodeSandbox. To see build info of the built libraries, click here or the icon next to each commit SHA. |
Thanks for the PR. A few quick questions and thoughts:
I'm pretty busy this week... not sure how much time I have to properly review this. |
For bonus points, you can update the wiki with information about DuckDB. That will also make it easier for me to review this. Otherwise I'll have to go and figure out all of this about DuckDB by myself. |
I was about to submit a PR also, I will make some comment in your code. |
Thanks for the comments above. The errors from the test suite are now down to five:
@nene - I'll look into filling out the wiki 👍 |
To fix
To fix the
I would guess the builtin PS. Make sure to run |
The test suite is passing completely now 🎉 I'll collate some notes for the wiki over the next week. Are there any other steps needed before merging? |
Thanks. I don't think there's anything else. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay...
Took a bit of a look at this and noticed that several things (most notably all these operators) are not actually supported by DuckDB.
Also, when I simply compare it to PostgreSQL implementation, it looks almost the same (ignoring keywords and function names lists). But my very brief scanning of DuckDB documentation revealed several things that are different in DuckDB.
'EXPLAIN', | ||
'FETCH', | ||
'GRANT', | ||
'INSTALL', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like INSTALL
is the only name added to this list compared to PostgreSQL. I would suspect there are more differences by the statements supported by PostgreSQL and DuckDB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @nene - you're correct. DuckDB uses the PostgreSQL parser. That's why DuckDB’s SQL dialect closely follows the conventions of the PostgreSQL dialect with only a few exceptions as listed here.
Many features are unsupported so I've removed unsupported elements in the formatter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @nene - let me know if it's OK to resolve this conversation.
I've also added some specific DuckDB features below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For those in the future. Diversions away from PostgreSQL dialect are now listed here.
Hello, |
Thanks for the interest. This thing has indeed been sitting here for a while now. Will need to dig back into this to see if there's any reasons why it hasn't been merged yet. I think one of the main reasons was that it seemed really-really similar to PostgreSQL. So a question to you @Zank94, if you just configure SQL Formatter to treat your SQL as PostgreSQL, would that solve your problems (e.g. the |
I see, thank you for the advice I will give it a try 👍 |
I was using the postgres formatter for duckdb queries, but it fails when the query is having |
Well, the thing is that this PR has a load of failing tests. Interestingly all failing tests were fixed at one point in eb7d9b1, but after that bunch of more changes were added and now it has a total of 92 failing tests. (Don't know why Github says that "All checks have passed".) I personally have no real knowledge of DuckDB and the documentation of DuckDB seems to be lacking. For example I tried to find information about that Initially I had the false impression that DuckDB is pretty much just PostgreSQL with a few minor differences, I have now come to a conclusion that it's more like DuckDB supports some small subset of PostgreSQL syntax (plus some DuckDB-specific additions like Might be that this whole PR should be started from scratch. I would personally start with filling out the wiki with information about DuckDB. But because of DuckDB lack of documentation, it seems like an inconvenient task to undertake. |
@nene , I have to disagree with you here. I think DuckDB has fantastic documentation, but it has a LOT of functionality above and beyond Postgres, so the documentation can be pretty dense for the uninitiated. @karanpopat , what the heck is the |
@riziles it's an inet extension https://duckdb.org/docs/stable/extensions/inet.html#-predicate |
@riziles It might very well be that the documentation is great and I just don't know how to use it. Like this page which at first glance seems to document differences from PostgreSQL, but it's a pretty short page. I guess it actually tries to document the "important" differences for ordinary users. I think I now finally found that most of the operators are documented in the Functions section. But not all operators can be found there. For example the |
@karanpopat , that seems like a pretty niche requirement. Can't you just use the "denseOperators" flag? |
I think the One can instead just extends the postgresql formatter configuration with an additional operator. Something like: import { formatDialect, postgresql } from 'sql-formatter';
const duckdb = {
...postgresql,
tokenizerOptions: {
...postgresql.tokenizerOptions
operators: [...postgresql.tokenizerOptions.operators, '<<=', '>>='],
}
};
formatDialect('SELECT foo <<= bar FROM tbl', { dialect: duckdb }); |
I have now digged a bit deeper to the DuckDB syntax and I think I was mislead earlier when I read from the docs that DuckDB uses PostgreSQL parser. I frankly can't find that from the docs any more. I guess that part was removed. Turns out they instead used a (likely heavily modified) fork of PostgreSQL parser and for all I know they might be using a completely custom parser by now. There are just so-so many differences in syntax, that I think it makes no sense to treat it as a completely different dialect. Some most notable DuckDB-specific syntax I've found so far:
|
It's definitely based on Postgres syntax, i.e. most vanilla Postgres queries would work fine in DuckDB, but it has capabilities far above and beyond for analytics workflows: |
So, I ended up putting this PR aside and creating a new DuckDB configuration from scratch: #857 I ended up using the functions, keywords and data types lists from this PR. Thanks for that @hughcameron. Also thanks to everybody else who has provided information about DuckDB in this thread. |
Created a brand new DuckDB configuration to replace the old #738 pull request. This should now support the most important bits of DuckDB. There are some caveats though: - No support for the percentage syntax, like `LIMIT 10%` (conflicts with modulo operator). - No support for named parameters like `$foo` (conflicts with $$-quoted strings). - No support for array-slice operator (conflicts with `:` in struct literals and prefix aliases). There's definitely quite a bit more than the above three little things. But I think at least for start we have some DuckDB support that should work for most users.
This pull request adds support for the DuckDB SQL dialect to the SQL Formatter library.
Description:
Benefits:
Testing:
Please review the changes and provide feedback.