-
Notifications
You must be signed in to change notification settings - Fork 517
Description
I apologize in advance for the long issue report. The TL;DR is:
- it would be good to crosslink here to https://duckdb.org/docs/stable/sql/dialect/keywords_and_identifiers.
- it can't hurt to also describe what it means if a word is listed as keyword.
- the meaning of the values occurring in the keyword_category should be described (and hopefully explained).
The remainder of this issue report is background for these 3 items. Again I apologize for the length; it serves mainly to explain my confusion and hence the perceived need for documentation. I can fully understand it if the docs choose for a more pragmatic solution and simply say that it's probably a good idea to avoid using any of the keywords as identifiers without going into the fine details of distinguishing between different kinds of keywords. Here goes:
Re. 1) crosslink to keywords and identifiers page
Seems self-evident, but since that page only explicitly mentions "reserved keyword", it might be more complete by mentioning the other keyword categories, or at least to mention that the restriction does not apply to any of the other types of keywords.
Re. 2) describe what it means to be keyword
The page on keywords and identifiers clearly states that a reserved keyword cannot be and identifier. The obvious assumption would be that that is because those words have a special meaning in the SQL language. However, there items listed as keyword that do not appear to be part of the SQL language (or at least not implemented by DuckDb, examples from reserved category: PLACING, SYMMETRIC, many examples in unreserved category.).
The postgres docs have some explicit documentation to explain this: https://www.postgresql.org/docs/current/sql-keywords-appendix.html:
the fact that a key word is not reserved in PostgreSQL does not mean that the feature related to the word is not implemented. Conversely, the presence of a key word does not indicate the existence of a feature.
A similar piece of text might help complement the duckdb documentation too.
Re. 3) meaning of the values in keyword_category
The description of the keyword_category column reads:
keyword_category | Indicates the category of the keyword. Values are
column_name,reserved,type_functionandunreserved.
Considering that identifiers are restricted and that the category says something about the context/scope where the restriction applies. With this in mind, I thought I had a clear understanding of the meaning of the various keyword categories. However I found it relatively easy to create counter-examples that falsify my assumptions about the context/scope communicated by the keyword category. Hence, my request for documentation that describes the categories more precisely.
Examples below:
column_name
Assumption: this means the keyword cannot be used as column name. To test the assumption, find a keyword in this category:
select *
from duckdb_keywords()
where keyword_category = 'column_name'
order by keyword_name
limit 1;Result:
┌──────────────┬──────────────────┐
│ keyword_name │ keyword_category │
│ varchar │ varchar │
├──────────────┼──────────────────┤
│ between │ column_name │
└──────────────┴──────────────────┘
Attempt to use between as column name (expected to fail):
select between
from (values(0)) as t(between);The statement succeeds. This demonstrates that between can be used as the name of a column, so the assumption cannot be correct.
Other examples where between is used as column name:
create table t(between int);
select between from t;Alternative assumption: category column_name means the keyword can be used as column name, but not for other identifiers.
Counterexamples:
create table between(c int);
select * from between;Both succeed. This demonstrates that between can be used as table name. So the alternative assumption is also incorrect.
So, conundrum. What does a column_name value in keyword_category really mean?
Maybe it means that even though it currently happens to be possible to use these keywords as a column name, we shouldn't rely on that to always be the case?
reserved
Assumption: reserved means the keyword cannot be used at all as identifier, in any context. Find a keyword in this category:
select *
from duckdb_keywords()
where keyword_category = 'reserved'
order by keyword_name
limit 1;Result:
┌──────────────┬──────────────────┐
│ keyword_name │ keyword_category │
│ varchar │ varchar │
├──────────────┼──────────────────┤
│ all │ reserved │
└──────────────┴──────────────────┘
Attempts to use a bare all as table or column name fail (as expected). Attempt to use a quoted "all" succeeds (also as expected). But once we have a table with a column named "all", we can use a bare all to select that column:
create table t("all" int);
select t . all from t;
It was expected that this would fail, and that it would be required to quote all also in the select list.
There's another example that does not rely on quoting the reserved word. In this case the keyword as:
select c as as from (values(0)) as t(c);(bare unquoted as succeeds when used for a column alias)
type_function
My initial assumption was that this means the word cannot be used as a function name. But after crosschecking some of the words in this category with the postgres docs (https://www.postgresql.org/docs/current/sql-keywords-appendix.html) I think this means "reserved (can be function or type)".
unreserved
This is the largest category. My assumption is that it means the word can always be used as identifier and I found no counter example.
Page URL: https://duckdb.org/docs/stable/sql/meta/duckdb_table_functions#duckdb_keywords
It's possible any modifications here would also invite modifications to https://duckdb.org/docs/stable/sql/dialect/keywords_and_identifiers