Skip to content

Commit 1f60253

Browse files
Fix DuckDB reserved keyword quoting for column names
DuckDB reserves several keywords that PostgreSQL does not — PIVOT, QUALIFY, LAMBDA, SHOW, SUMMARIZE, DESCRIBE, UNPIVOT, PIVOT_LONGER, PIVOT_WIDER. Column names matching these words are valid unquoted identifiers in PostgreSQL but cause parser errors when the same SQL is forwarded to pgduck_server. Changes ------- keywords.c / keywords.h Rewrites keywords.c to use a DuckDB-accurate keyword table (duckdb_kwlist.h) instead of PostgreSQL's parser/kwlist.h. Adds duckdb_quote_identifier(), which behaves like quote_identifier() but additionally double-quotes identifiers that are RESERVED_KEYWORD in DuckDB. Adds RequoteDuckDBReservedInSQL(), a SQL-string scanner that post-processes output from pg_get_querydef(): it walks the SQL text, skips single- and double-quoted regions, and wraps any unquoted token that DuckDB marks RESERVED_KEYWORD (but PostgreSQL does not) in double-quotes. Standard SQL literals such as true/false/null are unaffected because PostgreSQL's quote_identifier() already changes them, triggering a short-circuit in the scanner. deparse_ruleutils.c Calls RequoteDuckDBReservedInSQL() after pg_get_querydef() in DeparseQualifiedQuery(). This covers the PreparePGDuckSQLTemplate code path used by all FDW scans and query pushdown. read_data.c / write_data.c / iceberg_query_validation.c Replaces quote_identifier() with duckdb_quote_identifier() at all call sites that build SQL sent directly to pgduck_server (file-scan expressions, COPY-to-file SQL, Iceberg validation queries). tools/generate_duckdb_kwlist.py / duckdb_kwlist.h New script that parses the vendored DuckDB kwlist.hpp and emits the sorted C include fragment. The generated file is checked in; re-run after a DuckDB version bump. Tests ----- Adds test_duckdb_reserved_keywords.py (pg_lake_table) covering parquet/CSV/JSON foreign tables and Iceberg tables with reserved- keyword column names, including filter-pushdown assertions via EXPLAIN. Adds test_duckdb_reserved_keyword_copy.py (pg_lake_copy) covering COPY TO/FROM for parquet, CSV, and JSON formats. Removes the unsupported COPY iceberg format test. Fixes #277. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: David Christensen <david.christensen@snowflake.com>
1 parent 4a99db5 commit 1f60253

File tree

11 files changed

+1635
-59
lines changed

11 files changed

+1635
-59
lines changed

Makefile

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ CUSTOM_TARGETS = check-pg_lake_engine installcheck-pg_lake_engine check-pg_exten
1414
DUCKDB_BUILD_USE_CACHE ?= 0
1515

1616
# other phony targets go here
17-
.PHONY: all fast install install-fast installcheck clean check submodules uninstall check-indent reindent installcheck-postgres installcheck-postgres-with_extensions_created
17+
.PHONY: all fast install install-fast installcheck clean check submodules uninstall check-indent reindent installcheck-postgres installcheck-postgres-with_extensions_created generate-duckdb-kwlist check-duckdb-kwlist
1818
.PHONY: $(ALL_TARGETS)
1919
.PHONY: $(PHONY_TARGETS)
2020

@@ -220,6 +220,17 @@ uninstall-avro:
220220
rm -f $(PG_LIBDIR)/libavro.*
221221
rm -rf $(PG_INCLUDEDIR)/avro*
222222

223+
## DuckDB keyword list maintenance
224+
# Regenerate the checked-in keyword table from the vendored DuckDB kwlist.hpp.
225+
# Re-run whenever duckdb_pglake/duckdb is updated to a new DuckDB release.
226+
generate-duckdb-kwlist:
227+
python3 tools/generate_duckdb_kwlist.py
228+
229+
# Verify that the checked-in keyword table matches the current kwlist.hpp.
230+
# Run in CI to catch stale keyword tables after a DuckDB version bump.
231+
check-duckdb-kwlist:
232+
python3 tools/generate_duckdb_kwlist.py --check
233+
223234
## Other targets
224235
check-isolation_pg_lake_table:
225236
$(MAKE) -C pg_lake_table check-isolation
Lines changed: 212 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,212 @@
1+
"""
2+
Tests for correct identifier quoting during COPY TO / COPY FROM when table
3+
columns are named after DuckDB reserved keywords.
4+
5+
DuckDB reserves several keywords (PIVOT, QUALIFY, LAMBDA, etc.) that
6+
PostgreSQL does not. These are legal unquoted column names in PostgreSQL but
7+
require quoting in the SQL that pg_lake generates for pgduck_server.
8+
9+
See: https://github.com/Snowflake-Labs/pg_lake/issues/277
10+
"""
11+
12+
import pytest
13+
from utils_pytest import *
14+
15+
16+
# DuckDB-only reserved keywords that are safe as unquoted PostgreSQL
17+
# identifiers.
18+
DUCKDB_ONLY_RESERVED = [
19+
"describe",
20+
"lambda",
21+
"pivot",
22+
"pivot_longer",
23+
"pivot_wider",
24+
"qualify",
25+
"show",
26+
"summarize",
27+
"unpivot",
28+
]
29+
30+
# Smaller set used in multi-column tests for conciseness.
31+
TEST_KEYWORDS = ["pivot", "qualify", "lambda", "show"]
32+
33+
34+
def test_copy_to_parquet_with_reserved_keyword_columns(pg_conn, s3):
35+
"""
36+
COPY ... TO parquet must succeed when the table has DuckDB-reserved
37+
keyword column names, and COPY ... FROM must read the file back correctly.
38+
"""
39+
url = f"s3://{TEST_BUCKET}/test_kw_copy_parquet/kw_cols.parquet"
40+
41+
try:
42+
run_command(
43+
"""
44+
CREATE TABLE test_kw_copy_src (
45+
pivot int,
46+
qualify int,
47+
lambda int,
48+
show int
49+
);
50+
INSERT INTO test_kw_copy_src VALUES (1, 2, 3, 4), (10, 20, 30, 40);
51+
""",
52+
pg_conn,
53+
)
54+
55+
run_command(
56+
f"COPY test_kw_copy_src TO '{url}' WITH (format 'parquet')",
57+
pg_conn,
58+
)
59+
60+
run_command(
61+
f"""
62+
CREATE TABLE test_kw_copy_dst (
63+
pivot int,
64+
qualify int,
65+
lambda int,
66+
show int
67+
);
68+
COPY test_kw_copy_dst FROM '{url}' WITH (format 'parquet');
69+
""",
70+
pg_conn,
71+
)
72+
73+
result = run_query(
74+
"SELECT pivot, qualify, lambda, show FROM test_kw_copy_dst ORDER BY pivot",
75+
pg_conn,
76+
)
77+
assert result == [[1, 2, 3, 4], [10, 20, 30, 40]]
78+
finally:
79+
pg_conn.rollback()
80+
81+
82+
def test_copy_to_csv_with_reserved_keyword_columns(pg_conn, s3):
83+
"""
84+
COPY ... TO CSV and back must work with DuckDB-reserved keyword columns.
85+
"""
86+
url = f"s3://{TEST_BUCKET}/test_kw_copy_csv/kw_cols.csv"
87+
88+
try:
89+
run_command(
90+
f"""
91+
CREATE TABLE test_kw_csv_src (pivot int, qualify int, lambda int, show int);
92+
INSERT INTO test_kw_csv_src VALUES (3, 6, 9, 12), (30, 60, 90, 120);
93+
COPY test_kw_csv_src TO '{url}' WITH (format 'csv', header on);
94+
95+
CREATE TABLE test_kw_csv_dst (pivot int, qualify int, lambda int, show int);
96+
COPY test_kw_csv_dst FROM '{url}' WITH (format 'csv', header on);
97+
""",
98+
pg_conn,
99+
)
100+
101+
result = run_query(
102+
"SELECT pivot, qualify, lambda, show FROM test_kw_csv_dst ORDER BY pivot",
103+
pg_conn,
104+
)
105+
assert result == [[3, 6, 9, 12], [30, 60, 90, 120]]
106+
finally:
107+
pg_conn.rollback()
108+
109+
110+
def test_copy_to_json_with_reserved_keyword_columns(pg_conn, s3):
111+
"""
112+
COPY ... TO JSON and back must work with DuckDB-reserved keyword columns.
113+
"""
114+
url = f"s3://{TEST_BUCKET}/test_kw_copy_json/kw_cols.json"
115+
116+
try:
117+
run_command(
118+
f"""
119+
CREATE TABLE test_kw_json_src (pivot int, qualify int, lambda int, show int);
120+
INSERT INTO test_kw_json_src VALUES (7, 14, 21, 28), (70, 140, 210, 280);
121+
COPY test_kw_json_src TO '{url}' WITH (format 'json');
122+
123+
CREATE TABLE test_kw_json_dst (pivot int, qualify int, lambda int, show int);
124+
COPY test_kw_json_dst FROM '{url}' WITH (format 'json');
125+
""",
126+
pg_conn,
127+
)
128+
129+
result = run_query(
130+
"SELECT pivot, qualify, lambda, show FROM test_kw_json_dst ORDER BY pivot",
131+
pg_conn,
132+
)
133+
assert result == [[7, 14, 21, 28], [70, 140, 210, 280]]
134+
finally:
135+
pg_conn.rollback()
136+
137+
138+
# ---------------------------------------------------------------------------
139+
# Parameterised round-trip tests — each keyword, each format
140+
# ---------------------------------------------------------------------------
141+
142+
143+
@pytest.mark.parametrize("keyword", DUCKDB_ONLY_RESERVED)
144+
def test_copy_roundtrip_parquet(pg_conn, s3, keyword):
145+
"""Each DuckDB-only reserved keyword survives a parquet round-trip."""
146+
url = f"s3://{TEST_BUCKET}/test_kw_roundtrip_parquet/{keyword}/data.parquet"
147+
148+
try:
149+
run_command(
150+
f"""
151+
CREATE TABLE test_kw_pq_src_{keyword} ({keyword} int);
152+
INSERT INTO test_kw_pq_src_{keyword} VALUES (42);
153+
COPY test_kw_pq_src_{keyword} TO '{url}' WITH (format 'parquet');
154+
155+
CREATE TABLE test_kw_pq_dst_{keyword} ({keyword} int);
156+
COPY test_kw_pq_dst_{keyword} FROM '{url}' WITH (format 'parquet');
157+
""",
158+
pg_conn,
159+
)
160+
161+
result = run_query(f"SELECT {keyword} FROM test_kw_pq_dst_{keyword}", pg_conn)
162+
assert result == [[42]]
163+
finally:
164+
pg_conn.rollback()
165+
166+
167+
@pytest.mark.parametrize("keyword", DUCKDB_ONLY_RESERVED)
168+
def test_copy_roundtrip_csv(pg_conn, s3, keyword):
169+
"""Each DuckDB-only reserved keyword survives a CSV round-trip."""
170+
url = f"s3://{TEST_BUCKET}/test_kw_roundtrip_csv/{keyword}/data.csv"
171+
172+
try:
173+
run_command(
174+
f"""
175+
CREATE TABLE test_kw_csv_src_{keyword} ({keyword} int);
176+
INSERT INTO test_kw_csv_src_{keyword} VALUES (42);
177+
COPY test_kw_csv_src_{keyword} TO '{url}' WITH (format 'csv', header on);
178+
179+
CREATE TABLE test_kw_csv_dst_{keyword} ({keyword} int);
180+
COPY test_kw_csv_dst_{keyword} FROM '{url}' WITH (format 'csv', header on);
181+
""",
182+
pg_conn,
183+
)
184+
185+
result = run_query(f"SELECT {keyword} FROM test_kw_csv_dst_{keyword}", pg_conn)
186+
assert result == [[42]]
187+
finally:
188+
pg_conn.rollback()
189+
190+
191+
@pytest.mark.parametrize("keyword", DUCKDB_ONLY_RESERVED)
192+
def test_copy_roundtrip_json(pg_conn, s3, keyword):
193+
"""Each DuckDB-only reserved keyword survives a JSON round-trip."""
194+
url = f"s3://{TEST_BUCKET}/test_kw_roundtrip_json/{keyword}/data.json"
195+
196+
try:
197+
run_command(
198+
f"""
199+
CREATE TABLE test_kw_json_src_{keyword} ({keyword} int);
200+
INSERT INTO test_kw_json_src_{keyword} VALUES (42);
201+
COPY test_kw_json_src_{keyword} TO '{url}' WITH (format 'json');
202+
203+
CREATE TABLE test_kw_json_dst_{keyword} ({keyword} int);
204+
COPY test_kw_json_dst_{keyword} FROM '{url}' WITH (format 'json');
205+
""",
206+
pg_conn,
207+
)
208+
209+
result = run_query(f"SELECT {keyword} FROM test_kw_json_dst_{keyword}", pg_conn)
210+
assert result == [[42]]
211+
finally:
212+
pg_conn.rollback()

pg_lake_engine/include/pg_lake/pgduck/keywords.h

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,4 +17,29 @@
1717

1818
#pragma once
1919

20+
/*
21+
* IsDuckDBReservedWord — returns true for any keyword that is not
22+
* UNRESERVED_KEYWORD in DuckDB (i.e., RESERVED, COL_NAME, or
23+
* TYPE_FUNC_NAME). Used for struct field-access quoting.
24+
*/
2025
PGDLLEXPORT bool IsDuckDBReservedWord(char *candidateWord);
26+
27+
/*
28+
* duckdb_quote_identifier — like quote_identifier() but also quotes
29+
* identifiers that are RESERVED_KEYWORD in DuckDB but not in PostgreSQL
30+
* (e.g. LAMBDA, PIVOT, QUALIFY, SUMMARIZE, DESCRIBE, SHOW, UNPIVOT).
31+
*
32+
* Use this for all identifiers (column names, field names, relation names)
33+
* that will appear in SQL sent to pgduck_server.
34+
*/
35+
PGDLLEXPORT const char *duckdb_quote_identifier(const char *ident);
36+
37+
/*
38+
* RequoteDuckDBReservedInSQL — post-process a SQL string from pg_get_querydef
39+
* and wrap any unquoted identifier that is RESERVED_KEYWORD in DuckDB but
40+
* not reserved in PostgreSQL (e.g. PIVOT, QUALIFY, LAMBDA, SHOW, SUMMARIZE).
41+
*
42+
* Use this on every SQL string produced by pg_get_querydef() before sending
43+
* it to pgduck_server.
44+
*/
45+
PGDLLEXPORT char *RequoteDuckDBReservedInSQL(const char *sql);

0 commit comments

Comments
 (0)