Skip to content

Commit 6074ae9

Browse files
Add duckdb_quote_identifier() to fix identifier quoting for DuckDB pushdown
DuckDB reserves several keywords that PostgreSQL does not (LAMBDA, PIVOT, PIVOT_LONGER, PIVOT_WIDER, QUALIFY, SUMMARIZE, DESCRIBE, SHOW, UNPIVOT). Using PostgreSQL's quote_identifier() on column names sent to pgduck_server silently omits the extra quoting those reserved words require, causing query failures at runtime. This commit: - Adds tools/generate_duckdb_kwlist.py, a script that parses the vendored DuckDB kwlist.hpp and emits pg_lake_engine/src/pgduck/duckdb_kwlist.h — a sorted C include fragment used to initialise the keyword lookup table. The generated file is checked in; re-run `make generate-duckdb-kwlist` after a DuckDB version bump. - Adds `make generate-duckdb-kwlist` and `make check-duckdb-kwlist` targets. The check target verifies the generated file is up to date and can be wired into CI. - Rewrites keywords.c to use the DuckDB-accurate keyword table instead of PostgreSQL's parser/kwlist.h. Adds duckdb_quote_identifier(), which falls through to quote_identifier() for normal identifiers and additionally quotes identifiers that are RESERVED_KEYWORD in DuckDB. - Exports duckdb_quote_identifier() from keywords.h. - Replaces quote_identifier() with duckdb_quote_identifier() in all files that generate SQL sent to pgduck_server: - pg_lake_engine/src/pgduck/read_data.c (23 call sites) - pg_lake_engine/src/pgduck/write_data.c (5 call sites) - pg_lake_engine/src/pgduck/iceberg_query_validation.c (4 call sites) - pg_lake_table/src/fdw/deparse.c (7 call sites, plus audit TODO note) PostgreSQL-only callers (pg_extension_updater, base_worker_launcher, data_files_catalog, EXPLAIN output in pg_lake_table.c) are left unchanged. Fixes #277. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: David Christensen <david.christensen@snowflake.com>
1 parent 5638b9f commit 6074ae9

File tree

9 files changed

+832
-66
lines changed

9 files changed

+832
-66
lines changed

Makefile

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ CUSTOM_TARGETS = check-pg_lake_engine installcheck-pg_lake_engine check-pg_exten
1414
DUCKDB_BUILD_USE_CACHE ?= 0
1515

1616
# other phony targets go here
17-
.PHONY: all fast install install-fast installcheck clean check submodules uninstall check-indent reindent installcheck-postgres installcheck-postgres-with_extensions_created
17+
.PHONY: all fast install install-fast installcheck clean check submodules uninstall check-indent reindent installcheck-postgres installcheck-postgres-with_extensions_created generate-duckdb-kwlist check-duckdb-kwlist
1818
.PHONY: $(ALL_TARGETS)
1919
.PHONY: $(PHONY_TARGETS)
2020

@@ -220,6 +220,17 @@ uninstall-avro:
220220
rm -f $(PG_LIBDIR)/libavro.*
221221
rm -rf $(PG_INCLUDEDIR)/avro*
222222

223+
## DuckDB keyword list maintenance
224+
# Regenerate the checked-in keyword table from the vendored DuckDB kwlist.hpp.
225+
# Re-run whenever duckdb_pglake/duckdb is updated to a new DuckDB release.
226+
generate-duckdb-kwlist:
227+
python3 tools/generate_duckdb_kwlist.py
228+
229+
# Verify that the checked-in keyword table matches the current kwlist.hpp.
230+
# Run in CI to catch stale keyword tables after a DuckDB version bump.
231+
check-duckdb-kwlist:
232+
python3 tools/generate_duckdb_kwlist.py --check
233+
223234
## Other targets
224235
check-isolation_pg_lake_table:
225236
$(MAKE) -C pg_lake_table check-isolation

pg_lake_engine/include/pg_lake/pgduck/keywords.h

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,4 +17,19 @@
1717

1818
#pragma once
1919

20+
/*
21+
* IsDuckDBReservedWord — returns true for any keyword that is not
22+
* UNRESERVED_KEYWORD in DuckDB (i.e., RESERVED, COL_NAME, or
23+
* TYPE_FUNC_NAME). Used for struct field-access quoting.
24+
*/
2025
PGDLLEXPORT bool IsDuckDBReservedWord(char *candidateWord);
26+
27+
/*
28+
* duckdb_quote_identifier — like quote_identifier() but also quotes
29+
* identifiers that are RESERVED_KEYWORD in DuckDB but not in PostgreSQL
30+
* (e.g. LAMBDA, PIVOT, QUALIFY, SUMMARIZE, DESCRIBE, SHOW, UNPIVOT).
31+
*
32+
* Use this for all identifiers (column names, field names, relation names)
33+
* that will appear in SQL sent to pgduck_server.
34+
*/
35+
PGDLLEXPORT const char *duckdb_quote_identifier(const char *ident);

0 commit comments

Comments
 (0)