Commit eae3308
Make fork compatible with DuckDB v1.5.3 (#21)
* Publish DuckLake release binaries on tags (#7)
* ci: bigger runner for build, bigger ccache for tests (#8)
Speed up CI via two independent levers:
1. MainDistributionPipeline.yml: route the linux_amd64 / linux_arm64
build matrix entries to 16-core Depot managed runners
(depot-ubuntu-24.04-16 / depot-ubuntu-24.04-arm-16) via the upstream
workflow's runners JSON input. Cuts the build from ~25-30min to
~5-10min, including cold-cache cases.
Requires PostHog/ducklake to be added to the Depot GitHub App
installation; until then the build jobs will queue-time-out at 10m.
2. Five per-PR test workflows (Catalogs, ConfigTests, DeletionVectors,
MinIO, NoInline): bump hendrikmuhs/ccache-action max-size from the
500MB default to 1.5G. DuckDB's full release build produces 1-2 GB
of object files, so the default forces aggressive LRU eviction and
keeps effective hit rate at 30-50%. 1.5G fits most of the build,
pushing warm-cache hit rates into the 80-90% range.
5 caches x 1.5G = 7.5 GB, leaving headroom in the 10 GB GitHub
Actions cache budget per repo. Cold-cache cases (DuckDB submodule
bumps) will still take ~25-30min but are infrequent.
Debug.yml is now nightly-only after #9 and is left alone.
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
* [codex] Route Postgres DuckLake metadata through passthrough (#5)
* Route Postgres metadata reads through passthrough
* Fix Postgres metadata passthrough regressions
* Keep Postgres thread cache setting test-scoped
* Fix Postgres test pool setting init order
* Update postgres scanner for pool defaults
* Route Postgres global stats through passthrough (#11)
* Remove DuckDB extension deploy job
* Pin inlined table registry to snapshot schema
* Fix duplicate Query(string&) after v1.5.3 rebase
Rebasing the Postgres metadata passthrough patch (#5) onto DuckDB
v1.5.3 replayed an old DuckLakeMetadataManager::Query(string&) body
(returning transaction.Query(query)) as a second definition alongside
upstream's new implementation, which uses SubstituteCatalogPlaceholders
+ transaction.ExecuteRaw. Drop the stale duplicate; upstream's version
is correct for the v1.5.3 metadata-manager architecture.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* ci: pin GitHub Actions to commit SHAs
The v1.5.3 workflows inherited from upstream use tag/branch refs
(actions/checkout@v4, ccache-action@main, etc.), which the PostHog org
ruleset rejects ("all actions must be pinned to a full-length commit
SHA"). Pin every action to its commit SHA, reusing the SHAs vetted in
#17 on main where applicable. No behavior change.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Fix infinite recursion in Postgres metadata passthrough
PostgresMetadataManager::ExecuteQuery builds a `CALL postgres_query(...)`
passthrough statement and ran it via transaction.Query(). In DuckDB
v1.5.3's metadata architecture transaction.Query() delegates back to
metadata_manager->Query(), i.e. PostgresMetadataManager::Query(), which
calls ExecuteQuery() again and wraps the statement in yet another
CALL postgres_query(...) — recursing without bound and hanging the
"Test Postgres" suite (SQLite, which has no passthrough, was unaffected).
Run the fully-formed passthrough call via transaction.ExecuteRaw() so it
goes straight to the metadata connection without re-entering the manager.
This is the load-bearing change that #19 made (as RawQuery) and that was
lost when #19 was skipped during the v1.5.3 rebase.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Re-derive metadata query-intent split (#19) on v1.5.3 architecture
The Postgres metadata passthrough (#5/#11) depends on a read/write/raw
routing split that #19 introduced and that was dropped during the v1.5.3
rebase. v1.5.3 had collapsed all metadata access onto transaction.Query,
which for a Postgres backend wrongly routed everything through the
postgres_query passthrough — breaking DDL (uuid()), the metadata-exists
probe, local utility queries (ATTACH, duckdb_secrets), and recursing on
the passthrough CALL itself.
Restore the split, adapted to v1.5.3's manager-side substitution and
ExecuteRaw executor:
- Transaction-level Execute/SnapshotQuery/CurrentQuery/RawQuery run RAW on
the metadata connection (DuckDB executes against the attached Postgres
catalog, handling uuid()/LIST/STRUCT and queries that embed
postgres_query CTEs).
- Metadata-manager-level Execute/SnapshotQuery/CurrentQuery do the
postgres_execute/postgres_query passthrough, used (unqualified) only by
the snapshot-pinned file/stats reads that benefit from server-side
execution (#5/#11). Simple committed reads (e.g. global stats, inlined
file deletions) also use the manager passthrough.
- Call sites relabeled by intent: writes -> Execute, snapshot reads ->
SnapshotQuery, current-state reads -> CurrentQuery, local utilities ->
RawQuery. Notably ReadInlinedFileDeletions and other inlined-deletion
reads were running through Execute (postgres_execute returns no rows),
which double-applied inlined deletions.
- Port the check_metadata_query_intent.py guard and wire it into CI.
Verified locally against DuckDB, SQLite and Postgres backends
(postgres_query/postgres_execute via Docker): all three suites green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Materialize embedded postgres_query reads (v1.5.3 streaming limit)
DuckDB v1.5.3 rejects multiple streaming scans, or a streaming scan
alongside an insert/CTAS, in one query. The Postgres metadata reads that
embed postgres_query() run raw, so an un-materialized postgres_query
streams:
- GetLatestSnapshotQuery() is evaluated during write statements, so its
streaming postgres_query scan collided with the insert ("Failed to
query most recent snapshot ... Multiple streaming scans"). Wrap it in a
MATERIALIZED CTE.
- The file-column-stats CTE used NOT MATERIALIZED for single-reference
columns; multiple such columns produced multiple streaming
postgres_query scans. Always mark it MATERIALIZED.
Negligible cost (small metadata results). Only surfaces with a streaming
postgres_scanner build (CI); a materializing local build masked it.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Route GetSnapshot through CALL passthrough, not streaming postgres_query
The MATERIALIZED-CTE attempt didn't help: an embedded
"SELECT * FROM postgres_query(...)" is a streaming scan, and GetSnapshot()
runs during write statements, so it tripped DuckDB v1.5.3's "streaming
scan + insert in the same query" restriction. Wrapping it in a
MATERIALIZED CTE just turned it into "streaming scan + CTAS".
Use the metadata-manager passthrough instead, which executes
CALL postgres_query(...) as a separate statement returning a materialized
result (the same path GetSnapshot(at_clause) already uses and that global
stats use successfully in CI):
- GetSnapshot(): call the manager-level CurrentQuery() (passthrough)
instead of transaction.CurrentQuery() (raw embedded scan).
- Drop the Postgres GetLatestSnapshotQuery() override so the plain base
SELECT is used; the CALL passthrough wraps it.
- Revert the file-column-stats CTE back to its original MATERIALIZED hint.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* ci: build via PostHog/extension-ci-tools fork (pinned actions)
The PostHog org ruleset rejects any unpinned action ref, enforced
transitively into reusable workflows. duckdb/extension-ci-tools uses
unpinned internal actions (actions/checkout@v4, ...), so every run
startup-failed at "Prepare all required actions".
Point the distribution call at the PostHog extension-ci-tools fork,
whose main (d3e0ee09) pins every internal action to a SHA and exposes
the `runners` input, and pull ci-tools makefiles from the same fork/ref
via override_ci_tools_repository.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* ci: build linux_arm64 on GitHub-hosted runner
The depot arm64 runner was unresponsive (job lost at the 10-minute
runner timeout). This repo is public, so use the free GitHub-hosted
ubuntu-24.04-arm runner for arm64; keep amd64 on the depot runner.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Pin metadata postgres_scanner to single-threaded (materialize scans)
DuckLake metadata reads (e.g. GetCatalogForSnapshot) scan several attached
Postgres tables in one statement using DuckDB LIST/STRUCT syntax, so they
must run on the metadata connection rather than server-side. DuckDB v1.5.3
rejects multiple streaming postgres_scanner scans (and streaming + insert)
in a single query, which these multi-table reads triggered.
postgres_scanner only streams a scan when max_threads > 1, where
max_threads = pages / pg_pages_per_task. After attaching the Postgres
metadata catalog, set pg_pages_per_task to a very large value so
max_threads is pinned to 1 and these reads materialize instead of
streaming. Metadata results are tiny, so there is no throughput cost.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Align postgres_scanner to upstream v1.5.3 pin (c0e9256)
#5 bumped postgres_scanner forward to dd71d196 (the "Connection pool"
rework, which adds the database-connector submodule) on the DuckDB 1.5.2
line. That newer scanner is incompatible with DuckDB v1.5.3's restriction
on streaming postgres scans: DuckLake's multi-table metadata reads
(GetCatalogForSnapshot, ...) failed with "Multiple streaming scans or
streaming scans + CTAS / insert in the same query are not currently
supported".
Pin postgres_scanner to c0e9256 -- the exact version upstream DuckLake's
v1.5-variegata (DuckDB v1.5.3) is tested against -- and drop the
ineffective pg_pages_per_task workaround. Revisit the connection-pool
scanner bump separately against a v1.5.3-compatible scanner if needed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Pin postgres_scanner to the DuckDB v1.5.3 release version (6b2b12c)
Root cause of the remaining Postgres CI failures: CI built an older
postgres_scanner that streams multi-table scans, which DuckDB v1.5.3
rejects ("Multiple streaming scans or streaming scans + CTAS / insert in
the same query are not currently supported") for DuckLake's metadata
reads (GetCatalogForSnapshot scans 4 metadata tables in one query).
6b2b12c is the postgres_scanner the DuckDB v1.5.3 extension release ships;
it materializes those scans instead of streaming. It is 29 commits ahead
of #5's dd71d196 (so it keeps the connection-pool work) and 79 ahead of
c0e9256. Verified: the full Postgres test suite passes locally against
this scanner (428 test cases, 0 failures), alongside the green DuckDB and
SQLite suites.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* ci: test Postgres against the prebuilt scanner, not a source build
Root cause of the persistent Postgres CI failures: the test job set
ENABLE_POSTGRES_SCANNER=ON and statically built postgres_scanner from
source. That source build streams multi-table postgres scans, which
DuckDB v1.5.3 rejects ("Multiple streaming scans or streaming scans +
CTAS / insert in the same query are not currently supported") for
DuckLake's metadata reads (GetCatalogForSnapshot scans 4 tables at once).
The officially released postgres_scanner materializes those scans.
Stop building postgres_scanner from source for the test job; install the
released extension (INSTALL postgres FROM core) instead, exactly as a real
deployment would load it. The full Postgres suite passes locally with the
prebuilt extension (428 test cases, 0 failures). SQLite/Quack still build
from source as before.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(quack): route catalog-load reads through the quack passthrough
GetCatalogForSnapshot runs six snapshot-pinned reads via the raw
transaction-level SnapshotQuery, which scans the attached metadata
catalog directly. For the quack backend that produces multiple
streaming quack_query scans in one statement, and quack's optimizer
rejects that ("Multiple streaming scans ... not currently supported"),
failing 350/394 tests in the Test Quack step.
Add a SnapshotCatalogQuery seam on the metadata manager. The base
implementation keeps the existing raw behavior (correct for the DuckDB
backend, and for Postgres where postgres_scanner materializes the
multi-table scan). QuackMetadataManager overrides it to route through
CALL quack_query_by_name, which executes the DuckDB SQL server-side in
a real DuckDB and returns a materialized result.
Verified locally on the DuckDB, Postgres, and SQLite backends; quack is
verified via CI (cannot build the quack variant locally). Intent guard
still passes.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(quack): route all multi-table metadata reads through the passthrough
The first CI run after the GetCatalogForSnapshot fix moved the failure
forward: passes went 44 -> 113, the table-information error disappeared,
and the next raw multi-table reads surfaced the same quack streaming-scan
error (500 data-file-list, 220 extended-file-list, 16 column lookup, 16
orphan-cleanup).
Generalize the seam introduced for GetCatalogForSnapshot: add a
CurrentCatalogQuery counterpart to SnapshotCatalogQuery (both default to
the raw transaction read; quack overrides both to its quack_query_by_name
passthrough) and route the remaining raw multi-table reads through them:
GetFilesForTable, the extended file list, IsColumnCreatedWithTable,
GetOrphanFilesForCleanup, and GetTableSizes.
Single-table raw reads are left as-is: a single streaming quack scan is
allowed by quack's optimizer. The default keeps DuckDB/Postgres/SQLite on
the raw path (verified: stats, compaction, alter, remove_orphans,
add_files pass on all three backends). Intent guard still passes.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(quack): route metadata cleanup through passthrough
* ci: clarify postgres_scanner.cmake is the local/source-build path only
CI no longer builds postgres_scanner from source (the Postgres test job
installs the released extension via `INSTALL postgres FROM core`), so the
GIT_TAG pin here is only exercised by local/manual builds that set
ENABLE_POSTGRES_SCANNER. Document that and why the pin matters, so the
file doesn't read as the active CI scanner version.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* test: skip flaky Quack concurrent catalog creation
* test: skip Quack deletion inlining retry flake
---------
Co-authored-by: Bill Guowei Yang <bill@posthog.com>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>1 parent 04a91e8 commit eae3308
23 files changed
Lines changed: 1097 additions & 254 deletions
File tree
- .github
- config/extensions
- workflows
- scripts
- src
- functions
- include
- metadata_manager
- storage
- metadata_manager
- storage
- test
- configs
- sql
- data_inlining
- metadata
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | | - | |
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
2 | 8 | | |
3 | 9 | | |
4 | 10 | | |
5 | 11 | | |
6 | | - | |
| 12 | + | |
| 13 | + | |
7 | 14 | | |
8 | 15 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
24 | 24 | | |
25 | 25 | | |
26 | 26 | | |
27 | | - | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
28 | 31 | | |
29 | 32 | | |
30 | 33 | | |
| |||
35 | 38 | | |
36 | 39 | | |
37 | 40 | | |
38 | | - | |
| 41 | + | |
39 | 42 | | |
40 | 43 | | |
41 | 44 | | |
| |||
44 | 47 | | |
45 | 48 | | |
46 | 49 | | |
47 | | - | |
| 50 | + | |
48 | 51 | | |
49 | 52 | | |
50 | 53 | | |
| |||
55 | 58 | | |
56 | 59 | | |
57 | 60 | | |
58 | | - | |
| 61 | + | |
59 | 62 | | |
60 | 63 | | |
61 | 64 | | |
| 65 | + | |
62 | 66 | | |
63 | 67 | | |
64 | | - | |
| 68 | + | |
65 | 69 | | |
66 | 70 | | |
67 | 71 | | |
| |||
88 | 92 | | |
89 | 93 | | |
90 | 94 | | |
91 | | - | |
92 | | - | |
93 | | - | |
94 | | - | |
95 | | - | |
96 | | - | |
97 | | - | |
98 | | - | |
99 | | - | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
100 | 100 | | |
101 | 101 | | |
102 | 102 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
30 | 30 | | |
31 | 31 | | |
32 | 32 | | |
33 | | - | |
| 33 | + | |
34 | 34 | | |
35 | 35 | | |
36 | 36 | | |
37 | 37 | | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
38 | 42 | | |
39 | 43 | | |
40 | 44 | | |
41 | 45 | | |
42 | 46 | | |
43 | 47 | | |
44 | | - | |
| 48 | + | |
45 | 49 | | |
46 | 50 | | |
47 | 51 | | |
| 52 | + | |
48 | 53 | | |
49 | 54 | | |
50 | | - | |
| 55 | + | |
51 | 56 | | |
52 | 57 | | |
53 | 58 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
40 | 40 | | |
41 | 41 | | |
42 | 42 | | |
43 | | - | |
| 43 | + | |
44 | 44 | | |
45 | 45 | | |
46 | 46 | | |
| |||
51 | 51 | | |
52 | 52 | | |
53 | 53 | | |
54 | | - | |
| 54 | + | |
55 | 55 | | |
56 | 56 | | |
57 | 57 | | |
58 | 58 | | |
59 | 59 | | |
60 | | - | |
| 60 | + | |
61 | 61 | | |
62 | 62 | | |
63 | 63 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
30 | 30 | | |
31 | 31 | | |
32 | 32 | | |
33 | | - | |
| 33 | + | |
34 | 34 | | |
35 | 35 | | |
36 | 36 | | |
| |||
41 | 41 | | |
42 | 42 | | |
43 | 43 | | |
44 | | - | |
| 44 | + | |
45 | 45 | | |
46 | 46 | | |
47 | 47 | | |
| 48 | + | |
48 | 49 | | |
49 | 50 | | |
50 | | - | |
| 51 | + | |
51 | 52 | | |
52 | 53 | | |
53 | 54 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
18 | 18 | | |
19 | 19 | | |
20 | 20 | | |
21 | | - | |
| 21 | + | |
22 | 22 | | |
23 | 23 | | |
24 | 24 | | |
| |||
27 | 27 | | |
28 | 28 | | |
29 | 29 | | |
30 | | - | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
31 | 35 | | |
32 | 36 | | |
33 | | - | |
| 37 | + | |
| 38 | + | |
34 | 39 | | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
35 | 45 | | |
36 | | - | |
37 | | - | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
38 | 49 | | |
39 | | - | |
40 | | - | |
41 | | - | |
42 | | - | |
43 | | - | |
44 | | - | |
45 | | - | |
46 | | - | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
39 | 39 | | |
40 | 40 | | |
41 | 41 | | |
42 | | - | |
| 42 | + | |
43 | 43 | | |
44 | 44 | | |
45 | 45 | | |
| |||
55 | 55 | | |
56 | 56 | | |
57 | 57 | | |
58 | | - | |
| 58 | + | |
59 | 59 | | |
60 | 60 | | |
61 | 61 | | |
62 | 62 | | |
63 | | - | |
| 63 | + | |
64 | 64 | | |
| 65 | + | |
| 66 | + | |
65 | 67 | | |
66 | 68 | | |
67 | 69 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
30 | 30 | | |
31 | 31 | | |
32 | 32 | | |
33 | | - | |
| 33 | + | |
34 | 34 | | |
35 | 35 | | |
36 | 36 | | |
| |||
41 | 41 | | |
42 | 42 | | |
43 | 43 | | |
44 | | - | |
| 44 | + | |
45 | 45 | | |
46 | 46 | | |
47 | 47 | | |
| 48 | + | |
48 | 49 | | |
49 | 50 | | |
50 | | - | |
| 51 | + | |
51 | 52 | | |
52 | 53 | | |
53 | 54 | | |
| |||
0 commit comments