fix: emit Delta liquid clustering for clustered_by (#187)#188
Merged
Conversation
`{{ config(clustered_by=[...]) }}` was silently a no-op on Delta tables
because `fabricspark__clustered_cols` only emitted the clause when
`buckets` was also set (Hive bucketing). Fabric Spark accepts
`CLUSTER BY (cols)` on Delta CTAS for liquid clustering — this commit
wires the existing `clustered_by` config through to that DDL without
introducing a new config name.
Override `fabricspark__clustered_cols` and `fabricspark__file_format_clause`
per the reference impl in #187:
- `clustered_by` + `buckets` → unchanged Hive bucketing
- `clustered_by` alone on Delta → `cluster by (cols)` + `using delta`
- `clustered_by` + `partition_by` on Delta → compile-time error
(mutually exclusive on Delta)
- Non-`delta` `file_format` → unchanged `using <fmt>`, no clustering
Seeds inherit the new behavior automatically through adapter dispatch
(`seed.sql` calls the same `clustered_cols` and `file_format_clause`
macros).
Adds 10 unit tests covering the four acceptance-criteria branches and
adds `clustered_by` to two existing end-to-end test models — the
`persist_docs` Fabric delta table model and the local-e2e jaffle_shop
`orders.sql` — so both the Fabric Spark and local Spark 3.5.1 + Delta
3.2.0 paths exercise the new clause on every CI run.
Closes #187.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #187.
Problem
{{ config(clustered_by=[...]) }}is silently a no-op on Delta tables today.fabricspark__clustered_colsonly emits the clause whenbucketsis also set (Hive bucketing). For Delta, the right behavior is to emitCLUSTER BY (cols)for Fabric Spark liquid clustering.Fix
Drop in the reference impl from #187 — override
fabricspark__clustered_colsandfabricspark__file_format_clauseso the existingclustered_byconfig opts a Delta table into liquid clustering. No new config name introduced.Semantics matrix:
clustered_bybucketsfile_formatpartition_byclustered by (cols) into N bucketsdelta/unsetcluster by (cols)+using deltaemitted byfile_format_clausedelta/unsetexceptions.raise_compiler_error(mutually exclusive)deltausing <fmt>emittedSeeds inherit the new behavior automatically through adapter dispatch —
seed.sqlcalls the sameclustered_colsandfile_format_clausemacros.Expected DDL (per #187 acceptance criteria)
Validation
npx nx run dbt-fabricspark:testran end-to-end on this branch — all targets green:no_schemawith_schema(incl. cross-workspace)End-to-end smoke models added per user direction:
tests/functional/adapter/persist_docs/fixtures.py::_MODELS__TABLE_DELTA_MODEL→clustered_by=['id'](Fabric).tests/fixtures/dbt-jaffle-shop/models/orders.sql→clustered_by=['order_id'](local Livy).A successful
dbt runagainst both Fabric Spark and local Spark+Delta with the new clause is the live confirmation that the macros emit valid liquid-clustering SQL.Version
Bumps
1.12.0→1.12.1, CHANGELOG updated.