fix: cross-workspace view/snapshot/MLV materializations omit workspace prefix (#172)#182
Merged
Conversation
…e prefix (#172) The cross-workspace 4-part naming feature (introduced in #167/#168) forwards `workspace_name` from the model config onto `target_relation` for table and incremental materializations only. View, snapshot, and materialized_lake_view materializations built their target relation via paths that drop the workspace, emitting 3-part DDL against a 4-part target: create or replace view `OtherLakehouse`.`dbt`.test_model as ... --> Database Error: Artifact not found: `MainWorkspace`.`OtherLakehouse` Root causes (per materialization): * view (`mv.sql`): delegated to default `create_or_replace_view()` which builds `api.Relation.create(...)` without forwarding workspace. * snapshot (`snapshot.sql`): `get_or_create_relation(...)` returns a relation with no `workspace` field — both the cache hit and the `api.Relation.create` fallback drop it. Staging view in `spark_build_snapshot_staging_table` had the same omission. * materialized_lake_view (`materialized_lake_view.sql`): built `target_relation` via `api.Relation.create(... type='materialized_view')` without workspace. Fix: * `mv.sql` — full fabricspark view materialization mirroring `table.sql`: builds `target_relation` with `workspace=config.get('workspace_name')`. * `snapshot.sql` — re-incorporates workspace onto the target after `get_or_create_relation`; staging `tmp_relation` also forwards `target_relation.workspace` so MERGE INTO sees a 4-part staging view. * `materialized_lake_view.sql` — adds `workspace=workspace_name` to the `target_relation` build. (MLV REST API lakehouse-id resolution is unaffected; cross-workspace MLV via REST is a separate concern.) Tests (`tests/functional/adapter/test_cross_workspace.py`): * `TestCrossWorkspace4PartWriteView` — compile + first-run + idempotent re-run, with runtime 4-part SELECT verification against WS2. * `TestCrossWorkspace4PartWriteSnapshot` — CTAS path (first run) plus MERGE INTO path (mutated source between runs); asserts SCD2 history (4 → 5 rows with `dbt_valid_to` populated on the closed-out row). Both classes skip in `no_schema` mode (Fabric Livy 4-part naming is schema-enabled-only). Reuses the existing `cross_ws_write` schema in WS2. Repro repo: https://github.com/cheyney-w/dbt-fabricspark-cross-workspace-demo Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The new fabricspark view materialization called `persist_docs` which emits `ALTER TABLE ... CHANGE COLUMN` for column comments — invalid against a view (Spark error: 'expects a table'). The pre-fix mv.sql delegated to dbt-core's `create_or_replace_view()` which does NOT call persist_docs; restoring that behavior unblocks the TestPersistDocsDeltaView::test_delta_comments functional test. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Collaborator
Author
Repro validation against @cheyney-w's demo repoRan the issue author's dbt-fabricspark-cross-workspace-demo end-to-end against this branch ( Setup
Steps (per the demo's README)
The originally failing models — now greencc @cheyney-w — happy to hand over the substituted project for your own verification if useful. |
Collaborator
Author
|
@cheyney-w - the above was posted by AI, it validated your project repro is fixed 🙂 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #172.
The cross-workspace 4-part naming feature (introduced in #167/#168) forwards
workspace_namefrom a model'sconfig()ontotarget_relationfor tableand incremental materializations only. View, snapshot, and
materialized_lake_view materializations built their target relation via
paths that drop the workspace, emitting 3-part DDL against a 4-part target:
Manual reproduction (issue scenario)
Verified on a live Fabric tenant with two schema-enabled lakehouses (one in
WS1, one in WS2). Repro project mirrors @cheyney-w's
dbt-fabricspark-cross-workspace-demomodels:
Before the fix:
Same error for
dbt snapshotagainst the same cross-workspace target.After the fix:
Rendered DDL is now correct:
Note that the snapshot staging view (
__dbt_tmp) is also workspace-qualifiednow — this is the second snapshot.sql change.
Root causes
viewcreate_or_replace_view()api.Relation.create(identifier, schema, database, type='view')— no workspace kwargsnapshotget_or_create_relation(...)api.Relation.create(...)also drops workspacesnapshotstaging viewspark_build_snapshot_staging_tableapi.Relation.create(... type='view')copied schema+database from target but not workspacematerialized_lake_viewapi.Relation.create(... type='materialized_view')The
tableandincrementalmaterializations avoid the bug because theyeither pass
workspace=config.get('workspace_name')explicitly (table.sql) oruse
this, which is built viaFabricSparkRelation.create_fromand auto-pullsworkspace_name.Fix
mv.sql— full fabricspark view materialization mirroringtable.sql:builds
target_relationwithworkspace=config.get('workspace_name').Preserves the existing
fabricspark__handle_existing_tablecleanup forstale tables.
snapshot.sql— re-incorporates workspace ontotarget_relationafterget_or_create_relation(usesincorporate(workspace=workspace_name),which round-trips through
FabricSparkRelation.from_dictand preserves thefield). Staging
tmp_relationalso forwardstarget_relation.workspaceso MERGE INTO sees a 4-part staging view.
materialized_lake_view.sql— addsworkspace=workspace_nameto thetarget_relationbuild. (The MLV REST API lakehouse-id resolution stilluses the profile workspace; cross-workspace MLV via REST is a separate
concern and is not in this PR's scope.)
Tests
Adds two functional test classes to
tests/functional/adapter/test_cross_workspace.py:TestCrossWorkspace4PartWriteView— compile + first-run + idempotentre-run, with runtime 4-part SELECT verification against WS2. Primary
regression signal is
test_cross_workspace_view_executeswhich failsoutright on the un-fixed code path.
TestCrossWorkspace4PartWriteSnapshot— exercises both the CTAS path(first run) and the MERGE INTO path (mutated source between runs). Asserts
SCD2 history (4 → 5 rows with
dbt_valid_topopulated on the closed-outrow). The MERGE-into-WS2 path validates the staging-view workspace fix.
Both classes skip in
no_schemamode (Fabric Livy 4-part naming isschema-enabled-only). Reuses the existing
cross_ws_writeschema in WS2.Validation
Run against a live Fabric tenant on this branch (WS1 + WS2 both
schema-enabled):
npx nx run dbt-fabricspark:lintnpx nx run dbt-fabricspark:buildnpx nx run dbt-fabricspark:test:unitpytest tests/functional/adapter/test_cross_workspace.py(all 17 tests)pytest tests/functional/adapter/basic/test_snapshot_*(regression)pytest tests/functional/adapter/basic/test_base.py(view regression)npx nx run dbt-fabricspark:test:local-e2eOut of scope
the relation-rendering layer for parity, but the MLV REST API path resolves
lakehouse IDs against the profile workspace. Cross-workspace MLV via REST
is a larger change and is intentionally not addressed here.
adapter.get_relation— the existing cross-workspaceincremental MERGE INTO tests already prove the cache flow works
cross-workspace, so no Python-side change is needed for snapshot
existence detection.
Closes #172.
Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com