Skip to content

Add cross-workspace 4-part naming support for FabricSpark #233

@sdebruyn

Description

@sdebruyn

Context

Microsoft Fabric Spark SQL supports 4-part naming (workspace.lakehouse.schema.table) for cross-workspace queries and writes in schema-enabled lakehouses. This allows a single dbt project to read from and write to multiple Fabric workspaces without requiring OneLake shortcuts.

Currently our adapter renders relations as 3-part names (lakehouse.schema.table), limiting operations to the current workspace.

Confirmed working (tested 2026-05-17 via ad-hoc Livy statements):

Statement 4-part support
SELECT * FROM \ws`.`lh`.`schema`.`table`` Yes
INSERT INTO \ws`.`lh`.`schema`.`table`` Yes
CREATE TABLE \ws`.`lh`.`schema`.`table` AS SELECT ...` Yes
MERGE INTO \ws`.`lh`.`schema`.`table`` Yes
CREATE OR REPLACE VIEW \ws`.`lh`.`schema`.`view`` Yes
CREATE OR REPLACE MATERIALIZED LAKE VIEW ... Untested (REST API path)
DROP TABLE/VIEW IF EXISTS \ws`.`lh`.`schema`.`obj`` Yes

Requirements

  • The session's default lakehouse must be schema-enabled (or unset)
  • User must have permissions on the target workspace/lakehouse
  • Non-schema lakehouses use 3-part cross-workspace names (workspace.lakehouse.table) — we don't need to support this since our adapter requires schema-enabled lakehouses

Prior art

Microsoft's upstream dbt-fabricspark implemented this in:

  • PR #167 — Initial 4-part naming (read + table/incremental write)
  • PR #182 — Fix for view/snapshot/MLV materializations that lost the workspace prefix

Design

User interface

Users set workspace_name in their model config to target a different workspace:

{{ config(
    materialized='table',
    workspace_name='other-workspace',
    database='other-lakehouse',
    schema='my_schema'
) }}

SELECT * FROM {{ ref('local_model') }}

When workspace_name is not set, behavior is unchanged (3-part names within the current workspace).

Implementation plan

1. Add workspace field to FabricSparkRelation

@dataclass(frozen=True, eq=False, repr=False)
class FabricSparkRelation(BaseRelation):
    workspace: str | None = None
    ...

The field is render-only — it does NOT participate in cache identity (the cache key remains database.schema.identifier).

2. Override render() to prepend workspace

def render(self) -> str:
    base = super().render()
    if self.workspace and self.include_policy.database:
        quoted_ws = self.quoted(self.workspace)
        return f"{quoted_ws}.{base}" if base else quoted_ws
    return base

The workspace prefix is only emitted when include_policy.database is True — temp tables and CTEs that use .include(database=False) remain unqualified.

3. Override create_from() to pull workspace_name from model config

@classmethod
def create_from(cls, quoting, relation_config, **kwargs):
    if "workspace" not in kwargs:
        cfg = getattr(relation_config, "config", None)
        if cfg is not None:
            ws_name = cfg.get("workspace_name")
            if ws_name:
                kwargs["workspace"] = ws_name
    return super().create_from(quoting, relation_config, **kwargs)

This ensures this (used in incremental, snapshot, etc.) automatically carries the workspace. No per-materialization changes needed for the common path.

4. Handle incorporate() for workspace propagation

Override incorporate() (or ensure from_dict round-trips the workspace field) so that macros using target_relation.incorporate(...) preserve the workspace.

5. Update _build_spark_relation_list and catalog queries

We already extract the workspace from the 3-part namespace in _build_spark_relation_list and store it as catalog. Map this to the workspace field instead. Update SHOW TABLE EXTENDED queries to optionally use 4-part names when listing relations in other workspaces.

6. Validation

  • Reject workspace_name when schema-enabled mode is off (if we ever support non-schema mode)
  • Consider a parse-time warning/error if workspace_name targets the current workspace (redundant)

7. Update materializations (minimal)

Thanks to create_from() handling the workspace via this, most materializations work without changes. Verify:

  • table.sql — uses this.incorporate(type='table') → workspace preserved ✓
  • incremental.sql — uses this.incorporate(type='table') → workspace preserved ✓
  • materialized_view.sql — uses this.incorporate(type=...) → workspace preserved ✓
  • snapshot.sql — uses get_or_create_relation() which doesn't carry workspace. Need to re-incorporate workspace onto target_relation after get_or_create_relation (same fix as upstream PR Add FabricSpark integration tests for dbt-profiler package #182).
  • Staging tables in snapshot — forward target_relation.workspace to tmp_relation.

8. ensure_database_exists / CREATE DATABASE IF NOT EXISTS

The existing ensure_database_exists macro should forward the workspace so that CREATE DATABASE IF NOT EXISTS \ws`.`lh`.`schema`` works cross-workspace.

9. Tests

  • Unit tests: relation rendering with/without workspace
  • Integration tests: cross-workspace CTAS, MERGE, view creation (requires a second workspace in CI)

10. Documentation

  • Add docs page for cross-workspace models
  • Document the workspace_name config option
  • Note that this only works with schema-enabled lakehouses

Out of scope

  • Cross-workspace materialized lake views via REST API (the MLV REST API resolves lakehouse IDs against the profile workspace — a separate concern)
  • Non-schema lakehouse support (3-part cross-workspace names)
  • Workspace-aware relation cache (cache key remains 3-part)

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestfabricsparkRelated to the FabricSpark (Spark SQL / Lakehouse) adapter

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions