Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,17 @@
# Changelog

## v1.12.0

### New Features

- **High-concurrency Livy support** for true parallel statement execution. Each dbt thread acquires its own REPL inside one underlying Livy session via [Fabric's HC Livy API](https://learn.microsoft.com/en-us/fabric/data-engineering/high-concurrency-livy) (`/highConcurrencySessions` + `/repls/{replId}/statements`). All threads in a process share a deterministic `sessionTag` derived from `(workspaceid, lakehouseid)` when `reuse_session: true`, so Fabric snap-attaches new REPLs onto the still-warm underlying session across runs — observed **3.6× wall-clock speedup** on the 2nd run of the issue's repro (442s → 122s). Singleton mode remains available via `high_concurrency: false`; the new flag defaults to `true` for Fabric mode and is a no-op in local mode. See the new "High-concurrency Livy" section in the README for the `threads > 5` cross-REPL state table (#185, #186)

### Infrastructure

- Refactored the Livy backend behind a new `LivyBackend` ABC with two implementations — `singleton_livy.py` (existing single-session path) and `concurrent_livy.py` (new HC path) — selected at connect time by the `high_concurrency` credential. Shared auth/header/retry/lakehouse-property helpers remain in `livysession.py`; the existing class names continue to be re-exported from there for backwards compatibility with downstream importers and the test patch surface (#186)

---

## v1.11.0

### New Features
Expand Down
46 changes: 46 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -327,6 +327,7 @@ Each segment is independently backtick-quoted, so workspace names with spaces or
| `reuse_session` | bool | `false` | Keep Livy sessions alive for reuse across runs |
| `session_id_file` | string | `./livy-session-id.txt` | Path to file storing session ID for reuse |
| `session_idle_timeout` | string | `30m` | Livy session idle timeout (e.g. `30m`, `1h`) |
| `high_concurrency` | bool | `true` | Use high-concurrency Livy API so each dbt thread gets its own REPL — see [High-concurrency Livy](#high-concurrency-livy) |
| **Timeouts & Polling** | | | |
| `connect_retries` | int | `1` | Number of connection retries |
| `connect_timeout` | int | `10` | Connection timeout in seconds |
Expand All @@ -350,6 +351,51 @@ Each segment is independently backtick-quoted, so workspace names with spaces or
| **Service Principal** | `SPN` | CI/CD and automation. Uses Azure AD app registration. | `client_id`, `tenant_id`, `client_secret` |
| **Fabric Notebook** | `fabric_notebook` | Running dbt inside a Fabric notebook. Uses `notebookutils.credentials`. | None (runs in Fabric runtime) |

### High-concurrency Livy

By default the adapter uses Fabric's [high-concurrency Livy API](https://learn.microsoft.com/en-us/fabric/data-engineering/high-concurrency-livy)
(`high_concurrency: true`). Each dbt thread acquires its own HC session — and therefore its own REPL — inside a single underlying Livy session
shared via a deterministic `sessionTag` derived from `(workspaceid, lakehouseid)`. Statements from different REPLs execute in
parallel inside the same Spark application, so increasing `threads` buys us throughput.

When `reuse_session: true`, the underlying Livy session also stays warm between dbt invocations (until Fabric's
`spark.livy.session.idle.timeout` elapses), so the next run skips Spark cold-start entirely.

Set `high_concurrency: false` to fall back to the single-session-per-process mode, where one Livy session
serves every thread and statements queue FIFO inside — useful as an escape hatch
when debugging any problems with the high-concurrency API.

Fabric packs up to **5 REPLs onto one underlying Livy session** (see the
["Limits"](https://learn.microsoft.com/en-us/fabric/data-engineering/high-concurrency-livy#key-concepts)
note in the Microsoft Learn HC Livy docs). With `threads > 5`, dbt still
works correctly — Fabric simply spins up a second underlying Livy session
to host the 6th REPL onwards, and the same `sessionTag` makes future
acquires snap-attach to whichever underlying session has room.

What that means in practice:

| Property | Shared across underlying sessions? |
| ----------------------------------------------------- | ---------------------------------- |
| OneLake Delta tables (dbt model outputs) | Yes — same lakehouse storage |
| Catalog / metastore (`SELECT FROM <other_model>`) | Yes — same Fabric catalog |
| Temp views (`CREATE TEMPORARY VIEW ...`) | No — REPL/session-local |
| Session-level Spark configs (`SET spark.sql.X = ...`) | No |
| Cached datasets / UDFs / broadcast vars | No |

Because dbt-fabricspark materializations always write permanent Delta /
MLV objects, model-to-model `ref`s resolve correctly regardless of which
underlying session produced or consumes the table. Macros that depend on
session-local state (temp views, in-session configs) are the only ones
that could surprise — none ship with this adapter today.

Cost tradeoff: each additional underlying Livy session is a separate
Spark cluster billed for the duration of the run plus the
`spark.livy.session.idle.timeout` afterwards. Keep `threads ≤ 5` for the
cheapest profile; raise it only when the extra parallelism beats the
extra compute spend.

High-concurrency has no effect in local mode as this is a Fabric specific construct.

### Materialized Lake Views

[Materialized lake views](https://learn.microsoft.com/en-us/fabric/data-engineering/materialized-lake-views/overview-materialized-lake-view) are a Fabric-native construct that materializes a SQL query as a Delta table in your lakehouse, with automatic lineage-based refresh managed by Fabric.
Expand Down
2 changes: 1 addition & 1 deletion src/dbt/adapters/fabricspark/__version__.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
version = "1.11.0"
version = "1.12.0"
Loading