Skip to content

Commit 4ec2f1d

Browse files
Up
1 parent 77c9d7b commit 4ec2f1d

12 files changed

Lines changed: 2317 additions & 1144 deletions

File tree

README.md

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -327,6 +327,7 @@ Each segment is independently backtick-quoted, so workspace names with spaces or
327327
| `reuse_session` | bool | `false` | Keep Livy sessions alive for reuse across runs |
328328
| `session_id_file` | string | `./livy-session-id.txt` | Path to file storing session ID for reuse |
329329
| `session_idle_timeout` | string | `30m` | Livy session idle timeout (e.g. `30m`, `1h`) |
330+
| `high_concurrency` | bool | `true` | Use high-concurrency Livy API so each dbt thread gets its own REPL — see [High-concurrency Livy](#high-concurrency-livy) |
330331
| **Timeouts & Polling** | | | |
331332
| `connect_retries` | int | `1` | Number of connection retries |
332333
| `connect_timeout` | int | `10` | Connection timeout in seconds |
@@ -350,6 +351,51 @@ Each segment is independently backtick-quoted, so workspace names with spaces or
350351
| **Service Principal** | `SPN` | CI/CD and automation. Uses Azure AD app registration. | `client_id`, `tenant_id`, `client_secret` |
351352
| **Fabric Notebook** | `fabric_notebook` | Running dbt inside a Fabric notebook. Uses `notebookutils.credentials`. | None (runs in Fabric runtime) |
352353

354+
### High-concurrency Livy
355+
356+
By default the adapter uses Fabric's [high-concurrency Livy API](https://learn.microsoft.com/en-us/fabric/data-engineering/high-concurrency-livy)
357+
(`high_concurrency: true`). Each dbt thread acquires its own HC session — and therefore its own REPL — inside a single underlying Livy session
358+
shared via a deterministic `sessionTag` derived from `(workspaceid, lakehouseid)`. Statements from different REPLs execute in
359+
parallel inside the same Spark application, so increasing `threads` buys us throughput.
360+
361+
When `reuse_session: true`, the underlying Livy session also stays warm between dbt invocations (until Fabric's
362+
`spark.livy.session.idle.timeout` elapses), so the next run skips Spark cold-start entirely.
363+
364+
Set `high_concurrency: false` to fall back to the single-session-per-process mode, where one Livy session
365+
serves every thread and statements queue FIFO inside — useful as an escape hatch
366+
when debugging any problems with the high-concurrency API.
367+
368+
Fabric packs up to **5 REPLs onto one underlying Livy session** (see the
369+
["Limits"](https://learn.microsoft.com/en-us/fabric/data-engineering/high-concurrency-livy#key-concepts)
370+
note in the Microsoft Learn HC Livy docs). With `threads > 5`, dbt still
371+
works correctly — Fabric simply spins up a second underlying Livy session
372+
to host the 6th REPL onwards, and the same `sessionTag` makes future
373+
acquires snap-attach to whichever underlying session has room.
374+
375+
What that means in practice:
376+
377+
| Property | Shared across underlying sessions? |
378+
| ----------------------------------------------------- | ---------------------------------- |
379+
| OneLake Delta tables (dbt model outputs) | Yes — same lakehouse storage |
380+
| Catalog / metastore (`SELECT FROM <other_model>`) | Yes — same Fabric catalog |
381+
| Temp views (`CREATE TEMPORARY VIEW ...`) | No — REPL/session-local |
382+
| Session-level Spark configs (`SET spark.sql.X = ...`) | No |
383+
| Cached datasets / UDFs / broadcast vars | No |
384+
385+
Because dbt-fabricspark materializations always write permanent Delta /
386+
MLV objects, model-to-model `ref`s resolve correctly regardless of which
387+
underlying session produced or consumes the table. Macros that depend on
388+
session-local state (temp views, in-session configs) are the only ones
389+
that could surprise — none ship with this adapter today.
390+
391+
Cost tradeoff: each additional underlying Livy session is a separate
392+
Spark cluster billed for the duration of the run plus the
393+
`spark.livy.session.idle.timeout` afterwards. Keep `threads ≤ 5` for the
394+
cheapest profile; raise it only when the extra parallelism beats the
395+
extra compute spend.
396+
397+
High-concurrency has no effect in local mode as this is a Fabric specific construct.
398+
353399
### Materialized Lake Views
354400

355401
[Materialized lake views](https://learn.microsoft.com/en-us/fabric/data-engineering/materialized-lake-views/overview-materialized-lake-view) are a Fabric-native construct that materializes a SQL query as a Delta table in your lakehouse, with automatic lineage-based refresh managed by Fabric.
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
version = "1.11.0"
1+
version = "1.12.0"

0 commit comments

Comments
 (0)