Skip to content

Add high-concurrency Livy support for parallel statement execution#232

Merged
sdebruyn merged 12 commits into
mainfrom
feat/hc-livy
May 17, 2026
Merged

Add high-concurrency Livy support for parallel statement execution#232
sdebruyn merged 12 commits into
mainfrom
feat/hc-livy

Conversation

@sdebruyn

@sdebruyn sdebruyn commented May 17, 2026

Copy link
Copy Markdown
Owner

Summary

  • Add Fabric high-concurrency Livy API support so each dbt thread gets its own REPL inside a shared underlying Livy session — statements execute in parallel instead of queuing FIFO
  • HC is the only session mode — the legacy single-session path has been removed entirely
  • Deterministic sessionTag from (workspace_id, lakehouse_id) lets Fabric snap-attach REPLs onto a warm session across dbt invocations
  • HC session cleanup via dbt's connection manager close() path — no atexit handlers
  • Both adapters now use HC Livy: FabricSpark for all SQL execution, Fabric DW for Python model execution

Based on upstream contribution: microsoft/dbt-fabricspark#186. Clean-room implementation adapted to our FabricApiClient + HighConcurrencyLivySession + FabricSparkCursor architecture.

Closes #231

Changed files

File Change
fabric_api_client.py HC API methods only (acquire, poll, submit/get/cancel statement, delete); legacy single-session methods removed
fabric_hc_livy_session.py NewHighConcurrencyLivySession with REPL lifecycle, polling, and best-effort cleanup on failure
fabric_livy_session.py Reduced to shared dataclasses (LivySessionResult, LivySubmissionResult); LivySession class removed
fabric_livy_helper.py Switched to HighConcurrencyLivySession with thread-local storage for per-thread REPL isolation
fabricspark_connection_manager.py Always creates HC session (no legacy fallback)
fabricspark_cursor.py Routes cancel through session interface (cancel_statement)
fabricspark_connection.py Typed for HighConcurrencyLivySession only
docs/lakehouse.md HC architecture docs, Mermaid diagram, performance considerations
docs/comparison-dbt-fabricspark.md Updated comparison table

Test plan

  • Run FabricSpark integration tests (/test-de)
  • Run Fabric DW integration tests with Python models (/test-dw)
  • Verify threads > 5 creates multiple underlying sessions

🤖 Generated with Claude Code

)

Each dbt thread acquires its own REPL inside a shared underlying Livy
session via Fabric's HC Livy API, enabling true parallel execution
instead of FIFO queuing. Default on via `high_concurrency: true`.

Upstream contribution: microsoft/dbt-fabricspark#186

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 17, 2026 08:14
@cloudflare-workers-and-pages

cloudflare-workers-and-pages Bot commented May 17, 2026

Copy link
Copy Markdown

Deploying dbt-fabric with  Cloudflare Pages  Cloudflare Pages

Latest commit: fd02d95
Status: ✅  Deploy successful!
Preview URL: https://ed726286.dbt-fabric.pages.dev
Branch Preview URL: https://feat-hc-livy.dbt-fabric.pages.dev

View logs

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds Fabric high-concurrency (HC) Livy support to the fabricspark adapter so each dbt thread can execute statements through its own REPL (parallel execution), with a credential flag to fall back to the legacy single-session behavior.

Changes:

  • Introduces HighConcurrencyLivySession and corresponding FabricApiClient HC endpoints (acquire/poll/submit/get/cancel/delete).
  • Switches FabricSparkConnectionManager to choose HC vs legacy LivySession based on high_concurrency in credentials (default True).
  • Updates cursor cancellation flow to use a session-level cancel_statement() and expands documentation for HC mode.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
src/dbt/adapters/fabricspark/fabricspark_cursor.py Routes cancel through the session interface (cancel_statement) rather than directly via the API client.
src/dbt/adapters/fabricspark/fabricspark_credentials.py Adds high_concurrency: bool = True configuration flag.
src/dbt/adapters/fabricspark/fabricspark_connection.py Allows connections to hold either session type and closes HC sessions on connection close.
src/dbt/adapters/fabricspark/fabricspark_connection_manager.py Selects HC vs legacy session implementation during connection open.
src/dbt/adapters/fabric/fabric_livy_session.py Adds cancel_statement() to match the cursor’s new cancellation flow.
src/dbt/adapters/fabric/fabric_hc_livy_session.py New HC session implementation (acquire/poll/submit/get/cancel/cleanup).
src/dbt/adapters/fabric/fabric_api_client.py Adds HC Livy REST endpoints to support the new session implementation.
docs/lakehouse.md Documents HC mode behavior, configuration, and threads > 5 implications.
docs/configuration.md Documents the high_concurrency profile option and defaults.
docs/comparison-dbt-fabricspark.md Updates feature comparison to include HC mode and lifecycle differences.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/dbt/adapters/fabricspark/fabricspark_cursor.py
Comment thread src/dbt/adapters/fabricspark/fabricspark_cursor.py
Comment thread src/dbt/adapters/fabricspark/fabricspark_credentials.py
Comment thread src/dbt/adapters/fabric/fabric_hc_livy_session.py
Comment thread src/dbt/adapters/fabric/fabric_hc_livy_session.py
Comment thread src/dbt/adapters/fabric/fabric_hc_livy_session.py Outdated
Comment thread src/dbt/adapters/fabric/fabric_hc_livy_session.py
Comment thread src/dbt/adapters/fabric/fabric_api_client.py Outdated
sdebruyn and others added 5 commits May 17, 2026 10:42
HC mode passed all targeted integration tests (basic, validate connection,
concurrency). There is no reason to keep the singleton fallback — HC is
strictly better (parallel execution, warm session reuse). This removes the
`high_concurrency` config flag and simplifies all FabricSpark connection
code to use `HighConcurrencyLivySession` exclusively.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ling

- Fix unit test asserting old cancel path (cancel_livy_statement →
  cancel_statement)
- Handle requests transport exceptions (ConnectionError, Timeout,
  ChunkedEncodingError, JSONDecodeError) in HC session acquire and poll,
  matching the resilience of the singleton LivySession
- Remove unused _ACQUIRING_STATES constant

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… docs

- Rename session_id → livy_session_id in HC API client methods to
  clarify these take the underlying Livy session ID, not the HC session ID
- Merge duplicate TimeoutError/Exception handlers in
  wait_and_get_statement_result (TimeoutError is a subclass of Exception)
- Remove stale "singleton Livy sessions" reference from comparison doc

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The "1-5 minutes" claim was inaccurate — startup can sometimes
take just a few seconds. Replaced with generic phrasing.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The Fabric DW adapter's Python model execution now uses
HighConcurrencyLivySession instead of the old LivySession class.
This removes the last consumer of the legacy Livy session API,
so LivySession and all non-HC Livy methods in FabricApiClient
are deleted.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 5 comments.

Comment thread src/dbt/adapters/fabricspark/fabricspark_connection_manager.py
Comment thread src/dbt/adapters/fabric/fabric_livy_helper.py
Comment thread src/dbt/adapters/fabric/fabric_hc_livy_session.py Outdated
Comment thread docs/lakehouse.md Outdated
Comment thread src/dbt/adapters/fabric/fabric_hc_livy_session.py
sdebruyn and others added 5 commits May 17, 2026 11:25
…aths

- FabricLivyHelper: use thread-local storage instead of class-level
  singleton so each thread gets its own HC REPL
- HighConcurrencyLivySession: best-effort delete of HC session when
  _poll_until_idle fails or when re-acquiring after staleness
- Mermaid diagram: update API paths to match actual HC endpoints

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The file only contains dataclasses (LivySessionResult,
LivySubmissionResult) — the old name was misleading. Also removes
unit tests for the deleted LivySession class and legacy
FabricApiClient session management methods.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
29 tests covering: session tag derivation, logs URL, acquire with
retry/cleanup, polling (idle/timeout/fatal/transient), ensure-repl
re-acquire, SQL/Python statement dispatch, 404 dead-marking,
statement result parsing, close/cancel, and error resilience.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The synapsesql connector keeps JDBC connections to the Data Warehouse
open after df.write completes. These idle connections hold schema-level
locks (LCK_M_SCH_M) that block subsequent DDL in the same schema.

The GC must run as a separate Livy statement (fire-and-forget) after the
model code finishes, because running it in the same statement leaves the
JDBC objects in scope where GC cannot collect them.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@sdebruyn sdebruyn merged commit e25ee59 into main May 17, 2026
7 checks passed
@sdebruyn sdebruyn deleted the feat/hc-livy branch May 17, 2026 10:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: add high-concurrency Livy support for parallel statement execution

2 participants