Add high-concurrency Livy support for parallel statement execution#232
Merged
Conversation
) Each dbt thread acquires its own REPL inside a shared underlying Livy session via Fabric's HC Livy API, enabling true parallel execution instead of FIFO queuing. Default on via `high_concurrency: true`. Upstream contribution: microsoft/dbt-fabricspark#186 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Deploying dbt-fabric with
|
| Latest commit: |
fd02d95
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://ed726286.dbt-fabric.pages.dev |
| Branch Preview URL: | https://feat-hc-livy.dbt-fabric.pages.dev |
There was a problem hiding this comment.
Pull request overview
Adds Fabric high-concurrency (HC) Livy support to the fabricspark adapter so each dbt thread can execute statements through its own REPL (parallel execution), with a credential flag to fall back to the legacy single-session behavior.
Changes:
- Introduces
HighConcurrencyLivySessionand correspondingFabricApiClientHC endpoints (acquire/poll/submit/get/cancel/delete). - Switches
FabricSparkConnectionManagerto choose HC vs legacyLivySessionbased onhigh_concurrencyin credentials (defaultTrue). - Updates cursor cancellation flow to use a session-level
cancel_statement()and expands documentation for HC mode.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| src/dbt/adapters/fabricspark/fabricspark_cursor.py | Routes cancel through the session interface (cancel_statement) rather than directly via the API client. |
| src/dbt/adapters/fabricspark/fabricspark_credentials.py | Adds high_concurrency: bool = True configuration flag. |
| src/dbt/adapters/fabricspark/fabricspark_connection.py | Allows connections to hold either session type and closes HC sessions on connection close. |
| src/dbt/adapters/fabricspark/fabricspark_connection_manager.py | Selects HC vs legacy session implementation during connection open. |
| src/dbt/adapters/fabric/fabric_livy_session.py | Adds cancel_statement() to match the cursor’s new cancellation flow. |
| src/dbt/adapters/fabric/fabric_hc_livy_session.py | New HC session implementation (acquire/poll/submit/get/cancel/cleanup). |
| src/dbt/adapters/fabric/fabric_api_client.py | Adds HC Livy REST endpoints to support the new session implementation. |
| docs/lakehouse.md | Documents HC mode behavior, configuration, and threads > 5 implications. |
| docs/configuration.md | Documents the high_concurrency profile option and defaults. |
| docs/comparison-dbt-fabricspark.md | Updates feature comparison to include HC mode and lifecycle differences. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
HC mode passed all targeted integration tests (basic, validate connection, concurrency). There is no reason to keep the singleton fallback — HC is strictly better (parallel execution, warm session reuse). This removes the `high_concurrency` config flag and simplifies all FabricSpark connection code to use `HighConcurrencyLivySession` exclusively. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ling - Fix unit test asserting old cancel path (cancel_livy_statement → cancel_statement) - Handle requests transport exceptions (ConnectionError, Timeout, ChunkedEncodingError, JSONDecodeError) in HC session acquire and poll, matching the resilience of the singleton LivySession - Remove unused _ACQUIRING_STATES constant Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… docs - Rename session_id → livy_session_id in HC API client methods to clarify these take the underlying Livy session ID, not the HC session ID - Merge duplicate TimeoutError/Exception handlers in wait_and_get_statement_result (TimeoutError is a subclass of Exception) - Remove stale "singleton Livy sessions" reference from comparison doc Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The "1-5 minutes" claim was inaccurate — startup can sometimes take just a few seconds. Replaced with generic phrasing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The Fabric DW adapter's Python model execution now uses HighConcurrencyLivySession instead of the old LivySession class. This removes the last consumer of the legacy Livy session API, so LivySession and all non-HC Livy methods in FabricApiClient are deleted. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…aths - FabricLivyHelper: use thread-local storage instead of class-level singleton so each thread gets its own HC REPL - HighConcurrencyLivySession: best-effort delete of HC session when _poll_until_idle fails or when re-acquiring after staleness - Mermaid diagram: update API paths to match actual HC endpoints Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The file only contains dataclasses (LivySessionResult, LivySubmissionResult) — the old name was misleading. Also removes unit tests for the deleted LivySession class and legacy FabricApiClient session management methods. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
29 tests covering: session tag derivation, logs URL, acquire with retry/cleanup, polling (idle/timeout/fatal/transient), ensure-repl re-acquire, SQL/Python statement dispatch, 404 dead-marking, statement result parsing, close/cancel, and error resilience. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The synapsesql connector keeps JDBC connections to the Data Warehouse open after df.write completes. These idle connections hold schema-level locks (LCK_M_SCH_M) that block subsequent DDL in the same schema. The GC must run as a separate Livy statement (fire-and-forget) after the model code finishes, because running it in the same statement leaves the JDBC objects in scope where GC cannot collect them. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This reverts commit 3444723.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
sessionTagfrom(workspace_id, lakehouse_id)lets Fabric snap-attach REPLs onto a warm session across dbt invocationsclose()path — noatexithandlersBased on upstream contribution: microsoft/dbt-fabricspark#186. Clean-room implementation adapted to our
FabricApiClient+HighConcurrencyLivySession+FabricSparkCursorarchitecture.Closes #231
Changed files
fabric_api_client.pyfabric_hc_livy_session.pyHighConcurrencyLivySessionwith REPL lifecycle, polling, and best-effort cleanup on failurefabric_livy_session.pyLivySessionResult,LivySubmissionResult);LivySessionclass removedfabric_livy_helper.pyHighConcurrencyLivySessionwith thread-local storage for per-thread REPL isolationfabricspark_connection_manager.pyfabricspark_cursor.pycancel_statement)fabricspark_connection.pyHighConcurrencyLivySessiononlydocs/lakehouse.mddocs/comparison-dbt-fabricspark.mdTest plan
/test-de)/test-dw)threads > 5creates multiple underlying sessions🤖 Generated with Claude Code