fix: omit spark.livy.session.idle.timeout by default to keep Fabric starter-pool acceleration (#184)#189
Merged
Merged
Conversation
… starter-pool acceleration The adapter unconditionally injected `spark.livy.session.idle.timeout` into every Livy session `conf` because `session_idle_timeout` defaulted to "30m". Fabric treats that key as session-immutable, so its presence — even when the value matched the pool's own default — emitted `FallbackReasons: UserSparkConfigMismatch` and forced an on-demand cluster cold start (~3 min vs ~40 s on a warm starter pool). The same fallback was firing on every CI run of this repo today (visible in `tests/functional/fixtures/ws2_seed/logs/dbt.log`). Flip the credential default to `None` so the key is dropped from the acquire payload unless the user explicitly opts in. Existing profiles that set `session_idle_timeout` keep their previous behavior with the same starter-pool trade-off, now documented in the README config table. - credentials.py: default `session_idle_timeout` `"30m"` → `None`. - README: config-table row + profile example + a contributor-friendly bug-bashing section at the top. - Issue templates: cross-link the contributor guide for repros. - CHANGELOG: new v1.12.2 bug-fix entry. - Unit tests: cover default-off + explicit-opt-in for both the HC (`_build_acquire_payload`) and singleton (`_create_fabric_session`) paths, plus a credential-level default check. Fixes #184 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why this change is needed
Fixes #184.
How the bug was discovered
@karelrappo reported that every fresh Livy session created by
dbt-fabricsparkfalls back from a Fabric starter pool to an on-demand cluster, adding
roughly 3 min of cold-start to every cold run. Fabric returns
FallbackReasons: UserSparkConfigMismatchand explicitly namesspark.livy.session.idle.timeoutas a session-immutable SparkConf that isincompatible with the pool.
Repro confirmed
The bug was already firing on every CI run of this repo — visible in our own
fixture logs at
tests/functional/fixtures/ws2_seed/logs/dbt.log(e.g.lines 56, 355, 515):
@karelrappo verified that setting
session_idle_timeout: ""drops the keyand restores starter-pool match: ~196 s acquire → ~43 s acquire.
How
FabricSparkCredentials.session_idle_timeoutpreviously defaulted to"30m",which always made the
if credentials.session_idle_timeout:injectionguards inside
concurrent_livy._build_acquire_payloadandsingleton_livy._create_fabric_sessionadd the key to the sessionconf.Flipping the default to
Nonekeeps both guards in place but makes thekey absent unless the user explicitly opts in.
Considerations
session_idle_timeoutexplicitly continue to behave as before, with the same on-demand fallback
trade-off — now documented in the README config table.
expose a top-level/non-SparkConf way to set per-session idle timeout. The
fix is purely client-side suppression.
a single default flip fixes both. Unit tests cover both paths plus a
credential-level default check.
"60m"because the test suite genuinelyneeds long-lived sessions; that intentional override still trips the
fallback in CI as it does today, which is the accepted trade-off.
Alternatives considered
"30m"default and emit a warning: doesn't fix anyone'scold-start; preserves the today-broken behavior.
current Fabric Livy API docs; idle timeout is a pool/workspace setting.
Test
Microsoft Employee contributors
Full
npx nx run dbt-fabricspark:testran green locally on this branch(unit + local-e2e + functional, both
no_schemaandwith_schemaincluding cross-workspace):
New unit tests in this PR:
tests/unit/test_credentials.py::test_credentials_session_idle_timeout_defaults_to_nonetests/unit/test_concurrent_livy.py::TestBuildAcquirePayloadIdleTimeout(4 cases)tests/unit/test_livysession.py::TestCreateFabricSessionIdleTimeout(4 cases)covering: default omits key (HC + singleton), empty string omits key,
explicit value injects key,
environmentIdinjection is independent.