From b4feb35f2512f13318125fd96428cbac2932288b Mon Sep 17 00:00:00 2001 From: Raki Rahman Date: Sun, 17 May 2026 20:58:06 +0000 Subject: [PATCH 1/2] fix: omit `spark.livy.session.idle.timeout` by default to keep Fabric starter-pool acceleration MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The adapter unconditionally injected `spark.livy.session.idle.timeout` into every Livy session `conf` because `session_idle_timeout` defaulted to "30m". Fabric treats that key as session-immutable, so its presence — even when the value matched the pool's own default — emitted `FallbackReasons: UserSparkConfigMismatch` and forced an on-demand cluster cold start (~3 min vs ~40 s on a warm starter pool). The same fallback was firing on every CI run of this repo today (visible in `tests/functional/fixtures/ws2_seed/logs/dbt.log`). Flip the credential default to `None` so the key is dropped from the acquire payload unless the user explicitly opts in. Existing profiles that set `session_idle_timeout` keep their previous behavior with the same starter-pool trade-off, now documented in the README config table. - credentials.py: default `session_idle_timeout` `"30m"` → `None`. - README: config-table row + profile example + a contributor-friendly bug-bashing section at the top. - Issue templates: cross-link the contributor guide for repros. - CHANGELOG: new v1.12.2 bug-fix entry. - Unit tests: cover default-off + explicit-opt-in for both the HC (`_build_acquire_payload`) and singleton (`_create_fabric_session`) paths, plus a credential-level default check. Fixes #184 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- .github/ISSUE_TEMPLATE/bug-report.yml | 19 ++++ .github/ISSUE_TEMPLATE/regression-report.yml | 20 ++++ CHANGELOG.md | 6 ++ README.md | 98 ++++++++++++-------- src/dbt/adapters/fabricspark/__version__.py | 2 +- src/dbt/adapters/fabricspark/credentials.py | 8 +- tests/unit/test_concurrent_livy.py | 43 +++++++++ tests/unit/test_credentials.py | 18 ++++ tests/unit/test_livysession.py | 57 +++++++++++- 9 files changed, 228 insertions(+), 43 deletions(-) diff --git a/.github/ISSUE_TEMPLATE/bug-report.yml b/.github/ISSUE_TEMPLATE/bug-report.yml index 8709490f..096c55df 100644 --- a/.github/ISSUE_TEMPLATE/bug-report.yml +++ b/.github/ISSUE_TEMPLATE/bug-report.yml @@ -3,6 +3,25 @@ description: Report a bug or an issue you've found title: "[Bug] " labels: ["type:bug", "triage:product"] body: +- type: markdown + attributes: + value: | + ## Issues and bug bashing + + > ⚠️ Here's how you can get your issue triaged and fixed ASAP + + In the age of AI, we should be innovating and shipping daily. + So in this adapter, we try to fix bugs and ship features fast - and keep a high bar of test coverage. + + Once you open an issue, please - try to be **as descriptive as possible** to give our human/AI maintainers the necessary details to reproduce the issue. + + For example - if you can - create a dummy repro dbt project - so the maintainer can reproduce your problem ASAP - see an example a well-written GitHub issue [here](https://github.com/microsoft/dbt-fabricspark/issues/172#issuecomment-4426907640). + + Once we have a fix identified, to gain more confidence, we might ask you install the adapter right from a PR branch to ensure the repro is gone, like so: + + ```bash + pip install git+https://github.com/microsoft/dbt-fabricspark.git@dev/somebranch/172 + ``` - type: markdown attributes: value: Thanks for taking the time to fill out this bug report! diff --git a/.github/ISSUE_TEMPLATE/regression-report.yml b/.github/ISSUE_TEMPLATE/regression-report.yml index 0f919467..dc023d0c 100644 --- a/.github/ISSUE_TEMPLATE/regression-report.yml +++ b/.github/ISSUE_TEMPLATE/regression-report.yml @@ -4,6 +4,26 @@ description: Report a regression you've observed in a newer version title: "[Regression] <title>" labels: ["type:regression", "triage:product"] body: +- type: markdown + attributes: + value: | + ## Issues, bug-bashing, help us help you + + > ⚠️ Here's how you can get your issue triaged and fixed ASAP + + In the age of AI, we should be innovating and shipping **high-quality** software daily. + + So in this adapter, we try to fix bugs and ship features fast - and keep an extremely high bar for test coverage in CI before PRs merge to `main`. + + Once you open an issue, please - try to be **as descriptive as possible** to give our human/AI maintainers the necessary details to reproduce the issue rapidly. + + For example - if you can - create a dummy repro dbt project in your GitHub account - so the maintainers can reproduce your problem ASAP - see an example a well-written GitHub issue [here](https://github.com/microsoft/dbt-fabricspark/issues/172#issuecomment-4426907640). + + If the issue is complex - once we have a fix identified, to gain more confidence, we might ask you install the adapter right from a PR branch to ensure the repro is gone in your setup as well, like so: + + ```bash + pip install git+https://github.com/microsoft/dbt-fabricspark.git@dev/somebranch/123 + ``` - type: markdown attributes: value: Thanks for taking the time to fill out this regression report! diff --git a/CHANGELOG.md b/CHANGELOG.md index 128f7523..ed7a6209 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,5 +1,11 @@ # Changelog +## v1.12.2 + +### Bug Fixes + +- Fixed every fresh Livy session being forced onto an on-demand Spark cluster instead of a Fabric starter pool. `session_idle_timeout` no longer defaults to `"30m"`; the adapter now omits `spark.livy.session.idle.timeout` from the session `conf` unless the user explicitly sets a value. Fabric treats that key as session-immutable, so its mere presence (even matching the pool's own default) emitted `FallbackReasons: UserSparkConfigMismatch` and added ~3 min of cold-start per cold session (vs ~40 s on a warm starter pool). Existing profiles that set `session_idle_timeout` explicitly continue to behave as before, with the same starter-pool trade-off (#184) + ## v1.12.1 ### Bug Fixes diff --git a/README.md b/README.md index ed5c5a78..090f261a 100644 --- a/README.md +++ b/README.md @@ -67,6 +67,24 @@ pip install dbt-fabricspark[cli] > **Note:** The `azure-cli` is an optional dependency is only required for the `CLI` authentication mode. Service Principal (`SPN`) and Fabric Notebook (`fabric_notebook`) authentication modes do not need it. +## Issues, bug-bashing, help us help you + +> ⚠️ Here's how you can get your issue triaged and fixed ASAP + +In the age of AI, we should be innovating and shipping **high-quality** software daily. + +So in this adapter, we try to fix bugs and ship features fast - and keep an extremely high bar for test coverage in CI before PRs merge to `main`. + +Once you open an issue, please - try to be **as descriptive as possible** to give our human/AI maintainers the necessary details to reproduce the issue rapidly. + +For example - if you can - create a dummy repro dbt project in your GitHub account - so the maintainers can reproduce your problem ASAP - see an example a well-written GitHub issue [here](https://github.com/microsoft/dbt-fabricspark/issues/172#issuecomment-4426907640). + +If the issue is complex - once we have a fix identified, to gain more confidence, we might ask you install the adapter right from a PR branch to ensure the repro is gone in your setup as well, like so: + +```bash +pip install git+https://github.com/microsoft/dbt-fabricspark.git@dev/somebranch/123 +``` + ## Configuration Use a Livy endpoint to connect to Apache Spark in Microsoft Fabric. Configure your `profiles.yml` to connect via Livy endpoints. @@ -109,7 +127,9 @@ fabric-spark-test: # Session management reuse_session: true - session_idle_timeout: "30m" + # session_idle_timeout: "30m" # Opt-in only. Setting this triggers + # Fabric to bypass starter pools and + # cold-start an on-demand cluster. # session_id_file: ./livy-session-id.txt # Default path # Timeouts @@ -304,44 +324,44 @@ Each segment is independently backtick-quoted, so workspace names with spaces or ### Configuration Reference -| Option | Type | Default | Description | -| ----------------------- | ------ | ------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------- | -| `type` | string | — | Must be `fabricspark` | -| `method` | string | `livy` | Connection method | -| `endpoint` | string | `https://api.fabric.microsoft.com/v1` | Fabric API endpoint URL | -| `workspaceid` | string | — | Fabric workspace UUID | -| `lakehouseid` | string | — | Lakehouse UUID | -| `lakehouse` | string | — | Lakehouse name | -| `schema` | string | — | Schema name. Must equal `lakehouse` for non-schema lakehouses, must differ from `lakehouse` for schema-enabled (e.g., `dbo`) | -| `threads` | int | `1` | Number of threads for parallel execution | -| **Authentication** | | | | -| `authentication` | string | `CLI` | Auth method: `CLI`, `SPN`, or `fabric_notebook` | -| `client_id` | string | — | Service principal client ID (SPN only) | -| `tenant_id` | string | — | Azure AD tenant ID (SPN only) | -| `client_secret` | string | — | Service principal secret (SPN only) | -| `accessToken` | string | — | Direct access token (optional) | -| **Environment** | | | | -| `environmentId` | string | — | Fabric Environment ID for Spark configuration | -| `spark_config` | dict | `{}` | Spark session configuration (must include `name` key) | -| **Session Management** | | | | -| `reuse_session` | bool | `false` | Keep Livy sessions alive for reuse across runs | -| `session_id_file` | string | `./livy-session-id.txt` | Path to file storing session ID for reuse | -| `session_idle_timeout` | string | `30m` | Livy session idle timeout (e.g. `30m`, `1h`) | -| `high_concurrency` | bool | `true` | Use high-concurrency Livy API so each dbt thread gets its own REPL — see [High-concurrency Livy](#high-concurrency-livy) | -| **Timeouts & Polling** | | | | -| `connect_retries` | int | `1` | Number of connection retries | -| `connect_timeout` | int | `10` | Connection timeout in seconds | -| `http_timeout` | int | `120` | Seconds per HTTP request to Fabric API | -| `session_start_timeout` | int | `600` | Max seconds to wait for session start | -| `statement_timeout` | int | `3600` | Max seconds to wait for statement result | -| `poll_wait` | int | `10` | Seconds between session start polls | -| `poll_statement_wait` | int | `5` | Seconds between statement result polls | -| **Other** | | | | -| `retry_all` | bool | `false` | Retry all operations on failure | -| `create_shortcuts` | bool | `false` | Enable Fabric shortcut creation | -| `shortcuts_json_str` | string | — | JSON string defining shortcuts | -| `livy_mode` | string | `fabric` | `fabric` for Fabric cloud, `local` for local Livy | -| `livy_url` | string | `http://localhost:8998` | Local Livy URL (local mode only) | +| Option | Type | Default | Description | +| ----------------------- | ------ | ------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `type` | string | — | Must be `fabricspark` | +| `method` | string | `livy` | Connection method | +| `endpoint` | string | `https://api.fabric.microsoft.com/v1` | Fabric API endpoint URL | +| `workspaceid` | string | — | Fabric workspace UUID | +| `lakehouseid` | string | — | Lakehouse UUID | +| `lakehouse` | string | — | Lakehouse name | +| `schema` | string | — | Schema name. Must equal `lakehouse` for non-schema lakehouses, must differ from `lakehouse` for schema-enabled (e.g., `dbo`) | +| `threads` | int | `1` | Number of threads for parallel execution | +| **Authentication** | | | | +| `authentication` | string | `CLI` | Auth method: `CLI`, `SPN`, or `fabric_notebook` | +| `client_id` | string | — | Service principal client ID (SPN only) | +| `tenant_id` | string | — | Azure AD tenant ID (SPN only) | +| `client_secret` | string | — | Service principal secret (SPN only) | +| `accessToken` | string | — | Direct access token (optional) | +| **Environment** | | | | +| `environmentId` | string | — | Fabric Environment ID for Spark configuration | +| `spark_config` | dict | `{}` | Spark session configuration (must include `name` key) | +| **Session Management** | | | | +| `reuse_session` | bool | `false` | Keep Livy sessions alive for reuse across runs | +| `session_id_file` | string | `./livy-session-id.txt` | Path to file storing session ID for reuse | +| `session_idle_timeout` | string | — | Optional Livy session idle timeout (e.g. `30m`, `1h`). Leave unset to keep [Fabric starter-pool](https://learn.microsoft.com/en-us/fabric/data-engineering/configure-starter-pools) acceleration; setting a value injects `spark.livy.session.idle.timeout` into the session `conf`, which Fabric treats as session-immutable and falls back to an on-demand cluster. | +| `high_concurrency` | bool | `true` | Use high-concurrency Livy API so each dbt thread gets its own REPL — see [High-concurrency Livy](#high-concurrency-livy) | +| **Timeouts & Polling** | | | | +| `connect_retries` | int | `1` | Number of connection retries | +| `connect_timeout` | int | `10` | Connection timeout in seconds | +| `http_timeout` | int | `120` | Seconds per HTTP request to Fabric API | +| `session_start_timeout` | int | `600` | Max seconds to wait for session start | +| `statement_timeout` | int | `3600` | Max seconds to wait for statement result | +| `poll_wait` | int | `10` | Seconds between session start polls | +| `poll_statement_wait` | int | `5` | Seconds between statement result polls | +| **Other** | | | | +| `retry_all` | bool | `false` | Retry all operations on failure | +| `create_shortcuts` | bool | `false` | Enable Fabric shortcut creation | +| `shortcuts_json_str` | string | — | JSON string defining shortcuts | +| `livy_mode` | string | `fabric` | `fabric` for Fabric cloud, `local` for local Livy | +| `livy_url` | string | `http://localhost:8998` | Local Livy URL (local mode only) | ### Authentication Modes diff --git a/src/dbt/adapters/fabricspark/__version__.py b/src/dbt/adapters/fabricspark/__version__.py index 1eeb96f2..db0ff816 100644 --- a/src/dbt/adapters/fabricspark/__version__.py +++ b/src/dbt/adapters/fabricspark/__version__.py @@ -1 +1 @@ -version = "1.12.1" +version = "1.12.2" diff --git a/src/dbt/adapters/fabricspark/credentials.py b/src/dbt/adapters/fabricspark/credentials.py index 259ecfb3..150f1401 100644 --- a/src/dbt/adapters/fabricspark/credentials.py +++ b/src/dbt/adapters/fabricspark/credentials.py @@ -66,7 +66,13 @@ class FabricSparkCredentials(Credentials): environmentId: Optional[str] = None session_id_file: Optional[str] = None reuse_session: bool = False # When True, Fabric sessions are kept alive and reused across runs - session_idle_timeout: str = "30m" # Livy session idle timeout (e.g. "30m", "1h") + # Default ``None`` so the adapter does NOT send + # ``spark.livy.session.idle.timeout`` in the session ``conf``. Fabric + # treats that key as session-immutable, so its presence (even when the + # value matches the pool default) disqualifies starter-pool matching and + # forces on-demand cold start. Set explicitly (e.g. ``"30m"``) only when + # the trade-off is acceptable. + session_idle_timeout: Optional[str] = None # High-concurrency Livy. When True (default), each dbt thread acquires # its own REPL inside a single underlying Livy session shared via a diff --git a/tests/unit/test_concurrent_livy.py b/tests/unit/test_concurrent_livy.py index 5562792c..07bb4059 100644 --- a/tests/unit/test_concurrent_livy.py +++ b/tests/unit/test_concurrent_livy.py @@ -409,3 +409,46 @@ def test_execute_strips_trailing_semicolon(self): with patch.object(HighConcurrencyCursor, "execute") as mock_exec: wrapper.execute("SELECT 1;") mock_exec.assert_called_once_with("SELECT 1") + + +# --------------------------------------------------------------------------- # +# _build_acquire_payload — session_idle_timeout injection # +# --------------------------------------------------------------------------- # + + +class TestBuildAcquirePayloadIdleTimeout: + """Guard rails for the starter-pool fallback bug. + + Fabric treats ``spark.livy.session.idle.timeout`` as a session-immutable + SparkConf; its mere presence in the acquire ``conf`` disqualifies + starter-pool matching. The adapter must therefore omit the key unless + the user has explicitly opted in by setting ``session_idle_timeout``. + """ + + def test_default_credentials_omit_idle_timeout(self): + creds = _make_creds() + hc = HighConcurrencySession(creds, creds.spark_config) + payload = hc._build_acquire_payload() + assert "spark.livy.session.idle.timeout" not in payload.get("conf", {}) + + def test_empty_string_idle_timeout_omits_key(self): + creds = _make_creds(session_idle_timeout="") + hc = HighConcurrencySession(creds, creds.spark_config) + payload = hc._build_acquire_payload() + assert "spark.livy.session.idle.timeout" not in payload.get("conf", {}) + + def test_explicit_idle_timeout_injects_key(self): + creds = _make_creds(session_idle_timeout="45m") + hc = HighConcurrencySession(creds, creds.spark_config) + payload = hc._build_acquire_payload() + assert payload["conf"]["spark.livy.session.idle.timeout"] == "45m" + + def test_environment_id_still_injects_when_idle_timeout_omitted(self): + creds = _make_creds(environmentId="11111111-2222-3333-4444-555555555555") + hc = HighConcurrencySession(creds, creds.spark_config) + payload = hc._build_acquire_payload() + assert ( + payload["conf"]["spark.fabric.environment.id"] + == "11111111-2222-3333-4444-555555555555" + ) + assert "spark.livy.session.idle.timeout" not in payload["conf"] diff --git a/tests/unit/test_credentials.py b/tests/unit/test_credentials.py index 9978db59..f3cf729d 100644 --- a/tests/unit/test_credentials.py +++ b/tests/unit/test_credentials.py @@ -161,6 +161,24 @@ def test_credentials_type() -> None: assert credentials.type == "fabricspark" +def test_credentials_session_idle_timeout_defaults_to_none() -> None: + """Default ``session_idle_timeout`` must be falsy. + + A truthy default would inject ``spark.livy.session.idle.timeout`` into + the Livy session ``conf``, which Fabric treats as session-immutable — + disqualifying starter-pool matching and forcing on-demand cold start. + """ + credentials = FabricSparkCredentials( + method="livy", + authentication="CLI", + lakehouse="tests", + workspaceid="1de8390c-9aca-4790-bee8-72049109c0f4", + lakehouseid="8c5bc260-bc3a-4898-9ada-01e433d461ba", + spark_config={"name": "test-session"}, + ) + assert not credentials.session_idle_timeout + + def test_credentials_database_defaults_to_lakehouse() -> None: """Test that database is always derived from lakehouse name.""" credentials = FabricSparkCredentials( diff --git a/tests/unit/test_livysession.py b/tests/unit/test_livysession.py index 242c321e..087fccbf 100644 --- a/tests/unit/test_livysession.py +++ b/tests/unit/test_livysession.py @@ -490,9 +490,9 @@ def test_custom_session_file_path(self): assert credentials.resolved_session_id_file == "/custom/path/my-session.txt" -def _make_fabric_credentials(reuse_session=False, session_id_file=None): +def _make_fabric_credentials(reuse_session=False, session_id_file=None, **overrides): """Helper to create Fabric mode credentials for tests.""" - return FabricSparkCredentials( + base = dict( method="livy", livy_mode="fabric", authentication="CLI", @@ -504,6 +504,8 @@ def _make_fabric_credentials(reuse_session=False, session_id_file=None): reuse_session=reuse_session, session_id_file=session_id_file, ) + base.update(overrides) + return FabricSparkCredentials(**base) class TestFabricSessionReuseMode: @@ -672,6 +674,57 @@ def test_connect_fabric_routes_to_fresh_when_flag_not_set(self, mock_headers): mock_reuse.assert_not_called() +class TestCreateFabricSessionIdleTimeout: + """Guard rails for the starter-pool fallback bug on the singleton path. + + Fabric treats ``spark.livy.session.idle.timeout`` as a session-immutable + SparkConf, so its mere presence in the POST ``/sessions`` ``conf`` + disqualifies starter-pool matching and forces an on-demand cold start. + """ + + def setup_method(self): + LivySessionManager.livy_global_session = None + + def _capture_spark_config(self, credentials): + captured: dict = {} + + def _capture(spark_config): + captured["spark_config"] = spark_config + + with patch( + "dbt.adapters.fabricspark.livysession.LivySession.create_session", + side_effect=_capture, + ): + LivySessionManager._create_fabric_session(credentials, {"name": "test-session"}) + return captured["spark_config"] + + def test_default_credentials_omit_idle_timeout(self): + credentials = _make_fabric_credentials() + spark_config = self._capture_spark_config(credentials) + assert "spark.livy.session.idle.timeout" not in spark_config.get("conf", {}) + + def test_empty_string_idle_timeout_omits_key(self): + credentials = _make_fabric_credentials(session_idle_timeout="") + spark_config = self._capture_spark_config(credentials) + assert "spark.livy.session.idle.timeout" not in spark_config.get("conf", {}) + + def test_explicit_idle_timeout_injects_key(self): + credentials = _make_fabric_credentials(session_idle_timeout="45m") + spark_config = self._capture_spark_config(credentials) + assert spark_config["conf"]["spark.livy.session.idle.timeout"] == "45m" + + def test_environment_id_still_injects_when_idle_timeout_omitted(self): + credentials = _make_fabric_credentials( + environmentId="11111111-2222-3333-4444-555555555555" + ) + spark_config = self._capture_spark_config(credentials) + assert ( + spark_config["conf"]["spark.fabric.environment.id"] + == "11111111-2222-3333-4444-555555555555" + ) + assert "spark.livy.session.idle.timeout" not in spark_config["conf"] + + class TestFixBinding: """Tests for LivySessionConnectionWrapper._fix_binding.""" From 2db9ca75ce61c0059eb6d525b33cfc78009f9eee Mon Sep 17 00:00:00 2001 From: Raki Rahman <mdrakiburrahman@gmail.com> Date: Sun, 17 May 2026 21:02:47 +0000 Subject: [PATCH 2/2] Proofread --- CONTRIBUTING.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 2ae965ac..b809a75b 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -62,10 +62,10 @@ Therefore, if the tests pass locally, they are highly likely to pass in CI as we 1. Install recommended developer tooling (optional): - ```bash - curl -fsSL https://gh.io/copilot-install | bash - $HOME/.local/bin/copilot --yolo - ``` + ```bash + curl -fsSL https://gh.io/copilot-install | bash + $HOME/.local/bin/copilot --yolo + ``` 1. Login to github and ensure to authorize `Microsoft` if you're an employee (optional):