Add remote Spark test orchestration framework by sdebruyn · Pull Request #208 · sdebruyn/dbt-fabric

sdebruyn · 2026-05-16T16:23:11Z

Summary

Adds a native --remote pytest flag that transparently delegates FabricSpark test execution to a Spark Job Definition on Fabric infrastructure
Adds Mode B (mounted lakehouse) support that redirects test artifacts to OneLake-accessible paths
The developer experience is simply pytest --de --remote -k "TestFoo" — no separate CLI needed

How it works

Local pytest collects tests normally
pytest_runtestloop hook intercepts when --remote is set
Syncs project to lakehouse via azcopy
Triggers a Spark Job Definition with forwarded pytest args + --junitxml
Polls for completion, downloads results
Parses junitxml and reports results back to local pytest session

New files

tests/spark_remote/conftest_plugin.py — pytest_runtestloop hook implementation
tests/spark_remote/result_reporter.py — junitxml → TestReport mapping
tests/spark_remote/orchestrator.py — coordinates sync + job client
tests/spark_remote/spark_job_client.py — Fabric REST API wrapper
tests/spark_remote/sync.py — azcopy sync wrapper
tests/spark_remote/spark_entry_point.py — runs inside Spark job

Test plan

Verify pytest --de still works without --remote (no regression)
Verify --remote without --de exits with error
Test Mode B dry-run with FABRIC_TEST_SPARK_EXEC_MODE=mounted
End-to-end test with real Fabric workspace

Closes #207

🤖 Generated with Claude Code

cloudflare-workers-and-pages · 2026-05-16T16:23:35Z

Deploying dbt-fabric with Cloudflare Pages

Latest commit:	`10ea1df`
Status:	✅ Deploy successful!
Preview URL:	https://ba8cf70a.dbt-fabric.pages.dev
Branch Preview URL:	https://feat-spark-remote-tests.dbt-fabric.pages.dev

View logs

Copilot

Pull request overview

Adds a pytest-based remote execution framework for FabricSpark tests, delegating --remote --de runs to a Fabric Spark Job Definition and mapping remote JUnit results back into the local pytest session.

Changes:

Adds remote orchestration, Spark job API, azcopy sync, Spark entry point, and JUnit result reporting modules.
Adds --remote pytest option plus remote/mounted artifact path handling in shared test fixtures.
Adds unit coverage for JUnit result parsing and updates local ignore/sample env files.

Reviewed changes

Copilot reviewed 9 out of 11 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
`tests/conftest.py`	Registers `--remote`, delegates test loop, and redirects test artifact fixtures for remote/mounted execution.
`tests/spark_remote/conftest_plugin.py`	Implements the pytest runtest loop interception and remote pytest arg construction.
`tests/spark_remote/orchestrator.py`	Coordinates prerequisite checks, project sync, Spark job submission, polling, and result download.
`tests/spark_remote/spark_job_client.py`	Wraps Fabric REST API calls for Spark Job Definition creation/execution.
`tests/spark_remote/spark_entry_point.py`	Runs inside Fabric Spark to install dependencies and execute pytest.
`tests/spark_remote/sync.py`	Generates remote support files and syncs project/artifacts through azcopy.
`tests/spark_remote/result_reporter.py`	Parses JUnit XML and emits local pytest reports for remote results.
`tests/spark_remote/__init__.py`	Adds package marker for remote test helpers.
`tests/unit/test_result_reporter.py`	Adds unit tests for nodeid reconstruction and JUnit parsing.
`test.env.sample`	Documents optional remote Spark test environment variables.
`.gitignore`	Ignores generated remote requirements, env, artifact, and result files.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Introduces a pytest-native `--remote` flag that transparently delegates FabricSpark test execution to a Spark Job Definition on Fabric. The developer runs `pytest --de --remote -k "TestFoo"` and gets normal pytest output — the plugin handles sync, job submission, polling, and result reporting via junitxml. Also adds Mode B (mounted lakehouse) support via FABRIC_TEST_SPARK_EXEC_MODE env var, which redirects test artifacts to OneLake-accessible paths. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Require all env vars (workspace_id, workspace_name, lakehouse_id) - Use shlex.join for proper arg quoting in Spark job submission - Handle FileNotFoundError when checking for az CLI - Support absolute continuationUri URLs in pagination - Remove unused _FORWARDED_OPTIONS constant - Fix error message to reference correct env var name - Add unit tests for junitxml parsing and nodeid reconstruction Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The pytest_runtestloop guard already requires --de and rejects --remote without it, so --dw can never be active in remote mode. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Both flags are DW-only and don't exist for FabricSpark/DE tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

OneLake DFS supports GUIDs directly, so we don't need to pass human-readable names through the orchestrator and sync layers. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Replace `import py` with explicit `from py.path import local as LocalPath` (py is a pytest-provided shim, not a separate package) - Forward positional test paths to remote pytest args so `pytest --de --remote tests/...::test_name` works correctly - Clean stale artifacts before running to prevent reporting old results - Filter auth secrets from test.env.remote (only non-sensitive vars synced) - Use FabricTokenProvider instead of hardcoded AzureCliCredential, supporting all configured auth methods (CLI, SP, workload identity, etc.) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Instead of requiring explicit IDs via env vars, the orchestrator now builds FabricCredentials from the same env vars as conftest (reusing _auth_kwargs_from_env) and uses FabricApiClient.get_workspace_id() / get_lakehouse_id() to resolve names to IDs via the Fabric API. This means FABRIC_TEST_WORKSPACE_NAME + FABRIC_TEST_LAKEHOUSE_NAME are sufficient — no separate FABRIC_TEST_REMOTE_LAKEHOUSE_ID needed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The remote orchestrator is a DE context — use FabricSparkCredentials where database = lakehouse name (matching how FabricSpark profiles work) instead of FabricCredentials with a separate lakehouse field. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

SparkJobClient now takes a FabricApiClient instance and delegates all HTTP calls to it, gaining 429 retry and consistent error handling. Removes duplicated auth headers, base URL constant, and request methods. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Each --remote invocation generates a unique run ID (uuid4 hex[:8]) and uses it to namespace OneLake paths (dbt-remote-runs/{run_id}/project/ and .../artifacts/) and the Spark Job Definition name. This prevents concurrent runs from overwriting each other's project files or results. The run ID is passed as the first CLI argument to the Spark entry point, which uses it to locate its project and artifacts directories. Per-run Spark Job Definitions are cleaned up after completion. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Project files are now synced to dbt-remote-runs/projects/{worktree_key}/ where worktree_key is a stable hash of the project root path. This enables incremental sync (only changed files are transferred) and allows the Spark Job Definition to be reused across runs from the same worktree. Artifacts remain isolated per run at dbt-remote-runs/artifacts/{run_id}/ so concurrent runs never overwrite each other's results. Multiple AI agents working in separate worktrees can now run --remote simultaneously with both fast incremental syncs and full result isolation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ases - Expose public methods (api_get, api_post, api_patch, api_delete, base_url) on FabricApiClient so external callers don't access private members - Keep private methods as thin wrappers for backward compatibility - Validate Location header in run_on_demand (raise on empty/unparseable) - Skip livy_session_lifecycle fixture when --remote is used - Allow --remote without spark extra installed locally - Restore FABRIC_TEST_BASE_API_URI/POWERBI_BASE_API_URI env var support - Reduce repetition in _report_item_result - Fix ruff lint warnings (SIM103, SIM102, C416) Refs: #247 (artifact cleanup tracking issue) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 11 out of 13 changed files in this pull request and generated 4 comments.

- Add --no-emit-project to uv export (avoids stale -e . in requirements) - Include lakehouse_id in Spark Job Definition name (prevents stale job reuse when targeting a different lakehouse) - Treat items missing from junitxml as skipped rather than errors (fixes -x/maxfail causing false failures for unexecuted tests) - Forward --with-python and --with-grants to remote pytest invocation Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace _api_get/_api_post/_api_patch/_api_delete with their public equivalents (api_get/api_post/api_patch/api_delete) throughout. No functional change — just eliminates the redundant indirection layer. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Use the junitparser library instead of raw xml.etree.ElementTree for parsing junitxml test results. This gives us a typed API (TestCase, Failure, Error, Skipped) and eliminates manual XML element traversal. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

sdebruyn force-pushed the feat/spark-remote-tests branch 2 times, most recently from 2dc580d to 2d0af9c Compare May 16, 2026 18:49

sdebruyn changed the title ~~Remote Spark test orchestration framework~~ Add remote Spark test orchestration framework May 16, 2026

sdebruyn mentioned this pull request May 16, 2026

Add remote Spark test orchestration framework #211

Closed

4 tasks

sdebruyn requested a review from Copilot May 16, 2026 20:11

Copilot started reviewing on behalf of sdebruyn May 16, 2026 20:11 View session

Copilot AI reviewed May 16, 2026

View reviewed changes

Comment thread tests/conftest.py Outdated

Comment thread tests/spark_remote/conftest_plugin.py

Comment thread tests/spark_remote/spark_entry_point.py Outdated

Comment thread tests/spark_remote/sync.py

sdebruyn force-pushed the feat/spark-remote-tests branch from b6bdd51 to 239c43c Compare May 16, 2026 23:47

sdebruyn and others added 9 commits May 17, 2026 15:34

Remove dead --dw forwarding from remote args builder

cd44db5

The pytest_runtestloop guard already requires --de and rejects --remote without it, so --dw can never be active in remote mode. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Remove --with-grants and --with-python from remote args

3cbce96

Both flags are DW-only and don't exist for FabricSpark/DE tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Use only IDs for OneLake URLs, drop workspace/lakehouse names

eb03e22

OneLake DFS supports GUIDs directly, so we don't need to pass human-readable names through the orchestrator and sync layers. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Document remote Spark test execution in CONTRIBUTING.md

10ea1df

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

sdebruyn force-pushed the feat/spark-remote-tests branch from 239c43c to 10ea1df Compare May 17, 2026 13:38

sdebruyn and others added 4 commits May 17, 2026 15:56

Add docstrings to all spark_remote functions and classes

ec532b9

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

sdebruyn mentioned this pull request May 17, 2026

Remote test artifacts accumulate without cleanup #247

Closed

sdebruyn requested a review from Copilot May 17, 2026 15:03

Copilot started reviewing on behalf of sdebruyn May 17, 2026 15:03 View session

Copilot AI reviewed May 17, 2026

View reviewed changes

Comment thread tests/spark_remote/sync.py Outdated

Comment thread tests/spark_remote/orchestrator.py Outdated

Comment thread tests/spark_remote/conftest_plugin.py

Comment thread tests/spark_remote/conftest_plugin.py

sdebruyn and others added 2 commits May 17, 2026 17:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add remote Spark test orchestration framework#208

Add remote Spark test orchestration framework#208
sdebruyn wants to merge 17 commits into
mainfrom
feat/spark-remote-tests

sdebruyn commented May 16, 2026 •

edited

Loading

Uh oh!

cloudflare-workers-and-pages Bot commented May 16, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

sdebruyn commented May 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

How it works

New files

Test plan

Uh oh!

cloudflare-workers-and-pages Bot commented May 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying dbt-fabric with Cloudflare Pages

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sdebruyn commented May 16, 2026 •

edited

Loading

cloudflare-workers-and-pages Bot commented May 16, 2026 •

edited

Loading