Skip to content

feat(engine/sink)!: require relative paths in sink output#2123

Open
bnimit wants to merge 22 commits into
mainfrom
chore/sink-relative-paths
Open

feat(engine/sink)!: require relative paths in sink output#2123
bnimit wants to merge 22 commits into
mainfrom
chore/sink-relative-paths

Conversation

@bnimit
Copy link
Copy Markdown
Contributor

@bnimit bnimit commented Jun 1, 2026

Summary

Sink output parameters now require a relative path. The engine joins it internally against the per-job artifact directory (sandbox_root). Workflow authors no longer need to write Url(env["workerArtifactPath"]) / x.

This is a breaking change for any workflow author using the old absolute-URI pattern. See the CHANGELOG entry for the migration steps.

Closes #2117.
Follow-up to #2108 (which delivered the security half — the sandbox check).

What changed

  • SinkOutput::from_path rewritten to validate + join against ctx.sandbox_root
  • 16+ new unit tests covering the new validation chain (empty, whitespace, ./.., ://, leading //~, traversal, resolves-to-root, migration-keyword)
  • 2 new integration tests (relative-path success + absolute-path failure with migration hint)
  • 87 fixture YAML files migrated from legacy Url(env["workerArtifactPath"])/x and file::join_path(env.get("workerArtifactPath"), x) patterns
  • FeatureWriter processor updated to use SinkOutput::from_path (was using Uri::from_str directly)
  • 4 test harnesses switched from Runner::run (unsandboxed) to Runner::run_with_sandbox_root(tempdir()) to support relative paths in tests
  • engine/AGENTS.md "Sandbox & sink writes" section updated
  • CHANGELOG entry with migration instructions
  • Workspace version bumped

Migration for workflow authors

Old:

output:
  type: flowExpr
  value: |
    Url(env["workerArtifactPath"]) / "out.gpkg"

New (static):

output:
  type: string
  value: "out.gpkg"

New (dynamic):

output:
  type: flowExpr
  value: |
    "out_" + attributes["k"] + ".gpkg"

Workflows still using the old pattern fail at runtime with an error mentioning workerArtifactPath — search logs for that keyword.

Test plan

  • cargo make test-rs green (945 passed, 0 failed)
  • cargo make test-qc green (135 passed, 0 failed)
  • cargo make clippy clean
  • cargo make format no-op (fmt committed)
  • Manual: grep env["workerArtifactPath"] and env.get("workerArtifactPath") across YAML fixtures — zero sink-output uses remain

bnimit added 9 commits June 1, 2026 11:02
Pre-implementation TDD: write the test suite for the new from_path
semantics (relative-only input joined against ctx.sandbox_root). All 16
new tests fail against the current chokepoint that still accepts
absolute URIs. Task 3 will rewrite from_path to make them pass.
BREAKING: SinkOutput::from_path now accepts only relative paths and
joins them internally against ctx.sandbox_root. Absolute URIs (any
'scheme://...' form), leading '/' / '~', empty strings, whitespace
padding, literal '.' / '..', and paths that resolve to the artifact
directory itself are rejected with explicit error messages.

The URI-scheme rejection mentions workerArtifactPath by name so
customers upgrading can find the migration path from logs.

FeatureWriter is also updated to use SinkOutput::from_path so it
inherits the new validation chain (it previously used Uri::from_str
directly which would silently join relative paths against CWD).

gs:// and ram:// root tests use a lenient match: if the default
StorageResolver lacks those backends the test accepts a
"failed to resolve storage" error (storage step), but any earlier
validation or join failure is still a test failure.

join_with_dotdot_now_rejected_by_sandbox is updated to use '../..'
(two levels) so the traversal genuinely escapes sandbox_root.

This commit alone fails integration tests because existing fixtures
produce absolute URIs. The next commit migrates the 86 fixtures.

Part of #2117
The .local-specs/ directory holds local-only design + planning artifacts
that should not be tracked in git.
Strip the Url(env["workerArtifactPath"]) / prefix from sink output
expressions across all affected fixture files. Also strip the equivalent
file::join_path(env.get("workerArtifactPath"), x) wrapper, including
single-line complex expressions, multi-line let-variable patterns, and
YAML single-quoted value: forms. Literal-string outputs are promoted
from flowExpr type to string type where applicable. Empty
'with: { workerArtifactPath: null }' and bare 'workerArtifactPath:'
workflow declarations are removed.

Also update test harnesses to use Runner::run_with_sandbox_root so
that relative sink paths resolve against the tempdir rather than
the read-only filesystem root (file:///):

- engine/runtime/tests/helper.rs
- engine/runtime/tests/logging_test/logging_helper.rs
- engine/runtime/tests/testcase/geometry/neighbor_finder.rs
- engine/testing/workflow-tests/src/main.rs

Fix the logging/08 fixture's expected line number after the
workerArtifactPath: null removal shifted the YAML line count by one.

Fix tests/fixture/workflow/file/writer/geopackage.yaml: hardcoded
ram:///output_test.gpkg absolute URI → relative output_test.gpkg.

Functionally equivalent — the engine joins the relative path against
ctx.sandbox_root, which the worker/CLI already populates from the same
workerArtifactPath value.

Part of #2117.
- relative_output_writes_under_sandbox_root: confirms a workflow with
  output:{ type:string, value:'foo' } writes the file at
  <sandbox_root>/foo end-to-end.
- absolute_output_fails_with_migration_hint: confirms a workflow using
  the legacy Url(env["workerArtifactPath"])/x pattern fails at the
  chokepoint with an error that names workerArtifactPath (the search
  anchor customers will hit in logs).

Part of #2117.
Reflect the new strict-relative API: workflow authors provide a relative
path; the engine joins against ctx.sandbox_root. The same YAML is
portable across storage backends (file, gs, ram, s3) because the
sandbox_root scheme is decided by the worker/CLI, not the workflow.

Part of #2117.
Documents the breaking change in sink output parameters and the
migration path from the legacy Url(env["workerArtifactPath"])/x and
file::join_path(env.get("workerArtifactPath"), x) forms.

Part of #2117.
Copilot AI review requested due to automatic review settings June 1, 2026 17:56
@bnimit bnimit requested review from a team, asrcpq, n4to4 and shunski as code owners June 1, 2026 17:56
@github-actions github-actions Bot added the engine label Jun 1, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a breaking change to sink output handling: sink outputs must now be strict-relative paths that the engine joins against the per-job sandbox_root, improving workflow ergonomics and keeping sandbox enforcement centralized in SinkOutput::from_path.

Changes:

  • Reworked SinkOutput::from_path to validate strict-relative paths, join against ctx.sandbox_root, and sandbox-check via ensure_under.
  • Updated FeatureWriter and multiple test harnesses/fixtures to use relative outputs and to run with an explicit sandbox_root.
  • Added/updated integration fixtures and changelog/docs to reflect the migration away from workerArtifactPath-prefixed absolute URIs.

Reviewed changes

Copilot reviewed 100 out of 105 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
engine/runtime/action-sink/src/output.rs Implements strict-relative sink output validation + join against sandbox_root, with expanded unit tests.
engine/runtime/action-processor/src/feature/writer.rs Routes FeatureWriter buffering through SinkOutput::from_path and re-validates sandbox before writes.
engine/runtime/tests/helper.rs Updates test runner to use run_with_sandbox_root and adds execute_expect_err.
engine/runtime/tests/logging_test/logging_helper.rs Runs logging fixtures with an explicit sandbox_root.
engine/runtime/tests/testcase/geometry/neighbor_finder.rs Switches to relative output + runs workflow with sandbox root.
engine/testing/workflow-tests/src/main.rs Switches workflow-tests harness to run_with_sandbox_root and constructs sandbox_root from tempdir.
engine/runtime/tests/testcase/file/writer/relative_output.rs Adds integration tests for relative output success and absolute output failure.
engine/runtime/tests/testcase/file/writer.rs Registers the new relative_output testcase module.
engine/runtime/tests/fixture/workflow/file/writer/relative_output.yaml New fixture using a relative sink output.
engine/runtime/tests/fixture/workflow/file/writer/absolute_output_fails.yaml New fixture that should fail due to absolute output URI.
engine/runtime/tests/fixture/workflow/file/writer/geopackage.yaml Migrates GeoPackageWriter output to a relative path.
engine/runtime/tests/fixture/workflow/file/writer/czml_timeseries.yaml Migrates output expression to no longer depend on workerArtifactPath.
engine/runtime/tests/fixture/workflow/file/reader/czml_timeseries.yaml Migrates FeatureWriter output expression to no longer depend on workerArtifactPath.
engine/runtime/tests/fixture/workflow/obj-reader.yml Migrates FeatureWriter output (currently incorrectly set to a non-literal expression; see comments).
engine/runtime/tests/fixture/workflow/obj-reader-detailed.yml Migrates FeatureWriter outputs (currently incorrectly set to non-literal expressions; see comments).
engine/runtime/tests/fixture/testdata/logging/01_basic_sequential_processing/workflow.yml Removes workerArtifactPath input and migrates sink outputs to relative paths.
engine/runtime/tests/fixture/testdata/logging/02_parallel_processing/workflow.yml Same: removes workerArtifactPath input and migrates sink outputs to relative paths.
engine/runtime/tests/fixture/testdata/logging/03_duplicate_node_names/workflow.yml Same migration to relative outputs.
engine/runtime/tests/fixture/testdata/logging/04_factory_error/workflow.yml Same migration to relative outputs.
engine/runtime/tests/fixture/testdata/logging/05_source_error/workflow.yml Same migration to relative outputs.
engine/runtime/tests/fixture/testdata/logging/06_processor_error/workflow.yml Same migration to relative outputs.
engine/runtime/tests/fixture/testdata/logging/07_sink_error/workflow.yml Removes workerArtifactPath input (sink failure fixture).
engine/runtime/tests/fixture/testdata/logging/08_workflow_definition_error/workflow.yml Removes workerArtifactPath input and migrates output to relative.
engine/runtime/tests/fixture/testdata/logging/08_workflow_definition_error/user-facing.log Updates expected log line/column due to fixture edits.
engine/runtime/tests/fixture/testdata/logging/09_orphaned_sink/workflow.yml Removes workerArtifactPath input and migrates sink outputs to relative.
engine/runtime/examples/fixture/workflow/solar-radiation/workflow.yml Migrates many outputs to relative paths and updates Code type usage for sink outputs.
engine/runtime/examples/fixture/workflow/solar-radiation/solar-radiation.yml Same migration to relative sink outputs.
engine/runtime/examples/fixture/workflow/solar-radiation/adequate_place_judgement.yml Same migration to relative sink outputs.
engine/runtime/examples/fixture/workflow/examples/citygml3-reader/workflow.yml Migrates Cesium3DTilesWriter output to type: string relative output.
engine/runtime/examples/fixture/workflow/examples/citygml-roundtrip/workflow.yml Migrates outputs away from workerArtifactPath prefixing.
engine/runtime/examples/fixture/workflow/datetime_converter/basic_conversions.yml Migrates sink outputs to relative paths.
engine/runtime/examples/fixture/workflow/datetime_converter/custom_format.yml Migrates sink outputs to relative paths.
engine/runtime/examples/fixture/workflow/datetime_converter/error_handling.yml Migrates sink outputs to relative paths.
engine/runtime/examples/fixture/workflow/executor-testing/feature-write/diamond.yml Migrates sink outputs to relative paths and removes workerArtifactPath input.
engine/runtime/examples/fixture/workflow/executor-testing/feature-write/nested-subgraph.yml Same migration.
engine/runtime/examples/fixture/workflow/executor-testing/feature-write/subgraph.yml Same migration.
engine/runtime/examples/fixture/workflow/executor-testing/feature-write/two-copies.yml Same migration.
engine/runtime/examples/fixture/workflow/quality-check/plateau4/... Bulk migration of plateau QC fixtures to relative sink outputs and removal of workerArtifactPath inputs.
engine/runtime/examples/fixture/workflow/data-convert/plateau4/... Bulk migration of plateau data-convert fixtures to relative sink outputs and removal of workerArtifactPath inputs.
engine/runtime/examples/fixture/graphs/plateau4/quality-check/01-01-common.yml Updates graph fixture sink outputs to relative paths.
engine/runtime/examples/fixture/graphs/plateau4/quality-check/01-02-common.yml Updates graph fixture sink outputs to relative paths.
engine/worker/workflow/cms/plateau4/quality-check/bldg/template/workflow.yml Migrates CMS template sink output to relative.
engine/worker/workflow/cms/plateau4/data-convert/bldg/template/workflow.yml Migrates CMS template sink outputs/compress outputs to relative.
engine/AGENTS.md Documents new relative-output behavior and rejection rules.
CHANGELOG.md Adds breaking-change notice + migration instructions.
engine/Cargo.toml Bumps workspace version.
engine/Cargo.lock Updates locked workspace crate versions accordingly.
.gitignore Ignores .local-specs/.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread engine/runtime/tests/fixture/workflow/obj-reader.yml
Comment thread engine/runtime/tests/fixture/workflow/obj-reader-detailed.yml
Comment thread engine/runtime/tests/fixture/workflow/obj-reader-detailed.yml
Comment thread engine/runtime/tests/fixture/workflow/obj-reader-detailed.yml
Comment thread engine/runtime/tests/helper.rs
Comment thread engine/runtime/tests/testcase/file/writer/relative_output.rs
bnimit added 2 commits June 2, 2026 07:55
- Wrap bare FeatureWriter output values in double-quotes so Rhai
  evaluates them as string literals rather than property accesses
  (obj-reader.yml, obj-reader-detailed.yml).
- Change execute_expect_err to use e.to_string() (Display) instead
  of format!("{e:?}") (Debug) for more stable error messages.
- Strengthen relative_output_writes_under_sandbox_root: verify the
  output file actually exists under sandbox_root, not just that
  execution succeeded.

Part of #2117.
Run cargo make generate-examples-cms-workflow so the assembled
workflow YAMLs match the regenerator's output after the
relative-path migration in #2117.

Part of #2117.
@bnimit bnimit closed this Jun 2, 2026
@bnimit bnimit reopened this Jun 2, 2026
bnimit added 5 commits June 2, 2026 08:37
Bump workspace version to 0.0.377 (main caught up to 0.0.376).
Re-run cargo make generate-examples-cms-workflow to refresh assembled
plateau workflow YAMLs against the merged source pieces.
Resolve CHANGELOG conflict by keeping both Unreleased (this PR's
breaking change) and 0.1.0-alpha.18 (released by main while PR was open).

Part of #2117.
…utput

CI failure on PR #2123: cesium3dtiles and mvt sinks were calling
Uri::from_str(path) directly on the evaluated sink output expression.
After the strict-relative migration, that expression now produces a
bare filename like 'bldg_lod1' — Uri::from_str then joins it with CWD
(not sandbox_root), producing 'file:///<cwd>/bldg_lod1'. The new
SinkOutput::from_path chokepoint correctly rejects this as an absolute
URI.

Replace the Uri::from_str pre-parse with SinkOutput::from_path so the
relative path is validated and joined against ctx.sandbox_root before
being stored as the buffer key (same pattern Task 3 applied to
FeatureWriter).

Add SinkOutput::from_resolved_uri for the pipeline stages (tile_writing_stage,
geometry_slicing_stage) that receive an already-resolved buffer key URI
and need to acquire a SinkOutput handle without re-running the
strict-relative checks.

Part of #2117.
CI failure on PR #2123 (run 26798730587): plateau-tiles-test/runner.rs
was the only test harness left using Runner::run (the legacy file:///
sentinel). After the strict-relative migration, relative sink paths
joined against file:/// resolved to the filesystem root, producing
'PermissionDenied' errors when writing to /bldg_lod1.

Mirrors the Task 5 fix applied to helper.rs, logging_helper.rs,
neighbor_finder.rs, and workflow-tests/src/main.rs — use the
workerArtifactPath value (already computed as flow_dir) as sandbox_root.

Part of #2117.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 105 out of 110 changed files in this pull request and generated 1 comment.

Comment thread engine/runtime/action-processor/src/feature/writer.rs Outdated
Address Copilot/code-review concern that SinkOutput::from_path was
being called per-feature in three sinks (FeatureWriter, cesium3dtiles,
mvt) only to extract the validated URI as a buffer key. That pattern
forced storage_resolver.resolve() into the hot path for every feature.

Split the chokepoint into two clear phases:

  ensure_relative_path(ctx, path) -> Uri
    └── validate + join + ensure_under (no Storage acquisition)

  SinkOutput::from_path(ctx, path) -> SinkOutput
    └── ensure_relative_path + from_resolved_uri (full chokepoint)

The three sinks now call ensure_relative_path per feature (cheap) and
acquire Storage once at flush/write time via from_resolved_uri (which
was added during the cesium/mvt fix in this PR).

The validation behavior is identical — existing from_path tests
exercise the same code path because from_path now just composes the
two phases.

Part of #2117.
}
output: |
file::join_path(env.get("workerArtifactPath"), "summary_common2-" + env.get("__value")["package"] + ".json")
"summary_common2-" + env.get("__value")["package"] + ".json"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe if this is FlowExpr, output needs to be updated to have codeType = FlowExpr to work. Would like @asrcpq to confirm though

bnimit added 5 commits June 2, 2026 13:48
Restores the chokepoint guarantee weakened by the ensure_relative_path
split. After that refactor, from_resolved_uri was publicly exposed as
a way to construct a SinkOutput from a Uri without validation — a
future contributor could call it with any URI and bypass the sandbox.

Tighten to pub(crate). External callers (outside the action-sink crate)
have exactly two ways to interact with SinkOutput:

  - SinkOutput::from_path(ctx, &str): full chokepoint
    (validate + acquire + writeable handle)
  - ensure_relative_path(ctx, &str): validation only
    (returns a Uri, no write capability)

The split-phase optimization (ensure_relative_path then
from_resolved_uri) is now available only to sinks inside action-sink
that legitimately need to skip a second validation in their hot path
(cesium3dtiles, mvt). All current callers of from_resolved_uri are
inside action-sink, so this is a zero-functional-change tightening.

Part of #2117.
…, resolver)

Adopt external reviewer's proposed design: collapse from_path,
from_resolved_uri, ensure_relative_path, and join into a single
SinkOutput::new entrypoint. Validation, joining, and storage
acquisition all happen internally.

Buffering sinks (cesium3dtiles, mvt, FeatureWriter, csv, json, excel,
xml) key their buffers by the relative path String and call
SinkOutput::new once per output file at flush time. Storage resolver
caches keep this fast.

Grouped sinks (geojson, czml) compose sub-paths as strings
("{base}/{hash}.geojson") and call SinkOutput::new for each group.
Shapefile and GLTF pipelines compose tile/sidecar paths as strings.

A lightweight sandbox::ensure_valid_relative_path helper provides
syntactic early validation for buffering sinks (full sandbox check
still occurs at SinkOutput::new flush time).

Bug fixes the reviewer also caught:
- gltf: was calling Uri::from_str on the workflow's relative output
  at build time, which silently joined with CWD, producing an absolute
  URI that SinkOutput::new would have rejected at runtime. Now stores
  the relative path as String and calls SinkOutput::new at finish time.
- citygml texture sidecar: was composing absolute dst URIs from the
  parent of the resolved GML output (containing "://"), which
  SinkOutput::new rejects. Now computes the texture dst as a relative
  path under sandbox_root by stripping the sandbox_root prefix from
  the GML output URI, then composing "parent/stem_appearance/file.png".

Net effect: one constructor, one validation site, no API for bypassing
validation. The chokepoint guarantee is restored to its PR1/PR2 shape
— stronger than the pub(crate) tightening it replaces.

Part of #2117.
Reflect the SinkOutput::new(sandbox_root, path, resolver) chokepoint
that replaced from_path/from_resolved_uri/ensure_relative_path/join.
Note that buffering sinks key by relative-path String and compose
nested paths via string formatting, and that
sandbox::ensure_valid_relative_path provides optional fail-fast
validation at intake.

Part of #2117.
Address the remaining piece of the external reviewer's design: drop
the manual relative-path-checker that buffering sinks were calling at
intake. Validation now happens exclusively inside SinkOutput::new at
flush time. Bad paths still get rejected — just one validation site
instead of two.

Changes:
- Delete sandbox::ensure_valid_relative_path
- Drop the early-validate calls in cesium3dtiles/sink.rs (2 sites),
  mvt/sink.rs (2 sites), and FeatureWriter writer.rs (1 site)
- Revert sandbox module from pub to crate-private (the public surface
  is only ensure_under and SandboxError via lib.rs re-exports)
- Update engine/AGENTS.md to reflect the single-validation-site design

Part of #2117.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(engine/sink): sinks should accept relative paths and join internally (follow-up to #2108)

3 participants