feat(engine/sink)!: require relative paths in sink output#2123
Conversation
Pre-implementation TDD: write the test suite for the new from_path semantics (relative-only input joined against ctx.sandbox_root). All 16 new tests fail against the current chokepoint that still accepts absolute URIs. Task 3 will rewrite from_path to make them pass.
BREAKING: SinkOutput::from_path now accepts only relative paths and joins them internally against ctx.sandbox_root. Absolute URIs (any 'scheme://...' form), leading '/' / '~', empty strings, whitespace padding, literal '.' / '..', and paths that resolve to the artifact directory itself are rejected with explicit error messages. The URI-scheme rejection mentions workerArtifactPath by name so customers upgrading can find the migration path from logs. FeatureWriter is also updated to use SinkOutput::from_path so it inherits the new validation chain (it previously used Uri::from_str directly which would silently join relative paths against CWD). gs:// and ram:// root tests use a lenient match: if the default StorageResolver lacks those backends the test accepts a "failed to resolve storage" error (storage step), but any earlier validation or join failure is still a test failure. join_with_dotdot_now_rejected_by_sandbox is updated to use '../..' (two levels) so the traversal genuinely escapes sandbox_root. This commit alone fails integration tests because existing fixtures produce absolute URIs. The next commit migrates the 86 fixtures. Part of #2117
The .local-specs/ directory holds local-only design + planning artifacts that should not be tracked in git.
Strip the Url(env["workerArtifactPath"]) / prefix from sink output
expressions across all affected fixture files. Also strip the equivalent
file::join_path(env.get("workerArtifactPath"), x) wrapper, including
single-line complex expressions, multi-line let-variable patterns, and
YAML single-quoted value: forms. Literal-string outputs are promoted
from flowExpr type to string type where applicable. Empty
'with: { workerArtifactPath: null }' and bare 'workerArtifactPath:'
workflow declarations are removed.
Also update test harnesses to use Runner::run_with_sandbox_root so
that relative sink paths resolve against the tempdir rather than
the read-only filesystem root (file:///):
- engine/runtime/tests/helper.rs
- engine/runtime/tests/logging_test/logging_helper.rs
- engine/runtime/tests/testcase/geometry/neighbor_finder.rs
- engine/testing/workflow-tests/src/main.rs
Fix the logging/08 fixture's expected line number after the
workerArtifactPath: null removal shifted the YAML line count by one.
Fix tests/fixture/workflow/file/writer/geopackage.yaml: hardcoded
ram:///output_test.gpkg absolute URI → relative output_test.gpkg.
Functionally equivalent — the engine joins the relative path against
ctx.sandbox_root, which the worker/CLI already populates from the same
workerArtifactPath value.
Part of #2117.
- relative_output_writes_under_sandbox_root: confirms a workflow with
output:{ type:string, value:'foo' } writes the file at
<sandbox_root>/foo end-to-end.
- absolute_output_fails_with_migration_hint: confirms a workflow using
the legacy Url(env["workerArtifactPath"])/x pattern fails at the
chokepoint with an error that names workerArtifactPath (the search
anchor customers will hit in logs).
Part of #2117.
Reflect the new strict-relative API: workflow authors provide a relative path; the engine joins against ctx.sandbox_root. The same YAML is portable across storage backends (file, gs, ram, s3) because the sandbox_root scheme is decided by the worker/CLI, not the workflow. Part of #2117.
Documents the breaking change in sink output parameters and the
migration path from the legacy Url(env["workerArtifactPath"])/x and
file::join_path(env.get("workerArtifactPath"), x) forms.
Part of #2117.
There was a problem hiding this comment.
Pull request overview
This PR introduces a breaking change to sink output handling: sink outputs must now be strict-relative paths that the engine joins against the per-job sandbox_root, improving workflow ergonomics and keeping sandbox enforcement centralized in SinkOutput::from_path.
Changes:
- Reworked
SinkOutput::from_pathto validate strict-relative paths, join againstctx.sandbox_root, and sandbox-check viaensure_under. - Updated
FeatureWriterand multiple test harnesses/fixtures to use relative outputs and to run with an explicitsandbox_root. - Added/updated integration fixtures and changelog/docs to reflect the migration away from
workerArtifactPath-prefixed absolute URIs.
Reviewed changes
Copilot reviewed 100 out of 105 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| engine/runtime/action-sink/src/output.rs | Implements strict-relative sink output validation + join against sandbox_root, with expanded unit tests. |
| engine/runtime/action-processor/src/feature/writer.rs | Routes FeatureWriter buffering through SinkOutput::from_path and re-validates sandbox before writes. |
| engine/runtime/tests/helper.rs | Updates test runner to use run_with_sandbox_root and adds execute_expect_err. |
| engine/runtime/tests/logging_test/logging_helper.rs | Runs logging fixtures with an explicit sandbox_root. |
| engine/runtime/tests/testcase/geometry/neighbor_finder.rs | Switches to relative output + runs workflow with sandbox root. |
| engine/testing/workflow-tests/src/main.rs | Switches workflow-tests harness to run_with_sandbox_root and constructs sandbox_root from tempdir. |
| engine/runtime/tests/testcase/file/writer/relative_output.rs | Adds integration tests for relative output success and absolute output failure. |
| engine/runtime/tests/testcase/file/writer.rs | Registers the new relative_output testcase module. |
| engine/runtime/tests/fixture/workflow/file/writer/relative_output.yaml | New fixture using a relative sink output. |
| engine/runtime/tests/fixture/workflow/file/writer/absolute_output_fails.yaml | New fixture that should fail due to absolute output URI. |
| engine/runtime/tests/fixture/workflow/file/writer/geopackage.yaml | Migrates GeoPackageWriter output to a relative path. |
| engine/runtime/tests/fixture/workflow/file/writer/czml_timeseries.yaml | Migrates output expression to no longer depend on workerArtifactPath. |
| engine/runtime/tests/fixture/workflow/file/reader/czml_timeseries.yaml | Migrates FeatureWriter output expression to no longer depend on workerArtifactPath. |
| engine/runtime/tests/fixture/workflow/obj-reader.yml | Migrates FeatureWriter output (currently incorrectly set to a non-literal expression; see comments). |
| engine/runtime/tests/fixture/workflow/obj-reader-detailed.yml | Migrates FeatureWriter outputs (currently incorrectly set to non-literal expressions; see comments). |
| engine/runtime/tests/fixture/testdata/logging/01_basic_sequential_processing/workflow.yml | Removes workerArtifactPath input and migrates sink outputs to relative paths. |
| engine/runtime/tests/fixture/testdata/logging/02_parallel_processing/workflow.yml | Same: removes workerArtifactPath input and migrates sink outputs to relative paths. |
| engine/runtime/tests/fixture/testdata/logging/03_duplicate_node_names/workflow.yml | Same migration to relative outputs. |
| engine/runtime/tests/fixture/testdata/logging/04_factory_error/workflow.yml | Same migration to relative outputs. |
| engine/runtime/tests/fixture/testdata/logging/05_source_error/workflow.yml | Same migration to relative outputs. |
| engine/runtime/tests/fixture/testdata/logging/06_processor_error/workflow.yml | Same migration to relative outputs. |
| engine/runtime/tests/fixture/testdata/logging/07_sink_error/workflow.yml | Removes workerArtifactPath input (sink failure fixture). |
| engine/runtime/tests/fixture/testdata/logging/08_workflow_definition_error/workflow.yml | Removes workerArtifactPath input and migrates output to relative. |
| engine/runtime/tests/fixture/testdata/logging/08_workflow_definition_error/user-facing.log | Updates expected log line/column due to fixture edits. |
| engine/runtime/tests/fixture/testdata/logging/09_orphaned_sink/workflow.yml | Removes workerArtifactPath input and migrates sink outputs to relative. |
| engine/runtime/examples/fixture/workflow/solar-radiation/workflow.yml | Migrates many outputs to relative paths and updates Code type usage for sink outputs. |
| engine/runtime/examples/fixture/workflow/solar-radiation/solar-radiation.yml | Same migration to relative sink outputs. |
| engine/runtime/examples/fixture/workflow/solar-radiation/adequate_place_judgement.yml | Same migration to relative sink outputs. |
| engine/runtime/examples/fixture/workflow/examples/citygml3-reader/workflow.yml | Migrates Cesium3DTilesWriter output to type: string relative output. |
| engine/runtime/examples/fixture/workflow/examples/citygml-roundtrip/workflow.yml | Migrates outputs away from workerArtifactPath prefixing. |
| engine/runtime/examples/fixture/workflow/datetime_converter/basic_conversions.yml | Migrates sink outputs to relative paths. |
| engine/runtime/examples/fixture/workflow/datetime_converter/custom_format.yml | Migrates sink outputs to relative paths. |
| engine/runtime/examples/fixture/workflow/datetime_converter/error_handling.yml | Migrates sink outputs to relative paths. |
| engine/runtime/examples/fixture/workflow/executor-testing/feature-write/diamond.yml | Migrates sink outputs to relative paths and removes workerArtifactPath input. |
| engine/runtime/examples/fixture/workflow/executor-testing/feature-write/nested-subgraph.yml | Same migration. |
| engine/runtime/examples/fixture/workflow/executor-testing/feature-write/subgraph.yml | Same migration. |
| engine/runtime/examples/fixture/workflow/executor-testing/feature-write/two-copies.yml | Same migration. |
| engine/runtime/examples/fixture/workflow/quality-check/plateau4/... | Bulk migration of plateau QC fixtures to relative sink outputs and removal of workerArtifactPath inputs. |
| engine/runtime/examples/fixture/workflow/data-convert/plateau4/... | Bulk migration of plateau data-convert fixtures to relative sink outputs and removal of workerArtifactPath inputs. |
| engine/runtime/examples/fixture/graphs/plateau4/quality-check/01-01-common.yml | Updates graph fixture sink outputs to relative paths. |
| engine/runtime/examples/fixture/graphs/plateau4/quality-check/01-02-common.yml | Updates graph fixture sink outputs to relative paths. |
| engine/worker/workflow/cms/plateau4/quality-check/bldg/template/workflow.yml | Migrates CMS template sink output to relative. |
| engine/worker/workflow/cms/plateau4/data-convert/bldg/template/workflow.yml | Migrates CMS template sink outputs/compress outputs to relative. |
| engine/AGENTS.md | Documents new relative-output behavior and rejection rules. |
| CHANGELOG.md | Adds breaking-change notice + migration instructions. |
| engine/Cargo.toml | Bumps workspace version. |
| engine/Cargo.lock | Updates locked workspace crate versions accordingly. |
| .gitignore | Ignores .local-specs/. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Wrap bare FeatureWriter output values in double-quotes so Rhai
evaluates them as string literals rather than property accesses
(obj-reader.yml, obj-reader-detailed.yml).
- Change execute_expect_err to use e.to_string() (Display) instead
of format!("{e:?}") (Debug) for more stable error messages.
- Strengthen relative_output_writes_under_sandbox_root: verify the
output file actually exists under sandbox_root, not just that
execution succeeded.
Part of #2117.
Bump workspace version to 0.0.377 (main caught up to 0.0.376). Re-run cargo make generate-examples-cms-workflow to refresh assembled plateau workflow YAMLs against the merged source pieces. Resolve CHANGELOG conflict by keeping both Unreleased (this PR's breaking change) and 0.1.0-alpha.18 (released by main while PR was open). Part of #2117.
…utput CI failure on PR #2123: cesium3dtiles and mvt sinks were calling Uri::from_str(path) directly on the evaluated sink output expression. After the strict-relative migration, that expression now produces a bare filename like 'bldg_lod1' — Uri::from_str then joins it with CWD (not sandbox_root), producing 'file:///<cwd>/bldg_lod1'. The new SinkOutput::from_path chokepoint correctly rejects this as an absolute URI. Replace the Uri::from_str pre-parse with SinkOutput::from_path so the relative path is validated and joined against ctx.sandbox_root before being stored as the buffer key (same pattern Task 3 applied to FeatureWriter). Add SinkOutput::from_resolved_uri for the pipeline stages (tile_writing_stage, geometry_slicing_stage) that receive an already-resolved buffer key URI and need to acquire a SinkOutput handle without re-running the strict-relative checks. Part of #2117.
CI failure on PR #2123 (run 26798730587): plateau-tiles-test/runner.rs was the only test harness left using Runner::run (the legacy file:/// sentinel). After the strict-relative migration, relative sink paths joined against file:/// resolved to the filesystem root, producing 'PermissionDenied' errors when writing to /bldg_lod1. Mirrors the Task 5 fix applied to helper.rs, logging_helper.rs, neighbor_finder.rs, and workflow-tests/src/main.rs — use the workerArtifactPath value (already computed as flow_dir) as sandbox_root. Part of #2117.
Restores the chokepoint guarantee weakened by the ensure_relative_path
split. After that refactor, from_resolved_uri was publicly exposed as
a way to construct a SinkOutput from a Uri without validation — a
future contributor could call it with any URI and bypass the sandbox.
Tighten to pub(crate). External callers (outside the action-sink crate)
have exactly two ways to interact with SinkOutput:
- SinkOutput::from_path(ctx, &str): full chokepoint
(validate + acquire + writeable handle)
- ensure_relative_path(ctx, &str): validation only
(returns a Uri, no write capability)
The split-phase optimization (ensure_relative_path then
from_resolved_uri) is now available only to sinks inside action-sink
that legitimately need to skip a second validation in their hot path
(cesium3dtiles, mvt). All current callers of from_resolved_uri are
inside action-sink, so this is a zero-functional-change tightening.
Part of #2117.
…, resolver)
Adopt external reviewer's proposed design: collapse from_path,
from_resolved_uri, ensure_relative_path, and join into a single
SinkOutput::new entrypoint. Validation, joining, and storage
acquisition all happen internally.
Buffering sinks (cesium3dtiles, mvt, FeatureWriter, csv, json, excel,
xml) key their buffers by the relative path String and call
SinkOutput::new once per output file at flush time. Storage resolver
caches keep this fast.
Grouped sinks (geojson, czml) compose sub-paths as strings
("{base}/{hash}.geojson") and call SinkOutput::new for each group.
Shapefile and GLTF pipelines compose tile/sidecar paths as strings.
A lightweight sandbox::ensure_valid_relative_path helper provides
syntactic early validation for buffering sinks (full sandbox check
still occurs at SinkOutput::new flush time).
Bug fixes the reviewer also caught:
- gltf: was calling Uri::from_str on the workflow's relative output
at build time, which silently joined with CWD, producing an absolute
URI that SinkOutput::new would have rejected at runtime. Now stores
the relative path as String and calls SinkOutput::new at finish time.
- citygml texture sidecar: was composing absolute dst URIs from the
parent of the resolved GML output (containing "://"), which
SinkOutput::new rejects. Now computes the texture dst as a relative
path under sandbox_root by stripping the sandbox_root prefix from
the GML output URI, then composing "parent/stem_appearance/file.png".
Net effect: one constructor, one validation site, no API for bypassing
validation. The chokepoint guarantee is restored to its PR1/PR2 shape
— stronger than the pub(crate) tightening it replaces.
Part of #2117.
Reflect the SinkOutput::new(sandbox_root, path, resolver) chokepoint that replaced from_path/from_resolved_uri/ensure_relative_path/join. Note that buffering sinks key by relative-path String and compose nested paths via string formatting, and that sandbox::ensure_valid_relative_path provides optional fail-fast validation at intake. Part of #2117.
Address the remaining piece of the external reviewer's design: drop the manual relative-path-checker that buffering sinks were calling at intake. Validation now happens exclusively inside SinkOutput::new at flush time. Bad paths still get rejected — just one validation site instead of two. Changes: - Delete sandbox::ensure_valid_relative_path - Drop the early-validate calls in cesium3dtiles/sink.rs (2 sites), mvt/sink.rs (2 sites), and FeatureWriter writer.rs (1 site) - Revert sandbox module from pub to crate-private (the public surface is only ensure_under and SandboxError via lib.rs re-exports) - Update engine/AGENTS.md to reflect the single-validation-site design Part of #2117.
…lative paths PR #2133 (merged from main into this branch) added new PLATEAU6 quality-check workflow fixtures that still use the absolute-URI pattern (Url(env["workerArtifactPath"]) / x or file::join_path). Our migration ran before that merge, so these two new files were not touched. Strip the wrappers using the same one-shot script that handled the original 87 fixtures. After this, the 7 plateau6 quality-check tests that were failing on CI (test_quality_check_plateau6_01_01_common_l01 through z_common_00_no_error) pass locally. Part of #2117.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 120 out of 125 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (1)
engine/runtime/examples/fixture/workflow/quality-check/plateau4/06-fld/workflow.yml:350
- The
outputexpression still referencesbase_path, butbase_pathis no longer defined (it was removed when migrating away fromworkerArtifactPath). This will fail at runtime whenFolderis empty.
Address Copilot review on engine/AGENTS.md:80. The original wording implied all sinks work against any sandbox_root scheme. In practice the chokepoint (SinkOutput::new + storage-resolver) is scheme-agnostic, but several sinks bypass the storage abstraction with direct std::fs calls (Shapefile, GLTF atlas, Cesium3DTiles atlas + zip-cleanup, MVT zip-cleanup, GeoPackage temp staging) and require a file:// root in practice. Replace the portability claim with a more accurate split: the chokepoint is portable; individual sinks vary. Closing the gap is a future hardening task (wrap remaining std::fs calls behind the storage backend). Part of #2117.
External reviewer noted that engine/AGENTS.md should hold general project context, not implementation-level enforcement details. The section had accumulated through PR1/PR2/PR3 and grew into a detailed-and-partially-contradictory implementation spec. Remove it. The substantive content already lives where it belongs: - engine/runtime/action-sink/src/output.rs — rustdoc on SinkOutput::new documents the validation rules, the chokepoint guarantee, and the workerArtifactPath migration hint - engine/runtime/storage/src/storage_sync.rs — doc comment on Storage::put_sync warns its users against direct calls from sinks AGENTS.md stays focused on onboarding/architecture; per-API specifics live next to the APIs. Part of #2117.
There was a problem hiding this comment.
Approved with minor issues.
For filtering, there might be other corner cases like NUL character, URL vs path compatibility, etc. But since this PR's original motivation was simplification instead of sandboxing, it should be sufficient. We probably need to have a safe_join for Url class if everything is fixed to be url-based.
Both cesium3dtiles/pipeline.rs and citygml.rs were stripping the sandbox_root prefix from the resolved output URI to derive a relative sub-path. If the strip failed (output_path did not start with sandbox_root), they silently fell back to using just the file_name — which would silently collide across groups and corrupt data (e.g. group A's textures overwriting group B's at appearance/foo.png). That fallback is wrong: SinkOutput::new produced output_path by joining sandbox_root with a relative path, so the prefix MUST be present. Replace the silent fallback with a hard Err so any upstream break surfaces loudly instead. Also drop a duplicate inline comment in citygml.rs flagged in the same review. Surfaced by external reviewer on PR #2123. Part of #2117.
|
Sorry, mistakenly merged some other commits into it, reverted. |
Revert the two non-code additions from this PR. The CHANGELOG block and the .local-specs/ gitignore line are not required for the sink-relative-paths change to function. Keeping the PR focused on engine code. Refs #2117
| let base_path = env.get("workerArtifactPath"); | ||
| let folder = env.get("__value").Folder; | ||
| if folder == () || folder == "" { | ||
| return file::join_path(base_path, "06_浸水想定区域_fld.csv"); |
There was a problem hiding this comment.
Regression. Some constructed paths are not updated. Here base_path is already removed so this branch will trigger error when hit.
- Implement experimental function/closure and environment frame chaining in FlowExpr. Closures currently weakly captures parent env. - Implement `itertools.map` and `itertools.sorted` functions (module and function names are currently unstable). - Add `let` keyword for explicit variable rebind. Assignment updates ancestor frames upwards if variable exists, fallback to definition. - Consolidate reearth-flow-expr error types by using an `UNSET_POS` sentinel. - Builtins are now shared as an immutable toplevel frame to avoid reinitialization every evaluation. - JSON converter creates a "batch feature" with `__features` attribute a list of concatenated features to replace the original variable injection of `__features` - Finish workflow migrations partially done in #2123 (continue #2143) - Fix a spurious error log introduced in AttributeMapper migration. Maybe silent skip is not optimal, but it is not in the scope of this PR so preserving main's behavior. - Add list unpacking like `[x,y] = [1,2]` to improve code readability. - Fix wrong sink error type `ExcelWriterFactory` -> `ExcelWriter` introduced in #1519. - Remove EvalString error variant in FlowExpr crate. It is incorrect to add flow-specific error variants in reearth-flow-expr crate. - Add compiled params in citygml, czml, geopackage sink. Previously code compilation is done in `finish()` stage, but syntax error report should be done in `build()` stage. - Implement string/bool cast purely in reearth-flow related crates. Tighten FeatureFilter's truthy condition: now the expression must be strictly evaluated to a bool type or error will be reported. String is more lenient, accepting numeric values as well. - Update workflows, they use the previously lenient truthy cast with `Regex.find()` but now triggers error. - Separate AST depth limit and call depth limit since closure's callback is not routed by the AST-level entrypoint `eval_inner`.
Summary
Sink
outputparameters now require a relative path. The engine joins it internally against the per-job artifact directory (sandbox_root). Workflow authors no longer need to writeUrl(env["workerArtifactPath"]) / x.This is a breaking change for any workflow author using the old absolute-URI pattern. See the CHANGELOG entry for the migration steps.
Closes #2117.
Follow-up to #2108 (which delivered the security half — the sandbox check).
What changed
SinkOutput::from_pathrewritten to validate + join againstctx.sandbox_root./..,://, leading//~, traversal, resolves-to-root, migration-keyword)Url(env["workerArtifactPath"])/xandfile::join_path(env.get("workerArtifactPath"), x)patternsSinkOutput::from_path(was usingUri::from_strdirectly)Runner::run(unsandboxed) toRunner::run_with_sandbox_root(tempdir())to support relative paths in testsengine/AGENTS.md"Sandbox & sink writes" section updatedMigration for workflow authors
Old:
New (static):
New (dynamic):
Workflows still using the old pattern fail at runtime with an error mentioning
workerArtifactPath— search logs for that keyword.Test plan
cargo make test-rsgreen (945 passed, 0 failed)cargo make test-qcgreen (135 passed, 0 failed)cargo make clippycleancargo make formatno-op (fmt committed)env["workerArtifactPath"]andenv.get("workerArtifactPath")across YAML fixtures — zero sink-output uses remain