feat(engine): expose per-node attribute schemas via 'schema' command#2134
Open
pyshx wants to merge 30 commits into
Open
feat(engine): expose per-node attribute schemas via 'schema' command#2134pyshx wants to merge 30 commits into
pyshx wants to merge 30 commits into
Conversation
…) via closed-schema seam
Contributor
There was a problem hiding this comment.
Pull request overview
Adds an engine-level schema CLI command that reports per-node (and per-port) attribute schemas for a workflow, seeded by light source sampling and propagated through the DAG via per-action transfer functions.
Changes:
- Introduces a new attribute-schema model (
AttrSchema,AttrType, presence lattice, JSON DTOs) inreearth-flow-types. - Adds runtime source sampling (
schema_sample) plus DAG propagation/inference (schema_infer), including cycle detection. - Adds
reearth-flow schemaCLI subcommand and implements schema transfer functions + tests for several processors.
Reviewed changes
Copilot reviewed 22 out of 23 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| engine/testing/workflow-tests/Cargo.toml | Version bump for workflow test crate. |
| engine/testing/plateau-tiles-test/Cargo.toml | Version bump for plateau tiles test crate. |
| engine/runtime/types/src/lib.rs | Exposes the new attr_schema module. |
| engine/runtime/types/src/attr_schema.rs | Adds schema data model + JSON report DTOs + unit tests. |
| engine/runtime/runtime/tests/schema_sample.rs | Adds integration tests for source sampling via a real source factory. |
| engine/runtime/runtime/src/schema_sample.rs | Implements bounded source execution + attribute unioning into a schema. |
| engine/runtime/runtime/src/schema_infer.rs | Implements static propagation + sampling-seeded inference across the DAG. |
| engine/runtime/runtime/src/node.rs | Adds infer_output_schema hooks to factory traits + small default behavior test. |
| engine/runtime/runtime/src/lib.rs | Exports schema_infer and schema_sample modules. |
| engine/runtime/runtime/src/errors.rs | Adds SchemaInferenceCycle execution error. |
| engine/runtime/runtime/Cargo.toml | Adds dependencies needed by schema sampling/tests (e.g. indexmap, tempfile). |
| engine/runtime/action-processor/src/feature/filter.rs | Adds schema inference for FeatureFilter. |
| engine/runtime/action-processor/src/attribute/statistics_calculator.rs | Adds schema inference for StatisticsCalculator + tests. |
| engine/runtime/action-processor/src/attribute/mapper.rs | Adds schema inference for AttributeMapper + tests. |
| engine/runtime/action-processor/src/attribute/manager.rs | Adds schema inference for AttributeManager + tests. |
| engine/runtime/action-processor/src/attribute/datetime_converter.rs | Adds schema inference for DateTimeConverter + tests. |
| engine/plateau-gis-quality-checker/src-tauri/Cargo.toml | Version bump for Tauri app crate. |
| engine/cli/src/schema.rs | Adds the schema CLI command implementation + end-to-end test. |
| engine/cli/src/main.rs | Wires the new schema module. |
| engine/cli/src/cli.rs | Registers the schema subcommand and dispatch. |
| engine/cli/Cargo.toml | Adds needed deps/dev-deps for the new command/tests. |
| engine/Cargo.toml | Engine version bump. |
| engine/Cargo.lock | Lockfile updates reflecting new versions/dependencies. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+109
to
+130
| fn infer_output_schema( | ||
| &self, | ||
| inputs: &HashMap<Port, reearth_flow_types::attr_schema::AttrSchema>, | ||
| _with: &Option<HashMap<String, Value>>, | ||
| ) -> Option<HashMap<Port, reearth_flow_types::attr_schema::AttrSchema>> { | ||
| use reearth_flow_types::attr_schema::AttrSchema; | ||
|
|
||
| // FeatureFilter routes whole features by expression; it never modifies | ||
| // attributes. So each statically-declared output port carries the input | ||
| // schema unchanged (identity). | ||
| let input = inputs | ||
| .get(&DEFAULT_PORT.clone()) | ||
| .cloned() | ||
| .unwrap_or_else(AttrSchema::open); | ||
|
|
||
| let map = self | ||
| .get_output_ports() | ||
| .into_iter() | ||
| .map(|port| (port, input.clone())) | ||
| .collect(); | ||
| Some(map) | ||
| } |
Comment on lines
+93
to
+108
| match op.method { | ||
| // Create/Convert both set the attribute to an expression-derived value, | ||
| // whose type we can't analyze statically -> Unknown, Always present. | ||
| Method::Create | Method::Convert => { | ||
| out.insert(attr, AttrField::always(AttrType::Unknown)); | ||
| } | ||
| // Rename's destination name is an expression -> not statically knowable. | ||
| // Drop the source key and mark the schema open (an unknown-named attr appears). | ||
| Method::Rename => { | ||
| out.fields.shift_remove(&attr); | ||
| out.open = true; | ||
| } | ||
| Method::Remove => { | ||
| out.fields.shift_remove(&attr); | ||
| } | ||
| } |
Comment on lines
+360
to
+363
| assert_eq!( | ||
| schema.fields.get(&Attribute::new("foo".to_string())), | ||
| Some(&AttrField::always(AttrType::Unknown)) | ||
| ); |
Comment on lines
+423
to
+425
| assert!(!schema.fields.contains_key(&Attribute::new("a".to_string()))); | ||
| assert!(schema.open); | ||
| } |
Comment on lines
+86
to
+92
| let joined = join_all_inputs(&inputs); | ||
| factory | ||
| .output_ports() | ||
| .into_iter() | ||
| .map(|p| (p, joined.clone())) | ||
| .collect() | ||
| } |
Comment on lines
+153
to
+159
| let joined = join_all_inputs(&inputs); | ||
| factory | ||
| .output_ports() | ||
| .into_iter() | ||
| .map(|p| (p, joined.clone())) | ||
| .collect() | ||
| } |
Comment on lines
+102
to
+106
| async fn read_features( | ||
| mut source: Box<dyn crate::node::Source>, | ||
| ctx: NodeContext, | ||
| sample_size: usize, | ||
| ) -> Result<Vec<Feature>, String> { |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Overview
Adds a static
schemaCLI command to the engine that exposes the available feature attributes at each node of a workflow, so the UI/user can know what attributes exist and target them in expressions (e.g.attributes["myAttribute"]). Source readers are sampled (a light, bounded read of their dataset) to discover real attributes; downstream transforms (AttributeManager, etc.) propagate on top. Output is JSON for a server/UI to consume.What I've done
reearth-flow-types):AttrSchema(ordered fields +openflag),AttrType,AttrField(ty+Presence::{Always,Maybe}), and ajoinlattice for multi-edge fan-in. Plus a serdeSchemaReportJSON DTO (orderedfieldsarray,version, per-nodenote).runtime/runtime/src/schema_sample.rs): runs a source reader briefly against a bounded channel, unions the first N features (default 10;--sample-size 0= all) into a closed, typedAttrSchema. Per-source failures degrade toopen+ anote; never panics. No processors/sinks run, no sink writes.schema_infer.rs):infer_with_samplingseeds source nodes from samples, then propagates per-port schemas through the DAG in topological order (cycle-detected). Transfer functions forAttributeManager,AttributeMapper,StatisticsCalculator,DateTimeConverter,FeatureFilter; unknown processors pass through.schemaCLI command:reearth-flow schema --workflow <path|-> [--var K=V] [--sample-size N]→ prints{ version, sampleSize, nodes: { id: { name, ports: { port: { open, fields[] } }, note? } } }to stdout (logs to stderr). Supports!includeexpansion and--var.What I haven't done
build/checkvalidator (command,referenced_input_attributes,Diagnostic/Severity,AttrRef) — it wasn't independently useful; the reusable schema engine was retained.schemaJSON is their contract (separate specs to follow).Unknownfor now (Rhai is being replaced). Sampling bounds features processed, not always bytes read (whole-file readers still read the file).How I tested
AttrSchemajoin lattice, the serde DTO JSON shape,union_featuresedge cases (Maybe presence, type-conflict→Unknown, first-seen ordering), each transfer function.schema_sampleagainst a real GeoJSON read (tempfile); theschemacommand end-to-end onGeoJsonReader → AttributeManager(remove)— asserts the reader exposes real keys and the removed key is absent downstream while others survive.quality-check/bldgworkflow (87 nodes,!includes): emits valid JSON, no panic.cargo make check,format --check,clippy -D warnings,format-taplo --check,check-schema(0 drift),check-generate-examples-cms-workflow(0 drift),test-rs— all green. Engine version bumped to0.0.377.Screenshot
Which point I want you to review particularly
schema_sample.rs): running a source reader on a current-thread tokio runtime over a bounded mpsc channel, then dropping the receiver to stop it. No-panic + no-deadlock are the key invariants (reviewed).open+notefallback semantics: sources without a resolvable dataset (or expression-driven sources) reportopen: true+ a note rather than failing — so the editor flow degrades gracefully.Memo
Engine foundation only.
schemais additive and does not affectrun/dot. Branch based onorigin/main.