Skip to content

feat(orchestra): typed ArtifactManifest + flat per-flow run-root bundle#3343

Open
proksh wants to merge 39 commits into
feat/orchestra-artifacts-refactorfrom
feat/orchestra-artifact-manifest
Open

feat(orchestra): typed ArtifactManifest + flat per-flow run-root bundle#3343
proksh wants to merge 39 commits into
feat/orchestra-artifacts-refactorfrom
feat/orchestra-artifact-manifest

Conversation

@proksh

@proksh proksh commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Stacked on #3282. Makes Maestro core own a typed, self-documenting artifact manifest, and restructures the CLI's debug output into one per-flow folder per run — so local and cloud emit the same shape and every consumer reads one model instead of re-deriving artifact types from a folder of files.

What this adds

1. Shared model (maestro-orchestra-models) — ArtifactManifest, ArtifactEntry, ArtifactKind, ArtifactFormat, and ArtifactFiles (canonical relative paths). Plain data classes, no serialization annotations — tolerance is the consumer's choice.

2. Self-documenting manifest.json — it carries a $schema pointing at a stable, versioned URL: the hand-written schema served straight from this repo's main branch via GitHub raw (…/mobile-dev-inc/Maestro/main/maestro-orchestra-models/.../manifest.v1.schema.json). The filename carries the schema's major version (schemaVersion), so a future structural change ships as manifest.v2.schema.json beside it and the v1 URL keeps resolving for every manifest already in the wild. The URL is the schema's permanent identity, so the manifest stays self-describing even after it's copied or uploaded away from its run folder — with no per-run schema file to bundle and no hosting to stand up (it's just the in-repo file). Within a version, pointing at main keeps the URL constant while its content tracks additive changes; that's safe because the model ignores unknown fields and a build test fails if any ArtifactKind/ArtifactFormat is missing from the schema, so docs can't drift from code. Trade-off: resolving it needs network — the offline-bundled copy was intentionally dropped. Note: the URL 404s until this work lands on main.

3. Core produces the bundle (maestro-orchestra) — ArtifactsGenerator (an internal OrchestraListener) writes the per-flow bundle into artifactsDir and builds the manifest, exposed on Orchestra.FlowResult.artifactManifest. The run root itself is the zippable bundle — everything core makes sits directly under it, with no intermediate artifacts/ folder:

  • commands.json (per-command metadata) and logs/maestro.log at the run root. Serialized errors are slim — message + debugMessage only; hierarchies and stack traces live in their own artifacts.
  • Typed per-command metadata.artifacts — each command entry in commands.json lists {type, path} objects (run-root-relative; type is an ArtifactKind name), so consumers find per-command artifacts without depending on the folder layout. The manifest stays folder-level — no per-file entries. Empty lists are omitted. Plumbing: a new OrchestraListener.onCommandArtifact(kind, relativePath) hook, dispatched only when a bundle is being produced.
  • Self-describing kindsTAKE_SCREENSHOT (takeScreenshot output → takeScreenshot/), START_SCREEN_RECORDING (startRecording output → startRecording/), SCREENSHOT (step screenshots → screenshots/step-<N>.png), SCREEN_RECORDING (full-run recording), SCREEN_HIERARCHY (per-step hierarchy JSON). No metadata.source tags on these — the kind says it; source remains only where it carries real information (e.g. DEVICE_LOG).
  • Per-step view hierarchy, always on — every executed command writes screen-hierarchy/step-<N>.json and references it from its artifacts; the giant inline hierarchy blob is gone from commands.json (demo run: 246 KB → 4 KB).
  • Unified screenshots — one folder, one naming scheme: with captureStepScreenshots on (worker), every executed command gets screenshots/step-<N>.png; with it off (local CLI), only the failed command does. No special screenshot-❌-*.png convention — the failure shot is the FAILED command's SCREENSHOT artifact in commands.json.
  • Flag-gated capture (off by default; the worker turns them on): captureStepScreenshots (above) and captureScreenRecordingscreen-recording.mp4.
  • When there's no artifact dir (e.g. --continuous), takeScreenshot/startRecording write relative to CWD, as before.

4. CLI writes one folder per flow (maestro-cli) — each flow's output folder is resolved upfront and passed as the single artifactsDir; Orchestra writes the whole bundle straight into it. Single and suite runs both produce <output>/<ts>/<flow>/ (with -shard-N / -N suffixes for shards / name collisions). Replaces the old flat session layout and its temp-staging + copy step.

Layout

<output>/<ts>/<flow>/          # the run root IS the "artifacts" zip = everything core makes
  manifest.json
  commands.json
  logs/
    maestro.log
    …device logs, crash/ANR    # worker/cloud only (see #3346)
  takeScreenshot/…             # takeScreenshot command output
  startRecording/…             # startRecording command output
  screenshots/step-N.png       # step screenshots (all steps when flag on; failed step only when off)
  screen-hierarchy/step-N.json # per-step view hierarchy (always on)
  screen-recording.mp4         # full-run recording (flag-gated: worker on, CLI off)

The manifest uses paths relative to its own folder, so local and cloud manifests look the same.

Behavior change to flag

--test-output-dir, when set, is this artifact folder — it now holds the full bundle (incl. manifest.json), not just screenshots. Its help text is updated. The default location (~/.maestro/tests/<ts>/) is unchanged.

How this was tested

  • Model tests — manifest round-trips, ignores unknown fields, and a build check fails if any artifact kind is missing from the schema.
  • Orchestra tests — manifest entries carry the right dedicated kinds with run-root-relative paths and no redundant source tags; folders register as collections with a file count; step screenshots/recording appear per their flags; hierarchy files are written per executed command (not for skipped), attributed, and commands.json has no inline hierarchy; typed artifacts attribute to the right command (incl. nested/composite dedup) and the key is omitted when empty; serialized errors carry only message + debugMessage; manifest.json carries the stable, versioned $schema URL and no schema file is bundled.
  • Local e2e — ran the built CLI against an iOS simulator (default and worker-flag modes): the takeScreenshot entry carries {"type": "TAKE_SCREENSHOT", "path": "takeScreenshot/checkout.png"}, every executed command references its screen-hierarchy/step-N.json, the failed command references screenshots/step-4.png, all paths resolve, and commands.json shrank 246 KB → 4 KB.
  • CLI tests — per-flow folder naming: slashes sanitized, shard suffix, and same-name collisions.

@linear-code

linear-code Bot commented Jun 9, 2026

Copy link
Copy Markdown

MA-4055

proksh added 2 commits June 9, 2026 22:29
…format

Extract the per-type debug-message handling into printFlowError (a single
when over the MaestroException subtypes) so runSingle reads top-to-bottom,
and restore the original multi-line formatting of aiOutput/updatedEnv that
this branch had collapsed for no reason. Behavior unchanged.
- Remove the cli 'no clobber' test: with different flow names each lands in
  its own folder trivially; same-name clobber is already pinned by the
  numeric-suffix disambiguation test.
- Merge the two null-artifactsDir ArtifactsGenerator tests into one that
  asserts both no files on disk and an empty manifest.
@proksh proksh marked this pull request as ready for review June 9, 2026 17:49
@proksh proksh force-pushed the feat/orchestra-artifact-manifest branch from 4513cc9 to 0655561 Compare June 9, 2026 17:54
proksh added 5 commits June 10, 2026 11:35
… Schema

Agents (and humans) reading a run's manifest.json had no in-band way to
learn what each field or ArtifactKind means. Make the manifest
self-describing:

  - Add a hand-written JSON Schema (manifest.schema.json) describing
    ArtifactManifest, with a description on every field and per-ArtifactKind
    docs via the oneOf/const pattern. Lifted from the model's KDoc.
  - ArtifactsGenerator.onFlowEnd now bundles the schema next to manifest.json
    in each run dir and writes a leading `$schema` pointing at it, so the
    manifest resolves its own schema offline.
  - Centralize both writes in TestOutputWriter (saveManifest injects $schema;
    saveManifestSchema copies the classpath resource).

Drift guard: ArtifactManifestSchemaTest fails the build if any
ArtifactKind/ArtifactFormat value is missing from the schema, so the
hand-written doc can't silently fall out of sync with the enums.

The on-disk manifest now carries `$schema`, an unknown property to the
typed model. No production code deserializes the manifest; the one test
that did used a strict mapper and now decodes tolerantly, matching the
model's documented contract (tolerance is the reader's choice).
proksh added 3 commits June 10, 2026 17:26
Resolve the per-flow folder upfront and pass it as Orchestra's single
artifactsDir; Orchestra writes the bundle + screenshots/ + recordings/ +
manifest straight there. Drops the temp staging dir, copyBundleToFlowDir,
the separate screenshotsDir param, mediaRoot, and the now-vestigial
testOutputDir plumbing. Continuous mode keeps no bundle (artifactsDir null
-> takeScreenshot to CWD).
…shots + full recording (#3348)

feat(orchestra): nest bundle under artifacts/, flag-gated step shots + full recording

Restructure the per-run artifact bundle so everything core writes lives under
an artifacts/ folder (zipped as one unit), while the two separately-served
outputs — per-step screenshots/ and the full-run screen-recording.mp4 — sit at
the run root.

- Rename takeScreenshot output screenshots/ -> artifacts/takeScreenshot/ and
  startRecording output recordings/ -> artifacts/startRecording/ so folders
  match the command that writes them.
- Move commands.json, maestro.log (now under logs/), and the failure screenshot
  under artifacts/.
- Add Orchestra flags captureStepScreenshots / captureScreenRecording (default
  off, so the local CLI bundle is unchanged; the worker turns them on). When on,
  ArtifactsGenerator writes screenshots/step-<seq>.png after each non-failed
  command and records the whole run to screen-recording.mp4.
- Manifest entries use the new relative paths and carry metadata.source to tell
  same-kind entries apart (failure/take_screenshot/step, start_recording/full_run).
- Update assertScreenshot reference lookup and the schema prose accordingly.
proksh added 2 commits June 11, 2026 12:35
…#3349)

Replace the per-run bundled manifest.schema.json with a stable identity: each
manifest.json now sets $schema to the hand-written schema served from this
repo's main branch via GitHub raw —
https://raw.githubusercontent.com/mobile-dev-inc/Maestro/main/maestro-orchestra-models/src/main/resources/maestro/orchestra/manifest.schema.json

A fixed branch keeps the URL constant while its content tracks the latest
schema, which is safe because the model tolerates unknown fields and a test
blocks undocumented artifact kinds. No extra hosting infrastructure is needed,
and the manifest stays self-describing even after it is moved away from its run
folder.

- Drop saveManifestSchema and the per-run schema-file copy; saveManifest writes
  the URL directly.
- Set the schema's own $id to the same URL.
- Keep the in-repo schema resource as the source of truth (it is the file the
  URL serves) and for the schema-coverage test.

Offline resolution is intentionally dropped.
…r' into feat/orchestra-artifact-manifest

# Conflicts:
#	maestro-cli/src/main/java/maestro/cli/report/TestDebugReporter.kt
#	maestro-cli/src/main/java/maestro/cli/runner/TestSuiteInteractor.kt
#	maestro-orchestra/src/main/java/maestro/orchestra/Orchestra.kt
#	maestro-orchestra/src/main/kotlin/maestro/orchestra/debug/ArtifactsGenerator.kt
@proksh proksh changed the title feat: ArtifactsGenerator emits a typed ArtifactManifest feat(orchestra): typed ArtifactManifest + per-flow artifacts/ bundle Jun 11, 2026
proksh added 2 commits June 11, 2026 18:32
…st schema as v1

Drop the intermediate artifacts/ folder — the run root is now itself the
zippable bundle, so commands.json, logs/ (maestro.log + device logs +
crash/ANR), takeScreenshot/, startRecording/, the failure screenshot,
screenshots/, and screen-recording.mp4 all sit directly under it next to
manifest.json.

Version the hand-written schema as manifest.v1.schema.json (file, $id, the
$schema URL embedded in every manifest, and the classpath resource), so a
future structural change ships as manifest.vN beside it while the v1 URL keeps
resolving for manifests already in the wild.
@proksh proksh changed the title feat(orchestra): typed ArtifactManifest + per-flow artifacts/ bundle feat(orchestra): typed ArtifactManifest + flat per-flow run-root bundle Jun 11, 2026
@steviec

steviec commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator

@proksh I worked with Claude to generate this. I'll attach the conversation so you can see how I got there. Please note that this is a first pass conversation, but it immediately picked up on the architectural problems I noticed in this PR on my first scan. If core is going to own the artifacts, it needs to really own them, and not just find them.

The "artifact policy" idea is interesting, but it might not be necessary. If we are only in two modes: record everything vs record a lighter subset, that might be better as a single boolean flag.


Artifact manifest: record, don't scan

The RFC call is right — one typed manifest owned by Orchestra, identical local and cloud. The implementation undermines it: it derives the manifest by scanning the run-root folders at flow end instead of recording artifacts as they're produced. Most of the issues below follow from that.

Problems

  • Three registration paths, no single source of truth: a listener event (only the two command outputs), direct in-generator appends, and an end-of-run disk scan. The folder-level manifest and per-command list can disagree.
  • The scan can't see what core didn't hardcode — pushed/user files and anything a non-core producer adds are invisible.
  • No single "register an artifact" operation, so the manifest is core-sealed; absorbing device logs required re-implementing their capture in core (feat(orchestra): device logs + crash/ANR under logs/ #3346). Doesn't generalize.
  • Capture gating is three inconsistent rules (booleans for screenshots/recording, always-on for device logs), none client-selectable.
  • ArtifactFiles paths are a constant shared between writers and the scanner — that shared agreement is the drift surface.

Fix: a single collector that owns path allocation and the record. Producers go through it; the manifest is its records, the per-command list the same records grouped by command. Two verbs: allocate(kind, producer) → path for what we write (recorded atomically — can't write into the bundle unrecorded), adopt(kind, file, producer) for what's collected from outside (crash dumps, log streams). producer is a field, so worker- vs core-produced is data, not a second code path. Capture becomes one requested-artifact policy. ArtifactFiles becomes the collector's private layout, not a shared contract.

Layering (open decision): core manifest sealed/immutable at flow end; backend wraps it in a run envelope (run id, signed URLs, post-hoc cloud artifacts) that's a superset of the same entry type. Avoids a mutable manifest and the "when is it final" ambiguity.

PR impact: keep the model types, schema, driver seam, and flat layout. Replace the disk scan with collector accumulation; route every writer through allocate/adopt and delete the self-append/scan paths; swap the capture booleans for a policy; add producer to ArtifactEntry without building a separate worker-append path. Net less code than today.


Chat:
2026-06-12-133345-local-command-caveatcaveat-the-messages-below.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants