Skip to content

feat(orchestra): add custom commands with typed arguments#3339

Draft
sidferreira wants to merge 11 commits into
mobile-dev-inc:mainfrom
sidferreira:creates-sub-commands-structure
Draft

feat(orchestra): add custom commands with typed arguments#3339
sidferreira wants to merge 11 commits into
mobile-dev-inc:mainfrom
sidferreira:creates-sub-commands-structure

Conversation

@sidferreira

@sidferreira sidferreira commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Proposed changes

Adds support for declaring custom commands directly in YAML flows. A flow file can declare itself as a reusable, first-class command via a command: + arguments: header in its config block; callers invoke it like a built-in instead of going through runFlow: + env:.

Example — declare:

# subflows/greet.yaml
appId: com.example
command: greet
arguments:
  - { name: who,   type: string,  required: true }
  - { name: count, type: number,  default: 1 }
---
- inputText: "Hello \${who}"

Example — call:

- greet:
    who: world

Highlights:

  • Workspace mode auto-registers any flow with a command: header.
  • Single-file mode (maestro test foo.yaml) auto-discovers commands from <flow parent>/subflows/.
  • Arguments support string / number / boolean with required and default.
  • Parse-time validation: unknown keys, missing required, type mismatch.
  • Custom-command bodies cannot invoke other custom commands (no nesting); enforced explicitly during the workspace pre-pass.
  • A custom-command call desugars at parse time into a RunFlowCommand with a prepended DefineVariablesCommandzero changes to Orchestra's dispatcher.

A cross-cutting fix is bundled: TestCommand.makeChunkPlans was reconstructing ExecutionPlan and silently dropping the discovered registry — even single-file runs go through chunking, so the registry was being lost between plan() and the run. This had no prior visible effect (the registry was always empty before this change) but had to be fixed for the feature to function end-to-end.

Testing

  • New unit tests in YamlCommandReaderTest (header parsing, call expansion, all four validation error types, suggestion-list inclusion).
  • New planner tests in WorkspaceExecutionPlannerTest (workspace-mode discovery, single-file subflows/ discovery, command-definition files excluded from runnable flows, nested-custom-command rejection, malformed-command-file is skipped during discovery).
  • New end-to-end test in maestro-test/IntegrationTest (Case 100) — drives Orchestra with a fake driver and asserts the executed sequence.
  • All pre-existing tests continue to pass (./gradlew :maestro-orchestra-models:test :maestro-orchestra:test :maestro-test:test).
  • Manual end-to-end against an Android emulator — full reproduction steps + sample fixtures: feat(orchestra): add custom commands with typed arguments #3339 (comment)
  • Self code-review (Kotlin best practices) — findings + resolutions: feat(orchestra): add custom commands with typed arguments #3339 (comment)

Does this need e2e tests? — The feature is parser/orchestra-layer and is covered by integration tests under maestro-test. No demo-app change needed.

Issues fixed

N/A

Allow flow files to declare themselves as reusable, first-class commands via
a `command:` + `arguments:` header in the YAML config block. Callers invoke
them like built-ins (e.g. `- greet: { who: "world" }`) instead of going
through `runFlow:` + `env:`.

- Workspace mode auto-registers any flow with a `command:` header
- Single-file mode (`maestro test foo.yaml`) auto-discovers commands from
  `<flow parent>/subflows/`
- Arguments support string/number/boolean with required/default
- Validation happens at parse time (unknown keys, missing required, type
  mismatch)
- Custom-command bodies cannot invoke other custom commands (no nesting),
  enforced explicitly during workspace pre-pass
- A custom-command call desugars at parse time into a `RunFlowCommand` with
  a prepended `DefineVariablesCommand` — no `Orchestra` dispatcher changes
  needed

The key cross-cutting fix is in `TestCommand.makeChunkPlans`, which used to
reconstruct `ExecutionPlan` and silently drop the discovered registry — even
single-file runs went through chunking, so the registry was being thrown
away between `plan()` and the actual run.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@sidferreira

sidferreira commented Jun 4, 2026

Copy link
Copy Markdown
Contributor Author

Self code-review findings (Kotlin best practices + code quality)

Pre-merge findings I'd like reviewers to weigh in on. None are blocking compile/tests, but some are worth addressing before merge. Counts: 0 Critical / 4 Important / 6 Minor.

Update: All 10 findings have been addressed in follow-up commits (91c5fcf1..618192e0). Each fix is a separate, individually-verifiable commit so the history is bisectable.

Important

  1. Files.walk not closedWorkspaceExecutionPlanner.kt (in discoverCustomCommandsFromSubflowsDir and the workspace pre-pass .maestro/commands/ walk). The returned Stream is backed by an open directory handle; leaks an FD per planner invocation. Wrap with .use { ... }. (The same pattern appears in WorkspaceUtils.kt already, so this is a pre-existing convention worth correcting.)
    Fixed in 91c5fcf1.

  2. catch (_: Throwable) is too broadWorkspaceExecutionPlanner.discoverCustomCommands and the nesting-detection loop. Catches OutOfMemoryError, StackOverflowError, CancellationException. Also: a malformed command-definition file is silently dropped from the registry, so callers downstream get a confusing "Invalid Command: did you mean…" suggestion instead of the real syntax error. Narrow to Exception (or specifically IOException / JsonProcessingException / ValidationError) and at minimum log a DEBUG entry when swallowing.
    Fixed in c621af6d — narrowed to Exception, added DEBUG log, added regression test (022_malformed_command_file).

  3. Nesting detection is fragileWorkspaceExecutionPlanner compares RunFlowCommand.sourceDescription (a String) against CustomCommandDef.sourceFile.toString(). Works today because both go through .absolute().toString(), but any future normalization (.normalize(), symlink resolution) silently breaks nesting detection. Recommend adding a typed customCommandName: String? field to RunFlowCommand populated by buildCustomCommandRunFlow, then checking that field instead of string-matching.
    Fixed in 1c758a9a — added typed customCommandName to RunFlowCommand, nesting detection now uses direct map lookup by name.

  4. appId = url ?: _appId ?: "" sentinelYamlConfig.kt. Command-definition files (which legitimately have no app target) end up with an empty-string appId in MaestroConfig. Fine if all consumers use isNotBlank(); latent bug if anyone treats it as a literal bundle ID. Worth either making appId nullable or adding a comment documenting the sentinel.
    Fixed in 618192e0 — branch the init: runnable flows keep the original _appId!! assertion; command-definition files take the explicit "" fallback with a comment explaining why it's safe. Preserves the pre-feature invariant without changing the public type.

Minor

  1. validateAndCoerceArgscall.args.keys - knownNames allocates an extra Set; filterNot { it in knownNames } reads better.
    Fixed in 325f1451.

  2. ✅ Number coercion duplicates the toDoubleOrNull()-for-side-effect pattern in both YamlFlowArgument.coerceDefault and YamlFluentCommand.coerceArg. Extract a requireNumeric(str): String helper.
    Fixed in 0747e9d3.

  3. ✅ Boolean coercion already computes asString; small .also { require(...) } polish opportunity.
    Fixed in 46cf8ba9 — extracted requireBoolean helper to mirror requireNumeric.

  4. TestCommand.kt:559, TestRunner.kt:54, TestSuiteInteractor.kt:161 — fully-qualified maestro.orchestra.CustomCommandDef in signatures. Should be imported at file top per Kotlin convention.
    Fixed in eda09fcb.

  5. MaestroFlowParser.kt:563-566builtInCommandNames (Set) AND isBuiltInCommand(name) are both exposed. Only the latter is used externally — drop one.
    Fixed in 9dc12002 — dropped the unused builtInCommandNames set.

  6. YamlConfig.toCustomCommandArguments() is public but only called from toCommand(...) in the same file — make it internal or private.
    Fixed in b8314570 — narrowed to internal (planner calls it cross-file).

What's well-done

  • @JsonIgnore customCommand + INTERNAL_FLUENT_COMMAND_FIELDS exclusion cleanly extends the existing _sourceInfo pattern to keep the field out of YAML deserialization while still allowing callBy to set it.
  • Token-consumption in parseCustomCommandCall correctly mirrors the built-in command path's END_OBJECT assertion.
  • Argument coercion uses exhaustive when over ArgumentType — no fallthrough risk.

All findings resolved. Each fix is in its own commit (see git log a07efea3..HEAD). Full regression (:maestro-orchestra-models:test :maestro-orchestra:test :maestro-test:test) passes, plus Android-emulator E2E (maestro test samples/screenshot-command-android.yaml) verified after every commit.

sidferreira and others added 10 commits June 4, 2026 17:00
…nner

Files.walk returns a Stream backed by an open directory handle that must
be explicitly closed to release the FD. Wrap with .use {} to release.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Set subtraction (call.args.keys - knownNames) allocates a fresh Set even
in the common (empty) case. filterNot reads more naturally and avoids
the allocation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
YamlFlowArgument.coerceDefault and YamlFluentCommand.coerceArg both
validate a string as numeric via toDoubleOrNull-or-throw and return the
same string back. Extracted into a shared top-level helper.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mirror the requireNumeric extraction: both YamlFlowArgument.coerceDefault
and YamlFluentCommand.coerceArg validate true/false the same way. Pull
the logic into a shared top-level helper.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…line

Three CLI files declared customCommands: Map<String, maestro.orchestra.CustomCommandDef>
with the FQN inline. Import once at the top of each file per Kotlin
convention.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Only isBuiltInCommand() is called externally. Inline the lookup against
the existing private builtInCommands list and remove the public Set.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ternal

Called from YamlConfig.toCommand() and WorkspaceExecutionPlanner — both
in the same module. Drop public visibility.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…scovery

Replace catch (_: Throwable) with catch (e: Exception) in both
discovery and nesting-check loops. Throwable was catching VM errors
(OutOfMemoryError, StackOverflowError). Add DEBUG-level logging so a
silently-skipped file is recoverable from logs.

Adds a fixture+test (022_malformed_command_file) that confirms a
malformed YAML in a subflows/ subdir is skipped without crashing the
planner.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…etection

Replace the implicit String contract (RunFlowCommand.sourceDescription ==
CustomCommandDef.sourceFile.toString()) with an explicit RunFlowCommand
field, populated by buildCustomCommandRunFlow when expanding a custom-
command call. Nesting detection then becomes a direct map lookup by name.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ion files

Before custom commands, init was 'appId = url ?: _appId!!' — the bang-bang
was safe because the guard above enforced at least one of url/_appId was
non-null. Adding command-definition files relaxed that guard, which forced
a unconditional '?: ""' fallback and dropped the assertion for all flows.

Branch on isCommandDefinition instead: runnable flows keep the original
_appId!! assertion; command-definition files take the explicit fallback
path with a comment explaining why "" is safe (those files are excluded
from flowsToRun and their RunFlowCommand has config = null).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@sidferreira

sidferreira commented Jun 5, 2026

Copy link
Copy Markdown
Contributor Author

Manual reproduction — takeScreenshotInFormat custom command

This is the end-to-end example I used while developing/verifying the feature.

Setup

  1. Build & install the CLI from this branch:

    ./gradlew :maestro-cli:installDist -x installMcpViewerDeps -x buildMcpViewer
    rm -rf ~/.maestro/bin ~/.maestro/lib
    cp -r ./maestro-cli/build/install/maestro/bin ~/.maestro/bin
    cp -r ./maestro-cli/build/install/maestro/lib ~/.maestro/lib
    export PATH="$HOME/.maestro/bin:$PATH"
    maestro --version
  2. Download the standard samples (provides the Wikipedia APK + the conventional subflows/ folder):

    maestro download-samples
    cd samples
  3. Start an Android emulator and verify it's attached:

    adb devices
    # Expected: emulator-XXXX  device
  4. Install the Wikipedia app (target appId org.wikipedia):

    adb install wikipedia.apk

Add the two files

subflows/takeScreenshotInFormat.yaml — the custom command definition:

command: takeScreenshotInFormat
arguments:
  - { name: platform, type: string, required: true }
---
- takeScreenshot: "screenshot-from-command--${platform}"

screenshot-command-android.yaml — the caller flow (alongside the existing samples):

appId: org.wikipedia
tags:
  - android
---
- launchApp
- takeScreenshotInFormat:
    platform: android

(iOS variant: appId: org.wikimedia.wikipedia + tags: [ios] — works against an iOS Simulator with the Wikipedia app installed.)

Run

rm -f screenshot-from-command--android.png
maestro test screenshot-command-android.yaml
ls -la screenshot-from-command--android.png      # ~20–100 KB

Expected flow output

 > Flow screenshot-command-android
Launch app "org.wikipedia"... COMPLETED
Run /path/to/samples/subflows/takeScreenshotInFormat.yaml...
  Take screenshot screenshot-from-command--${platform}... COMPLETED
Run /path/to/samples/subflows/takeScreenshotInFormat.yaml... COMPLETED

Note screenshot-from-command--${platform} is shown unevaluated in the step log (it's the raw path: value before interpolation), but the actual file on disk has platform substituted: screenshot-from-command--android.png.

Negative-path sanity check

Drop the platform: line from screenshot-command-android.yaml and re-run — parsing fails at the call site with:

Missing required argument 'platform' for command 'takeScreenshotInFormat'

Filename note

The screenshot writes as .png regardless of the path: value. Orchestra.takeScreenshotCommand unconditionally appends .png — that's existing Maestro behavior, intentionally not touched by this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant