Skip to content

docs(perps/agentic): eval-ref vocabulary rename, HUD propagation, flow system#27775

Merged
abretonc7s merged 35 commits into
mainfrom
feat/perps/recipe-flows-0321-2009
Mar 23, 2026
Merged

docs(perps/agentic): eval-ref vocabulary rename, HUD propagation, flow system#27775
abretonc7s merged 35 commits into
mainfrom
feat/perps/recipe-flows-0321-2009

Conversation

@abretonc7s
Copy link
Copy Markdown
Contributor

@abretonc7s abretonc7s commented Mar 21, 2026

Description

Adds a deterministic recipe system for agentic validation of the perps feature. Recipes are JSON files that describe step-by-step app interactions (navigate, press, type, assert) — executed against a running app via CDP. They replace manual testing with reproducible, token-efficient automation.

Why recipes?

For agents doing PR review or bug fixing: An agent can generate a recipe that opens a position, sets TP/SL, closes it — and record the execution as video evidence. One JSON file replaces dozens of CDP tool calls, saving inference tokens and producing consistent results.

For developers working with agents on features (recipe-driven development): Write the recipe first — define expected navigation, interactions, and assertions — then implement the feature until the recipe passes. Like TDD but for agent workflows: define the expected output, iterate until green, get a fully reproducible validation artifact. Existing Gherkin scenarios from PRs can be directly migrated to recipes with 1:1 step mapping.

Composability via flow_ref: Common operations (open position, create TP/SL, close position) are packaged as reusable flows. A recipe composes flows instead of repeating raw steps — 4 lines of flow_ref replace 40 lines of navigate/press/wait.

What's in this PR

  • 12 parameterized flowstrade-open-market, trade-close-position, tpsl-create, tpsl-edit, order-limit-place, order-limit-cancel, position-add-margin, market-discovery, market-watchlist, activity-view, select-account, setup-testnet
  • validate-recipe.sh — recipe runner with flow_ref composition, wait_for condition polling, HUD overlay, pre-condition checks, {{param}} templating
  • wait_for action — condition-based polling replacing all blind wait steps (route, testID visibility, or arbitrary expression)
  • testID additions — keypad keys, tab bars, order view buttons, market header — enabling recipe-driven navigation without coordinates
  • AgentStepHud__DEV__-only overlay showing current step ID during recipe execution
  • Team directory structure (teams/<team>/flows/, evals/, recipes/, pre-conditions.js)

Lifecycle recipe (used for validation video)

This recipe composes 4 flows to validate a complete trade lifecycle — the exact JSON that validate-recipe.sh executes (teams/perps/recipes/full-trade-lifecycle.json):

{
  "title": "Full BTC trade lifecycle — mainnet start, testnet switch, open, TP/SL, close",
  "validate": {
    "runtime": {
      "steps": [
        { "id": "nav-wallet-home", "action": "navigate", "target": "WalletView" },
        { "id": "switch-mainnet", "action": "toggle_testnet", "enabled": false },
        { "id": "wait-mainnet", "action": "wait_for",
          "expression": "JSON.stringify({isTestnet:Engine.context.PerpsController.state.isTestnet})",
          "assert": { "operator": "eq", "field": "isTestnet", "value": false } },
        { "id": "nav-perps-home", "action": "navigate", "target": "PerpsHomeView" },
        { "id": "wait-perps-home", "action": "wait_for", "route": "PerpsMarketListView" },
        { "id": "ensure-testnet", "action": "flow_ref", "ref": "setup-testnet" },
        { "id": "verify-provider", "action": "eval_ref", "ref": "providers",
          "assert": { "operator": "contains", "value": "hyperliquid" } },
        { "id": "open-long-btc", "action": "flow_ref", "ref": "trade-open-market",
          "params": { "symbol": "BTC", "side": "long", "usdAmount": "10", "leverage": "2" } },
        { "id": "wait-position", "action": "wait_for", "timeout_ms": 10000,
          "expression": "...",
          "assert": { "operator": "eq", "field": "found", "value": true } },
        { "id": "create-tpsl", "action": "flow_ref", "ref": "tpsl-create",
          "params": { "symbol": "BTC" } },
        { "id": "close-position", "action": "flow_ref", "ref": "trade-close-position",
          "params": { "symbol": "BTC" } },
        { "id": "wait-closed", "action": "wait_for", "timeout_ms": 10000,
          "expression": "...",
          "assert": { "operator": "eq", "field": "found", "value": false } }
      ]
    }
  }
}

Result: 14/14 steps pass, composing 4 nested flows (34 total steps across flows).

Risk profile

Shell scripts + JSON schemas — dev-only tooling, not shipped in production builds. React component changes limited to adding testID props (no logic changes). AgentStepHud.tsx gated behind __DEV__.

Changelog

CHANGELOG entry: null

Related issues

Internal perps agentic tooling — no external ticket.

Manual testing steps

Feature: Agentic recipe validation

  Scenario: Full trade lifecycle via recipe runner
    Given Metro and CDP bridge are connected to a running app
    And wallet is unlocked with testnet-capable account
    When developer runs: bash scripts/perps/agentic/validate-recipe.sh scripts/perps/agentic/teams/perps/recipes/full-trade-lifecycle.json
    Then recipe switches to testnet, opens BTC long, creates TP/SL, closes position
    And all 14 top-level steps pass (composing 4 nested flow_refs)
    And HUD overlay shows current step ID on screen during execution

  Scenario: Dry-run validates JSON without app
    When developer runs with --dry-run flag
    Then all steps are parsed and resolved without executing against the app

Screenshots/Recordings

Before

N/A — new tooling, no prior state.

After

full-trade-lifecycle.mp4

Pre-merge author checklist

Pre-merge reviewer checklist

  • I've manually tested the PR (e.g. pull and build branch, run the app, test code being changed).
  • I confirm that this PR addresses all acceptance criteria described in the ticket it closes and includes the necessary testing evidence such as recordings and or screenshots.

Note

Low Risk
Mostly adds/renames testID props across Keypad, Tabs, and Perps UI plus a __DEV__-only HUD overlay; behavior changes are minimal and largely test/automation-facing.

Overview
Improves agentic/E2E automation hooks by adding stable testIDs across shared UI (Keypad, Tabs) and many Perps surfaces (order rows, TP/SL inputs, limit price sheet, adjust margin actions, transactions tabs, and various touchable rows/buttons), including fixing a mislabeled TAKE_PROFIT_BUTTON selector.

Adds a dev-only AgentStepHud overlay to App (rendered only in __DEV__) to display the currently executing agent step, and updates Perps tests/snapshots to match the new identifiers. Also updates CODEOWNERS for scripts/perps/agentic/teams/perps/ and ignores local .task/ working directories.

Written by Cursor Bugbot for commit a568618. This will update automatically on new commits. Configure here.

- Add testIDs to PerpsTPSLView inputs (tp/sl price fields)
- Add testIDs to PerpsOrderTypeBottomSheet options (market/limit)
- Add testIDs to PerpsAdjustMarginActionSheet options (add/reduce)
- Add testIDs to PerpsOrderDetailsView cancel button
- Add testIDs to PerpsCompactOrderRow (first-row stable ID)
- Extend TabsList/TabsBar to support per-tab tabTestID prop
- Add tabTestID to PerpsMarketTabs tab props

Fix all 10 flow files to use correct testIDs and navigation params:
- activity-view, market-discovery, market-watchlist
- order-limit-cancel (via compact-row → order-details → cancel)
- order-limit-place, position-add-margin
- tpsl-create, tpsl-edit
- trade-close-position, trade-open-market

Remove trailing screenshot steps from all flows — screenshots are
for PR artifacts, not repeated agentic calls (token waste).
Add testIDs to the 4 filter tabs in PerpsTransactionsView
(trades/orders/funding/deposits) and update the activity-view
agentic flow to accept a {{tab}} param so any sub-tab can be
targeted by recipe runners.
@github-actions
Copy link
Copy Markdown
Contributor

CLA Signature Action: All authors have signed the CLA. You may need to manually re-run the blocking PR check if it doesn't pass in a few minutes.

@abretonc7s abretonc7s marked this pull request as ready for review March 21, 2026 13:43
@abretonc7s abretonc7s requested review from a team as code owners March 21, 2026 13:43
Comment thread docs/perps/perps-agentic-feedback-loop.md Outdated
Comment thread app/components/UI/Perps/Perps.testIds.ts
@github-actions github-actions Bot added the risk-medium Moderate testing recommended · Possible bug introduction risk label Mar 21, 2026
@github-actions github-actions Bot added risk-low Low testing needed · Low bug introduction risk and removed risk-medium Moderate testing recommended · Possible bug introduction risk labels Mar 21, 2026
Comment thread scripts/perps/agentic/teams/perps/flows/activity-view.json
@github-actions github-actions Bot added risk-low Low testing needed · Low bug introduction risk and removed risk-low Low testing needed · Low bug introduction risk labels Mar 21, 2026
Comment thread app/components/UI/Perps/Views/PerpsTransactionsView/PerpsTransactionsView.tsx Outdated
…chlist UI flow

- Add FAVORITE_BUTTON testID to PerpsMarketHeaderSelectorsIDs and apply
  both FAVORITE_BUTTON and MORE_BUTTON testIDs to the TouchableOpacity
  elements in PerpsMarketHeader.tsx (previously untestable via CDP)
- Rewrite market-watchlist.json flow to use perps-market-header-favorite-button,
  navigate directly to market detail (remove redundant PerpsTrendingView nav),
  and use {{symbol|BTC}} default syntax for consistency
- Add Recipes vs Flows concept section to flows/README.md
- Add testIDs to Keypad digit/dot buttons for type_keypad flow action support,
  update snapshot accordingly
- Improve order-limit-place and trade-open-market flows with param defaults
  and better step descriptions
- Expand validate-recipe.sh with dry-run support and flow_ref resolution
- Update perps-agentic-feedback-loop.md docs
… watchlist flow pre-condition

PerpsMarketDetailsView was using MORE_BUTTON testID on the star/watchlist
toggle button — a semantic mismatch that caused the market-watchlist flow
press step to target the wrong element. Change to FAVORITE_BUTTON so the
testID matches intent and the flow can locate it reliably.

Add an assert-based pre-condition step to market-watchlist.json that fails
fast when the symbol is already in the watchlist, preventing the toggle
from silently removing instead of adding.

Validated: 9/9 steps pass against live app.
@github-actions github-actions Bot added risk-medium Moderate testing recommended · Possible bug introduction risk and removed risk-low Low testing needed · Low bug introduction risk labels Mar 22, 2026
…ewSelectorsIDs

- activity-view.json: change {{tab}} to {{tab|trades}} so the flow runs
  correctly when invoked directly without params
- README + perps-agentic-feedback-loop.md: document tab param with default
- PerpsTransactionsView: import and use PerpsTransactionsViewSelectorsIDs
  constants instead of inline template strings to eliminate drift risk
… directory

Move flows, recipes, and snippets into the per-team directory alongside
pre-conditions so CODEOWNERS can be set at teams/<team>/ with no
cross-team file edits required.

Changes:
- flows/perps/*.json → teams/perps/flows/
- recipes/perps/{core,setup}.json → teams/perps/recipes/
- recipes/perps.json → teams/perps/snippets.json
- Rename teams/wallet/ → teams/mobile-platform/ (matches CODEOWNERS)
- Delete now-empty flows/ and recipes/ root dirs

Validators updated:
- validate-flow-schema.js: auto-discovers from teams/*/flows/
- validate-recipe.sh: integrates pre-condition checks, adds --hud flag
- cdp-bridge.js recipe command: reads from teams/<team>/snippets.json
  and teams/<team>/recipes/<file>.json (external API unchanged)

Infrastructure added:
- lib/assert.js, lib/registry.js: shared pre-condition evaluation
- validate-pre-conditions.js, validate-flow-schema.js: offline validators
- teams/README.md: updated contribution guide

CODEOWNERS: add explicit teams/perps/ entry for @MetaMask/perps
Add a dev-only HUD that renders the current recipe step name as an
overlay during agentic recipe execution. Wired via a callback registry
so the HUD component stays decoupled from AgenticService.

- AgentStepHud.tsx: floating overlay showing step id + description
- AgenticService.ts: showStep/hideStep bridge methods + callback registry
- App.tsx: mount <AgentStepHud /> in DEV builds only
@github-actions github-actions Bot added risk-medium Moderate testing recommended · Possible bug introduction risk and removed risk-high Extensive testing required · High bug introduction risk risk-medium Moderate testing recommended · Possible bug introduction risk labels Mar 23, 2026
@abretonc7s abretonc7s removed the skip-sonar-cloud Only used for bypassing sonar cloud when failures are not relevant to the changes. label Mar 23, 2026
tommasini
tommasini previously approved these changes Mar 23, 2026
@abretonc7s abretonc7s enabled auto-merge March 23, 2026 13:46
Comment thread app/component-library/components-temp/Tabs/TabsBar/TabsBar.types.ts Outdated
Renames the `tabTestID` prop to `testID` across TabsBar, TabsList types
and components, and PerpsMarketTabs consumer. Also adds missing
`tab.testID` fallback in the non-scrollable TabsBar branch.
@abretonc7s abretonc7s dismissed stale reviews from michalconsensys and tommasini via a568618 March 23, 2026 15:06
@github-actions github-actions Bot added risk-medium Moderate testing recommended · Possible bug introduction risk and removed risk-medium Moderate testing recommended · Possible bug introduction risk labels Mar 23, 2026
@github-actions
Copy link
Copy Markdown
Contributor

🔍 Smart E2E Test Selection

  • Selected E2E tags: SmokePerps, SmokeWalletPlatform, SmokeRamps, SmokeTrade, SmokeConfirmations, SmokePredictions
  • Selected Performance tags: None (no tests recommended)
  • Risk Level: medium
  • AI Confidence: 85%
click to see 🤖 AI reasoning details

E2E Test Selection:

The PR introduces the agentic testing framework infrastructure with the following key changes:

  1. AgenticService + AgentStepHud (DEV-only): New showStep, hideStep, findFiberByTestId methods added to the __AGENTIC__ bridge, plus a new AgentStepHud overlay component. Both are strictly guarded by __DEV__ checks - no production impact. Added to App.tsx root with {__DEV__ && <AgentStepHud />}.

  2. Perps testID additions (primary change): Extensive testID additions across Perps views and components:

    • PerpsOrderView: LEVERAGE_ROW, LIMIT_PRICE_ROW, KEYPAD_25_PCT, KEYPAD_50_PCT, KEYPAD_MAX, KEYPAD_DONE
    • PerpsTPSLView: TAKE_PROFIT_PRICE_INPUT, STOP_LOSS_PRICE_INPUT
    • PerpsAdjustMarginView: CONFIRM_BUTTON, DONE_BUTTON
    • PerpsMarketHeader: FAVORITE_BUTTON (also fixed MORE_BUTTON assignment)
    • PerpsLimitPriceBottomSheet: PRICE_DISPLAY, CONFIRM_BUTTON
    • PerpsOrderTypeBottomSheet: MARKET_OPTION, LIMIT_OPTION
    • PerpsAdjustMarginActionSheet: ADD_MARGIN_OPTION, REDUCE_MARGIN_OPTION
    • PerpsTransactionsView: TAB_TRADES, TAB_ORDERS, TAB_FUNDING, TAB_DEPOSITS
    • PerpsOrderDetailsView: CANCEL_BUTTON
    • PerpsMarketDetailsView: Uses FAVORITE_BUTTON (was incorrectly using MORE_BUTTON)
    • PerpsMarketTabs: POSITION_TAB, ORDERS_TAB, STATISTICS_TAB
    • BUG FIX: TAKE_PROFIT_BUTTON was incorrectly set to 'perps-order-view-stop-loss-button', now corrected to 'perps-order-view-take-profit-button'
  3. Keypad testID additions: Added testIDs to all keypad buttons (1-9, 0, dot). The Keypad component is shared across Ramp, Earn, Bridge, Perps, Confirmations, and Predictions flows.

  4. TabsBar/TabsList: Added optional testID prop to TabItem interface - backward compatible, used by PerpsMarketTabs.

  5. Scripts/docs: New agentic scripts and documentation - no app code impact.

Tag selection rationale:

  • SmokePerps: Primary target - extensive Perps component changes including testID fixes that could affect existing E2E tests referencing old testIDs. The TAKE_PROFIT_BUTTON fix is a breaking change for any test using the old incorrect testID.
  • SmokeWalletPlatform: Perps is a section inside Trending tab; TabsBar changes affect shared tab infrastructure.
  • SmokeTrade: Perps is accessed via TradeWalletActions; Keypad used in Bridge/Swap flows.
  • SmokeConfirmations: Required when selecting SmokeTrade; Keypad used in deposit-keyboard and edit-amount-keyboard in confirmations.
  • SmokeRamps: Keypad is used in Ramp BuildQuote views (both Aggregator and Deposit).
  • SmokePredictions: Keypad used in PredictKeypad; Predictions is in Trending tab (SmokeWalletPlatform dependency).

Performance Test Selection:
The changes are primarily additive testID additions to existing components and a DEV-only overlay (AgentStepHud). No changes to rendering logic, data fetching, state management, or component structure that would impact performance. The testID props are lightweight string attributes with negligible performance impact. The AgentStepHud is strictly DEV-only and never renders in production builds. No performance tests are warranted.

View GitHub Actions results

Copy link
Copy Markdown
Contributor

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

Comment thread app/components/Nav/App/App.tsx
@sonarqubecloud
Copy link
Copy Markdown

Copy link
Copy Markdown
Contributor

@georgewrmarshall georgewrmarshall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! We should definitely have testID props for all of our components.

@github-actions
Copy link
Copy Markdown
Contributor

E2E Fixture Validation — Schema is up to date
17 value mismatches detected (expected — fixture represents an existing user).
View details

@abretonc7s abretonc7s added this pull request to the merge queue Mar 23, 2026
Merged via the queue into main with commit 4c5ffae Mar 23, 2026
124 of 161 checks passed
@abretonc7s abretonc7s deleted the feat/perps/recipe-flows-0321-2009 branch March 23, 2026 16:16
@github-actions github-actions Bot locked and limited conversation to collaborators Mar 23, 2026
@metamaskbot metamaskbot added the release-7.72.0 Issue or pull request that will be included in release 7.72.0 label Mar 23, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

release-7.72.0 Issue or pull request that will be included in release 7.72.0 risk-medium Moderate testing recommended · Possible bug introduction risk size-XL team-perps Perps team type-feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants