Skip to content

feat: add cross-harness A/B testing skill and checker#26989

Merged
bfullam merged 13 commits into
mainfrom
swaps-ab-testing-skill
Mar 6, 2026
Merged

feat: add cross-harness A/B testing skill and checker#26989
bfullam merged 13 commits into
mainfrom
swaps-ab-testing-skill

Conversation

@bfullam
Copy link
Copy Markdown
Contributor

@bfullam bfullam commented Mar 4, 2026

Description

This PR introduces a standard for implementing A/B tests across agent harnesses.

Changes included:

  • Added a canonical cross-harness A/B testing skill at .agents/skills/ab-testing-implementation/.
  • Updated A/B test guidance to make docs/ab-testing.md the SSOT for both humans and agents.
  • Added thin command wrappers for Cursor and Claude (/create-ab-test) that delegate to the canonical instructions.
  • Added and documented a standalone compliance script for agent/dev usage:
    • .agents/skills/ab-testing-implementation/scripts/check-ab-testing-compliance.sh
  • Updated AGENTS.md and docs/ab-testing.md to point to the canonical skill entrypoint.

Design choice for v1:

  • Encourage agents to run the compliance checker in-flow.
  • Do not require checker execution in CI yet.

Changelog

CHANGELOG entry: null

Related issues

Fixes: N/A

Manual testing steps

Feature: A/B testing skill standard for agents

  Scenario: Agent guidance and checker behavior are available and valid
    Given the branch with this PR checked out

    When opening docs/ab-testing.md and the /create-ab-test wrappers
    Then the canonical skill path and wrapper entrypoints are present

    When running bash .agents/skills/ab-testing-implementation/scripts/check-ab-testing-compliance.sh --staged
    Then the script inspects staged files, or falls back to working-tree changes if nothing is staged

    When running the same command with no staged or working-tree changes
    Then the script exits successfully with an explicit no-op message

Screenshots/Recordings

Not applicable (docs/tooling/script changes only).

Before

N/A

After

N/A

Pre-merge author checklist

Pre-merge reviewer checklist

  • I've manually tested the PR (e.g. pull and build branch, run the app, test code being changed).
  • I confirm that this PR addresses all acceptance criteria described in the ticket it closes and includes the necessary testing evidence such as recordings and or screenshots.

Note

Low Risk
Low risk: changes are documentation/agent harness tooling plus a standalone compliance script with unit tests, with no impact on app runtime behavior. Main risk is false positives/negatives in the checker’s diff-based heuristics.

Overview
Adds a canonical, cross-harness A/B testing “skill” entrypoint (Codex/Claude/Cursor) that points agents to docs/ab-testing.md as the SSOT, plus new /create-ab-test command shims for Claude and Cursor.

Introduces check-ab-testing-compliance.sh, a standalone diff-scanner that fails on new ab_tests payload additions, malformed active_ab_tests items, and inline useABTest calls missing a control variant, and warns on flag naming and risky analytics wiring without test updates.

Expands docs/ab-testing.md and AGENTS.md to document the SSOT workflow, config-module pattern, risk-based testing guidance, and the compliance-check command, and adds Jest coverage for the checker under tests/scripts/.

Written by Cursor Bugbot for commit 524fa21. This will update automatically on new commits. Configure here.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 4, 2026

CLA Signature Action: All authors have signed the CLA. You may need to manually re-run the blocking PR check if it doesn't pass in a few minutes.

@metamaskbot metamaskbot added the team-swaps-and-bridge Swaps and Bridge team label Mar 4, 2026
@github-actions github-actions Bot added the size-L label Mar 4, 2026
@bfullam bfullam changed the title Document A/B test config module pattern chore: add cross-harness A/B testing skill and checker Mar 4, 2026
@bfullam bfullam changed the title chore: add cross-harness A/B testing skill and checker feat: add cross-harness A/B testing skill and checker Mar 4, 2026
Comment thread .ai/skills/ab-testing-implementation/scripts/check-ab-testing-compliance.sh Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Introduces a canonical, cross-harness “A/B testing implementation” skill and a standalone compliance checker script to standardize how agents/devs implement and validate A/B tests (use useABTest, emit active_ab_tests, avoid new ab_tests payload additions).

Changes:

  • Adds .ai/skills/ab-testing-implementation/ with skill entrypoint, playbook reference, and an OpenAI agent descriptor.
  • Adds /create-ab-test command wrappers for Cursor and Claude that point to the canonical skill/playbook.
  • Adds a bash compliance checker for staged/worktree (fallback) or explicit-file scans and updates docs (docs/ab-testing.md, AGENTS.md) to reference the new entrypoints.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
docs/ab-testing.md Adds canonical skill entrypoints, compliance check invocation, and a recommended config-module pattern for A/B tests.
AGENTS.md Documents the canonical A/B testing skill and harness entrypoints for agent usage.
.cursor/commands/create-ab-test.md Adds Cursor wrapper command delegating to the canonical playbook/skill.
.claude/commands/create-ab-test.md Adds Claude wrapper command delegating to the canonical playbook/skill.
.ai/skills/ab-testing-implementation/scripts/check-ab-testing-compliance.sh Adds the compliance checker that scans diffs/added lines for policy violations and warnings.
.ai/skills/ab-testing-implementation/references/ab-testing-playbook.md Adds the canonical playbook (workflow, analytics rules, compliance command, output contract).
.ai/skills/ab-testing-implementation/agents/openai.yaml Adds an OpenAI agent descriptor pointing at $ab-testing-implementation.
.ai/skills/ab-testing-implementation/SKILL.md Adds the canonical skill entrypoint describing required workflow and response contract.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment thread .agents/skills/ab-testing-implementation/agents/openai.yaml
racitores
racitores previously approved these changes Mar 5, 2026
Copy link
Copy Markdown
Contributor

@racitores racitores left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great job!

@bfullam bfullam enabled auto-merge March 6, 2026 11:32
Copy link
Copy Markdown
Contributor

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

Comment thread .agents/skills/ab-testing-implementation/scripts/check-ab-testing-compliance.sh Outdated
@NicolasMassart NicolasMassart added team-mobile-platform Mobile Platform team area-devex Issues and PRs focused on developer experience Code Impact - Low Minor code change that can safely applied to the codebase skip-e2e skip E2E test jobs labels Mar 6, 2026
racitores
racitores previously approved these changes Mar 6, 2026
Copy link
Copy Markdown
Contributor

@racitores racitores left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@bfullam bfullam added this pull request to the merge queue Mar 6, 2026
@bfullam bfullam removed this pull request from the merge queue due to a manual request Mar 6, 2026
Copy link
Copy Markdown
Contributor

@NicolasMassart NicolasMassart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, everytime we write code, we should have a unit test.
Even when it's a script for a skill.
How do we know it behaves as expected otherwise?

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 6, 2026

🔍 Smart E2E Test Selection

  • Selected E2E tags: None (no tests recommended)
  • Selected Performance tags: None (no tests recommended)
  • Risk Level: low
  • AI Confidence: 95%
click to see 🤖 AI reasoning details

E2E Test Selection:
This PR adds A/B testing implementation tooling for AI agents, consisting entirely of:

  1. Documentation files: Agent skill definitions (.agents/, .claude/, .cursor/), AGENTS.md updates, and docs/ab-testing.md updates with best practices and agent execution standards.

  2. Developer tooling: A bash script (check-ab-testing-compliance.sh) for checking A/B testing compliance in code changes.

  3. Unit tests: Jest tests (tests/scripts/check-ab-testing-compliance.test.ts) that test the compliance checker script by creating temporary git repositories and running the script against them.

None of these changes affect:

  • App source code in app/ directory
  • UI components or user-facing functionality
  • Controllers, Engine, or core architecture
  • E2E test infrastructure (page objects, fixtures, test specs)
  • Build configuration (metro, babel, detox)
  • Any runtime behavior of the MetaMask Mobile app

The test file is a unit test for a developer tool script, not an E2E test or test infrastructure change. It runs independently using Jest and does not interact with the app or Detox framework.

No E2E tests are needed because there are no app code changes to validate.

Performance Test Selection:
No performance tests needed. All changes are documentation, developer tooling (bash script), and unit tests for the tooling. There are no changes to app source code, UI components, data loading, state management, or any runtime behavior that could impact app performance.

View GitHub Actions results

@bfullam bfullam enabled auto-merge March 6, 2026 13:49
@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented Mar 6, 2026

@bfullam bfullam added this pull request to the merge queue Mar 6, 2026
Merged via the queue into main with commit 1fa11e5 Mar 6, 2026
59 checks passed
@bfullam bfullam deleted the swaps-ab-testing-skill branch March 6, 2026 14:32
@github-actions github-actions Bot locked and limited conversation to collaborators Mar 6, 2026
@metamaskbot metamaskbot added the release-7.70.0 Issue or pull request that will be included in release 7.70.0 label Mar 6, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-devex Issues and PRs focused on developer experience Code Impact - Low Minor code change that can safely applied to the codebase release-7.70.0 Issue or pull request that will be included in release 7.70.0 size-L skip-e2e skip E2E test jobs team-mobile-platform Mobile Platform team team-swaps-and-bridge Swaps and Bridge team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants