feat: add cross-harness A/B testing skill and checker by bfullam · Pull Request #26989 · MetaMask/metamask-mobile

bfullam · 2026-03-04T12:59:16Z

Description

This PR introduces a standard for implementing A/B tests across agent harnesses.

Changes included:

Added a canonical cross-harness A/B testing skill at .agents/skills/ab-testing-implementation/.
Updated A/B test guidance to make docs/ab-testing.md the SSOT for both humans and agents.
Added thin command wrappers for Cursor and Claude (/create-ab-test) that delegate to the canonical instructions.
Added and documented a standalone compliance script for agent/dev usage:
- .agents/skills/ab-testing-implementation/scripts/check-ab-testing-compliance.sh
Updated AGENTS.md and docs/ab-testing.md to point to the canonical skill entrypoint.

Design choice for v1:

Encourage agents to run the compliance checker in-flow.
Do not require checker execution in CI yet.

Changelog

CHANGELOG entry: null

Related issues

Fixes: N/A

Manual testing steps

Feature: A/B testing skill standard for agents

  Scenario: Agent guidance and checker behavior are available and valid
    Given the branch with this PR checked out

    When opening docs/ab-testing.md and the /create-ab-test wrappers
    Then the canonical skill path and wrapper entrypoints are present

    When running bash .agents/skills/ab-testing-implementation/scripts/check-ab-testing-compliance.sh --staged
    Then the script inspects staged files, or falls back to working-tree changes if nothing is staged

    When running the same command with no staged or working-tree changes
    Then the script exits successfully with an explicit no-op message

Screenshots/Recordings

Not applicable (docs/tooling/script changes only).

Before

N/A

After

N/A

Pre-merge author checklist

I've followed MetaMask Contributor Docs and MetaMask Mobile Coding Standards.
I've completed the PR template to the best of my ability
I've included tests if applicable
I've documented my code using JSDoc format if applicable
I've applied the right labels on the PR (see labeling guidelines). Not required for external contributors.

Pre-merge reviewer checklist

I've manually tested the PR (e.g. pull and build branch, run the app, test code being changed).
I confirm that this PR addresses all acceptance criteria described in the ticket it closes and includes the necessary testing evidence such as recordings and or screenshots.

Note

Low Risk
Low risk: changes are documentation/agent harness tooling plus a standalone compliance script with unit tests, with no impact on app runtime behavior. Main risk is false positives/negatives in the checker’s diff-based heuristics.

Overview
Adds a canonical, cross-harness A/B testing “skill” entrypoint (Codex/Claude/Cursor) that points agents to docs/ab-testing.md as the SSOT, plus new /create-ab-test command shims for Claude and Cursor.

Introduces check-ab-testing-compliance.sh, a standalone diff-scanner that fails on new ab_tests payload additions, malformed active_ab_tests items, and inline useABTest calls missing a control variant, and warns on flag naming and risky analytics wiring without test updates.

Expands docs/ab-testing.md and AGENTS.md to document the SSOT workflow, config-module pattern, risk-based testing guidance, and the compliance-check command, and adds Jest coverage for the checker under tests/scripts/.

^{Written by Cursor Bugbot for commit 524fa21. This will update automatically on new commits. Configure here.}

github-actions · 2026-03-04T12:59:27Z

CLA Signature Action: All authors have signed the CLA. You may need to manually re-run the blocking PR check if it doesn't pass in a few minutes.

Copilot

Pull request overview

Introduces a canonical, cross-harness “A/B testing implementation” skill and a standalone compliance checker script to standardize how agents/devs implement and validate A/B tests (use useABTest, emit active_ab_tests, avoid new ab_tests payload additions).

Changes:

Adds .ai/skills/ab-testing-implementation/ with skill entrypoint, playbook reference, and an OpenAI agent descriptor.
Adds /create-ab-test command wrappers for Cursor and Claude that point to the canonical skill/playbook.
Adds a bash compliance checker for staged/worktree (fallback) or explicit-file scans and updates docs (docs/ab-testing.md, AGENTS.md) to reference the new entrypoints.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
docs/ab-testing.md	Adds canonical skill entrypoints, compliance check invocation, and a recommended config-module pattern for A/B tests.
AGENTS.md	Documents the canonical A/B testing skill and harness entrypoints for agent usage.
.cursor/commands/create-ab-test.md	Adds Cursor wrapper command delegating to the canonical playbook/skill.
.claude/commands/create-ab-test.md	Adds Claude wrapper command delegating to the canonical playbook/skill.
.ai/skills/ab-testing-implementation/scripts/check-ab-testing-compliance.sh	Adds the compliance checker that scans diffs/added lines for policy violations and warnings.
.ai/skills/ab-testing-implementation/references/ab-testing-playbook.md	Adds the canonical playbook (workflow, analytics rules, compliance command, output contract).
.ai/skills/ab-testing-implementation/agents/openai.yaml	Adds an OpenAI agent descriptor pointing at `$ab-testing-implementation`.
.ai/skills/ab-testing-implementation/SKILL.md	Adds the canonical skill entrypoint describing required workflow and response contract.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

racitores

great job!

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.}

racitores

LGTM

NicolasMassart

Ideally, everytime we write code, we should have a unit test.
Even when it's a script for a skill.
How do we know it behaves as expected otherwise?

github-actions · 2026-03-06T13:48:08Z

🔍 Smart E2E Test Selection

Selected E2E tags: None (no tests recommended)
Selected Performance tags: None (no tests recommended)
Risk Level: low
AI Confidence: 95%

click to see 🤖 AI reasoning details

E2E Test Selection:
This PR adds A/B testing implementation tooling for AI agents, consisting entirely of:

Documentation files: Agent skill definitions (.agents/, .claude/, .cursor/), AGENTS.md updates, and docs/ab-testing.md updates with best practices and agent execution standards.
Developer tooling: A bash script (check-ab-testing-compliance.sh) for checking A/B testing compliance in code changes.
Unit tests: Jest tests (tests/scripts/check-ab-testing-compliance.test.ts) that test the compliance checker script by creating temporary git repositories and running the script against them.

None of these changes affect:

App source code in app/ directory
UI components or user-facing functionality
Controllers, Engine, or core architecture
E2E test infrastructure (page objects, fixtures, test specs)
Build configuration (metro, babel, detox)
Any runtime behavior of the MetaMask Mobile app

The test file is a unit test for a developer tool script, not an E2E test or test infrastructure change. It runs independently using Jest and does not interact with the app or Detox framework.

No E2E tests are needed because there are no app code changes to validate.

Performance Test Selection:
No performance tests needed. All changes are documentation, developer tooling (bash script), and unit tests for the tooling. There are no changes to app source code, UI components, data loading, state management, or any runtime behavior that could impact app performance.

View GitHub Actions results

sonarqubecloud · 2026-03-06T14:11:28Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

bfullam added 2 commits March 4, 2026 09:49

chore: update docs

83a34d5

feat: add ab testing agent skill and harness wrappers

5a7ef3d

metamaskbot added the team-swaps-and-bridge Swaps and Bridge team label Mar 4, 2026

github-actions Bot added the size-L label Mar 4, 2026

cursor Bot reviewed Mar 4, 2026

View reviewed changes

Comment thread .agents/skills/ab-testing-implementation/scripts/check-ab-testing-compliance.sh

bfullam added 2 commits March 4, 2026 14:33

Implement A/B testing skill standard

7086d6b

feat: account for fresh branches in ab test compliance check

87bed98

bfullam changed the title ~~Document A/B test config module pattern~~ chore: add cross-harness A/B testing skill and checker Mar 4, 2026

cursor Bot reviewed Mar 4, 2026

View reviewed changes

Comment thread .agents/skills/ab-testing-implementation/scripts/check-ab-testing-compliance.sh

bfullam changed the title ~~chore: add cross-harness A/B testing skill and checker~~ feat: add cross-harness A/B testing skill and checker Mar 4, 2026

Harden AB test compliance check

e9bb49d

cursor Bot reviewed Mar 4, 2026

View reviewed changes

Comment thread .agents/skills/ab-testing-implementation/scripts/check-ab-testing-compliance.sh

bfullam added 2 commits March 4, 2026 16:31

Clarify AB test testing rule

a8d10a7

Clarify A/B test testing rules

fc6fbda

cursor Bot reviewed Mar 4, 2026

View reviewed changes

Comment thread .agents/skills/ab-testing-implementation/scripts/check-ab-testing-compliance.sh

Fix AB test compliance checker

4b95ea4

cursor Bot reviewed Mar 4, 2026

View reviewed changes

Comment thread .ai/skills/ab-testing-implementation/scripts/check-ab-testing-compliance.sh Outdated

Align AB test compliance checks

bc3522f

NicolasMassart requested review from NicolasMassart and Copilot March 4, 2026 18:17

Copilot started reviewing on behalf of NicolasMassart March 4, 2026 18:18 View session

Copilot AI reviewed Mar 4, 2026

View reviewed changes

Comment thread .agents/skills/ab-testing-implementation/scripts/check-ab-testing-compliance.sh

Comment thread .agents/skills/ab-testing-implementation/scripts/check-ab-testing-compliance.sh

chore: use ab-testing.md as SSOT and migrate to .agents/

b5dc111

cursor Bot reviewed Mar 5, 2026

View reviewed changes

Comment thread .agents/skills/ab-testing-implementation/scripts/check-ab-testing-compliance.sh

Update ab testing compliance doc

f317482

racitores requested changes Mar 5, 2026

View reviewed changes

Comment thread .agents/skills/ab-testing-implementation/agents/openai.yaml

racitores previously approved these changes Mar 5, 2026

View reviewed changes

Add Claude AB test skill entry

5ad3c06

bfullam dismissed racitores’s stale review via 5ad3c06 March 6, 2026 11:27

bfullam enabled auto-merge March 6, 2026 11:32

cursor Bot reviewed Mar 6, 2026

View reviewed changes

Comment thread .agents/skills/ab-testing-implementation/scripts/check-ab-testing-compliance.sh Outdated

NicolasMassart added team-mobile-platform Mobile Platform team area-devex Issues and PRs focused on developer experience Code Impact - Low Minor code change that can safely applied to the codebase skip-e2e skip E2E test jobs labels Mar 6, 2026

racitores previously approved these changes Mar 6, 2026

View reviewed changes

bfullam added this pull request to the merge queue Mar 6, 2026

bfullam removed this pull request from the merge queue due to a manual request Mar 6, 2026

NicolasMassart reviewed Mar 6, 2026

View reviewed changes

test: add compliance checker unit tests

524fa21

bfullam dismissed racitores’s stale review via 524fa21 March 6, 2026 13:46

bfullam enabled auto-merge March 6, 2026 13:49

NicolasMassart approved these changes Mar 6, 2026

View reviewed changes

bfullam added this pull request to the merge queue Mar 6, 2026

Merged via the queue into main with commit 1fa11e5 Mar 6, 2026
59 checks passed

bfullam deleted the swaps-ab-testing-skill branch March 6, 2026 14:32

github-actions Bot locked and limited conversation to collaborators Mar 6, 2026

metamaskbot added the release-7.70.0 Issue or pull request that will be included in release 7.70.0 label Mar 6, 2026

Uh oh!

Conversation

bfullam commented Mar 4, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changelog

Related issues

Manual testing steps

Screenshots/Recordings

Before

After

Pre-merge author checklist

Pre-merge reviewer checklist

Uh oh!

github-actions Bot commented Mar 4, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

racitores left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

racitores left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

NicolasMassart left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Mar 6, 2026

🔍 Smart E2E Test Selection

Uh oh!

sonarqubecloud Bot commented Mar 6, 2026

Quality Gate passed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

bfullam commented Mar 4, 2026 •

edited by cursor Bot

Loading