feat: add cross-harness A/B testing skill and checker#26989
Conversation
|
CLA Signature Action: All authors have signed the CLA. You may need to manually re-run the blocking PR check if it doesn't pass in a few minutes. |
There was a problem hiding this comment.
Pull request overview
Introduces a canonical, cross-harness “A/B testing implementation” skill and a standalone compliance checker script to standardize how agents/devs implement and validate A/B tests (use useABTest, emit active_ab_tests, avoid new ab_tests payload additions).
Changes:
- Adds
.ai/skills/ab-testing-implementation/with skill entrypoint, playbook reference, and an OpenAI agent descriptor. - Adds
/create-ab-testcommand wrappers for Cursor and Claude that point to the canonical skill/playbook. - Adds a bash compliance checker for staged/worktree (fallback) or explicit-file scans and updates docs (
docs/ab-testing.md,AGENTS.md) to reference the new entrypoints.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| docs/ab-testing.md | Adds canonical skill entrypoints, compliance check invocation, and a recommended config-module pattern for A/B tests. |
| AGENTS.md | Documents the canonical A/B testing skill and harness entrypoints for agent usage. |
| .cursor/commands/create-ab-test.md | Adds Cursor wrapper command delegating to the canonical playbook/skill. |
| .claude/commands/create-ab-test.md | Adds Claude wrapper command delegating to the canonical playbook/skill. |
| .ai/skills/ab-testing-implementation/scripts/check-ab-testing-compliance.sh | Adds the compliance checker that scans diffs/added lines for policy violations and warnings. |
| .ai/skills/ab-testing-implementation/references/ab-testing-playbook.md | Adds the canonical playbook (workflow, analytics rules, compliance command, output contract). |
| .ai/skills/ab-testing-implementation/agents/openai.yaml | Adds an OpenAI agent descriptor pointing at $ab-testing-implementation. |
| .ai/skills/ab-testing-implementation/SKILL.md | Adds the canonical skill entrypoint describing required workflow and response contract. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
NicolasMassart
left a comment
There was a problem hiding this comment.
Ideally, everytime we write code, we should have a unit test.
Even when it's a script for a skill.
How do we know it behaves as expected otherwise?
🔍 Smart E2E Test Selection
click to see 🤖 AI reasoning detailsE2E Test Selection:
None of these changes affect:
The test file is a unit test for a developer tool script, not an E2E test or test infrastructure change. It runs independently using Jest and does not interact with the app or Detox framework. No E2E tests are needed because there are no app code changes to validate. Performance Test Selection: |
|



Description
This PR introduces a standard for implementing A/B tests across agent harnesses.
Changes included:
.agents/skills/ab-testing-implementation/.docs/ab-testing.mdthe SSOT for both humans and agents./create-ab-test) that delegate to the canonical instructions..agents/skills/ab-testing-implementation/scripts/check-ab-testing-compliance.shAGENTS.mdanddocs/ab-testing.mdto point to the canonical skill entrypoint.Design choice for v1:
Changelog
CHANGELOG entry: null
Related issues
Fixes: N/A
Manual testing steps
Screenshots/Recordings
Not applicable (docs/tooling/script changes only).
Before
N/A
After
N/A
Pre-merge author checklist
Pre-merge reviewer checklist
Note
Low Risk
Low risk: changes are documentation/agent harness tooling plus a standalone compliance script with unit tests, with no impact on app runtime behavior. Main risk is false positives/negatives in the checker’s diff-based heuristics.
Overview
Adds a canonical, cross-harness A/B testing “skill” entrypoint (Codex/Claude/Cursor) that points agents to
docs/ab-testing.mdas the SSOT, plus new/create-ab-testcommand shims for Claude and Cursor.Introduces
check-ab-testing-compliance.sh, a standalone diff-scanner that fails on newab_testspayload additions, malformedactive_ab_testsitems, and inlineuseABTestcalls missing acontrolvariant, and warns on flag naming and risky analytics wiring without test updates.Expands
docs/ab-testing.mdandAGENTS.mdto document the SSOT workflow, config-module pattern, risk-based testing guidance, and the compliance-check command, and adds Jest coverage for the checker undertests/scripts/.Written by Cursor Bugbot for commit 524fa21. This will update automatically on new commits. Configure here.