Skip to content

Commit afb03e0

Browse files
authored
Automated drift remediation pipeline (#41)
## Summary - Add `scripts/drift-report-collector.ts` that runs drift tests, parses vitest JSON output, and produces structured `drift-report.json` with provider-to-file mappings - Add `scripts/fix-drift.ts` that reads the drift report, constructs a prompt, invokes Claude Code CLI to auto-fix builders, and creates a PR or GitHub issue - Add `.github/workflows/fix-drift.yml` that triggers on drift test failure (or manual dispatch) to run the full collect→fix→verify→PR pipeline - Update `.github/workflows/test-drift.yml` to use the collector script and upload drift report artifacts - Add `serializeDiffsAsJSON()` export to `schema.ts` for structured drift serialization - Add `@types/node` and `tsx` dev dependencies for running TypeScript scripts in CI - Document the automated workflow in `DRIFT.md` and `CLAUDE.md` ## Test plan - [x] All 533 existing tests pass - [ ] Manually trigger `fix-drift.yml` via `workflow_dispatch` to verify end-to-end - [ ] Verify drift-report.json artifact appears on test-drift workflow runs - [ ] Introduce intentional drift (e.g., remove `refusal: null`) and verify the pipeline detects and fixes it 🤖 Generated with [Claude Code](https://claude.com/claude-code)
2 parents 5e47385 + 4b6f190 commit afb03e0

17 files changed

Lines changed: 2697 additions & 64 deletions

.gitattributes

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,3 +6,4 @@
66
*.mp4 filter=lfs diff=lfs merge=lfs -text
77
*.webm filter=lfs diff=lfs merge=lfs -text
88
*.svg filter=lfs diff=lfs merge=lfs -text
9+
docs/favicon.svg !filter !diff !merge

.github/workflows/fix-drift.yml

Lines changed: 128 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,128 @@
1+
name: Fix Drift
2+
on:
3+
workflow_dispatch:
4+
workflow_run:
5+
workflows: ["Drift Tests"]
6+
types: [completed]
7+
branches: [main]
8+
9+
concurrency:
10+
group: drift-fix
11+
cancel-in-progress: false
12+
13+
jobs:
14+
fix:
15+
if: >-
16+
github.event_name == 'workflow_dispatch' ||
17+
github.event.workflow_run.conclusion == 'failure'
18+
runs-on: ubuntu-latest
19+
timeout-minutes: 30
20+
permissions:
21+
contents: write
22+
pull-requests: write
23+
issues: write
24+
steps:
25+
- uses: actions/checkout@v4
26+
- uses: pnpm/action-setup@v4
27+
- uses: actions/setup-node@v4
28+
with:
29+
node-version: 22
30+
cache: pnpm
31+
- run: pnpm install --frozen-lockfile
32+
33+
# Step 0: Configure git identity and create fix branch
34+
- name: Configure git
35+
run: |
36+
git config user.name "llmock-drift-bot"
37+
git config user.email "drift-bot@copilotkit.ai"
38+
git checkout -B fix/drift-$(date +%Y-%m-%d)-${{ github.run_id }}
39+
40+
# Step 1: Detect drift and produce report
41+
- name: Collect drift report
42+
id: detect
43+
run: |
44+
set +e
45+
npx tsx scripts/drift-report-collector.ts
46+
EXIT_CODE=$?
47+
set -e
48+
echo "exit_code=$EXIT_CODE" >> $GITHUB_OUTPUT
49+
if [ "$EXIT_CODE" -eq 2 ]; then
50+
: # critical drift found, continue
51+
elif [ "$EXIT_CODE" -ne 0 ]; then
52+
echo "::error::Collector script crashed with exit code $EXIT_CODE"
53+
exit $EXIT_CODE
54+
fi
55+
env:
56+
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
57+
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
58+
GOOGLE_API_KEY: ${{ secrets.GOOGLE_API_KEY }}
59+
60+
# Always upload the report as an artifact
61+
- name: Upload drift report
62+
if: always()
63+
uses: actions/upload-artifact@v4
64+
with:
65+
name: drift-report
66+
path: drift-report.json
67+
if-no-files-found: warn
68+
retention-days: 30
69+
70+
# Step 2: Exit if no critical drift
71+
- name: Check for critical diffs
72+
id: check
73+
env:
74+
DETECT_EXIT_CODE: ${{ steps.detect.outputs.exit_code }}
75+
run: |
76+
if [ "$DETECT_EXIT_CODE" = "2" ]; then
77+
echo "skip=false" >> $GITHUB_OUTPUT
78+
echo "Critical drift detected"
79+
else
80+
echo "skip=true" >> $GITHUB_OUTPUT
81+
echo "No critical drift detected (exit code: $DETECT_EXIT_CODE) — skipping fix"
82+
fi
83+
84+
# Step 3: Invoke Claude Code to fix
85+
- name: Auto-fix drift
86+
if: steps.check.outputs.skip != 'true'
87+
run: npx tsx scripts/fix-drift.ts
88+
env:
89+
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
90+
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
91+
GOOGLE_API_KEY: ${{ secrets.GOOGLE_API_KEY }}
92+
93+
# Upload Claude Code output for debugging
94+
- name: Upload Claude Code logs
95+
if: always()
96+
uses: actions/upload-artifact@v4
97+
with:
98+
name: claude-code-output
99+
path: claude-code-output.log
100+
if-no-files-found: warn
101+
retention-days: 30
102+
103+
# Step 4: Verify fix independently
104+
- name: Verify conformance
105+
if: steps.check.outputs.skip != 'true'
106+
run: pnpm test
107+
108+
- name: Verify drift resolved
109+
if: steps.check.outputs.skip != 'true'
110+
run: pnpm test:drift
111+
env:
112+
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
113+
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
114+
GOOGLE_API_KEY: ${{ secrets.GOOGLE_API_KEY }}
115+
116+
# Step 5: Create PR on success
117+
- name: Create PR
118+
if: success() && steps.check.outputs.skip != 'true'
119+
run: npx tsx scripts/fix-drift.ts --create-pr
120+
env:
121+
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
122+
123+
# Step 6: Open issue on failure
124+
- name: Create issue on failure
125+
if: failure() && steps.check.outputs.skip != 'true'
126+
run: npx tsx scripts/fix-drift.ts --create-issue
127+
env:
128+
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}

.github/workflows/test-drift.yml

Lines changed: 29 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@ on:
66
jobs:
77
drift:
88
runs-on: ubuntu-latest
9+
timeout-minutes: 15
910
steps:
1011
- uses: actions/checkout@v4
1112
- uses: pnpm/action-setup@v4
@@ -14,8 +15,35 @@ jobs:
1415
node-version: 22
1516
cache: pnpm
1617
- run: pnpm install --frozen-lockfile
17-
- run: pnpm test:drift
18+
19+
- name: Run drift tests
20+
id: drift
21+
run: |
22+
set +e
23+
npx tsx scripts/drift-report-collector.ts
24+
EXIT_CODE=$?
25+
set -e
26+
echo "exit_code=$EXIT_CODE" >> $GITHUB_OUTPUT
27+
if [ "$EXIT_CODE" -eq 2 ]; then
28+
: # critical drift found, continue
29+
elif [ "$EXIT_CODE" -ne 0 ]; then
30+
echo "::error::Collector script crashed with exit code $EXIT_CODE"
31+
exit $EXIT_CODE
32+
fi
1833
env:
1934
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
2035
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
2136
GOOGLE_API_KEY: ${{ secrets.GOOGLE_API_KEY }}
37+
38+
- name: Upload drift report
39+
if: always()
40+
uses: actions/upload-artifact@v4
41+
with:
42+
name: drift-report
43+
path: drift-report.json
44+
if-no-files-found: warn
45+
retention-days: 30
46+
47+
- name: Fail if critical drift detected
48+
if: steps.drift.outputs.exit_code == '2'
49+
run: exit 1

CLAUDE.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,15 @@ entire repo, not just staged files.
3434
- When adding features or fixing bugs, add or update tests
3535
- Run `pnpm test` before pushing
3636

37+
## Drift Remediation
38+
39+
Automated drift remediation lives in `scripts/`:
40+
41+
- `scripts/drift-report-collector.ts` — runs drift tests, produces `drift-report.json`
42+
- `scripts/fix-drift.ts` — reads drift report, invokes Claude Code to fix builders, creates PR or issue
43+
44+
See `DRIFT.md` for full documentation and `.github/workflows/fix-drift.yml` for the CI workflow.
45+
3746
## Commit Messages
3847

3948
- This repo enforces conventional commit prefixes via commitlint: `fix:`, `feat:`, `docs:`, `test:`, `chore:`, `refactor:`, etc.

DRIFT.md

Lines changed: 25 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -106,7 +106,7 @@ When a model is deprecated:
106106

107107
## WebSocket Drift Coverage
108108

109-
In addition to the 19 existing drift tests (16 HTTP response-shape + 3 model deprecation), WebSocket drift tests cover llmock's WS protocols:
109+
In addition to the 19 existing drift tests (16 HTTP response-shape + 3 model deprecation), WebSocket drift tests cover llmock's WS protocols (4 verified + 2 canary = 6 WS tests):
110110

111111
| Protocol | Text | Tool Call | Real Endpoint | Status |
112112
| ------------------- | ---- | --------- | ------------------------------------------------------------------- | ---------- |
@@ -138,6 +138,29 @@ Drift tests run on a schedule:
138138

139139
See `.github/workflows/test-drift.yml`.
140140

141+
## Automated Drift Remediation
142+
143+
When the daily drift test detects critical diffs on the `main` branch, the `fix-drift.yml` workflow runs automatically:
144+
145+
1. **Collect**`scripts/drift-report-collector.ts` runs drift tests and produces a structured `drift-report.json`
146+
2. **Fix**`scripts/fix-drift.ts` (default mode) constructs a prompt from the report and invokes Claude Code to fix the builders
147+
3. **Verify** — Independent `pnpm test` and `pnpm test:drift` steps confirm the fix works
148+
4. **PR**`scripts/fix-drift.ts --create-pr` stages and commits the changes, bumps the version, and opens a pull request
149+
5. **Issue** (on failure) — `scripts/fix-drift.ts --create-issue` opens a GitHub issue with the drift report and Claude Code output
150+
151+
Steps 2 and 4/5 are separate invocations of `fix-drift.ts` with different modes.
152+
153+
### Artifacts
154+
155+
Both workflows upload artifacts:
156+
157+
- `drift-report.json` — structured drift data (retained 30 days)
158+
- `claude-code-output.log` — Claude Code's reasoning and tool calls (fix workflow only)
159+
160+
### Manual trigger
161+
162+
The fix workflow also supports `workflow_dispatch` for manual runs.
163+
141164
## Cost
142165

143-
~25 API calls per run (16 HTTP response-shape + 3 model listing + 4 WS + 2 canaries) using the cheapest available models (`gpt-4o-mini`, `gpt-4o-mini-realtime-preview`, `claude-haiku-4-5-20251001`, `gemini-2.5-flash`) with 10-100 max tokens each. Under $0.15/week at daily cadence. When Gemini Live text-capable models become available, this will increase to 6 WS calls.
166+
~25 API calls per run (16 HTTP response-shape + 3 model listing + 6 WS including canaries) using the cheapest available models (`gpt-4o-mini`, `gpt-4o-mini-realtime-preview`, `claude-haiku-4-5-20251001`, `gemini-2.5-flash`) with 10-100 max tokens each. Under $0.15/week at daily cadence. When Gemini Live text-capable models become available, the 2 canary tests will become full drift tests, increasing real WS connections from 4 to 6.

docs/favicon.svg

Lines changed: 3 additions & 30 deletions
Loading

package.json

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -63,7 +63,9 @@
6363
"typescript-eslint": "^8.35.1",
6464
"@anthropic-ai/sdk": "^0.78.0",
6565
"@google/generative-ai": "^0.24.0",
66+
"@types/node": "^22.0.0",
6667
"openai": "^4.0.0",
68+
"tsx": "^4.19.0",
6769
"vitest": "^3.2.1"
6870
}
6971
}

0 commit comments

Comments
 (0)