Target Workflow: test-coverage-improver.md
Source report: #2233
Estimated cost per run: $0.00 (billing not yet surfaced — tokens tracked)
Total tokens per run: ~2,009K
Cache hit rate: ~49%
LLM turns (requests): 46
Duration: 7.9 min
Current Configuration
| Setting |
Value |
| Tools loaded |
github: toolsets: [default] (~22 tools), bash (6 patterns) |
| Network groups |
github |
| Pre-agent steps |
❌ None |
| Post-agent steps |
❌ None |
| Prompt size |
6,109 chars |
| Input/output ratio |
1,996K input : 13K output |
| Behavior profile |
exploratory, broad tool usage, read_only actuation |
Token breakdown:
- Total input tokens: 1,995,616
- Total output tokens: 13,106
- Cache read tokens: 1,934,570 (49% efficiency)
- Requests: 46
- Average tokens/request: ~43K
The agent is read-only (0 write actions) yet spends 46 requests exploring the codebase, running npm ci, building, and running coverage — all purely deterministic work that could be pre-computed.
Recommendations
1. Pre-compute npm ci, npm run build, and npm run test:coverage in steps:
Estimated savings: ~600–800K tokens/run (~30–40%)
The agent runs the full build + test pipeline itself (at least 10–15 requests just handling shell output from npm ci → npm run build → npm run test:coverage). These are fully deterministic. Using steps: pre-runs them and passes results as structured context.
steps:
- name: Install dependencies
run: npm ci
- name: Build
run: npm run build
- name: Run coverage
run: npm run test:coverage -- --json --outputFile=/tmp/coverage-out.json 2>&1 | tail -5
id: coverage
- name: Read coverage summary
run: cat coverage/coverage-summary.json
id: coverage-summary
Then reference $\{\{ steps.coverage-summary.output }} in the prompt instead of asking the agent to run coverage itself. This eliminates ~10–15 LLM turns of build churn and the agent gets a clean, pre-parsed coverage JSON instead of raw terminal output.
2. Restrict github: toolsets from [default] to [repos]
Estimated savings: ~200–300K tokens/run (~10–15%)
toolsets: [default] loads all ~22 GitHub tools. This workflow only reads repository contents and searches for open PRs. Restrict to the minimal toolsets:
tools:
github:
toolsets: [repos, pull_requests]
repos provides file reading and search; pull_requests provides PR listing for the Phase 0 duplicate check. This removes ~10 unused tool schemas (~500–700 tokens each) from every one of the 46 requests.
Estimated: ~12 tools removed × 600 tokens × 46 requests = ~330K tokens saved.
3. Replace broad bash patterns with targeted allowlist
Estimated savings: ~100–200K tokens/run (~5–10%)
The current bash allowlist (cat:*, ls:*, head:*, tail:*) lets the agent freely explore the entire repository. This encourages broad, unfocused exploration (the behavior_fingerprint confirms tool_breadth: broad). Tighten to specific paths:
tools:
bash:
- "npm run build"
- "npm run test"
- "npm run test:coverage"
- "cat:src/*.test.ts"
- "cat:src/*.ts"
- "cat:tests/**"
- "cat:coverage/coverage-summary.json"
- "cat:jest.config.js"
- "ls:src"
- "ls:tests"
- "head:*"
Removing cat:* globally prevents the agent from reading unrelated files (node_modules contents, lock files, etc.) and forces it to stay focused on the source tree.
4. Pre-load COVERAGE_SUMMARY.md and key file list in the prompt via steps:
Estimated savings: ~100–150K tokens/run (~5–7%)
The prompt says "Check COVERAGE_SUMMARY.md for current coverage metrics" — the agent reads this itself. Use a step to inject the current coverage summary directly:
steps:
- name: Read coverage summary
run: cat COVERAGE_SUMMARY.md
id: coverage-md
- name: List source files needing tests
run: |
echo "Files < 80% coverage:"
cat coverage/coverage-summary.json | node -e "
const d=JSON.parse(require('fs').readFileSync('/dev/stdin','utf8'));
Object.entries(d).filter(([k,v])=>k!=='total'&&v.statements.pct<80)
.forEach(([k,v])=>console.log(k,v.statements.pct+'%'))
"
id: low-coverage
Then add to the prompt:
## Current Coverage Status
$\{\{ steps.coverage-md.output }}
## Files Below 80% Coverage
$\{\{ steps.low-coverage.output }}
This means the agent skips the "read COVERAGE_SUMMARY.md → parse it → think about it" multi-request cycle.
5. Improve cache efficiency with stable prefix ordering
Estimated savings: ~150–200K tokens/run (~7–10%) through better cache reuse
Current cache efficiency is 49%. For Anthropic prefix caching to work well, the most stable content must come first. Restructure the prompt so the static system context (Repository Context, guidelines, test quality criteria) comes before any dynamic content.
Current order:
- System description (stable ✅)
- Coverage baseline table — changes every run ❌
- Task phases
Better order:
- System description (stable ✅)
- Guidelines + test quality criteria (stable ✅)
- "Do Not" rules (stable ✅)
- Coverage baseline table (dynamic, at the end) ✅
Moving the dynamic "Current Coverage Baseline" table to the end of the prompt (or injecting it via steps) would push the stable ~4K-char prefix into cache on every run, potentially raising cache efficiency from 49% to 70%+.
Expected Impact
| Metric |
Current |
Projected |
Savings |
| Total tokens/run |
2,009K |
~900–1,100K |
~45–55% |
| LLM turns/run |
46 |
~20–25 |
-45% |
| Session time |
7.9 min |
~4–5 min |
~40% |
| Cache hit rate |
49% |
~65–70% |
+16–21pp |
Implementation Checklist
Generated by Daily Copilot Token Optimization Advisor · ● 262.9K · ◷
Target Workflow:
test-coverage-improver.mdSource report: #2233
Estimated cost per run: $0.00 (billing not yet surfaced — tokens tracked)
Total tokens per run: ~2,009K
Cache hit rate: ~49%
LLM turns (requests): 46
Duration: 7.9 min
Current Configuration
github: toolsets: [default](~22 tools),bash(6 patterns)githubexploratory,broadtool usage,read_onlyactuationToken breakdown:
The agent is read-only (0 write actions) yet spends 46 requests exploring the codebase, running
npm ci, building, and running coverage — all purely deterministic work that could be pre-computed.Recommendations
1. Pre-compute
npm ci,npm run build, andnpm run test:coverageinsteps:Estimated savings: ~600–800K tokens/run (~30–40%)
The agent runs the full build + test pipeline itself (at least 10–15 requests just handling shell output from
npm ci→npm run build→npm run test:coverage). These are fully deterministic. Usingsteps:pre-runs them and passes results as structured context.Then reference
$\{\{ steps.coverage-summary.output }}in the prompt instead of asking the agent to run coverage itself. This eliminates ~10–15 LLM turns of build churn and the agent gets a clean, pre-parsed coverage JSON instead of raw terminal output.2. Restrict
github:toolsets from[default]to[repos]Estimated savings: ~200–300K tokens/run (~10–15%)
toolsets: [default]loads all ~22 GitHub tools. This workflow only reads repository contents and searches for open PRs. Restrict to the minimal toolsets:reposprovides file reading and search;pull_requestsprovides PR listing for the Phase 0 duplicate check. This removes ~10 unused tool schemas (~500–700 tokens each) from every one of the 46 requests.Estimated: ~12 tools removed × 600 tokens × 46 requests = ~330K tokens saved.
3. Replace broad bash patterns with targeted allowlist
Estimated savings: ~100–200K tokens/run (~5–10%)
The current bash allowlist (
cat:*,ls:*,head:*,tail:*) lets the agent freely explore the entire repository. This encourages broad, unfocused exploration (thebehavior_fingerprintconfirmstool_breadth: broad). Tighten to specific paths:Removing
cat:*globally prevents the agent from reading unrelated files (node_modules contents, lock files, etc.) and forces it to stay focused on the source tree.4. Pre-load COVERAGE_SUMMARY.md and key file list in the prompt via
steps:Estimated savings: ~100–150K tokens/run (~5–7%)
The prompt says "Check COVERAGE_SUMMARY.md for current coverage metrics" — the agent reads this itself. Use a step to inject the current coverage summary directly:
Then add to the prompt:
This means the agent skips the "read COVERAGE_SUMMARY.md → parse it → think about it" multi-request cycle.
5. Improve cache efficiency with stable prefix ordering
Estimated savings: ~150–200K tokens/run (~7–10%) through better cache reuse
Current cache efficiency is 49%. For Anthropic prefix caching to work well, the most stable content must come first. Restructure the prompt so the static system context (Repository Context, guidelines, test quality criteria) comes before any dynamic content.
Current order:
Better order:
Moving the dynamic "Current Coverage Baseline" table to the end of the prompt (or injecting it via steps) would push the stable ~4K-char prefix into cache on every run, potentially raising cache efficiency from 49% to 70%+.
Expected Impact
Implementation Checklist
steps:section withnpm ci,npm run build,npm run test:coveragepre-computationcoverage/coverage-summary.jsonoutput into prompt context viasteps:outputgithub: toolsets: [default]→toolsets: [repos, pull_requests]bash:allowlist fromcat:*/ls:*to path-scoped patternsgh aw compile .github/workflows/test-coverage-improver.md