Skip to content

fix(agent): deploy_change pushes branch + opens PR (no more ghost-deploy)#179

Merged
costajohnt merged 3 commits intomainfrom
fix/agent-deploy-push-not-local-merge
Apr 26, 2026
Merged

fix(agent): deploy_change pushes branch + opens PR (no more ghost-deploy)#179
costajohnt merged 3 commits intomainfrom
fix/agent-deploy-push-not-local-merge

Conversation

@costajohnt
Copy link
Copy Markdown
Owner

Summary

The strategy-agent end-to-end audit caught a critical bug that hadn't fired yet because no agent deploy has succeeded (24 trades < 50-trade safety floor, every weekly run has been "no-deploy diagnostic"). The first time the agent tries to deploy:

agent_writes.deploy_change did:

  1. git checkout -b agent/change-<timestamp>
  2. git add <files> && git commit
  3. git checkout main
  4. git merge agent/change-<timestamp> --no-fflocal only — never git push
  5. git branch -d agent/change-<timestamp>

The merge commit lived only on the runner's checkout. When the runner shuts down it's discarded.

Meanwhile the workflow's "Open PR for agent code changes" step (PR #160) checks git diff --cached --quiet — empty, because the agent already committed locally. The step exits deployed=false. No PR is opened. No notification fires. The agent's changelog records status=deployed with a commit_sha pointing at a discarded local merge commit.

Same bug class as the --quiet flag from PR-A2 — never exercised because no deploy has succeeded yet.

Fix

  • scripts/agent_writes.deploy_change: after commit on the feature branch, git push origin <branch> --force-with-lease + gh pr create. On gh pr create failure (re-run same day → PR already exists), recover URL via gh pr view. Persist PR URL to data/agent/last_pr_url.txt for the workflow. Do NOT merge to local main — main is protected; the PR is the deploy gate.
  • .github/workflows/strategy-review.yml:
    • "Run Strategy Agent" step gets GH_TOKEN so the agent's gh CLI calls authenticate.
    • "Open PR for agent code changes" → renamed to "Surface agent deploy PR"; just reads the URL from the agent's persisted file, sets deployed=true|false outputs.
  • Tests (tests/test_agent_writes.py): 3 new in TestDeployChange:
    • test_successful_deploy updated to verify pr_url in result + changelog
    • test_deploy_writes_pr_url_file_for_workflow — file artifact for workflow surfacing
    • test_pr_create_failure_recovers_via_gh_pr_view — re-run-same-day case
    • test_deploy_push_failure_aborts — protected-branch / network-error case
    • Obsolete test_merge_failure removed (no merge step anymore)

Why not auto-merge

main requires 4 status checks (test, lint, type-check, secrets-scan). PR #160 deliberately switched the agent from direct push to PR-based deploys for exactly this reason — bypassing CI to auto-merge would re-introduce the "agent commits unreviewed code to main without CI" risk. Agent's job ends at PR opened; operator (or a future auto-merge rule) reviews and merges.

Verified

$ uv run python -m pytest tests/ -q
2820 passed, 45 deselected, 1 warning in 155.82s

🤖 Generated with Claude Code

costajohnt and others added 3 commits April 25, 2026 20:13
…loy)

Strategy-agent end-to-end audit (post-PR-A1..A12) caught a critical bug
that hadn't fired yet because no agent deploy has succeeded since the
agent was built (24 trades < 50-trade safety floor, so all weekly runs
have been "no-deploy diagnostic"). The first time it tries to deploy:

  agent_writes.deploy_change does:
    1. git checkout -b agent/change-<timestamp>
    2. git add <files> && git commit
    3. git checkout main
    4. git merge agent/change-<timestamp> --no-ff   ← LOCAL only
    5. git branch -d agent/change-<timestamp>

  Step 4's merge commit lives only on the runner's local checkout of
  main. There is NO `git push`. When the runner shuts down, the merge
  commit is discarded.

  Meanwhile the workflow's "Open PR for agent code changes" step
  (added in PR #160) does:
    git add <agent-modifiable-files>
    git diff --cached --quiet  → exit deployed=false if no staged diff

  Because the agent already committed and merged locally, there are no
  staged changes. The workflow exits deployed=false. No PR is opened.
  No notification fires. The agent's changelog records status=deployed
  with a commit_sha that points at the discarded local merge commit.

  Net effect: ghost deploy. Changelog says deployed; live code is
  unchanged. compute_measured_impact 5 days later compares equal-state
  windows, reports no effect. The agent may re-attempt the same change
  on the next eligible cycle (after 14d cooling), perpetually
  ghost-deploying.

Fix: deploy_change does the push and opens the PR itself.
  - After commit on the feature branch, `git push origin <branch>
    --force-with-lease`.
  - `gh pr create` to open the PR; on failure (re-run same day; PR
    already exists), recover URL via `gh pr view`.
  - Persist PR URL to data/agent/last_pr_url.txt for the workflow's
    Telegram step to pick up.
  - DO NOT merge to local main. main is protected — the PR is the
    deploy gate, CI runs on it, merging requires the operator (or an
    auto-merge rule).

Workflow change: the "Open PR for agent code changes" step is renamed
to "Surface agent deploy PR" and rewritten to read the PR URL the
agent persisted. It no longer does its own push or pr-create — that
responsibility now lives entirely in agent_writes.deploy_change. The
"Run Strategy Agent" step gets GH_TOKEN env so the agent's gh CLI
calls authenticate.

Tests: 3 new in TestDeployChange covering the new push+PR path —
successful PR creation, gh pr create fails (PR already exists)
recovery via gh pr view, git push failure aborts cleanly.

The obsolete test_merge_failure was removed (no merge step anymore).

Verified locally: 2820 tests pass (was 2818; net +2).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reviewer caught a critical data-flow bug in the workflow: the agent's
`data/agent/last_pr_url.txt` was being wiped by the immediately-prior
"Commit agent state to trading-data branch" step. That step does a
git checkout main → trading-data → main round-trip; the closing
`git checkout main` removes anything under data/agent/ from the
working tree (data/ is gitignored on main; tracked on trading-data).

So the surface step would always find the file missing and exit
deployed=false. Net effect: deploy reaches origin (the actual ghost-
deploy fix works), but the Telegram surface notification is silently
dropped.

Fix: write the URL to $RUNNER_TEMP/agent-pr-url.txt instead of
data/agent/. RUNNER_TEMP is per-run on GHA so there's no cross-run
staleness; falls back to /tmp locally. The workflow's surface step
reads from $RUNNER_TEMP first.

Also added a WARNING log when both `gh pr create` and `gh pr view`
return failure (pr_url ends up empty string) — branch is pushed but
the surface notification will be skipped, so post-mortems can find it.

Test renamed test_deploy_writes_pr_url_file_for_workflow ->
test_deploy_writes_pr_url_file_to_runner_temp and updated to set
RUNNER_TEMP via monkeypatch + assert the file is NOT under data/.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
bandit's B108 hardcoded_tmp_directory flags `/tmp` literals.
tempfile.gettempdir() returns the platform tempdir (still /tmp on
Linux, but discovered properly), avoiding the static-string surface
that bandit's heuristic catches.

The actual prod path on GHA is $RUNNER_TEMP — only the local-dev
fallback changed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@costajohnt costajohnt merged commit 652e9b2 into main Apr 26, 2026
7 checks passed
@costajohnt costajohnt deleted the fix/agent-deploy-push-not-local-merge branch April 26, 2026 03:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant