Skip to content

Add testbot: AI-powered test generation and review response#788

Open
jiaenren wants to merge 6 commits intomainfrom
jiaenr/testbot-simplify
Open

Add testbot: AI-powered test generation and review response#788
jiaenren wants to merge 6 commits intomainfrom
jiaenr/testbot-simplify

Conversation

@jiaenren
Copy link
Copy Markdown
Collaborator

@jiaenren jiaenren commented Apr 2, 2026

Summary

Add testbot, an automated pipeline that identifies low-coverage code via Codecov, generates tests using Claude Code, validates them, and opens PRs for human review. Includes a review-response module that addresses inline PR feedback via /testbot.

Issue #760

Architecture

Test Generation (testbot.yaml)

Codecov API → coverage_targets.py → Claude Code CLI → guardrails → create_pr.py
                                      |         ↑
                                      └─────────┘ (agent retries on test failures)
Stage Component Description
Coverage analysis coverage_targets.py Fetches Codecov report, selects lowest-coverage files
Test generation Claude Code CLI Reads source, writes test files and BUILD entries, runs tests, iterates on failures
Guardrails guardrails.py Filters out any non-test file changes made by Claude
PR creation create_pr.py Creates branch, commits test files, pushes, opens PR with ai-generated label

Claude Code is sandboxed: it can only read files, edit test files, and run test commands. It cannot run git, gh, or modify source code — all git and GitHub operations are in deterministic harness scripts.

Review Response (testbot-respond.yaml)

/testbot comment → respond.py
  ├─ fetch all thread comments (GraphQL)
  ├─ filter: trigger phrase, author, dedup
  ├─ Claude Code CLI: read files, apply fix, run tests
  ├─ guardrails: filter non-test changes
  ├─ respond.py: git commit + push
  ├─ structured reply via --json-schema
  ├─ post inline reply to each thread
  └─ resolve addressed threads (GraphQL)
Feature Description
Trigger Comment starting with /testbot on inline review threads of ai-generated PRs
Thread context Full conversation history (all nested comments) passed to Claude
Structured output --json-schema returns per-comment replies with resolve verdict and commit message
Thread resolution Resolved via GraphQL mutation after fix is applied
Safety Repo-member-only access, test-file-only guardrail, crash recovery, push retry
Dedup Skips threads where the bot already replied and is awaiting human follow-up

Security Boundary

Claude Code Harness scripts
Read source files Yes
Write/edit test files Yes
Run bazel test / pnpm test Yes
Run git commands No create_pr.py, respond.py
Run gh commands No create_pr.py, respond.py
Filter non-test changes guardrails.py

Key Design Decisions

Decision Rationale
Claude Code CLI over raw LLM API Natively reads/writes files, runs tests, iterates on failures — no response parsing or custom retry logic
Harness scripts for git/GitHub Claude Code cannot run git or gh — prevents prompt injection from modifying version control or creating unauthorized PRs
Test-file-only guardrail guardrails.py discards any non-test changes, enforced at commit time regardless of what Claude does
NVIDIA LLM Gateway Routes through inference-api.nvidia.com using existing NVIDIA_NIM_KEY
Codecov API for coverage data Seconds vs 15+ min bazel coverage; leverages existing CI uploads
--json-schema for respond replies Structured per-comment replies with resolve verdicts; 3-tier fallback
/testbot trigger (must be first text in comment) Opt-in — prevents bot-to-bot loops, responds only when asked
Full thread history in prompt Nested reply context so follow-up requests are understood

Configurable Parameters

testbot.yaml (dispatch inputs)

Input Default Description
max_targets 1 Files to target per run
max_uncovered 300 Uncovered lines cap per target (0 = no cap)
max_turns 50 Claude Code agent turns
timeout_minutes 30 Workflow timeout
model aws/anthropic/claude-opus-4-5 LLM model on NVIDIA gateway
dry_run false Generate without creating PR

testbot-respond.yaml (env vars)

Env Var Default Description
TESTBOT_MAX_TURNS 50 Claude Code agent turns
TESTBOT_MAX_RESPONSES 10 Max threads to address per trigger
ANTHROPIC_MODEL aws/anthropic/claude-opus-4-5 LLM model

Testing

Files

src/scripts/testbot/
├── coverage_targets.py    # Codecov API → select low-coverage targets
├── create_pr.py           # Branch, commit, push, open PR
├── guardrails.py          # Test-file-only filter, shared by all scripts
├── respond.py             # Review response: Claude Code CLI + GraphQL
├── TESTBOT_PROMPT.md      # Quality rules and conventions for Claude Code
├── README.md              # Usage instructions
└── tests/                 # 101 unit tests (Bazel + pylint + mypy)
    ├── test_coverage_targets.py
    ├── test_create_pr.py
    ├── test_guardrails.py
    └── test_respond.py

.github/workflows/
├── testbot.yaml           # Scheduled test generation
└── testbot-respond.yaml   # /testbot review response

Checklist

  • I am familiar with the Contributing Guidelines
  • New or existing tests cover these changes
  • The documentation is up to date with these changes

Summary by CodeRabbit

Release Notes

  • New Features

    • Added weekly automated test generation workflow (schedule and parameters customizable)
    • On-demand test generation via GitHub Actions workflow dispatch
    • Automated inline responses to /testbot comments in pull request reviews
    • Automatic PR creation with generated test file changes
  • Documentation

    • Added testbot system documentation and setup guidance

@jiaenren jiaenren requested a review from a team as a code owner April 2, 2026 01:39
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 2, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Introduces a comprehensive "Testbot" system comprising two GitHub Actions workflows, five Python modules, supporting Bazel and test infrastructure, and documentation. The system enables AI-powered test generation from coverage gaps and inline review-comment responses via Claude Code CLI integration with GitHub PR automation and safety guardrails.

Changes

Cohort / File(s) Summary
GitHub Actions Workflows
.github/workflows/testbot.yaml, .github/workflows/testbot-respond.yaml
New workflows: one runs on weekday cron or manual dispatch to select low-coverage files and trigger Claude Code for test generation; the other triggers on pull_request_review_comment to respond to /testbot inline review triggers with fixes and tests.
Test Generation Core
src/scripts/testbot/coverage_targets.py, src/scripts/testbot/create_pr.py, src/scripts/testbot/guardrails.py
New modules: coverage_targets.py fetches Codecov reports and selects targets; create_pr.py creates and pushes test PRs; guardrails.py filters changes to test files and Python BUILD entries only.
Review Response
src/scripts/testbot/respond.py
New 555-line module that queries GitHub GraphQL for actionable review threads, deduplicates and filters by trigger phrase, pipes thread histories to Claude Code, parses structured replies with fallback strategies, commits/pushes changes with retry logic, and posts inline replies with thread resolution.
Documentation & Metadata
src/scripts/testbot/TESTBOT_PROMPT.md, src/scripts/testbot/README.md, src/scripts/testbot/__init__.py, src/scripts/testbot/BUILD
New prompt protocol defining language-specific test discovery and guardrails (no git/gh commands, test-only modifications); README documenting workflows and security boundaries; Bazel build target and module init file.
Test Suite
src/scripts/testbot/tests/BUILD, src/scripts/testbot/tests/test_coverage_targets.py, src/scripts/testbot/tests/test_guardrails.py, src/scripts/testbot/tests/test_create_pr.py, src/scripts/testbot/tests/test_respond.py
Comprehensive test coverage: test_coverage_targets.py validates Codecov integration and range/capping logic; test_guardrails.py tests file classification and git operations; test_create_pr.py validates PR existence checks; test_respond.py tests thread filtering, reply parsing (3-tier fallback), and prompt building.
Documentation Update
AGENTS.md
Added Testbot section referencing scripts/testbot/README.md and listing test scopes in the Tests matrix.

Sequence Diagram(s)

sequenceDiagram
    participant Cron as Cron Schedule
    participant GHA as GitHub Actions<br/>(testbot.yaml)
    participant Codecov as Codecov API
    participant CoverageScript as coverage_targets.py
    participant Claude as Claude Code CLI
    participant BazelPnpm as Bazel/pnpm<br/>(Test Runners)
    participant CreatePR as create_pr.py
    participant GitHub as GitHub<br/>(PR Create)

    Cron->>GHA: Trigger on weekday cron
    GHA->>CoverageScript: Run coverage_targets.py<br/>(select low-coverage files)
    CoverageScript->>Codecov: Fetch coverage report
    Codecov-->>CoverageScript: line_coverage data
    CoverageScript-->>GHA: targets.txt<br/>(filtered candidates)
    GHA->>Claude: npx `@anthropic-ai/claude-code`<br/>(generate/update tests)<br/>with targets & prompt
    Claude->>BazelPnpm: Generate test code<br/>run bazel test/pnpm test
    BazelPnpm-->>Claude: test results
    Claude->>Claude: Iterate on failures<br/>(up to 3 retries)
    Claude-->>GHA: Generated/updated files
    GHA->>CreatePR: python -m testbot.create_pr
    CreatePR->>GitHub: gh pr create<br/>(ai-generated label)
    GitHub-->>CreatePR: PR created/confirmed
    CreatePR-->>GHA: PR URL logged
Loading
sequenceDiagram
    participant User as Reviewer
    participant GHA as GitHub Actions<br/>(testbot-respond.yaml)
    participant GitHub as GitHub GraphQL<br/>(fetch threads)
    participant RespondPy as respond.py<br/>(filtering & orchestration)
    participant Claude as Claude Code CLI<br/>(fix generation)
    participant BazelPnpm as Bazel/pnpm<br/>(Test Runners)
    participant GH_CLI as gh CLI<br/>(post/resolve)

    User->>GitHub: Comment `/testbot` on<br/>inline review thread
    GitHub-->>GHA: pull_request_review_comment trigger
    GHA->>GitHub: Fetch all review threads<br/>(full histories via GraphQL)
    GitHub-->>RespondPy: threads with comments
    RespondPy->>RespondPy: Filter actionable threads:<br/>unresolved, has `/testbot` trigger,<br/>not already bot-handled,<br/>respects max_responses
    RespondPy->>Claude: Single prompt with all<br/>thread histories & trigger context
    Claude->>BazelPnpm: Generate fixes & run tests
    BazelPnpm-->>Claude: test results
    Claude->>Claude: Iterate on failures<br/>(up to max_turns)
    Claude-->>RespondPy: Structured replies<br/>(per-thread fixes, resolve flag)
    RespondPy->>RespondPy: Parse JSON with<br/>fallback strategy
    RespondPy->>RespondPy: Commit & push changes<br/>(with retry on failure)
    RespondPy->>GH_CLI: Post reply to each<br/>review comment (gh)
    GH_CLI->>GitHub: POST inline reply
    RespondPy->>GH_CLI: Resolve thread<br/>(if resolve=true & push OK)
    GH_CLI->>GitHub: Mutate thread to RESOLVED
    GitHub-->>User: Inline reply posted,<br/>thread resolved
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

The review requires careful validation of: complex GraphQL/subprocess orchestration in respond.py (555 lines), multi-stage Codecov/range computation logic, security guardrails enforcing test-only modifications, API integration patterns with fallback strategies, and coordination across workflows, CLI invocations, and git operations. The well-structured test suite aids review but does not eliminate the need to verify logic density and API/concurrency boundaries.

Poem

🐰 Testbot bounds through coverage gaps so wide,
Claude's code quills craft tests with gentle pride,
Review threads sprouted—responses in flight,
Guardrails held firm, no git in sight,
From Codecov's whisper to PR's delight!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 18.75% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely summarizes the main changes: adding testbot, an AI-powered system for test generation and review response.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch jiaenr/testbot-simplify

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 2, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 42.93%. Comparing base (c8bf5bc) to head (0e5aaa1).
⚠️ Report is 3 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #788      +/-   ##
==========================================
+ Coverage   42.68%   42.93%   +0.25%     
==========================================
  Files         203      203              
  Lines       26803    26880      +77     
  Branches     7573     7591      +18     
==========================================
+ Hits        11440    11542     +102     
+ Misses      15253    15229      -24     
+ Partials      110      109       -1     
Flag Coverage Δ
backend 45.21% <ø> (+0.27%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.
see 24 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@jiaenren jiaenren marked this pull request as draft April 2, 2026 01:42
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (7)
src/scripts/testbot/TESTBOT_PROMPT.md (2)

87-89: CLI output testing pattern may not work as expected.

mock_print.call_args_list returns a list of call objects, not strings. Joining str(c) will produce call('text') representations rather than the actual printed content.

A more accurate pattern:

- `output = " ".join(str(c) for c in mock_print.call_args_list)`
+ `output = " ".join(str(args[0]) for args, kwargs in mock_print.call_args_list)`

Or alternatively: "".join(call.args[0] for call in mock_print.call_args_list if call.args)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/scripts/testbot/TESTBOT_PROMPT.md` around lines 87 - 89, The current test
builds output by joining str(c) for c in mock_print.call_args_list which yields
"call(...)" strings; instead iterate mock_print.call_args_list and extract the
first positional arg from each call (call.args[0]) when present, join those
pieces (e.g., with "" or " ") into the output variable, and then use
self.assertIn("expected", output); update references to
builtins.print/mock_print.call_args_list and the output variable accordingly and
guard against calls with no positional args.

62-65: Consider checking for existing PR before generating tests, not just before creating PR.

The prompt instructs Claude to check for existing ai-generated PRs only after successfully generating tests. This means compute resources are spent on test generation even when a PR won't be created. Consider adding guidance to check for open PRs at the start of the process.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/scripts/testbot/TESTBOT_PROMPT.md` around lines 62 - 65, The prompt
currently runs the ai-generated PR check only after generating tests; update the
flow in TESTBOT_PROMPT.md so the existing-PR check (using the gh pr list --label
ai-generated --state open --json number --jq "length" command) happens at the
start of the process, before any test generation step, and change any
instructions referencing "check before creating PR" to "check before generating
tests and before creating PR" so the bot can exit early if an open ai-generated
PR exists.
.github/workflows/testbot.yaml (3)

113-120: Shell quoting concern with $TARGETS expansion.

If TARGETS contains special characters or newlines, the current inline expansion could break. Consider using a heredoc or --input flag if available.

However, since coverage_targets.py produces well-structured markdown output with controlled characters, this is likely safe in practice.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/testbot.yaml around lines 113 - 120, The inline expansion
of $TARGETS in the npx `@anthropic-ai/claude-code` command can break if it
contains newlines or special chars; update the invocation used around
--print/--model "$ANTHROPIC_MODEL" to pass the prompt and Coverage targets via a
safe mechanism (e.g., a heredoc feeding stdin, or using the tool's --input flag
if available) and read in src/scripts/testbot/TESTBOT_PROMPT.md plus the
contents of $TARGETS into that input stream so special characters/newlines are
preserved.

97-107: Add step ID for referencing output in subsequent steps.

If you implement the target validation suggested above, you'll need a step ID here.

       - name: Select coverage targets
+        id: select-targets
         run: |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/testbot.yaml around lines 97 - 107, Add an id to the
"Select coverage targets" step (e.g., id: select_coverage_targets) so later
steps can reference its outputs, and expose the selected targets as a step
output named (for example) targets by writing the contents of
"$RUNNER_TEMP/targets.txt" to the Actions outputs (use the GITHUB_OUTPUT
mechanism to set the output key "targets"). Ensure the step block (the run
script under the step with name "Select coverage targets") is updated to set
that output after creating the targets file.

109-120: Consider validating targets before invoking Claude Code.

If coverage_targets.py returns "No coverage targets found." (the empty case), the workflow still invokes Claude Code, consuming API credits for no useful work.

Proposed fix
       - name: Generate tests
-        if: inputs.dry_run != true
+        if: inputs.dry_run != true && !contains(steps.select-targets.outputs.targets, 'No coverage targets found')
         run: |

Or add a validation step:

      - name: Check targets exist
        id: check-targets
        run: |
          if grep -q "No coverage targets found" "$RUNNER_TEMP/targets.txt"; then
            echo "skip=true" >> "$GITHUB_OUTPUT"
          else
            echo "skip=false" >> "$GITHUB_OUTPUT"
          fi

      - name: Generate tests
        if: inputs.dry_run != true && steps.check-targets.outputs.skip != 'true'
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/testbot.yaml around lines 109 - 120, Add a validation step
that checks the generated targets file for the "No coverage targets found"
sentinel and prevents invoking Claude Code when empty: create a step with id
check-targets (name "Check targets exist") that inspects
"$RUNNER_TEMP/targets.txt" (e.g. grep -q "No coverage targets found" ...) and
writes skip=true/false to $GITHUB_OUTPUT, then change the existing "Generate
tests" step's if condition to include steps.check-targets.outputs.skip != 'true'
(i.e. if: inputs.dry_run != true && steps.check-targets.outputs.skip != 'true')
so the Claude Code run is skipped when there are no targets.
.github/workflows/testbot-respond.yaml (1)

39-44: Consider adding repository input for cross-fork safety.

When checking out a PR branch by ref name only, if a PR comes from a fork with the same branch name as one in the base repo, the checkout could be ambiguous. Adding the repository explicitly ensures the correct branch is checked out.

       - name: Checkout PR branch
         uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
         with:
           ref: ${{ github.event.pull_request.head.ref }}
+          repository: ${{ github.event.pull_request.head.repo.full_name }}
           fetch-depth: 0
           token: ${{ secrets.SVC_OSMO_CI_TOKEN }}

However, since this workflow only runs on PRs with the ai-generated label (which are created by the bot in the same repo), this is low risk.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/testbot-respond.yaml around lines 39 - 44, The checkout
step using actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 should
include the repository input to avoid ambiguity when checking out by ref; add
repository: ${{ github.event.pull_request.head.repo.full_name }} alongside the
existing ref, fetch-depth, and token inputs so the Checkout PR branch step
explicitly targets the PR head repository.
src/scripts/testbot/coverage_targets.py (1)

223-228: Add error handling for missing/malformed totals in the Codecov response.

If the API response lacks totals, files, or coverage keys, this will raise a KeyError. Consider defensive checks.

Proposed fix
     report = fetch_codecov_report(token)
+    totals = report.get("totals")
+    if not totals:
+        logger.error("Codecov response missing 'totals' field")
+        sys.exit(1)
     logger.info(
         "Codecov: %d files, %.1f%% overall coverage",
-        report["totals"]["files"],
-        report["totals"]["coverage"],
+        totals.get("files", 0),
+        totals.get("coverage", 0.0),
     )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/scripts/testbot/coverage_targets.py` around lines 223 - 228, The logger
call assumes report["totals"]["files"] and ["coverage"] exist; add defensive
validation after fetch_codecov_report(token) to ensure report is a dict and
contains totals with files and coverage keys (e.g., check isinstance(report,
dict) and "totals" in report and all(k in report["totals"] for k in
("files","coverage"))), and if missing or malformed, log a clear warning via
logger.warning (including the raw report or a short sanitized excerpt) and fall
back to safe defaults (e.g., files=0, coverage=0.0) or skip the info log path;
update the code around fetch_codecov_report and the logger.info call accordingly
to use the validated values.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/testbot-respond.yaml:
- Around line 73-88: The workflow uses anthropics/claude-code-action@v1 and
wrongly passes max_turns, allowed_tools, and custom_instructions as top-level
inputs; move these into the claude_args input instead by removing the
max_turns/allowed_tools/custom_instructions keys and adding a claude_args
multi-line value that sets the equivalent CLI flags (e.g., --max-turns,
--allowedTools, and --append-system-prompt) with the same values and text,
preserving ANTHROPIC_BASE_URL, ANTHROPIC_MODEL, and GH_TOKEN env entries.

In `@src/scripts/testbot/coverage_targets.py`:
- Line 218: The assignment uses an inline __import__("os") which violates the
rule that imports must be top-level; add a top-level import os (placed
immediately after the module docstring) and change the token assignment in
coverage_targets.py (the token = ... line) to use
os.environ.get("CODECOV_TOKEN", "") instead of
__import__("os").environ.get(...), ensuring no other inline imports remain in
the file.

---

Nitpick comments:
In @.github/workflows/testbot-respond.yaml:
- Around line 39-44: The checkout step using
actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 should include the
repository input to avoid ambiguity when checking out by ref; add repository:
${{ github.event.pull_request.head.repo.full_name }} alongside the existing ref,
fetch-depth, and token inputs so the Checkout PR branch step explicitly targets
the PR head repository.

In @.github/workflows/testbot.yaml:
- Around line 113-120: The inline expansion of $TARGETS in the npx
`@anthropic-ai/claude-code` command can break if it contains newlines or special
chars; update the invocation used around --print/--model "$ANTHROPIC_MODEL" to
pass the prompt and Coverage targets via a safe mechanism (e.g., a heredoc
feeding stdin, or using the tool's --input flag if available) and read in
src/scripts/testbot/TESTBOT_PROMPT.md plus the contents of $TARGETS into that
input stream so special characters/newlines are preserved.
- Around line 97-107: Add an id to the "Select coverage targets" step (e.g., id:
select_coverage_targets) so later steps can reference its outputs, and expose
the selected targets as a step output named (for example) targets by writing the
contents of "$RUNNER_TEMP/targets.txt" to the Actions outputs (use the
GITHUB_OUTPUT mechanism to set the output key "targets"). Ensure the step block
(the run script under the step with name "Select coverage targets") is updated
to set that output after creating the targets file.
- Around line 109-120: Add a validation step that checks the generated targets
file for the "No coverage targets found" sentinel and prevents invoking Claude
Code when empty: create a step with id check-targets (name "Check targets
exist") that inspects "$RUNNER_TEMP/targets.txt" (e.g. grep -q "No coverage
targets found" ...) and writes skip=true/false to $GITHUB_OUTPUT, then change
the existing "Generate tests" step's if condition to include
steps.check-targets.outputs.skip != 'true' (i.e. if: inputs.dry_run != true &&
steps.check-targets.outputs.skip != 'true') so the Claude Code run is skipped
when there are no targets.

In `@src/scripts/testbot/coverage_targets.py`:
- Around line 223-228: The logger call assumes report["totals"]["files"] and
["coverage"] exist; add defensive validation after fetch_codecov_report(token)
to ensure report is a dict and contains totals with files and coverage keys
(e.g., check isinstance(report, dict) and "totals" in report and all(k in
report["totals"] for k in ("files","coverage"))), and if missing or malformed,
log a clear warning via logger.warning (including the raw report or a short
sanitized excerpt) and fall back to safe defaults (e.g., files=0, coverage=0.0)
or skip the info log path; update the code around fetch_codecov_report and the
logger.info call accordingly to use the validated values.

In `@src/scripts/testbot/TESTBOT_PROMPT.md`:
- Around line 87-89: The current test builds output by joining str(c) for c in
mock_print.call_args_list which yields "call(...)" strings; instead iterate
mock_print.call_args_list and extract the first positional arg from each call
(call.args[0]) when present, join those pieces (e.g., with "" or " ") into the
output variable, and then use self.assertIn("expected", output); update
references to builtins.print/mock_print.call_args_list and the output variable
accordingly and guard against calls with no positional args.
- Around line 62-65: The prompt currently runs the ai-generated PR check only
after generating tests; update the flow in TESTBOT_PROMPT.md so the existing-PR
check (using the gh pr list --label ai-generated --state open --json number --jq
"length" command) happens at the start of the process, before any test
generation step, and change any instructions referencing "check before creating
PR" to "check before generating tests and before creating PR" so the bot can
exit early if an open ai-generated PR exists.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 47acc588-3a9b-4aaf-8dbe-63d1426894ac

📥 Commits

Reviewing files that changed from the base of the PR and between c8bf5bc and 9413d5f.

📒 Files selected for processing (4)
  • .github/workflows/testbot-respond.yaml
  • .github/workflows/testbot.yaml
  • src/scripts/testbot/TESTBOT_PROMPT.md
  • src/scripts/testbot/coverage_targets.py

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 2, 2026

📖 Docs preview: https://d3in15bfzp49i0.cloudfront.net/788/index.html

@jiaenren
Copy link
Copy Markdown
Collaborator Author

jiaenren commented Apr 3, 2026

@coderabbitai resolve

@jiaenren
Copy link
Copy Markdown
Collaborator Author

jiaenren commented Apr 3, 2026

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 3, 2026

✅ Actions performed

Full review triggered.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (5)
src/scripts/testbot/tests/BUILD (1)

17-19: Remove duplicate load statement.

The osmo_py_test macro is loaded twice on lines 17 and 19. Remove one of them.

Proposed fix
 load("//bzl:py.bzl", "osmo_py_test")
 
-load("//bzl:py.bzl", "osmo_py_test")
-
 osmo_py_test(
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/scripts/testbot/tests/BUILD` around lines 17 - 19, The BUILD contains a
duplicate load of the osmo_py_test macro; remove one of the identical
load("//bzl:py.bzl", "osmo_py_test") statements so the macro is only loaded once
(keep a single load invocation referencing osmo_py_test and delete the redundant
one).
src/scripts/testbot/README.md (1)

80-85: Add language specifier to fenced code block.

The static analysis tool flagged this code block for missing a language identifier. Since these are example comment texts, use text as the language.

Proposed fix
-```
+```text
 /testbot rename test methods to follow test_<behavior>_<condition> convention
 /testbot add edge case tests for empty input
 /testbot remove the redundant tests for preset labels
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/scripts/testbot/README.md` around lines 80 - 85, The fenced code block in
README.md is missing a language specifier; update the triple-backtick that opens
the example comment block to include the language "text" (change ``` to ```text)
for the block containing "/testbot rename test methods...", ensuring the closing
triple-backticks remain unchanged so the static analysis warning is resolved.
src/scripts/testbot/guardrails.py (1)

66-66: Prefer unpacking over list concatenation.

Use iterable unpacking for cleaner syntax.

Proposed fix
-        subprocess.run(["git", "checkout", "--"] + non_test_files, check=False)
+        subprocess.run(["git", "checkout", "--", *non_test_files], check=False)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/scripts/testbot/guardrails.py` at line 66, The subprocess.run call
currently concatenates lists to pass arguments to git (subprocess.run(["git",
"checkout", "--"] + non_test_files, check=False)); change this to use iterable
unpacking so the args are constructed as ["git", "checkout", "--",
*non_test_files] while preserving the check=False flag and the non_test_files
variable as the unpacked iterable.
src/scripts/testbot/create_pr.py (1)

80-80: Prefer unpacking over list concatenation.

Use iterable unpacking for cleaner syntax.

Proposed fix
-    run(["git", "add"] + changed_files)
+    run(["git", "add", *changed_files])
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/scripts/testbot/create_pr.py` at line 80, Replace the list concatenation
in the subprocess call with iterable unpacking: instead of constructing the args
with ["git", "add"] + changed_files, pass the arguments to run using unpacking
so the call uses ["git", "add", *changed_files] (locate the run(...) call and
the changed_files variable in create_pr.py).
src/scripts/testbot/respond.py (1)

360-363: Consider preserving the .claude/ directory during git checkout as well.

The discard_changes function preserves .claude/ during git clean (line 363) but git checkout -- . (line 362) will still revert any tracked changes in that directory. If .claude/ contains tracked files that shouldn't be reverted, you may want to exclude it from the checkout as well. If .claude/ only contains untracked files, this is fine as-is.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/scripts/testbot/respond.py` around lines 360 - 363, The current
discard_changes() runs "git checkout -- ." which will revert tracked changes
inside .claude/; change it so the checkout excludes that directory (e.g., build
the list of tracked paths excluding ^.claude/ and pass them to git checkout --,
or run: git checkout -- $(git ls-files | grep -v '^.claude/' | xargs) while
handling the empty-list case), update the subprocess.run call in discard_changes
to execute that exclusion command (or equivalent using Python's subprocess + git
ls-files output) so tracked files under .claude/ are not reverted while keeping
the existing git clean --exclude for untracked files.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/scripts/testbot/coverage_targets.py`:
- Around line 145-148: Replace the two-part comparison that checks
MAX_FILE_LINES > 0 and total_lines > MAX_FILE_LINES with a chained comparison to
satisfy pylint R1716: change the condition using the symbols MAX_FILE_LINES and
total_lines to a single chained expression (e.g., 0 < MAX_FILE_LINES <
total_lines) and keep the surrounding logic (the continue) intact so behavior is
unchanged.

In `@src/scripts/testbot/create_pr.py`:
- Around line 104-114: The subprocess.run call that creates the PR (the
invocation assigned to variable result) should include an explicit check=False
parameter since you manually inspect result.returncode; update the
subprocess.run([...], capture_output=True, text=True) call in create_pr.py to
pass check=False so the function won't raise CalledProcessError and the existing
manual error handling (logger.error and sys.exit) remains valid.

In `@src/scripts/testbot/guardrails.py`:
- Around line 35-46: The subprocess.run calls that create diff_result and
untracked_result (the two calls invoking ["git", "diff", "--name-only"] and
["git", "ls-files", "--others", "--exclude-standard"]) must explicitly pass
check=False; update both calls to include check=False so pylint stops warning
while keeping the manual returncode checks and existing logger.error handling.

In `@src/scripts/testbot/respond.py`:
- Around line 103-109: The run_gh function uses subprocess.run without
explicitly setting the check argument, triggering pylint W1510; update the
subprocess.run call inside run_gh to include check=False (keeping the existing
capture_output and text parameters) so the return code continues to be handled
manually and the linter warning is resolved.
- Around line 374-386: The subprocess.run call invoking ["git", "push"] in the
retry loop currently omits an explicit check argument which triggers pylint
W1510; update the subprocess.run invocation inside the for loop (the call in
respond.py that executes git push) to pass check=False explicitly so the code
continues to manually inspect result.returncode and avoid the linter warning
while preserving the existing retry logic and subsequent subprocess.run(["git",
"pull", "--rebase"], check=False) behavior.
- Around line 305-316: pylint flags subprocess.run usage in respond.py (within
the block that assigns result) because check is not explicitly set; update the
subprocess.run call (the one that assigns result) to include check=False so the
call won't raise on non-zero exits while you continue to manually inspect
result.returncode, keeping the existing TimeoutExpired handling and subsequent
return-code check in place.

---

Nitpick comments:
In `@src/scripts/testbot/create_pr.py`:
- Line 80: Replace the list concatenation in the subprocess call with iterable
unpacking: instead of constructing the args with ["git", "add"] + changed_files,
pass the arguments to run using unpacking so the call uses ["git", "add",
*changed_files] (locate the run(...) call and the changed_files variable in
create_pr.py).

In `@src/scripts/testbot/guardrails.py`:
- Line 66: The subprocess.run call currently concatenates lists to pass
arguments to git (subprocess.run(["git", "checkout", "--"] + non_test_files,
check=False)); change this to use iterable unpacking so the args are constructed
as ["git", "checkout", "--", *non_test_files] while preserving the check=False
flag and the non_test_files variable as the unpacked iterable.

In `@src/scripts/testbot/README.md`:
- Around line 80-85: The fenced code block in README.md is missing a language
specifier; update the triple-backtick that opens the example comment block to
include the language "text" (change ``` to ```text) for the block containing
"/testbot rename test methods...", ensuring the closing triple-backticks remain
unchanged so the static analysis warning is resolved.

In `@src/scripts/testbot/respond.py`:
- Around line 360-363: The current discard_changes() runs "git checkout -- ."
which will revert tracked changes inside .claude/; change it so the checkout
excludes that directory (e.g., build the list of tracked paths excluding
^.claude/ and pass them to git checkout --, or run: git checkout -- $(git
ls-files | grep -v '^.claude/' | xargs) while handling the empty-list case),
update the subprocess.run call in discard_changes to execute that exclusion
command (or equivalent using Python's subprocess + git ls-files output) so
tracked files under .claude/ are not reverted while keeping the existing git
clean --exclude for untracked files.

In `@src/scripts/testbot/tests/BUILD`:
- Around line 17-19: The BUILD contains a duplicate load of the osmo_py_test
macro; remove one of the identical load("//bzl:py.bzl", "osmo_py_test")
statements so the macro is only loaded once (keep a single load invocation
referencing osmo_py_test and delete the redundant one).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 968f6c1d-02ab-43a0-889b-e35c53156d6e

📥 Commits

Reviewing files that changed from the base of the PR and between 9413d5f and 38a0e81.

📒 Files selected for processing (16)
  • .github/workflows/testbot-respond.yaml
  • .github/workflows/testbot.yaml
  • src/scripts/testbot/BUILD
  • src/scripts/testbot/README.md
  • src/scripts/testbot/TESTBOT_PROMPT.md
  • src/scripts/testbot/__init__.py
  • src/scripts/testbot/coverage_targets.py
  • src/scripts/testbot/create_pr.py
  • src/scripts/testbot/guardrails.py
  • src/scripts/testbot/respond.py
  • src/scripts/testbot/tests/BUILD
  • src/scripts/testbot/tests/__init__.py
  • src/scripts/testbot/tests/test_coverage_targets.py
  • src/scripts/testbot/tests/test_create_pr.py
  • src/scripts/testbot/tests/test_guardrails.py
  • src/scripts/testbot/tests/test_respond.py
✅ Files skipped from review due to trivial changes (4)
  • src/scripts/testbot/tests/init.py
  • src/scripts/testbot/init.py
  • src/scripts/testbot/BUILD
  • src/scripts/testbot/TESTBOT_PROMPT.md
🚧 Files skipped from review as they are similar to previous changes (1)
  • .github/workflows/testbot-respond.yaml

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 3, 2026

✅ Actions performed

Comments resolved. Approval is disabled; enable reviews.request_changes_workflow to allow explicit top-level @coderabbitai resolve or @coderabbitai approve commands.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (4)
src/scripts/testbot/TESTBOT_PROMPT.md (1)

61-63: Clarify the mock_print assertion pattern.

The pattern " ".join(str(c) for c in mock_print.call_args_list) converts call objects to strings, producing output like call('text') rather than just 'text'. A clearer pattern would extract the actual arguments:

output = " ".join(str(args[0]) for args, kwargs in mock_print.call_args_list)

Or for all positional args:

output = " ".join(" ".join(str(a) for a in args) for args, kwargs in mock_print.call_args_list)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/scripts/testbot/TESTBOT_PROMPT.md` around lines 61 - 63, The current test
pattern builds output by stringifying call objects from
mock_print.call_args_list which yields entries like "call('text')"; change the
aggregation to extract the actual positional arguments from each call (e.g., use
args, kwargs = call and join args or use args[0] if only single-arg prints) when
constructing the output variable used in assertions (mock_print, call_args_list,
output); then assert with self.assertIn("expected", output).
.github/workflows/testbot-respond.yaml (1)

67-73: Consider pinning bazel-contrib/setup-bazel to an immutable SHA digest.

Other actions in this workflow use SHA-pinned versions for reproducibility. The bazel-contrib/setup-bazel@4fd964a13a440a8aeb0be47350db2fc640f19ca8 is already SHA-pinned, which is good. However, adding a version comment (e.g., # v0.x.x) would improve clarity.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/testbot-respond.yaml around lines 67 - 73, The workflow
uses bazel-contrib/setup-bazel pinned to SHA
"4fd964a13a440a8aeb0be47350db2fc640f19ca8"; update the action entry to keep the
SHA pin but add a human-readable version comment (e.g., "# v0.x.x") next to the
uses line for clarity and reproducibility so reviewers can see the intended
release version while retaining the immutable SHA; locate the uses string
"bazel-contrib/setup-bazel@4fd964a13a440a8aeb0be47350db2fc640f19ca8" and append
the version comment on the same line or immediately above it.
src/scripts/testbot/tests/BUILD (1)

17-19: Remove duplicate load statement.

Line 19 duplicates the load statement on line 17. While Bazel tolerates duplicate loads, this should be cleaned up for clarity.

🧹 Remove duplicate load
 load("//bzl:py.bzl", "osmo_py_test")

-load("//bzl:py.bzl", "osmo_py_test")
-
 osmo_py_test(
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/scripts/testbot/tests/BUILD` around lines 17 - 19, Remove the duplicate
load statement by keeping a single occurrence of load("//bzl:py.bzl",
"osmo_py_test") and deleting the repeated identical line; ensure only one
load("//bzl:py.bzl", "osmo_py_test") remains in the BUILD file to avoid
redundancy.
src/scripts/testbot/README.md (1)

80-84: Add a language specifier to the fenced code block.

The code block showing example /testbot commands lacks a language specifier. Consider adding text or bash for consistency with other code blocks in the document.

Proposed fix
-```
+```text
 /testbot rename test methods to follow test_<behavior>_<condition> convention
 /testbot add edge case tests for empty input
 /testbot remove the redundant tests for preset labels
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/scripts/testbot/README.md` around lines 80 - 84, The fenced code block in
the README showing example /testbot commands is missing a language specifier;
update that fenced block (the triple-backtick block containing the lines
starting with "/testbot rename", "/testbot add", "/testbot remove") to include a
language tag such as text or bash (e.g., ```text) so it matches other code
blocks and ensures correct syntax highlighting and rendering.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/scripts/testbot/BUILD`:
- Around line 19-24: Add the new testbot module to the Codebase Structure
section in AGENTS.md: update AGENTS.md to list the "testbot" module (matching
the BUILD target name testbot in src/scripts/testbot/BUILD) under the src/**
modules, include a short description like "testbot: test automation utilities"
and any relevant path or visibility note so the documentation and code change
remain synchronized.

In `@src/scripts/testbot/README.md`:
- Around line 116-135: Update AGENTS.md’s "Codebase Structure" to include the
new testbot component: add an entry for src/scripts/testbot describing its
purpose (automatic test generation and PR response via Claude Code), list the
key modules (coverage_targets.py, create_pr.py, guardrails.py, respond.py,
TESTBOT_PROMPT.md and tests/), and explain how it integrates with the build
system by referencing the GitHub Actions workflows (testbot.yaml and
testbot-respond.yaml) that schedule test generation and trigger review
responses; ensure the language mirrors existing style and placement used for
other major components in AGENTS.md.

---

Nitpick comments:
In @.github/workflows/testbot-respond.yaml:
- Around line 67-73: The workflow uses bazel-contrib/setup-bazel pinned to SHA
"4fd964a13a440a8aeb0be47350db2fc640f19ca8"; update the action entry to keep the
SHA pin but add a human-readable version comment (e.g., "# v0.x.x") next to the
uses line for clarity and reproducibility so reviewers can see the intended
release version while retaining the immutable SHA; locate the uses string
"bazel-contrib/setup-bazel@4fd964a13a440a8aeb0be47350db2fc640f19ca8" and append
the version comment on the same line or immediately above it.

In `@src/scripts/testbot/README.md`:
- Around line 80-84: The fenced code block in the README showing example
/testbot commands is missing a language specifier; update that fenced block (the
triple-backtick block containing the lines starting with "/testbot rename",
"/testbot add", "/testbot remove") to include a language tag such as text or
bash (e.g., ```text) so it matches other code blocks and ensures correct syntax
highlighting and rendering.

In `@src/scripts/testbot/TESTBOT_PROMPT.md`:
- Around line 61-63: The current test pattern builds output by stringifying call
objects from mock_print.call_args_list which yields entries like "call('text')";
change the aggregation to extract the actual positional arguments from each call
(e.g., use args, kwargs = call and join args or use args[0] if only single-arg
prints) when constructing the output variable used in assertions (mock_print,
call_args_list, output); then assert with self.assertIn("expected", output).

In `@src/scripts/testbot/tests/BUILD`:
- Around line 17-19: Remove the duplicate load statement by keeping a single
occurrence of load("//bzl:py.bzl", "osmo_py_test") and deleting the repeated
identical line; ensure only one load("//bzl:py.bzl", "osmo_py_test") remains in
the BUILD file to avoid redundancy.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 3e9dbacc-b545-4833-89ba-d6fcb0e42a14

📥 Commits

Reviewing files that changed from the base of the PR and between 762ea59 and 28d2c73.

📒 Files selected for processing (16)
  • .github/workflows/testbot-respond.yaml
  • .github/workflows/testbot.yaml
  • src/scripts/testbot/BUILD
  • src/scripts/testbot/README.md
  • src/scripts/testbot/TESTBOT_PROMPT.md
  • src/scripts/testbot/__init__.py
  • src/scripts/testbot/coverage_targets.py
  • src/scripts/testbot/create_pr.py
  • src/scripts/testbot/guardrails.py
  • src/scripts/testbot/respond.py
  • src/scripts/testbot/tests/BUILD
  • src/scripts/testbot/tests/__init__.py
  • src/scripts/testbot/tests/test_coverage_targets.py
  • src/scripts/testbot/tests/test_create_pr.py
  • src/scripts/testbot/tests/test_guardrails.py
  • src/scripts/testbot/tests/test_respond.py

jiaenren and others added 4 commits April 2, 2026 20:00
Add coverage_targets.py to fetch Codecov data and select low-coverage
files, TESTBOT_PROMPT.md with quality rules for Claude Code, and a
GitHub Actions workflow that runs Claude Code CLI to generate tests
and create_pr.py to commit and open PRs.

Security: Claude Code is sandboxed to read/edit test files and run
test commands only. guardrails.py discards any non-test file changes.
All git/gh operations are in deterministic harness scripts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add respond.py that fetches review threads via GraphQL, delegates
fixes to Claude Code CLI with --json-schema for structured replies,
posts inline replies, and resolves threads.

Features: /testbot trigger phrase, full thread history context,
repo-member-only access, dedup (skip already-replied threads),
crash recovery, push retry, Claude-authored commit messages.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
4 test targets with pylint and mypy checks:
- test_coverage_targets: _is_ignored, _lines_to_ranges, _cap_ranges,
  select_targets, format_targets
- test_guardrails: is_test_file, get_changed_test_files (mocked git)
- test_respond: _has_trigger, filter_actionable (all skip reasons,
  nested replies, dedup), parse_replies (3-tier fallback), build_prompt,
  run_claude (success, failure, timeout, env vars)
- test_create_pr: has_open_testbot_pr (all branches)

Also updates imports to use full src.scripts.testbot path for Bazel
compatibility, matching the repo convention.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🧹 Nitpick comments (1)
src/scripts/testbot/coverage_targets.py (1)

170-175: Use a typed dataclass for target entries instead of raw dictionaries.

entries and targets carry a stable multi-field structure (file_path, coverage_pct, uncovered_lines, uncovered_ranges). Converting to a dataclass improves type safety and prevents key drift.

♻️ Proposed refactor sketch
+from dataclasses import dataclass
+
+@dataclass(frozen=True)
+class CoverageTarget:
+    file_path: str
+    coverage_pct: float
+    uncovered_lines: int
+    uncovered_ranges: list[tuple[int, int]]
@@
-def select_targets(
+def select_targets(
     report: dict,
     max_targets: int,
     max_uncovered: int,
-) -> list[dict]:
+) -> list[CoverageTarget]:
@@
-    entries = []
+    entries: list[CoverageTarget] = []
@@
-        entries.append({
-            "file_path": file_path,
-            "coverage_pct": coverage_pct,
-            "uncovered_lines": uncovered_line_count,
-            "uncovered_ranges": uncovered_ranges,
-        })
+        entries.append(
+            CoverageTarget(
+                file_path=file_path,
+                coverage_pct=coverage_pct,
+                uncovered_lines=uncovered_line_count,
+                uncovered_ranges=uncovered_ranges,
+            )
+        )
@@
-    entries.sort(key=lambda entry: entry["coverage_pct"])
+    entries.sort(key=lambda entry: entry.coverage_pct)
@@
-def format_targets(targets: list[dict]) -> str:
+def format_targets(targets: list[CoverageTarget]) -> str:
@@
-        lines.append(f"## Target {i}: {target['file_path']}")
+        lines.append(f"## Target {i}: {target.file_path}")

Update all dictionary accesses in format_targets() and main() to use attribute access, and update test cases to construct CoverageTarget instances instead of dicts.

Per coding guidelines: "Prefer dataclasses over dictionaries when passing structured data with multiple fields in Python" and "Add type annotations where they improve code clarity."

Also applies to lines 181–199 (format_targets) and lines 243–245 (main).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/scripts/testbot/coverage_targets.py` around lines 170 - 175, Replace the
ad-hoc dicts used for coverage entries with a typed dataclass (e.g.,
CoverageTarget) and update usages to attribute access: define a dataclass
CoverageTarget(file_path: str, coverage_pct: float, uncovered_lines: int,
uncovered_ranges: List[Tuple[int,int]]), change places that append dicts to
append CoverageTarget(...) (the block that builds entries), update
format_targets() and main() to accept and reference target.file_path,
target.coverage_pct, target.uncovered_lines, target.uncovered_ranges instead of
dict keys, and add type annotations to function signatures and tests to
construct CoverageTarget instances rather than raw dicts so all callers and
tests compile cleanly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/testbot-respond.yaml:
- Around line 35-38: Update the respond job-level if condition so the job only
runs when the comment contains the trigger and the PR is from the same repo: use
a combined expression with contains(github.event.comment.body, '/testbot') AND
github.event.pull_request.head.repo.full_name == github.repository instead of
the current label-only check on the respond job; apply the same change to the
parallel job block referenced (lines ~80-92) so both jobs perform this preflight
gating before any checkout or secret injection.

In `@src/scripts/testbot/coverage_targets.py`:
- Around line 213-224: Validate parsed values for --max-targets and
--max-uncovered immediately after args = parser.parse_args(): check that
args.max_targets >= 0 and args.max_uncovered >= 0 (allow 0 for --max-uncovered
per the help meaning), and call parser.error("...") with a clear message if a
negative value is provided; reference the parser.add_argument entries for
"--max-targets" and "--max-uncovered" and the variables args.max_targets /
args.max_uncovered so the check is colocated with the existing argument parsing
(also apply the same non-negative check to the other parser.add_argument
occurrence handling these flags).
- Around line 109-115: The fetch currently only catches urllib.error.HTTPError;
extend the excepts around the urllib.request.urlopen/json.loads block to also
catch urllib.error.URLError (for DNS/timeouts) and json.JSONDecodeError (or
ValueError for older Pythons) so those failures are logged with context via
logger.error and then call sys.exit(1); reference the existing
urllib.request.urlopen call and the json.loads call and reuse the same
logging/exit pattern used for HTTPError to handle these additional error cases.

In `@src/scripts/testbot/create_pr.py`:
- Around line 36-44: After calling run(...) for the "gh pr list" command, don't
treat a non-zero exit as "no open PRs" — check result.returncode (or call
result.check_returncode()) immediately and raise/abort with the command stderr
(and exit code) so failures don't get parsed as zero PRs; then only parse
int(result.stdout.strip()) > 0 as before and remove the ValueError fallback that
masks command failures (refer to the run(...) call, the result variable, and the
int(result.stdout.strip()) > 0 return).

In `@src/scripts/testbot/guardrails.py`:
- Around line 14-26: Remove the catch-all "BUILD" from TEST_FILE_PATTERNS and
change is_test_file(file_path: str) so BUILD files are only allowed when they
live inside an explicit test package (e.g., a "tests" or "test" directory).
Concretely: keep the existing basename+fnmatch check for normal patterns, but if
basename == "BUILD" return True only when any path segment in
file_path.split("/") equals "tests" or "test"; otherwise treat BUILD as
non-test. Update TEST_FILE_PATTERNS and the is_test_file function accordingly
(references: TEST_FILE_PATTERNS and is_test_file).

In `@src/scripts/testbot/respond.py`:
- Around line 393-416: The reply_to_comment function currently always allows the
thread to be resolved even if the API post failed; change reply_to_comment to
return a boolean success flag and only return True when run_gh returns a zero
returncode and the posted body meets the inline structured schema
(validate/serialize message to the expected schema before calling run_gh).
Specifically, update reply_to_comment to call run_gh as before but on non-zero
result.returncode log the error and return False, and on success return True;
then update the caller (the code that resolves threads) to check
reply_to_comment's boolean and only call the thread-resolve logic when it
returns True. Ensure you keep using reply_comment_id and the same run_gh
invocation but gate resolution on the new success flag.
- Around line 269-282: Update the prompt text added in the lines.extend([...])
call inside respond.py to tighten the BUILD-file rule: replace the line
currently reading "5. Do NOT create git commits or branches. Do NOT modify
source code — only test files and BUILD files." with a stricter sentence such as
"5. Do NOT create git commits or branches. Only create/modify test files
(test_*.py, *_test.go, *.test.ts, *.test.tsx) and Python test-directory BUILD
files; do not edit source/config/other non-test files." Locate the string within
the lines.extend array and swap the sentence so the prompt enforces the intended
guardrail.
- Around line 336-345: The tier-2 JSON extraction currently returns
data["replies"] without validating shape; update the block that processes
result_text/claude_output to only return replies if data is a list, non-empty,
and each item is an object (e.g., isinstance(item, dict))—otherwise fall through
so main() doesn't crash when it later calls reply_data.get(...). Reference the
symbols result_text, claude_output, data, and logger.info("Parsed replies from
result text (tier 2)") and ensure the check rejects strings or wrong-shaped
values like {"replies":"done"}.

---

Nitpick comments:
In `@src/scripts/testbot/coverage_targets.py`:
- Around line 170-175: Replace the ad-hoc dicts used for coverage entries with a
typed dataclass (e.g., CoverageTarget) and update usages to attribute access:
define a dataclass CoverageTarget(file_path: str, coverage_pct: float,
uncovered_lines: int, uncovered_ranges: List[Tuple[int,int]]), change places
that append dicts to append CoverageTarget(...) (the block that builds entries),
update format_targets() and main() to accept and reference target.file_path,
target.coverage_pct, target.uncovered_lines, target.uncovered_ranges instead of
dict keys, and add type annotations to function signatures and tests to
construct CoverageTarget instances rather than raw dicts so all callers and
tests compile cleanly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: d71b59f9-7102-4c0f-b09a-f98725d2d8f4

📥 Commits

Reviewing files that changed from the base of the PR and between 28d2c73 and 554fc1e.

📒 Files selected for processing (17)
  • .github/workflows/testbot-respond.yaml
  • .github/workflows/testbot.yaml
  • AGENTS.md
  • src/scripts/testbot/BUILD
  • src/scripts/testbot/README.md
  • src/scripts/testbot/TESTBOT_PROMPT.md
  • src/scripts/testbot/__init__.py
  • src/scripts/testbot/coverage_targets.py
  • src/scripts/testbot/create_pr.py
  • src/scripts/testbot/guardrails.py
  • src/scripts/testbot/respond.py
  • src/scripts/testbot/tests/BUILD
  • src/scripts/testbot/tests/__init__.py
  • src/scripts/testbot/tests/test_coverage_targets.py
  • src/scripts/testbot/tests/test_create_pr.py
  • src/scripts/testbot/tests/test_guardrails.py
  • src/scripts/testbot/tests/test_respond.py
✅ Files skipped from review due to trivial changes (10)
  • src/scripts/testbot/init.py
  • AGENTS.md
  • src/scripts/testbot/tests/init.py
  • src/scripts/testbot/BUILD
  • src/scripts/testbot/tests/BUILD
  • src/scripts/testbot/TESTBOT_PROMPT.md
  • src/scripts/testbot/README.md
  • src/scripts/testbot/tests/test_create_pr.py
  • src/scripts/testbot/tests/test_guardrails.py
  • .github/workflows/testbot.yaml
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/scripts/testbot/tests/test_coverage_targets.py

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
src/scripts/testbot/coverage_targets.py (2)

230-230: ⚠️ Potential issue | 🟡 Minor

Validate negative CLI bounds after argument parsing.

On Line 230, negative values for --max-targets / --max-uncovered are still accepted and lead to surprising slicing/capping behavior.

🔧 Proposed fix
     args = parser.parse_args()
+    if args.max_targets < 0:
+        parser.error("--max-targets must be >= 0")
+    if args.max_uncovered < 0:
+        parser.error("--max-uncovered must be >= 0")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/scripts/testbot/coverage_targets.py` at line 230, After parsing CLI args
(args = parser.parse_args()), validate that numeric bounds for --max-targets and
--max-uncovered are non-negative; if either args.max_targets or
args.max_uncovered is < 0 call parser.error (or raise a SystemExit with a clear
message) to fail fast. Locate the argument parser setup (parser) and the
post-parse usage of args in this module (e.g., main() or whichever function uses
args) and add the validation immediately after parser.parse_args() so invalid
negative values are rejected before any slicing/capping logic runs.

26-32: ⚠️ Potential issue | 🟠 Major

Expand ignore globs to exclude nested test directories.

Right now only src/tests/** is excluded. Files like src/service/core/tests/... can still be selected as “coverage targets,” which can cause testbot to generate tests for test code.

🔧 Proposed fix
 IGNORE_PATTERNS = [
     "src/tests/**",
+    "src/**/tests/**",
     "src/scripts/**",
     "bzl/**",
     "run/**",
     "deployments/**",
 ]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/scripts/testbot/coverage_targets.py` around lines 26 - 32, The
IGNORE_PATTERNS list currently only excludes top-level src/tests/**; update
IGNORE_PATTERNS to also exclude nested test directories by adding globs like
"**/tests/**", "**/test/**", and "**/__tests__/**" (or any other
project-specific test folder names) so files such as src/service/core/tests/...
are ignored; ensure these new patterns are added alongside the existing entries
in the IGNORE_PATTERNS variable.
🧹 Nitpick comments (5)
src/scripts/testbot/tests/test_respond.py (1)

278-284: Consider adding stderr to mock CompletedProcess for completeness.

Some test mocks omit stderr, which could cause AttributeError if the code under test ever accesses it. While the current run_claude implementation only logs stderr on failure paths (which are tested with stderr=""), adding it consistently improves mock fidelity.

Proposed fix
     `@patch`("src.scripts.testbot.respond.subprocess.run")
     def test_successful_run_returns_parsed_json(self, mock_run):
         expected = {"structured_output": {"replies": []}, "result": "ok"}
         mock_run.return_value = subprocess.CompletedProcess(
-            [], 0, stdout=json.dumps(expected),
+            [], 0, stdout=json.dumps(expected), stderr="",
         )
         result = run_claude("test prompt")
         self.assertEqual(result, expected)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/scripts/testbot/tests/test_respond.py` around lines 278 - 284, The test
test_successful_run_returns_parsed_json mocks subprocess.CompletedProcess
without stderr which can lead to AttributeError if run_claude ever reads stderr;
update the mock_return_value creation in that test to include stderr (e.g.,
stderr="") when constructing subprocess.CompletedProcess so the mocked object
matches the real CompletedProcess shape used by run_claude.
src/scripts/testbot/guardrails.py (1)

71-71: Consider using iterable unpacking for cleaner syntax.

Per Ruff RUF005, prefer unpacking over list concatenation.

Proposed fix
-        subprocess.run(["git", "checkout", "--"] + non_test_files, check=False)
+        subprocess.run(["git", "checkout", "--", *non_test_files], check=False)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/scripts/testbot/guardrails.py` at line 71, Replace the list concatenation
in the subprocess.run call with iterable unpacking: update the subprocess.run
invocation that currently builds the args with ["git", "checkout", "--"] +
non_test_files to use unpacking so the command becomes ["git", "checkout", "--",
*non_test_files] (referencing the subprocess.run call and the non_test_files
variable in guardrails.py).
src/scripts/testbot/respond.py (1)

105-105: Consider using iterable unpacking for cleaner syntax.

Per Ruff RUF005, prefer unpacking over list concatenation at lines 105 and 373.

Proposed fixes
 def run_gh(args: str) -> subprocess.CompletedProcess:
     """Run a gh CLI command and return the result."""
     return subprocess.run(
-        ["gh"] + shlex.split(args),
+        ["gh", *shlex.split(args)],
         capture_output=True,
         text=True,
         check=False,
     )
-    subprocess.run(["git", "add"] + files, check=True)
+    subprocess.run(["git", "add", *files], check=True)

Also applies to: 373-373

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/scripts/testbot/respond.py` at line 105, Replace the list-concatenation
expressions that build command args (the occurrences of ["gh"] +
shlex.split(args)) with iterable unpacking so the command list is constructed as
["gh", *shlex.split(args)]; update both occurrences (the one at the top-level
call and the similar one later) to use this unpacking syntax to satisfy Ruff
RUF005 and improve readability.
src/scripts/testbot/create_pr.py (1)

87-91: Minor inconsistency: direct subprocess.run vs run() helper.

Lines 88-91 use subprocess.run directly while nearby calls use the run() helper. This is functionally fine but slightly inconsistent. Consider using the helper for uniformity, or keep as-is since commit needs different options (input parameter isn't supported by the helper).

Also, per Ruff RUF005, prefer unpacking on line 87:

Proposed fix for unpacking
-    run(["git", "add"] + changed_files)
+    run(["git", "add", *changed_files])
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/scripts/testbot/create_pr.py` around lines 87 - 91, Change the
inconsistent subprocess invocation to use the existing run() helper and unpack
the changed_files list: replace the run(["git", "add"] + changed_files call with
run(["git", "add", *changed_files]) to satisfy Ruff RUF005, and replace the
subprocess.run([...], check=True) commit call with run(["git", "commit", "-m",
f"Add AI-generated tests for {files_summary}"], check=True) so both git
operations use the same run() helper (noting the helper supports the check=True
option used here).
src/scripts/testbot/coverage_targets.py (1)

123-127: Use a dataclass for target records instead of untyped dicts.

select_targets/format_targets pass a multi-field record as dict, which makes schema drift easy. A small dataclass would improve safety and readability.

As per coding guidelines, "Prefer dataclasses over dictionaries when passing structured data with multiple fields in Python".

Also applies to: 176-181, 187-197

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/scripts/testbot/coverage_targets.py` around lines 123 - 127, Create a
small dataclass (e.g., TargetRecord) representing the multi-field record
currently passed as dict (fields: id/name/path, covered_lines, uncovered_lines,
coverage_pct, etc. as used by select_targets and format_targets), update
select_targets to build and return List[TargetRecord] instead of list[dict],
change the function signature and type hints accordingly, and update
format_targets to accept List[TargetRecord] and access attributes instead of
dict keys; then adjust the callers referenced in the review (the blocks around
the former dict construction/consumption at the noted sites) to construct and
consume TargetRecord instances rather than raw dicts so typing and schema are
enforced.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/scripts/testbot/coverage_targets.py`:
- Line 110: Ruff flags the call to urllib.request.urlopen(request, timeout=60)
as S310; since the target URL is a fixed, trusted Codecov HTTPS API, add a
targeted suppression for Ruff S310 on the urlopen call (e.g., append "# noqa:
S310") and include a short inline rationale comment referencing that the request
URL is fixed/trusted (Codecov HTTPS API) so reviewers understand the exception;
locate the call to urllib.request.urlopen and apply the suppression and
rationale there.

---

Duplicate comments:
In `@src/scripts/testbot/coverage_targets.py`:
- Line 230: After parsing CLI args (args = parser.parse_args()), validate that
numeric bounds for --max-targets and --max-uncovered are non-negative; if either
args.max_targets or args.max_uncovered is < 0 call parser.error (or raise a
SystemExit with a clear message) to fail fast. Locate the argument parser setup
(parser) and the post-parse usage of args in this module (e.g., main() or
whichever function uses args) and add the validation immediately after
parser.parse_args() so invalid negative values are rejected before any
slicing/capping logic runs.
- Around line 26-32: The IGNORE_PATTERNS list currently only excludes top-level
src/tests/**; update IGNORE_PATTERNS to also exclude nested test directories by
adding globs like "**/tests/**", "**/test/**", and "**/__tests__/**" (or any
other project-specific test folder names) so files such as
src/service/core/tests/... are ignored; ensure these new patterns are added
alongside the existing entries in the IGNORE_PATTERNS variable.

---

Nitpick comments:
In `@src/scripts/testbot/coverage_targets.py`:
- Around line 123-127: Create a small dataclass (e.g., TargetRecord)
representing the multi-field record currently passed as dict (fields:
id/name/path, covered_lines, uncovered_lines, coverage_pct, etc. as used by
select_targets and format_targets), update select_targets to build and return
List[TargetRecord] instead of list[dict], change the function signature and type
hints accordingly, and update format_targets to accept List[TargetRecord] and
access attributes instead of dict keys; then adjust the callers referenced in
the review (the blocks around the former dict construction/consumption at the
noted sites) to construct and consume TargetRecord instances rather than raw
dicts so typing and schema are enforced.

In `@src/scripts/testbot/create_pr.py`:
- Around line 87-91: Change the inconsistent subprocess invocation to use the
existing run() helper and unpack the changed_files list: replace the run(["git",
"add"] + changed_files call with run(["git", "add", *changed_files]) to satisfy
Ruff RUF005, and replace the subprocess.run([...], check=True) commit call with
run(["git", "commit", "-m", f"Add AI-generated tests for {files_summary}"],
check=True) so both git operations use the same run() helper (noting the helper
supports the check=True option used here).

In `@src/scripts/testbot/guardrails.py`:
- Line 71: Replace the list concatenation in the subprocess.run call with
iterable unpacking: update the subprocess.run invocation that currently builds
the args with ["git", "checkout", "--"] + non_test_files to use unpacking so the
command becomes ["git", "checkout", "--", *non_test_files] (referencing the
subprocess.run call and the non_test_files variable in guardrails.py).

In `@src/scripts/testbot/respond.py`:
- Line 105: Replace the list-concatenation expressions that build command args
(the occurrences of ["gh"] + shlex.split(args)) with iterable unpacking so the
command list is constructed as ["gh", *shlex.split(args)]; update both
occurrences (the one at the top-level call and the similar one later) to use
this unpacking syntax to satisfy Ruff RUF005 and improve readability.

In `@src/scripts/testbot/tests/test_respond.py`:
- Around line 278-284: The test test_successful_run_returns_parsed_json mocks
subprocess.CompletedProcess without stderr which can lead to AttributeError if
run_claude ever reads stderr; update the mock_return_value creation in that test
to include stderr (e.g., stderr="") when constructing
subprocess.CompletedProcess so the mocked object matches the real
CompletedProcess shape used by run_claude.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: f6dcd1bb-a269-4af9-9a2f-1f25d8a94d2f

📥 Commits

Reviewing files that changed from the base of the PR and between 554fc1e and 4375e3c.

📒 Files selected for processing (13)
  • .github/workflows/testbot-respond.yaml
  • .github/workflows/testbot.yaml
  • src/scripts/testbot/BUILD
  • src/scripts/testbot/__init__.py
  • src/scripts/testbot/coverage_targets.py
  • src/scripts/testbot/create_pr.py
  • src/scripts/testbot/guardrails.py
  • src/scripts/testbot/respond.py
  • src/scripts/testbot/tests/BUILD
  • src/scripts/testbot/tests/test_coverage_targets.py
  • src/scripts/testbot/tests/test_create_pr.py
  • src/scripts/testbot/tests/test_guardrails.py
  • src/scripts/testbot/tests/test_respond.py
✅ Files skipped from review due to trivial changes (6)
  • src/scripts/testbot/init.py
  • src/scripts/testbot/BUILD
  • src/scripts/testbot/tests/BUILD
  • .github/workflows/testbot-respond.yaml
  • .github/workflows/testbot.yaml
  • src/scripts/testbot/tests/test_guardrails.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/scripts/testbot/tests/test_create_pr.py

ecolternv
ecolternv previously approved these changes Apr 3, 2026
jiaenren and others added 2 commits April 3, 2026 13:36
- Add comment explaining DISABLE_PROMPT_CACHING=1 (NVIDIA gateway)
- Change copyright years from 2025-2026 to 2026
- Switch BUILD files from full Apache text to 2-line SPDX header
- Remove __init__.py files (not needed for Bazel or Python 3)
- Move env vars (MAX_RESPONSES, MAX_TURNS, MODEL) to CLI args
- Strengthen _is_ignored: catch all */tests/* dirs, test files, __init__.py
- Restrict BUILD whitelist to /tests/ directories only
- Handle URLError and JSONDecodeError from Codecov fetch
- Fail closed when gh pr list fails (prevent duplicate PRs)
- Validate tier-2 replies shape before returning
- Only resolve threads when inline reply post succeeds
- Fix dry_run to only skip PR creation, not test generation
- Only trigger on last non-bot human comment (prevent stale retrigger)
- Catch git add/commit failures in commit_and_push fallback path
- Normalize comment_id to int before lookup (handle string IDs)
- Add author association check: only OWNER/MEMBER/COLLABORATOR can trigger

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Tighten --allowedTools: Bash(bazel:*) → Bash(bazel test *),
  Bash(pnpm:*) → Bash(pnpm test *). Prevents bazel run, pnpm dlx,
  and other arbitrary command execution via prompt injection.
- Pin claude-code to @2.1.91 instead of @latest to prevent supply
  chain attacks via npm package compromise.
- Delete untracked non-test files (os.remove) instead of git checkout
  which silently no-ops on new files. Closes guardrail bypass where
  Claude could create malicious files that persist during bazel test.
- Sanitize commit messages: enforce testbot: prefix, strip git
  trailers (Signed-off-by, Co-authored-by), cap at 500 chars.
- Update TS test instruction to pnpm --dir (no cd needed).
- Filter has_open_testbot_pr by --author svc-osmo-ci so dev PRs
  manually labeled ai-generated don't block testbot runs.
- Add environment: testbot to both workflows. Requires GitHub
  environment configured with deployment branches main + testbot/*
  and a branch ruleset restricting testbot/* to maintain/admin.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants