Add testbot: AI-powered test generation and review response#788
Add testbot: AI-powered test generation and review response#788
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughIntroduces a comprehensive "Testbot" system comprising two GitHub Actions workflows, five Python modules, supporting Bazel and test infrastructure, and documentation. The system enables AI-powered test generation from coverage gaps and inline review-comment responses via Claude Code CLI integration with GitHub PR automation and safety guardrails. Changes
Sequence Diagram(s)sequenceDiagram
participant Cron as Cron Schedule
participant GHA as GitHub Actions<br/>(testbot.yaml)
participant Codecov as Codecov API
participant CoverageScript as coverage_targets.py
participant Claude as Claude Code CLI
participant BazelPnpm as Bazel/pnpm<br/>(Test Runners)
participant CreatePR as create_pr.py
participant GitHub as GitHub<br/>(PR Create)
Cron->>GHA: Trigger on weekday cron
GHA->>CoverageScript: Run coverage_targets.py<br/>(select low-coverage files)
CoverageScript->>Codecov: Fetch coverage report
Codecov-->>CoverageScript: line_coverage data
CoverageScript-->>GHA: targets.txt<br/>(filtered candidates)
GHA->>Claude: npx `@anthropic-ai/claude-code`<br/>(generate/update tests)<br/>with targets & prompt
Claude->>BazelPnpm: Generate test code<br/>run bazel test/pnpm test
BazelPnpm-->>Claude: test results
Claude->>Claude: Iterate on failures<br/>(up to 3 retries)
Claude-->>GHA: Generated/updated files
GHA->>CreatePR: python -m testbot.create_pr
CreatePR->>GitHub: gh pr create<br/>(ai-generated label)
GitHub-->>CreatePR: PR created/confirmed
CreatePR-->>GHA: PR URL logged
sequenceDiagram
participant User as Reviewer
participant GHA as GitHub Actions<br/>(testbot-respond.yaml)
participant GitHub as GitHub GraphQL<br/>(fetch threads)
participant RespondPy as respond.py<br/>(filtering & orchestration)
participant Claude as Claude Code CLI<br/>(fix generation)
participant BazelPnpm as Bazel/pnpm<br/>(Test Runners)
participant GH_CLI as gh CLI<br/>(post/resolve)
User->>GitHub: Comment `/testbot` on<br/>inline review thread
GitHub-->>GHA: pull_request_review_comment trigger
GHA->>GitHub: Fetch all review threads<br/>(full histories via GraphQL)
GitHub-->>RespondPy: threads with comments
RespondPy->>RespondPy: Filter actionable threads:<br/>unresolved, has `/testbot` trigger,<br/>not already bot-handled,<br/>respects max_responses
RespondPy->>Claude: Single prompt with all<br/>thread histories & trigger context
Claude->>BazelPnpm: Generate fixes & run tests
BazelPnpm-->>Claude: test results
Claude->>Claude: Iterate on failures<br/>(up to max_turns)
Claude-->>RespondPy: Structured replies<br/>(per-thread fixes, resolve flag)
RespondPy->>RespondPy: Parse JSON with<br/>fallback strategy
RespondPy->>RespondPy: Commit & push changes<br/>(with retry on failure)
RespondPy->>GH_CLI: Post reply to each<br/>review comment (gh)
GH_CLI->>GitHub: POST inline reply
RespondPy->>GH_CLI: Resolve thread<br/>(if resolve=true & push OK)
GH_CLI->>GitHub: Mutate thread to RESOLVED
GitHub-->>User: Inline reply posted,<br/>thread resolved
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes The review requires careful validation of: complex GraphQL/subprocess orchestration in Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #788 +/- ##
==========================================
+ Coverage 42.68% 42.93% +0.25%
==========================================
Files 203 203
Lines 26803 26880 +77
Branches 7573 7591 +18
==========================================
+ Hits 11440 11542 +102
+ Misses 15253 15229 -24
+ Partials 110 109 -1
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (7)
src/scripts/testbot/TESTBOT_PROMPT.md (2)
87-89: CLI output testing pattern may not work as expected.
mock_print.call_args_listreturns a list ofcallobjects, not strings. Joiningstr(c)will producecall('text')representations rather than the actual printed content.A more accurate pattern:
- `output = " ".join(str(c) for c in mock_print.call_args_list)` + `output = " ".join(str(args[0]) for args, kwargs in mock_print.call_args_list)`Or alternatively:
"".join(call.args[0] for call in mock_print.call_args_list if call.args)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/scripts/testbot/TESTBOT_PROMPT.md` around lines 87 - 89, The current test builds output by joining str(c) for c in mock_print.call_args_list which yields "call(...)" strings; instead iterate mock_print.call_args_list and extract the first positional arg from each call (call.args[0]) when present, join those pieces (e.g., with "" or " ") into the output variable, and then use self.assertIn("expected", output); update references to builtins.print/mock_print.call_args_list and the output variable accordingly and guard against calls with no positional args.
62-65: Consider checking for existing PR before generating tests, not just before creating PR.The prompt instructs Claude to check for existing
ai-generatedPRs only after successfully generating tests. This means compute resources are spent on test generation even when a PR won't be created. Consider adding guidance to check for open PRs at the start of the process.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/scripts/testbot/TESTBOT_PROMPT.md` around lines 62 - 65, The prompt currently runs the ai-generated PR check only after generating tests; update the flow in TESTBOT_PROMPT.md so the existing-PR check (using the gh pr list --label ai-generated --state open --json number --jq "length" command) happens at the start of the process, before any test generation step, and change any instructions referencing "check before creating PR" to "check before generating tests and before creating PR" so the bot can exit early if an open ai-generated PR exists..github/workflows/testbot.yaml (3)
113-120: Shell quoting concern with$TARGETSexpansion.If
TARGETScontains special characters or newlines, the current inline expansion could break. Consider using a heredoc or--inputflag if available.However, since
coverage_targets.pyproduces well-structured markdown output with controlled characters, this is likely safe in practice.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/testbot.yaml around lines 113 - 120, The inline expansion of $TARGETS in the npx `@anthropic-ai/claude-code` command can break if it contains newlines or special chars; update the invocation used around --print/--model "$ANTHROPIC_MODEL" to pass the prompt and Coverage targets via a safe mechanism (e.g., a heredoc feeding stdin, or using the tool's --input flag if available) and read in src/scripts/testbot/TESTBOT_PROMPT.md plus the contents of $TARGETS into that input stream so special characters/newlines are preserved.
97-107: Add step ID for referencing output in subsequent steps.If you implement the target validation suggested above, you'll need a step ID here.
- name: Select coverage targets + id: select-targets run: |🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/testbot.yaml around lines 97 - 107, Add an id to the "Select coverage targets" step (e.g., id: select_coverage_targets) so later steps can reference its outputs, and expose the selected targets as a step output named (for example) targets by writing the contents of "$RUNNER_TEMP/targets.txt" to the Actions outputs (use the GITHUB_OUTPUT mechanism to set the output key "targets"). Ensure the step block (the run script under the step with name "Select coverage targets") is updated to set that output after creating the targets file.
109-120: Consider validating targets before invoking Claude Code.If
coverage_targets.pyreturns "No coverage targets found." (the empty case), the workflow still invokes Claude Code, consuming API credits for no useful work.Proposed fix
- name: Generate tests - if: inputs.dry_run != true + if: inputs.dry_run != true && !contains(steps.select-targets.outputs.targets, 'No coverage targets found') run: |Or add a validation step:
- name: Check targets exist id: check-targets run: | if grep -q "No coverage targets found" "$RUNNER_TEMP/targets.txt"; then echo "skip=true" >> "$GITHUB_OUTPUT" else echo "skip=false" >> "$GITHUB_OUTPUT" fi - name: Generate tests if: inputs.dry_run != true && steps.check-targets.outputs.skip != 'true'🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/testbot.yaml around lines 109 - 120, Add a validation step that checks the generated targets file for the "No coverage targets found" sentinel and prevents invoking Claude Code when empty: create a step with id check-targets (name "Check targets exist") that inspects "$RUNNER_TEMP/targets.txt" (e.g. grep -q "No coverage targets found" ...) and writes skip=true/false to $GITHUB_OUTPUT, then change the existing "Generate tests" step's if condition to include steps.check-targets.outputs.skip != 'true' (i.e. if: inputs.dry_run != true && steps.check-targets.outputs.skip != 'true') so the Claude Code run is skipped when there are no targets..github/workflows/testbot-respond.yaml (1)
39-44: Consider addingrepositoryinput for cross-fork safety.When checking out a PR branch by ref name only, if a PR comes from a fork with the same branch name as one in the base repo, the checkout could be ambiguous. Adding the repository explicitly ensures the correct branch is checked out.
- name: Checkout PR branch uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1 with: ref: ${{ github.event.pull_request.head.ref }} + repository: ${{ github.event.pull_request.head.repo.full_name }} fetch-depth: 0 token: ${{ secrets.SVC_OSMO_CI_TOKEN }}However, since this workflow only runs on PRs with the
ai-generatedlabel (which are created by the bot in the same repo), this is low risk.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/testbot-respond.yaml around lines 39 - 44, The checkout step using actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 should include the repository input to avoid ambiguity when checking out by ref; add repository: ${{ github.event.pull_request.head.repo.full_name }} alongside the existing ref, fetch-depth, and token inputs so the Checkout PR branch step explicitly targets the PR head repository.src/scripts/testbot/coverage_targets.py (1)
223-228: Add error handling for missing/malformedtotalsin the Codecov response.If the API response lacks
totals,files, orcoveragekeys, this will raise aKeyError. Consider defensive checks.Proposed fix
report = fetch_codecov_report(token) + totals = report.get("totals") + if not totals: + logger.error("Codecov response missing 'totals' field") + sys.exit(1) logger.info( "Codecov: %d files, %.1f%% overall coverage", - report["totals"]["files"], - report["totals"]["coverage"], + totals.get("files", 0), + totals.get("coverage", 0.0), )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/scripts/testbot/coverage_targets.py` around lines 223 - 228, The logger call assumes report["totals"]["files"] and ["coverage"] exist; add defensive validation after fetch_codecov_report(token) to ensure report is a dict and contains totals with files and coverage keys (e.g., check isinstance(report, dict) and "totals" in report and all(k in report["totals"] for k in ("files","coverage"))), and if missing or malformed, log a clear warning via logger.warning (including the raw report or a short sanitized excerpt) and fall back to safe defaults (e.g., files=0, coverage=0.0) or skip the info log path; update the code around fetch_codecov_report and the logger.info call accordingly to use the validated values.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/testbot-respond.yaml:
- Around line 73-88: The workflow uses anthropics/claude-code-action@v1 and
wrongly passes max_turns, allowed_tools, and custom_instructions as top-level
inputs; move these into the claude_args input instead by removing the
max_turns/allowed_tools/custom_instructions keys and adding a claude_args
multi-line value that sets the equivalent CLI flags (e.g., --max-turns,
--allowedTools, and --append-system-prompt) with the same values and text,
preserving ANTHROPIC_BASE_URL, ANTHROPIC_MODEL, and GH_TOKEN env entries.
In `@src/scripts/testbot/coverage_targets.py`:
- Line 218: The assignment uses an inline __import__("os") which violates the
rule that imports must be top-level; add a top-level import os (placed
immediately after the module docstring) and change the token assignment in
coverage_targets.py (the token = ... line) to use
os.environ.get("CODECOV_TOKEN", "") instead of
__import__("os").environ.get(...), ensuring no other inline imports remain in
the file.
---
Nitpick comments:
In @.github/workflows/testbot-respond.yaml:
- Around line 39-44: The checkout step using
actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 should include the
repository input to avoid ambiguity when checking out by ref; add repository:
${{ github.event.pull_request.head.repo.full_name }} alongside the existing ref,
fetch-depth, and token inputs so the Checkout PR branch step explicitly targets
the PR head repository.
In @.github/workflows/testbot.yaml:
- Around line 113-120: The inline expansion of $TARGETS in the npx
`@anthropic-ai/claude-code` command can break if it contains newlines or special
chars; update the invocation used around --print/--model "$ANTHROPIC_MODEL" to
pass the prompt and Coverage targets via a safe mechanism (e.g., a heredoc
feeding stdin, or using the tool's --input flag if available) and read in
src/scripts/testbot/TESTBOT_PROMPT.md plus the contents of $TARGETS into that
input stream so special characters/newlines are preserved.
- Around line 97-107: Add an id to the "Select coverage targets" step (e.g., id:
select_coverage_targets) so later steps can reference its outputs, and expose
the selected targets as a step output named (for example) targets by writing the
contents of "$RUNNER_TEMP/targets.txt" to the Actions outputs (use the
GITHUB_OUTPUT mechanism to set the output key "targets"). Ensure the step block
(the run script under the step with name "Select coverage targets") is updated
to set that output after creating the targets file.
- Around line 109-120: Add a validation step that checks the generated targets
file for the "No coverage targets found" sentinel and prevents invoking Claude
Code when empty: create a step with id check-targets (name "Check targets
exist") that inspects "$RUNNER_TEMP/targets.txt" (e.g. grep -q "No coverage
targets found" ...) and writes skip=true/false to $GITHUB_OUTPUT, then change
the existing "Generate tests" step's if condition to include
steps.check-targets.outputs.skip != 'true' (i.e. if: inputs.dry_run != true &&
steps.check-targets.outputs.skip != 'true') so the Claude Code run is skipped
when there are no targets.
In `@src/scripts/testbot/coverage_targets.py`:
- Around line 223-228: The logger call assumes report["totals"]["files"] and
["coverage"] exist; add defensive validation after fetch_codecov_report(token)
to ensure report is a dict and contains totals with files and coverage keys
(e.g., check isinstance(report, dict) and "totals" in report and all(k in
report["totals"] for k in ("files","coverage"))), and if missing or malformed,
log a clear warning via logger.warning (including the raw report or a short
sanitized excerpt) and fall back to safe defaults (e.g., files=0, coverage=0.0)
or skip the info log path; update the code around fetch_codecov_report and the
logger.info call accordingly to use the validated values.
In `@src/scripts/testbot/TESTBOT_PROMPT.md`:
- Around line 87-89: The current test builds output by joining str(c) for c in
mock_print.call_args_list which yields "call(...)" strings; instead iterate
mock_print.call_args_list and extract the first positional arg from each call
(call.args[0]) when present, join those pieces (e.g., with "" or " ") into the
output variable, and then use self.assertIn("expected", output); update
references to builtins.print/mock_print.call_args_list and the output variable
accordingly and guard against calls with no positional args.
- Around line 62-65: The prompt currently runs the ai-generated PR check only
after generating tests; update the flow in TESTBOT_PROMPT.md so the existing-PR
check (using the gh pr list --label ai-generated --state open --json number --jq
"length" command) happens at the start of the process, before any test
generation step, and change any instructions referencing "check before creating
PR" to "check before generating tests and before creating PR" so the bot can
exit early if an open ai-generated PR exists.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 47acc588-3a9b-4aaf-8dbe-63d1426894ac
📒 Files selected for processing (4)
.github/workflows/testbot-respond.yaml.github/workflows/testbot.yamlsrc/scripts/testbot/TESTBOT_PROMPT.mdsrc/scripts/testbot/coverage_targets.py
|
📖 Docs preview: https://d3in15bfzp49i0.cloudfront.net/788/index.html |
a6b44d5 to
89672b8
Compare
a160158 to
7a3d3c8
Compare
7a3d3c8 to
26e852d
Compare
26e852d to
38a0e81
Compare
|
@coderabbitai resolve |
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Actionable comments posted: 6
🧹 Nitpick comments (5)
src/scripts/testbot/tests/BUILD (1)
17-19: Remove duplicateloadstatement.The
osmo_py_testmacro is loaded twice on lines 17 and 19. Remove one of them.Proposed fix
load("//bzl:py.bzl", "osmo_py_test") -load("//bzl:py.bzl", "osmo_py_test") - osmo_py_test(🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/scripts/testbot/tests/BUILD` around lines 17 - 19, The BUILD contains a duplicate load of the osmo_py_test macro; remove one of the identical load("//bzl:py.bzl", "osmo_py_test") statements so the macro is only loaded once (keep a single load invocation referencing osmo_py_test and delete the redundant one).src/scripts/testbot/README.md (1)
80-85: Add language specifier to fenced code block.The static analysis tool flagged this code block for missing a language identifier. Since these are example comment texts, use
textas the language.Proposed fix
-``` +```text /testbot rename test methods to follow test_<behavior>_<condition> convention /testbot add edge case tests for empty input /testbot remove the redundant tests for preset labels🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/scripts/testbot/README.md` around lines 80 - 85, The fenced code block in README.md is missing a language specifier; update the triple-backtick that opens the example comment block to include the language "text" (change ``` to ```text) for the block containing "/testbot rename test methods...", ensuring the closing triple-backticks remain unchanged so the static analysis warning is resolved.src/scripts/testbot/guardrails.py (1)
66-66: Prefer unpacking over list concatenation.Use iterable unpacking for cleaner syntax.
Proposed fix
- subprocess.run(["git", "checkout", "--"] + non_test_files, check=False) + subprocess.run(["git", "checkout", "--", *non_test_files], check=False)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/scripts/testbot/guardrails.py` at line 66, The subprocess.run call currently concatenates lists to pass arguments to git (subprocess.run(["git", "checkout", "--"] + non_test_files, check=False)); change this to use iterable unpacking so the args are constructed as ["git", "checkout", "--", *non_test_files] while preserving the check=False flag and the non_test_files variable as the unpacked iterable.src/scripts/testbot/create_pr.py (1)
80-80: Prefer unpacking over list concatenation.Use iterable unpacking for cleaner syntax.
Proposed fix
- run(["git", "add"] + changed_files) + run(["git", "add", *changed_files])🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/scripts/testbot/create_pr.py` at line 80, Replace the list concatenation in the subprocess call with iterable unpacking: instead of constructing the args with ["git", "add"] + changed_files, pass the arguments to run using unpacking so the call uses ["git", "add", *changed_files] (locate the run(...) call and the changed_files variable in create_pr.py).src/scripts/testbot/respond.py (1)
360-363: Consider preserving the.claude/directory duringgit checkoutas well.The
discard_changesfunction preserves.claude/duringgit clean(line 363) butgit checkout -- .(line 362) will still revert any tracked changes in that directory. If.claude/contains tracked files that shouldn't be reverted, you may want to exclude it from the checkout as well. If.claude/only contains untracked files, this is fine as-is.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/scripts/testbot/respond.py` around lines 360 - 363, The current discard_changes() runs "git checkout -- ." which will revert tracked changes inside .claude/; change it so the checkout excludes that directory (e.g., build the list of tracked paths excluding ^.claude/ and pass them to git checkout --, or run: git checkout -- $(git ls-files | grep -v '^.claude/' | xargs) while handling the empty-list case), update the subprocess.run call in discard_changes to execute that exclusion command (or equivalent using Python's subprocess + git ls-files output) so tracked files under .claude/ are not reverted while keeping the existing git clean --exclude for untracked files.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/scripts/testbot/coverage_targets.py`:
- Around line 145-148: Replace the two-part comparison that checks
MAX_FILE_LINES > 0 and total_lines > MAX_FILE_LINES with a chained comparison to
satisfy pylint R1716: change the condition using the symbols MAX_FILE_LINES and
total_lines to a single chained expression (e.g., 0 < MAX_FILE_LINES <
total_lines) and keep the surrounding logic (the continue) intact so behavior is
unchanged.
In `@src/scripts/testbot/create_pr.py`:
- Around line 104-114: The subprocess.run call that creates the PR (the
invocation assigned to variable result) should include an explicit check=False
parameter since you manually inspect result.returncode; update the
subprocess.run([...], capture_output=True, text=True) call in create_pr.py to
pass check=False so the function won't raise CalledProcessError and the existing
manual error handling (logger.error and sys.exit) remains valid.
In `@src/scripts/testbot/guardrails.py`:
- Around line 35-46: The subprocess.run calls that create diff_result and
untracked_result (the two calls invoking ["git", "diff", "--name-only"] and
["git", "ls-files", "--others", "--exclude-standard"]) must explicitly pass
check=False; update both calls to include check=False so pylint stops warning
while keeping the manual returncode checks and existing logger.error handling.
In `@src/scripts/testbot/respond.py`:
- Around line 103-109: The run_gh function uses subprocess.run without
explicitly setting the check argument, triggering pylint W1510; update the
subprocess.run call inside run_gh to include check=False (keeping the existing
capture_output and text parameters) so the return code continues to be handled
manually and the linter warning is resolved.
- Around line 374-386: The subprocess.run call invoking ["git", "push"] in the
retry loop currently omits an explicit check argument which triggers pylint
W1510; update the subprocess.run invocation inside the for loop (the call in
respond.py that executes git push) to pass check=False explicitly so the code
continues to manually inspect result.returncode and avoid the linter warning
while preserving the existing retry logic and subsequent subprocess.run(["git",
"pull", "--rebase"], check=False) behavior.
- Around line 305-316: pylint flags subprocess.run usage in respond.py (within
the block that assigns result) because check is not explicitly set; update the
subprocess.run call (the one that assigns result) to include check=False so the
call won't raise on non-zero exits while you continue to manually inspect
result.returncode, keeping the existing TimeoutExpired handling and subsequent
return-code check in place.
---
Nitpick comments:
In `@src/scripts/testbot/create_pr.py`:
- Line 80: Replace the list concatenation in the subprocess call with iterable
unpacking: instead of constructing the args with ["git", "add"] + changed_files,
pass the arguments to run using unpacking so the call uses ["git", "add",
*changed_files] (locate the run(...) call and the changed_files variable in
create_pr.py).
In `@src/scripts/testbot/guardrails.py`:
- Line 66: The subprocess.run call currently concatenates lists to pass
arguments to git (subprocess.run(["git", "checkout", "--"] + non_test_files,
check=False)); change this to use iterable unpacking so the args are constructed
as ["git", "checkout", "--", *non_test_files] while preserving the check=False
flag and the non_test_files variable as the unpacked iterable.
In `@src/scripts/testbot/README.md`:
- Around line 80-85: The fenced code block in README.md is missing a language
specifier; update the triple-backtick that opens the example comment block to
include the language "text" (change ``` to ```text) for the block containing
"/testbot rename test methods...", ensuring the closing triple-backticks remain
unchanged so the static analysis warning is resolved.
In `@src/scripts/testbot/respond.py`:
- Around line 360-363: The current discard_changes() runs "git checkout -- ."
which will revert tracked changes inside .claude/; change it so the checkout
excludes that directory (e.g., build the list of tracked paths excluding
^.claude/ and pass them to git checkout --, or run: git checkout -- $(git
ls-files | grep -v '^.claude/' | xargs) while handling the empty-list case),
update the subprocess.run call in discard_changes to execute that exclusion
command (or equivalent using Python's subprocess + git ls-files output) so
tracked files under .claude/ are not reverted while keeping the existing git
clean --exclude for untracked files.
In `@src/scripts/testbot/tests/BUILD`:
- Around line 17-19: The BUILD contains a duplicate load of the osmo_py_test
macro; remove one of the identical load("//bzl:py.bzl", "osmo_py_test")
statements so the macro is only loaded once (keep a single load invocation
referencing osmo_py_test and delete the redundant one).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 968f6c1d-02ab-43a0-889b-e35c53156d6e
📒 Files selected for processing (16)
.github/workflows/testbot-respond.yaml.github/workflows/testbot.yamlsrc/scripts/testbot/BUILDsrc/scripts/testbot/README.mdsrc/scripts/testbot/TESTBOT_PROMPT.mdsrc/scripts/testbot/__init__.pysrc/scripts/testbot/coverage_targets.pysrc/scripts/testbot/create_pr.pysrc/scripts/testbot/guardrails.pysrc/scripts/testbot/respond.pysrc/scripts/testbot/tests/BUILDsrc/scripts/testbot/tests/__init__.pysrc/scripts/testbot/tests/test_coverage_targets.pysrc/scripts/testbot/tests/test_create_pr.pysrc/scripts/testbot/tests/test_guardrails.pysrc/scripts/testbot/tests/test_respond.py
✅ Files skipped from review due to trivial changes (4)
- src/scripts/testbot/tests/init.py
- src/scripts/testbot/init.py
- src/scripts/testbot/BUILD
- src/scripts/testbot/TESTBOT_PROMPT.md
🚧 Files skipped from review as they are similar to previous changes (1)
- .github/workflows/testbot-respond.yaml
✅ Actions performedComments resolved. Approval is disabled; enable |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (4)
src/scripts/testbot/TESTBOT_PROMPT.md (1)
61-63: Clarify the mock_print assertion pattern.The pattern
" ".join(str(c) for c in mock_print.call_args_list)convertscallobjects to strings, producing output likecall('text')rather than just'text'. A clearer pattern would extract the actual arguments:output = " ".join(str(args[0]) for args, kwargs in mock_print.call_args_list)Or for all positional args:
output = " ".join(" ".join(str(a) for a in args) for args, kwargs in mock_print.call_args_list)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/scripts/testbot/TESTBOT_PROMPT.md` around lines 61 - 63, The current test pattern builds output by stringifying call objects from mock_print.call_args_list which yields entries like "call('text')"; change the aggregation to extract the actual positional arguments from each call (e.g., use args, kwargs = call and join args or use args[0] if only single-arg prints) when constructing the output variable used in assertions (mock_print, call_args_list, output); then assert with self.assertIn("expected", output)..github/workflows/testbot-respond.yaml (1)
67-73: Consider pinningbazel-contrib/setup-bazelto an immutable SHA digest.Other actions in this workflow use SHA-pinned versions for reproducibility. The
bazel-contrib/setup-bazel@4fd964a13a440a8aeb0be47350db2fc640f19ca8is already SHA-pinned, which is good. However, adding a version comment (e.g.,# v0.x.x) would improve clarity.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/testbot-respond.yaml around lines 67 - 73, The workflow uses bazel-contrib/setup-bazel pinned to SHA "4fd964a13a440a8aeb0be47350db2fc640f19ca8"; update the action entry to keep the SHA pin but add a human-readable version comment (e.g., "# v0.x.x") next to the uses line for clarity and reproducibility so reviewers can see the intended release version while retaining the immutable SHA; locate the uses string "bazel-contrib/setup-bazel@4fd964a13a440a8aeb0be47350db2fc640f19ca8" and append the version comment on the same line or immediately above it.src/scripts/testbot/tests/BUILD (1)
17-19: Remove duplicate load statement.Line 19 duplicates the load statement on line 17. While Bazel tolerates duplicate loads, this should be cleaned up for clarity.
🧹 Remove duplicate load
load("//bzl:py.bzl", "osmo_py_test") -load("//bzl:py.bzl", "osmo_py_test") - osmo_py_test(🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/scripts/testbot/tests/BUILD` around lines 17 - 19, Remove the duplicate load statement by keeping a single occurrence of load("//bzl:py.bzl", "osmo_py_test") and deleting the repeated identical line; ensure only one load("//bzl:py.bzl", "osmo_py_test") remains in the BUILD file to avoid redundancy.src/scripts/testbot/README.md (1)
80-84: Add a language specifier to the fenced code block.The code block showing example
/testbotcommands lacks a language specifier. Consider addingtextorbashfor consistency with other code blocks in the document.Proposed fix
-``` +```text /testbot rename test methods to follow test_<behavior>_<condition> convention /testbot add edge case tests for empty input /testbot remove the redundant tests for preset labels🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/scripts/testbot/README.md` around lines 80 - 84, The fenced code block in the README showing example /testbot commands is missing a language specifier; update that fenced block (the triple-backtick block containing the lines starting with "/testbot rename", "/testbot add", "/testbot remove") to include a language tag such as text or bash (e.g., ```text) so it matches other code blocks and ensures correct syntax highlighting and rendering.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/scripts/testbot/BUILD`:
- Around line 19-24: Add the new testbot module to the Codebase Structure
section in AGENTS.md: update AGENTS.md to list the "testbot" module (matching
the BUILD target name testbot in src/scripts/testbot/BUILD) under the src/**
modules, include a short description like "testbot: test automation utilities"
and any relevant path or visibility note so the documentation and code change
remain synchronized.
In `@src/scripts/testbot/README.md`:
- Around line 116-135: Update AGENTS.md’s "Codebase Structure" to include the
new testbot component: add an entry for src/scripts/testbot describing its
purpose (automatic test generation and PR response via Claude Code), list the
key modules (coverage_targets.py, create_pr.py, guardrails.py, respond.py,
TESTBOT_PROMPT.md and tests/), and explain how it integrates with the build
system by referencing the GitHub Actions workflows (testbot.yaml and
testbot-respond.yaml) that schedule test generation and trigger review
responses; ensure the language mirrors existing style and placement used for
other major components in AGENTS.md.
---
Nitpick comments:
In @.github/workflows/testbot-respond.yaml:
- Around line 67-73: The workflow uses bazel-contrib/setup-bazel pinned to SHA
"4fd964a13a440a8aeb0be47350db2fc640f19ca8"; update the action entry to keep the
SHA pin but add a human-readable version comment (e.g., "# v0.x.x") next to the
uses line for clarity and reproducibility so reviewers can see the intended
release version while retaining the immutable SHA; locate the uses string
"bazel-contrib/setup-bazel@4fd964a13a440a8aeb0be47350db2fc640f19ca8" and append
the version comment on the same line or immediately above it.
In `@src/scripts/testbot/README.md`:
- Around line 80-84: The fenced code block in the README showing example
/testbot commands is missing a language specifier; update that fenced block (the
triple-backtick block containing the lines starting with "/testbot rename",
"/testbot add", "/testbot remove") to include a language tag such as text or
bash (e.g., ```text) so it matches other code blocks and ensures correct syntax
highlighting and rendering.
In `@src/scripts/testbot/TESTBOT_PROMPT.md`:
- Around line 61-63: The current test pattern builds output by stringifying call
objects from mock_print.call_args_list which yields entries like "call('text')";
change the aggregation to extract the actual positional arguments from each call
(e.g., use args, kwargs = call and join args or use args[0] if only single-arg
prints) when constructing the output variable used in assertions (mock_print,
call_args_list, output); then assert with self.assertIn("expected", output).
In `@src/scripts/testbot/tests/BUILD`:
- Around line 17-19: Remove the duplicate load statement by keeping a single
occurrence of load("//bzl:py.bzl", "osmo_py_test") and deleting the repeated
identical line; ensure only one load("//bzl:py.bzl", "osmo_py_test") remains in
the BUILD file to avoid redundancy.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 3e9dbacc-b545-4833-89ba-d6fcb0e42a14
📒 Files selected for processing (16)
.github/workflows/testbot-respond.yaml.github/workflows/testbot.yamlsrc/scripts/testbot/BUILDsrc/scripts/testbot/README.mdsrc/scripts/testbot/TESTBOT_PROMPT.mdsrc/scripts/testbot/__init__.pysrc/scripts/testbot/coverage_targets.pysrc/scripts/testbot/create_pr.pysrc/scripts/testbot/guardrails.pysrc/scripts/testbot/respond.pysrc/scripts/testbot/tests/BUILDsrc/scripts/testbot/tests/__init__.pysrc/scripts/testbot/tests/test_coverage_targets.pysrc/scripts/testbot/tests/test_create_pr.pysrc/scripts/testbot/tests/test_guardrails.pysrc/scripts/testbot/tests/test_respond.py
Add coverage_targets.py to fetch Codecov data and select low-coverage files, TESTBOT_PROMPT.md with quality rules for Claude Code, and a GitHub Actions workflow that runs Claude Code CLI to generate tests and create_pr.py to commit and open PRs. Security: Claude Code is sandboxed to read/edit test files and run test commands only. guardrails.py discards any non-test file changes. All git/gh operations are in deterministic harness scripts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add respond.py that fetches review threads via GraphQL, delegates fixes to Claude Code CLI with --json-schema for structured replies, posts inline replies, and resolves threads. Features: /testbot trigger phrase, full thread history context, repo-member-only access, dedup (skip already-replied threads), crash recovery, push retry, Claude-authored commit messages. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
4 test targets with pylint and mypy checks: - test_coverage_targets: _is_ignored, _lines_to_ranges, _cap_ranges, select_targets, format_targets - test_guardrails: is_test_file, get_changed_test_files (mocked git) - test_respond: _has_trigger, filter_actionable (all skip reasons, nested replies, dedup), parse_replies (3-tier fallback), build_prompt, run_claude (success, failure, timeout, env vars) - test_create_pr: has_open_testbot_pr (all branches) Also updates imports to use full src.scripts.testbot path for Bazel compatibility, matching the repo convention. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 8
🧹 Nitpick comments (1)
src/scripts/testbot/coverage_targets.py (1)
170-175: Use a typed dataclass for target entries instead of raw dictionaries.
entriesandtargetscarry a stable multi-field structure (file_path, coverage_pct, uncovered_lines, uncovered_ranges). Converting to a dataclass improves type safety and prevents key drift.♻️ Proposed refactor sketch
+from dataclasses import dataclass + +@dataclass(frozen=True) +class CoverageTarget: + file_path: str + coverage_pct: float + uncovered_lines: int + uncovered_ranges: list[tuple[int, int]] @@ -def select_targets( +def select_targets( report: dict, max_targets: int, max_uncovered: int, -) -> list[dict]: +) -> list[CoverageTarget]: @@ - entries = [] + entries: list[CoverageTarget] = [] @@ - entries.append({ - "file_path": file_path, - "coverage_pct": coverage_pct, - "uncovered_lines": uncovered_line_count, - "uncovered_ranges": uncovered_ranges, - }) + entries.append( + CoverageTarget( + file_path=file_path, + coverage_pct=coverage_pct, + uncovered_lines=uncovered_line_count, + uncovered_ranges=uncovered_ranges, + ) + ) @@ - entries.sort(key=lambda entry: entry["coverage_pct"]) + entries.sort(key=lambda entry: entry.coverage_pct) @@ -def format_targets(targets: list[dict]) -> str: +def format_targets(targets: list[CoverageTarget]) -> str: @@ - lines.append(f"## Target {i}: {target['file_path']}") + lines.append(f"## Target {i}: {target.file_path}")Update all dictionary accesses in
format_targets()andmain()to use attribute access, and update test cases to constructCoverageTargetinstances instead of dicts.Per coding guidelines: "Prefer dataclasses over dictionaries when passing structured data with multiple fields in Python" and "Add type annotations where they improve code clarity."
Also applies to lines 181–199 (
format_targets) and lines 243–245 (main).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/scripts/testbot/coverage_targets.py` around lines 170 - 175, Replace the ad-hoc dicts used for coverage entries with a typed dataclass (e.g., CoverageTarget) and update usages to attribute access: define a dataclass CoverageTarget(file_path: str, coverage_pct: float, uncovered_lines: int, uncovered_ranges: List[Tuple[int,int]]), change places that append dicts to append CoverageTarget(...) (the block that builds entries), update format_targets() and main() to accept and reference target.file_path, target.coverage_pct, target.uncovered_lines, target.uncovered_ranges instead of dict keys, and add type annotations to function signatures and tests to construct CoverageTarget instances rather than raw dicts so all callers and tests compile cleanly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/testbot-respond.yaml:
- Around line 35-38: Update the respond job-level if condition so the job only
runs when the comment contains the trigger and the PR is from the same repo: use
a combined expression with contains(github.event.comment.body, '/testbot') AND
github.event.pull_request.head.repo.full_name == github.repository instead of
the current label-only check on the respond job; apply the same change to the
parallel job block referenced (lines ~80-92) so both jobs perform this preflight
gating before any checkout or secret injection.
In `@src/scripts/testbot/coverage_targets.py`:
- Around line 213-224: Validate parsed values for --max-targets and
--max-uncovered immediately after args = parser.parse_args(): check that
args.max_targets >= 0 and args.max_uncovered >= 0 (allow 0 for --max-uncovered
per the help meaning), and call parser.error("...") with a clear message if a
negative value is provided; reference the parser.add_argument entries for
"--max-targets" and "--max-uncovered" and the variables args.max_targets /
args.max_uncovered so the check is colocated with the existing argument parsing
(also apply the same non-negative check to the other parser.add_argument
occurrence handling these flags).
- Around line 109-115: The fetch currently only catches urllib.error.HTTPError;
extend the excepts around the urllib.request.urlopen/json.loads block to also
catch urllib.error.URLError (for DNS/timeouts) and json.JSONDecodeError (or
ValueError for older Pythons) so those failures are logged with context via
logger.error and then call sys.exit(1); reference the existing
urllib.request.urlopen call and the json.loads call and reuse the same
logging/exit pattern used for HTTPError to handle these additional error cases.
In `@src/scripts/testbot/create_pr.py`:
- Around line 36-44: After calling run(...) for the "gh pr list" command, don't
treat a non-zero exit as "no open PRs" — check result.returncode (or call
result.check_returncode()) immediately and raise/abort with the command stderr
(and exit code) so failures don't get parsed as zero PRs; then only parse
int(result.stdout.strip()) > 0 as before and remove the ValueError fallback that
masks command failures (refer to the run(...) call, the result variable, and the
int(result.stdout.strip()) > 0 return).
In `@src/scripts/testbot/guardrails.py`:
- Around line 14-26: Remove the catch-all "BUILD" from TEST_FILE_PATTERNS and
change is_test_file(file_path: str) so BUILD files are only allowed when they
live inside an explicit test package (e.g., a "tests" or "test" directory).
Concretely: keep the existing basename+fnmatch check for normal patterns, but if
basename == "BUILD" return True only when any path segment in
file_path.split("/") equals "tests" or "test"; otherwise treat BUILD as
non-test. Update TEST_FILE_PATTERNS and the is_test_file function accordingly
(references: TEST_FILE_PATTERNS and is_test_file).
In `@src/scripts/testbot/respond.py`:
- Around line 393-416: The reply_to_comment function currently always allows the
thread to be resolved even if the API post failed; change reply_to_comment to
return a boolean success flag and only return True when run_gh returns a zero
returncode and the posted body meets the inline structured schema
(validate/serialize message to the expected schema before calling run_gh).
Specifically, update reply_to_comment to call run_gh as before but on non-zero
result.returncode log the error and return False, and on success return True;
then update the caller (the code that resolves threads) to check
reply_to_comment's boolean and only call the thread-resolve logic when it
returns True. Ensure you keep using reply_comment_id and the same run_gh
invocation but gate resolution on the new success flag.
- Around line 269-282: Update the prompt text added in the lines.extend([...])
call inside respond.py to tighten the BUILD-file rule: replace the line
currently reading "5. Do NOT create git commits or branches. Do NOT modify
source code — only test files and BUILD files." with a stricter sentence such as
"5. Do NOT create git commits or branches. Only create/modify test files
(test_*.py, *_test.go, *.test.ts, *.test.tsx) and Python test-directory BUILD
files; do not edit source/config/other non-test files." Locate the string within
the lines.extend array and swap the sentence so the prompt enforces the intended
guardrail.
- Around line 336-345: The tier-2 JSON extraction currently returns
data["replies"] without validating shape; update the block that processes
result_text/claude_output to only return replies if data is a list, non-empty,
and each item is an object (e.g., isinstance(item, dict))—otherwise fall through
so main() doesn't crash when it later calls reply_data.get(...). Reference the
symbols result_text, claude_output, data, and logger.info("Parsed replies from
result text (tier 2)") and ensure the check rejects strings or wrong-shaped
values like {"replies":"done"}.
---
Nitpick comments:
In `@src/scripts/testbot/coverage_targets.py`:
- Around line 170-175: Replace the ad-hoc dicts used for coverage entries with a
typed dataclass (e.g., CoverageTarget) and update usages to attribute access:
define a dataclass CoverageTarget(file_path: str, coverage_pct: float,
uncovered_lines: int, uncovered_ranges: List[Tuple[int,int]]), change places
that append dicts to append CoverageTarget(...) (the block that builds entries),
update format_targets() and main() to accept and reference target.file_path,
target.coverage_pct, target.uncovered_lines, target.uncovered_ranges instead of
dict keys, and add type annotations to function signatures and tests to
construct CoverageTarget instances rather than raw dicts so all callers and
tests compile cleanly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: d71b59f9-7102-4c0f-b09a-f98725d2d8f4
📒 Files selected for processing (17)
.github/workflows/testbot-respond.yaml.github/workflows/testbot.yamlAGENTS.mdsrc/scripts/testbot/BUILDsrc/scripts/testbot/README.mdsrc/scripts/testbot/TESTBOT_PROMPT.mdsrc/scripts/testbot/__init__.pysrc/scripts/testbot/coverage_targets.pysrc/scripts/testbot/create_pr.pysrc/scripts/testbot/guardrails.pysrc/scripts/testbot/respond.pysrc/scripts/testbot/tests/BUILDsrc/scripts/testbot/tests/__init__.pysrc/scripts/testbot/tests/test_coverage_targets.pysrc/scripts/testbot/tests/test_create_pr.pysrc/scripts/testbot/tests/test_guardrails.pysrc/scripts/testbot/tests/test_respond.py
✅ Files skipped from review due to trivial changes (10)
- src/scripts/testbot/init.py
- AGENTS.md
- src/scripts/testbot/tests/init.py
- src/scripts/testbot/BUILD
- src/scripts/testbot/tests/BUILD
- src/scripts/testbot/TESTBOT_PROMPT.md
- src/scripts/testbot/README.md
- src/scripts/testbot/tests/test_create_pr.py
- src/scripts/testbot/tests/test_guardrails.py
- .github/workflows/testbot.yaml
🚧 Files skipped from review as they are similar to previous changes (1)
- src/scripts/testbot/tests/test_coverage_targets.py
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (2)
src/scripts/testbot/coverage_targets.py (2)
230-230:⚠️ Potential issue | 🟡 MinorValidate negative CLI bounds after argument parsing.
On Line 230, negative values for
--max-targets/--max-uncoveredare still accepted and lead to surprising slicing/capping behavior.🔧 Proposed fix
args = parser.parse_args() + if args.max_targets < 0: + parser.error("--max-targets must be >= 0") + if args.max_uncovered < 0: + parser.error("--max-uncovered must be >= 0")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/scripts/testbot/coverage_targets.py` at line 230, After parsing CLI args (args = parser.parse_args()), validate that numeric bounds for --max-targets and --max-uncovered are non-negative; if either args.max_targets or args.max_uncovered is < 0 call parser.error (or raise a SystemExit with a clear message) to fail fast. Locate the argument parser setup (parser) and the post-parse usage of args in this module (e.g., main() or whichever function uses args) and add the validation immediately after parser.parse_args() so invalid negative values are rejected before any slicing/capping logic runs.
26-32:⚠️ Potential issue | 🟠 MajorExpand ignore globs to exclude nested test directories.
Right now only
src/tests/**is excluded. Files likesrc/service/core/tests/...can still be selected as “coverage targets,” which can cause testbot to generate tests for test code.🔧 Proposed fix
IGNORE_PATTERNS = [ "src/tests/**", + "src/**/tests/**", "src/scripts/**", "bzl/**", "run/**", "deployments/**", ]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/scripts/testbot/coverage_targets.py` around lines 26 - 32, The IGNORE_PATTERNS list currently only excludes top-level src/tests/**; update IGNORE_PATTERNS to also exclude nested test directories by adding globs like "**/tests/**", "**/test/**", and "**/__tests__/**" (or any other project-specific test folder names) so files such as src/service/core/tests/... are ignored; ensure these new patterns are added alongside the existing entries in the IGNORE_PATTERNS variable.
🧹 Nitpick comments (5)
src/scripts/testbot/tests/test_respond.py (1)
278-284: Consider addingstderrto mock CompletedProcess for completeness.Some test mocks omit
stderr, which could causeAttributeErrorif the code under test ever accesses it. While the currentrun_claudeimplementation only logsstderron failure paths (which are tested withstderr=""), adding it consistently improves mock fidelity.Proposed fix
`@patch`("src.scripts.testbot.respond.subprocess.run") def test_successful_run_returns_parsed_json(self, mock_run): expected = {"structured_output": {"replies": []}, "result": "ok"} mock_run.return_value = subprocess.CompletedProcess( - [], 0, stdout=json.dumps(expected), + [], 0, stdout=json.dumps(expected), stderr="", ) result = run_claude("test prompt") self.assertEqual(result, expected)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/scripts/testbot/tests/test_respond.py` around lines 278 - 284, The test test_successful_run_returns_parsed_json mocks subprocess.CompletedProcess without stderr which can lead to AttributeError if run_claude ever reads stderr; update the mock_return_value creation in that test to include stderr (e.g., stderr="") when constructing subprocess.CompletedProcess so the mocked object matches the real CompletedProcess shape used by run_claude.src/scripts/testbot/guardrails.py (1)
71-71: Consider using iterable unpacking for cleaner syntax.Per Ruff RUF005, prefer unpacking over list concatenation.
Proposed fix
- subprocess.run(["git", "checkout", "--"] + non_test_files, check=False) + subprocess.run(["git", "checkout", "--", *non_test_files], check=False)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/scripts/testbot/guardrails.py` at line 71, Replace the list concatenation in the subprocess.run call with iterable unpacking: update the subprocess.run invocation that currently builds the args with ["git", "checkout", "--"] + non_test_files to use unpacking so the command becomes ["git", "checkout", "--", *non_test_files] (referencing the subprocess.run call and the non_test_files variable in guardrails.py).src/scripts/testbot/respond.py (1)
105-105: Consider using iterable unpacking for cleaner syntax.Per Ruff RUF005, prefer unpacking over list concatenation at lines 105 and 373.
Proposed fixes
def run_gh(args: str) -> subprocess.CompletedProcess: """Run a gh CLI command and return the result.""" return subprocess.run( - ["gh"] + shlex.split(args), + ["gh", *shlex.split(args)], capture_output=True, text=True, check=False, )- subprocess.run(["git", "add"] + files, check=True) + subprocess.run(["git", "add", *files], check=True)Also applies to: 373-373
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/scripts/testbot/respond.py` at line 105, Replace the list-concatenation expressions that build command args (the occurrences of ["gh"] + shlex.split(args)) with iterable unpacking so the command list is constructed as ["gh", *shlex.split(args)]; update both occurrences (the one at the top-level call and the similar one later) to use this unpacking syntax to satisfy Ruff RUF005 and improve readability.src/scripts/testbot/create_pr.py (1)
87-91: Minor inconsistency: direct subprocess.run vs run() helper.Lines 88-91 use
subprocess.rundirectly while nearby calls use therun()helper. This is functionally fine but slightly inconsistent. Consider using the helper for uniformity, or keep as-is since commit needs different options (inputparameter isn't supported by the helper).Also, per Ruff RUF005, prefer unpacking on line 87:
Proposed fix for unpacking
- run(["git", "add"] + changed_files) + run(["git", "add", *changed_files])🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/scripts/testbot/create_pr.py` around lines 87 - 91, Change the inconsistent subprocess invocation to use the existing run() helper and unpack the changed_files list: replace the run(["git", "add"] + changed_files call with run(["git", "add", *changed_files]) to satisfy Ruff RUF005, and replace the subprocess.run([...], check=True) commit call with run(["git", "commit", "-m", f"Add AI-generated tests for {files_summary}"], check=True) so both git operations use the same run() helper (noting the helper supports the check=True option used here).src/scripts/testbot/coverage_targets.py (1)
123-127: Use a dataclass for target records instead of untyped dicts.
select_targets/format_targetspass a multi-field record asdict, which makes schema drift easy. A small dataclass would improve safety and readability.As per coding guidelines, "Prefer dataclasses over dictionaries when passing structured data with multiple fields in Python".
Also applies to: 176-181, 187-197
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/scripts/testbot/coverage_targets.py` around lines 123 - 127, Create a small dataclass (e.g., TargetRecord) representing the multi-field record currently passed as dict (fields: id/name/path, covered_lines, uncovered_lines, coverage_pct, etc. as used by select_targets and format_targets), update select_targets to build and return List[TargetRecord] instead of list[dict], change the function signature and type hints accordingly, and update format_targets to accept List[TargetRecord] and access attributes instead of dict keys; then adjust the callers referenced in the review (the blocks around the former dict construction/consumption at the noted sites) to construct and consume TargetRecord instances rather than raw dicts so typing and schema are enforced.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/scripts/testbot/coverage_targets.py`:
- Line 110: Ruff flags the call to urllib.request.urlopen(request, timeout=60)
as S310; since the target URL is a fixed, trusted Codecov HTTPS API, add a
targeted suppression for Ruff S310 on the urlopen call (e.g., append "# noqa:
S310") and include a short inline rationale comment referencing that the request
URL is fixed/trusted (Codecov HTTPS API) so reviewers understand the exception;
locate the call to urllib.request.urlopen and apply the suppression and
rationale there.
---
Duplicate comments:
In `@src/scripts/testbot/coverage_targets.py`:
- Line 230: After parsing CLI args (args = parser.parse_args()), validate that
numeric bounds for --max-targets and --max-uncovered are non-negative; if either
args.max_targets or args.max_uncovered is < 0 call parser.error (or raise a
SystemExit with a clear message) to fail fast. Locate the argument parser setup
(parser) and the post-parse usage of args in this module (e.g., main() or
whichever function uses args) and add the validation immediately after
parser.parse_args() so invalid negative values are rejected before any
slicing/capping logic runs.
- Around line 26-32: The IGNORE_PATTERNS list currently only excludes top-level
src/tests/**; update IGNORE_PATTERNS to also exclude nested test directories by
adding globs like "**/tests/**", "**/test/**", and "**/__tests__/**" (or any
other project-specific test folder names) so files such as
src/service/core/tests/... are ignored; ensure these new patterns are added
alongside the existing entries in the IGNORE_PATTERNS variable.
---
Nitpick comments:
In `@src/scripts/testbot/coverage_targets.py`:
- Around line 123-127: Create a small dataclass (e.g., TargetRecord)
representing the multi-field record currently passed as dict (fields:
id/name/path, covered_lines, uncovered_lines, coverage_pct, etc. as used by
select_targets and format_targets), update select_targets to build and return
List[TargetRecord] instead of list[dict], change the function signature and type
hints accordingly, and update format_targets to accept List[TargetRecord] and
access attributes instead of dict keys; then adjust the callers referenced in
the review (the blocks around the former dict construction/consumption at the
noted sites) to construct and consume TargetRecord instances rather than raw
dicts so typing and schema are enforced.
In `@src/scripts/testbot/create_pr.py`:
- Around line 87-91: Change the inconsistent subprocess invocation to use the
existing run() helper and unpack the changed_files list: replace the run(["git",
"add"] + changed_files call with run(["git", "add", *changed_files]) to satisfy
Ruff RUF005, and replace the subprocess.run([...], check=True) commit call with
run(["git", "commit", "-m", f"Add AI-generated tests for {files_summary}"],
check=True) so both git operations use the same run() helper (noting the helper
supports the check=True option used here).
In `@src/scripts/testbot/guardrails.py`:
- Line 71: Replace the list concatenation in the subprocess.run call with
iterable unpacking: update the subprocess.run invocation that currently builds
the args with ["git", "checkout", "--"] + non_test_files to use unpacking so the
command becomes ["git", "checkout", "--", *non_test_files] (referencing the
subprocess.run call and the non_test_files variable in guardrails.py).
In `@src/scripts/testbot/respond.py`:
- Line 105: Replace the list-concatenation expressions that build command args
(the occurrences of ["gh"] + shlex.split(args)) with iterable unpacking so the
command list is constructed as ["gh", *shlex.split(args)]; update both
occurrences (the one at the top-level call and the similar one later) to use
this unpacking syntax to satisfy Ruff RUF005 and improve readability.
In `@src/scripts/testbot/tests/test_respond.py`:
- Around line 278-284: The test test_successful_run_returns_parsed_json mocks
subprocess.CompletedProcess without stderr which can lead to AttributeError if
run_claude ever reads stderr; update the mock_return_value creation in that test
to include stderr (e.g., stderr="") when constructing
subprocess.CompletedProcess so the mocked object matches the real
CompletedProcess shape used by run_claude.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: f6dcd1bb-a269-4af9-9a2f-1f25d8a94d2f
📒 Files selected for processing (13)
.github/workflows/testbot-respond.yaml.github/workflows/testbot.yamlsrc/scripts/testbot/BUILDsrc/scripts/testbot/__init__.pysrc/scripts/testbot/coverage_targets.pysrc/scripts/testbot/create_pr.pysrc/scripts/testbot/guardrails.pysrc/scripts/testbot/respond.pysrc/scripts/testbot/tests/BUILDsrc/scripts/testbot/tests/test_coverage_targets.pysrc/scripts/testbot/tests/test_create_pr.pysrc/scripts/testbot/tests/test_guardrails.pysrc/scripts/testbot/tests/test_respond.py
✅ Files skipped from review due to trivial changes (6)
- src/scripts/testbot/init.py
- src/scripts/testbot/BUILD
- src/scripts/testbot/tests/BUILD
- .github/workflows/testbot-respond.yaml
- .github/workflows/testbot.yaml
- src/scripts/testbot/tests/test_guardrails.py
🚧 Files skipped from review as they are similar to previous changes (1)
- src/scripts/testbot/tests/test_create_pr.py
- Add comment explaining DISABLE_PROMPT_CACHING=1 (NVIDIA gateway) - Change copyright years from 2025-2026 to 2026 - Switch BUILD files from full Apache text to 2-line SPDX header - Remove __init__.py files (not needed for Bazel or Python 3) - Move env vars (MAX_RESPONSES, MAX_TURNS, MODEL) to CLI args - Strengthen _is_ignored: catch all */tests/* dirs, test files, __init__.py - Restrict BUILD whitelist to /tests/ directories only - Handle URLError and JSONDecodeError from Codecov fetch - Fail closed when gh pr list fails (prevent duplicate PRs) - Validate tier-2 replies shape before returning - Only resolve threads when inline reply post succeeds - Fix dry_run to only skip PR creation, not test generation - Only trigger on last non-bot human comment (prevent stale retrigger) - Catch git add/commit failures in commit_and_push fallback path - Normalize comment_id to int before lookup (handle string IDs) - Add author association check: only OWNER/MEMBER/COLLABORATOR can trigger Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Tighten --allowedTools: Bash(bazel:*) → Bash(bazel test *), Bash(pnpm:*) → Bash(pnpm test *). Prevents bazel run, pnpm dlx, and other arbitrary command execution via prompt injection. - Pin claude-code to @2.1.91 instead of @latest to prevent supply chain attacks via npm package compromise. - Delete untracked non-test files (os.remove) instead of git checkout which silently no-ops on new files. Closes guardrail bypass where Claude could create malicious files that persist during bazel test. - Sanitize commit messages: enforce testbot: prefix, strip git trailers (Signed-off-by, Co-authored-by), cap at 500 chars. - Update TS test instruction to pnpm --dir (no cd needed). - Filter has_open_testbot_pr by --author svc-osmo-ci so dev PRs manually labeled ai-generated don't block testbot runs. - Add environment: testbot to both workflows. Requires GitHub environment configured with deployment branches main + testbot/* and a branch ruleset restricting testbot/* to maintain/admin. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
Add testbot, an automated pipeline that identifies low-coverage code via Codecov, generates tests using Claude Code, validates them, and opens PRs for human review. Includes a review-response module that addresses inline PR feedback via
/testbot.Issue #760
Architecture
Test Generation (
testbot.yaml)coverage_targets.pyguardrails.pycreate_pr.pyai-generatedlabelClaude Code is sandboxed: it can only read files, edit test files, and run test commands. It cannot run
git,gh, or modify source code — all git and GitHub operations are in deterministic harness scripts.Review Response (
testbot-respond.yaml)/testboton inline review threads ofai-generatedPRs--json-schemareturns per-comment replies with resolve verdict and commit messageSecurity Boundary
bazel test/pnpm testgitcommandscreate_pr.py,respond.pyghcommandscreate_pr.py,respond.pyguardrails.pyKey Design Decisions
gitorgh— prevents prompt injection from modifying version control or creating unauthorized PRsguardrails.pydiscards any non-test changes, enforced at commit time regardless of what Claude doesinference-api.nvidia.comusing existingNVIDIA_NIM_KEYbazel coverage; leverages existing CI uploads--json-schemafor respond replies/testbottrigger (must be first text in comment)Configurable Parameters
testbot.yaml(dispatch inputs)max_targets1max_uncovered300max_turns50timeout_minutes30modelaws/anthropic/claude-opus-4-5dry_runfalsetestbot-respond.yaml(env vars)TESTBOT_MAX_TURNS50TESTBOT_MAX_RESPONSES10ANTHROPIC_MODELaws/anthropic/claude-opus-4-5Testing
/testbotcomment on PR [testbot] Add tests for date-range-utils.test.ts #794 → fix committed, inline reply posted, thread resolvedFiles
Checklist
Summary by CodeRabbit
Release Notes
New Features
/testbotcomments in pull request reviewsDocumentation