|
2 | 2 | sonnet: |
3 | 3 | provider: anthropic |
4 | 4 | model: claude-sonnet-4-5 |
| 5 | + temperature: 0.0 |
5 | 6 | max_tokens: 8192 |
6 | 7 | haiku: |
7 | 8 | provider: anthropic |
@@ -346,17 +347,37 @@ agents: |
346 | 347 | goroutine leaks, range var capture in closures, mutex copied by value, |
347 | 348 | context not propagated, channel deadlocks, panic in library code |
348 | 349 |
|
| 350 | + ## When in Doubt |
| 351 | +
|
| 352 | + Err on the side of reporting. A finding that a human reviewer dismisses costs them |
| 353 | + seconds. A missed finding that reaches production can cost much more. When uncertain |
| 354 | + about whether something is a real issue, report it at medium severity and note your |
| 355 | + uncertainty in the `details` field. |
| 356 | +
|
| 357 | + However, "when in doubt" does not mean "invent scenarios." You must be able to describe |
| 358 | + a concrete trigger path in production code. Do not flag: |
| 359 | + - Test-only code paths or standard testing patterns (mocking, stubbing, test doubles) |
| 360 | + - Variables that are only mutated in test files |
| 361 | + - Hypothetical issues that require ignoring the Ignore list above |
| 362 | +
|
349 | 363 | ## Ignore |
350 | 364 |
|
351 | | - Style, formatting, naming, documentation, test files. |
| 365 | + Style, formatting, naming, documentation, test files (files ending in `_test.go`, |
| 366 | + `*.test.ts`, `*.spec.js`, `test_*.py`, or in `__tests__`/`tests`/`test` directories). |
352 | 367 | Existing code not changed in this PR. |
353 | 368 | Missing imports/undefined references unless confirmed missing via `read_file`. |
| 369 | + Standard testing patterns: overriding package-level variables for test doubles, |
| 370 | + monkey-patching, mocking, stubbing — even when the variable is declared in production |
| 371 | + code, if it is only mutated in test files it is not a production concurrency bug. |
354 | 372 |
|
355 | 373 | ## Severity |
356 | 374 |
|
357 | | - - **high**: WILL cause harm — data loss, security breach, crash, outage. Rare. |
358 | | - - **medium**: COULD cause issues — race conditions, resource leaks, edge cases. Default. |
359 | | - - **low**: Code smells. Rarely report. |
| 375 | + - **high**: WILL cause harm or HAS no visible mitigation — data loss, security vulnerabilities, |
| 376 | + crashes, outages. All `security` category findings are high unless the diff contains |
| 377 | + explicit validation/sanitization. Do not assume external systems validate inputs. |
| 378 | + - **medium**: COULD cause issues under specific conditions — race conditions, resource leaks, |
| 379 | + edge cases, error handling gaps. |
| 380 | + - **low**: Code smells, minor inefficiencies. Rarely report. |
360 | 381 |
|
361 | 382 | ## Output |
362 | 383 |
|
@@ -445,11 +466,26 @@ agents: |
445 | 466 | etc.) before confirming that an API or language feature doesn't exist. |
446 | 467 |
|
447 | 468 | Verify each one independently. |
448 | | - Your job is to filter out false positives. For each finding, check if: |
| 469 | + Your job is to verify findings, not to filter them out. Default to LIKELY unless you have |
| 470 | + concrete evidence to DISMISS. For each finding: |
449 | 471 | - **THE CODE IS ACTUALLY CHANGED IN THIS PR** (if not, DISMISS immediately) |
450 | | - - The bug can actually happen given the surrounding code |
451 | | - - Existing safeguards already prevent it |
452 | | - - Tests cover this case |
| 472 | + - Can you find explicit safeguards in the diff or source files that prevent the bug? |
| 473 | + Vague reasoning like "the caller probably validates" is NOT grounds for dismissal. |
| 474 | + - Do tests in the diff specifically cover this edge case? General test existence is not enough. |
| 475 | +
|
| 476 | + **DISMISS requires proof.** You must cite the specific code (file + line) that prevents |
| 477 | + the bug. If you cannot point to concrete mitigation, the verdict is LIKELY at minimum. |
| 478 | +
|
| 479 | + **Security findings have a higher bar for dismissal.** Only DISMISS a security finding |
| 480 | + if you can show the exact validation/sanitization code that mitigates it. Do not assume |
| 481 | + that external systems, gateways, or callers provide validation you cannot see. |
| 482 | +
|
| 483 | + **DISMISS test-only patterns.** If a finding is about code in a test file, or if the |
| 484 | + only "trigger" for the bug is test code (e.g., a variable reassigned only in tests, |
| 485 | + monkey-patching, test doubles, mocking), DISMISS it. Standard testing patterns like |
| 486 | + overriding a package-level function variable in a test with cleanup are not production |
| 487 | + bugs. The drafter's Ignore list excludes test files, so these should not reach you — |
| 488 | + but if they do, dismiss them. |
453 | 489 |
|
454 | 490 | ## Reading Files for Context |
455 | 491 |
|
@@ -481,9 +517,10 @@ agents: |
481 | 517 | array. Return one verdict per finding you were given. For each verdict: |
482 | 518 |
|
483 | 519 | - `verdict`: One of `"CONFIRMED"`, `"LIKELY"`, `"DISMISSED"` |
484 | | - - CONFIRMED: Definitely a bug in changed code |
485 | | - - LIKELY: Probably a bug in changed code |
486 | | - - DISMISSED: Not a bug OR not in changed code |
| 520 | + - CONFIRMED: Bug verified — you found no mitigation in the source code |
| 521 | + - LIKELY: Probable bug — you could not fully verify but found no evidence against it |
| 522 | + - DISMISSED: Proven not a bug — you can cite the specific code that prevents it, OR |
| 523 | + the finding is not in code changed by this PR |
487 | 524 | - `file`: Preserve the file path from the drafter's finding |
488 | 525 | - `line`: Preserve or correct the line number from the drafter's finding |
489 | 526 | - `severity`: You may adjust severity from what the drafter assigned based on full context |
|
0 commit comments