Skip to content

Parse the goal reviewer verdict from the last line of its reply#24262

Draft
AAraKKe wants to merge 1 commit into
loa/openmetrics-ai-genfrom
aarakke/goal-verdict-last-line
Draft

Parse the goal reviewer verdict from the last line of its reply#24262
AAraKKe wants to merge 1 commit into
loa/openmetrics-ai-genfrom
aarakke/goal-verdict-last-line

Conversation

@AAraKKe

@AAraKKe AAraKKe commented Jun 30, 2026

Copy link
Copy Markdown
Collaborator

What does this PR do?

Parses the goal reviewer's verdict from the last line of its reply instead of requiring the entire reply to be a bare JSON object.

The reviewer agent commonly writes its reasoning as prose and then emits the {"valid": ..., "reason": ...} verdict at the end. parse_reviewer_output previously ran json.loads() on the whole reply, which failed whenever any prose preceded the JSON. That failure triggered an extra "reply with only JSON" parse-retry turn on nearly every check — wasted tokens and latency.

Changes:

  • parse_reviewer_output now reads the last non-empty line as the verdict, falling back to the whole reply (covers a pure-JSON or pretty-printed response).
  • The reviewer system prompt and the parse-retry prompt now instruct the agent to emit the verdict as a single-line JSON object on the last line, with prose allowed before it.

Motivation

The goal reviewer was failing JSON parsing and re-running on essentially every check, because its natural output is "reasoning, then verdict". Reading just the final line matches how the model actually responds and removes the spurious retries.

Review checklist (to be filled by reviewers)

  • Feature or bugfix MUST have appropriate tests (unit, integration, e2e)
  • Add qa/required if this PR needs QA validation, or qa/skip-qa if it does not. Exactly one of the two is required.
  • If you need to backport this PR to another branch, you can add the backport/<branch-name> label to the PR and it will automatically open a backport PR once this one is merged

The reviewer often writes its reasoning as prose and then the JSON verdict at
the end. parse_reviewer_output previously json.loads()'d the whole reply, which
failed on any preceding prose and triggered a wasteful parse-retry turn.

Now parse the last non-empty line as the verdict (falling back to the whole
reply for a pure-JSON response), and instruct the reviewer to emit the verdict
as a single-line JSON object on the last line.
@AAraKKe AAraKKe added the qa/skip-qa Automatically skip this PR for the next QA label Jun 30, 2026
@dd-octo-sts dd-octo-sts Bot added the ddev label Jun 30, 2026
@dd-octo-sts

dd-octo-sts Bot commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Validation Report

All 21 validations passed.

Show details
Validation Description Status
agent-reqs Verify check versions match the Agent requirements file
ci Validate CI configuration and code coverage settings
codeowners Validate every integration has a CODEOWNERS entry
config Validate default configuration files against spec.yaml
dep Verify dependency pins are consistent and Agent-compatible
http Validate integrations use the HTTP wrapper correctly
imports Validate check imports do not use deprecated modules
integration-style Validate check code style conventions
jmx-metrics Validate JMX metrics definition files and config
labeler Validate PR labeler config matches integration directories
legacy-signature Validate no integration uses the legacy Agent check signature
license-headers Validate Python files have proper license headers
licenses Validate third-party license attribution list
metadata Validate metadata.csv metric definitions
models Validate configuration data models match spec.yaml
openmetrics Validate OpenMetrics integrations disable the metric limit
package Validate Python package metadata and naming
qa-label Validate the pull request declares whether it needs QA for the next Agent release
readmes Validate README files have required sections
saved-views Validate saved view JSON file structure and fields
version Validate version consistency between package and changelog

View full run

@datadog-datadog-prod-us1

datadog-datadog-prod-us1 Bot commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Pipelines  Tests  Code Coverage

Fix all issues with BitsAI

⚠️ Warnings

🚦 8 Pipeline jobs failed

PR | test / test (linux, ubuntu-22.04, ddev, ddev on Linux) / ddev on Linux   View in Datadog   GitHub Actions

PR | test / test (windows, windows-2022, ddev, ddev on Windows) / ddev on Windows   View in Datadog   GitHub Actions

PR | test / test-minimum-base-package (linux, ubuntu-22.04, ddev, ddev on Linux) / minimum-base-package-ddev on Linux   View in Datadog   GitHub Actions

View all 8 failed jobs.

🧪 2 Tests failed in 1 job

PR | run   GitHub Actions

test_from_names_multiple_native from test_registry.py   View in Datadog
assert (&#39;web_search&#39;, &#39;web_fetch&#39;) == [&#39;web_search&#39;, &#39;web_fetch&#39;]
  
  Full diff:
  - [
  &#43; (
        &#39;web_search&#39;,
        &#39;web_fetch&#39;,
  - ]
  &#43; )
test_from_names_multiple_native from test_registry.py   View in Datadog
assert (&#39;web_search&#39;, &#39;web_fetch&#39;) == [&#39;web_search&#39;, &#39;web_fetch&#39;]
  
  Full diff:
  - [
  &#43; (
        &#39;web_search&#39;,
        &#39;web_fetch&#39;,
  - ]
  &#43; )

ℹ️ Info

No other issues found (see more)

❄️ No new flaky tests detected

🎯 Code Coverage (details)
Patch Coverage: 100.00%
Overall Coverage: 88.49%

Useful? React with 👍 / 👎

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: ecbc907 | Docs | Datadog PR Page | Give us feedback!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ddev qa/skip-qa Automatically skip this PR for the next QA

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant