Parse the goal reviewer verdict from the last line of its reply by AAraKKe · Pull Request #24262 · DataDog/integrations-core

AAraKKe · 2026-06-30T14:34:51Z

What does this PR do?

Parses the goal reviewer's verdict from the last line of its reply instead of requiring the entire reply to be a bare JSON object.

The reviewer agent commonly writes its reasoning as prose and then emits the {"valid": ..., "reason": ...} verdict at the end. parse_reviewer_output previously ran json.loads() on the whole reply, which failed whenever any prose preceded the JSON. That failure triggered an extra "reply with only JSON" parse-retry turn on nearly every check — wasted tokens and latency.

Changes:

parse_reviewer_output now reads the last non-empty line as the verdict, falling back to the whole reply (covers a pure-JSON or pretty-printed response).
The reviewer system prompt and the parse-retry prompt now instruct the agent to emit the verdict as a single-line JSON object on the last line, with prose allowed before it.

Motivation

The goal reviewer was failing JSON parsing and re-running on essentially every check, because its natural output is "reasoning, then verdict". Reading just the final line matches how the model actually responds and removes the spurious retries.

Review checklist (to be filled by reviewers)

Feature or bugfix MUST have appropriate tests (unit, integration, e2e)
Add qa/required if this PR needs QA validation, or qa/skip-qa if it does not. Exactly one of the two is required.
If you need to backport this PR to another branch, you can add the backport/<branch-name> label to the PR and it will automatically open a backport PR once this one is merged

The reviewer often writes its reasoning as prose and then the JSON verdict at the end. parse_reviewer_output previously json.loads()'d the whole reply, which failed on any preceding prose and triggered a wasteful parse-retry turn. Now parse the last non-empty line as the verdict (falling back to the whole reply for a pure-JSON response), and instruct the reviewer to emit the verdict as a single-line JSON object on the last line.

dd-octo-sts · 2026-06-30T14:41:31Z

Validation Report

All 21 validations passed.

Show details

Validation	Description	Status
`agent-reqs`	Verify check versions match the Agent requirements file	✅
`ci`	Validate CI configuration and code coverage settings	✅
`codeowners`	Validate every integration has a CODEOWNERS entry	✅
`config`	Validate default configuration files against spec.yaml	✅
`dep`	Verify dependency pins are consistent and Agent-compatible	✅
`http`	Validate integrations use the HTTP wrapper correctly	✅
`imports`	Validate check imports do not use deprecated modules	✅
`integration-style`	Validate check code style conventions	✅
`jmx-metrics`	Validate JMX metrics definition files and config	✅
`labeler`	Validate PR labeler config matches integration directories	✅
`legacy-signature`	Validate no integration uses the legacy Agent check signature	✅
`license-headers`	Validate Python files have proper license headers	✅
`licenses`	Validate third-party license attribution list	✅
`metadata`	Validate metadata.csv metric definitions	✅
`models`	Validate configuration data models match spec.yaml	✅
`openmetrics`	Validate OpenMetrics integrations disable the metric limit	✅
`package`	Validate Python package metadata and naming	✅
`qa-label`	Validate the pull request declares whether it needs QA for the next Agent release	✅
`readmes`	Validate README files have required sections	✅
`saved-views`	Validate saved view JSON file structure and fields	✅
`version`	Validate version consistency between package and changelog	✅

View full run

datadog-datadog-prod-us1 · 2026-06-30T14:43:32Z

✨ Fix all issues with BitsAI

⚠️ Warnings

🚦 8 Pipeline jobs failed

PR | test / test (linux, ubuntu-22.04, ddev, ddev on Linux) / ddev on Linux

PR | test / test (windows, windows-2022, ddev, ddev on Windows) / ddev on Windows

PR | test / test-minimum-base-package (linux, ubuntu-22.04, ddev, ddev on Linux) / minimum-base-package-ddev on Linux

View all 8 failed jobs.

🧪 2 Tests failed in 1 job

PR | run

test_from_names_multiple_native from test_registry.py

assert (&#39;web_search&#39;, &#39;web_fetch&#39;) == [&#39;web_search&#39;, &#39;web_fetch&#39;]
  
  Full diff:
  - [
  &#43; (
        &#39;web_search&#39;,
        &#39;web_fetch&#39;,
  - ]
  &#43; )

test_from_names_multiple_native from test_registry.py

assert (&#39;web_search&#39;, &#39;web_fetch&#39;) == [&#39;web_search&#39;, &#39;web_fetch&#39;]
  
  Full diff:
  - [
  &#43; (
        &#39;web_search&#39;,
        &#39;web_fetch&#39;,
  - ]
  &#43; )

ℹ️ Info

No other issues found (see more)

❄️ No new flaky tests detected

🎯 Code Coverage (details)
• Patch Coverage: 100.00%
• Overall Coverage: 88.49%

Useful? React with 👍 / 👎

_{This comment will be updated automatically if new data arrives.

🔗 Commit SHA: ecbc907 | Docs | Datadog PR Page | Give us feedback!}

AAraKKe added the qa/skip-qa Automatically skip this PR for the next QA label Jun 30, 2026

dd-octo-sts Bot added the ddev label Jun 30, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Parse the goal reviewer verdict from the last line of its reply#24262

Parse the goal reviewer verdict from the last line of its reply#24262
AAraKKe wants to merge 1 commit into
loa/openmetrics-ai-genfrom
aarakke/goal-verdict-last-line

AAraKKe commented Jun 30, 2026

Uh oh!

dd-octo-sts Bot commented Jun 30, 2026

Uh oh!

datadog-datadog-prod-us1 Bot commented Jun 30, 2026 •

edited by datadog-datadog-prod-us1-2 Bot

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

AAraKKe commented Jun 30, 2026

What does this PR do?

Motivation

Review checklist (to be filled by reviewers)

Uh oh!

dd-octo-sts Bot commented Jun 30, 2026

Validation Report

Uh oh!

datadog-datadog-prod-us1 Bot commented Jun 30, 2026 • edited by datadog-datadog-prod-us1-2 Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ Warnings

ℹ️ Info

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

datadog-datadog-prod-us1 Bot commented Jun 30, 2026 •

edited by datadog-datadog-prod-us1-2 Bot

Loading