Skip to content

fix(prompt): Extract display value from prompt result options for grading#148

Merged
mattbrailsford merged 2 commits into
devfrom
feature/fix-prompt-test-grader-output
May 5, 2026
Merged

fix(prompt): Extract display value from prompt result options for grading#148
mattbrailsford merged 2 commits into
devfrom
feature/fix-prompt-test-grader-output

Conversation

@mattbrailsford
Copy link
Copy Markdown
Contributor

Summary

  • Fixes [Tests] Prompt test graders evaluate raw JSON output instead of plain text value #142 — prompt test graders evaluated the raw structured-output JSON envelope ({"value":"..."}) instead of the unwrapped text, causing the Regex grader to fail length checks and the LLM Judge to throw The JSON value could not be converted to System.Double when scoring.
  • PromptTestFeature now overrides ExtractOutputValue to prefer resultOptions[].displayValue from the transcript's FinalOutput, joining multi-option responses with newlines and falling back to content when no options are present (preserves OptionCount == 0 and error transcripts).
  • Adds PromptTestFeatureTests covering single option, multiple options, no options, missing resultOptions, and error-shape transcripts.

Test plan

  • dotnet test Umbraco.AI.Prompt/Umbraco.AI.Prompt.slnx — 60/60 pass (5 new)
  • dotnet test Umbraco.AI/Umbraco.AI.slnx — 714 unit + 25 integration pass
  • Manual: create a Single Option prompt, run a test with the regex ^[\s\S]{1,160}$ and an LLM Judge, confirm graders evaluate the unwrapped text

🤖 Generated with Claude Code

mattbrailsford and others added 2 commits May 5, 2026 10:28
…ding

Prompts with Single Option / Multiple Options return a structured-output JSON
envelope on result.Content (e.g. {"value":"..."}). The unwrapped text lives on
ResultOptions[].DisplayValue. The default ExtractOutputValue read the raw JSON
back out of FinalOutput.content, so graders evaluated the envelope instead of
the actual generated text - the regex grader failed on the JSON wrapper, and
the LLM judge embedded the JSON in its prompt and produced unparseable scores.

PromptTestFeature now overrides ExtractOutputValue to prefer
resultOptions[].displayValue (joined with newlines for multi-option), falling
back to content when no options are present.

Fixes #142

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces the anonymous-object writes and the property-name-based reads in
PromptTestFeature with a single FinalOutputEnvelope type. Now both
ExecuteAsync (write) and ExtractOutputValue (read) reference the same
properties, so a future rename can't desynchronise serialisation from
extraction.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@mattbrailsford mattbrailsford merged commit 81e5250 into dev May 5, 2026
0 of 4 checks passed
@mattbrailsford mattbrailsford deleted the feature/fix-prompt-test-grader-output branch May 5, 2026 11:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Tests] Prompt test graders evaluate raw JSON output instead of plain text value

1 participant