fix(prompt): Extract display value from prompt result options for grading#148
Merged
Merged
Conversation
…ding
Prompts with Single Option / Multiple Options return a structured-output JSON
envelope on result.Content (e.g. {"value":"..."}). The unwrapped text lives on
ResultOptions[].DisplayValue. The default ExtractOutputValue read the raw JSON
back out of FinalOutput.content, so graders evaluated the envelope instead of
the actual generated text - the regex grader failed on the JSON wrapper, and
the LLM judge embedded the JSON in its prompt and produced unparseable scores.
PromptTestFeature now overrides ExtractOutputValue to prefer
resultOptions[].displayValue (joined with newlines for multi-option), falling
back to content when no options are present.
Fixes #142
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces the anonymous-object writes and the property-name-based reads in PromptTestFeature with a single FinalOutputEnvelope type. Now both ExecuteAsync (write) and ExtractOutputValue (read) reference the same properties, so a future rename can't desynchronise serialisation from extraction. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
{"value":"..."}) instead of the unwrapped text, causing the Regex grader to fail length checks and the LLM Judge to throwThe JSON value could not be converted to System.Doublewhen scoring.PromptTestFeaturenow overridesExtractOutputValueto preferresultOptions[].displayValuefrom the transcript'sFinalOutput, joining multi-option responses with newlines and falling back tocontentwhen no options are present (preservesOptionCount == 0and error transcripts).PromptTestFeatureTestscovering single option, multiple options, no options, missingresultOptions, and error-shape transcripts.Test plan
dotnet test Umbraco.AI.Prompt/Umbraco.AI.Prompt.slnx— 60/60 pass (5 new)dotnet test Umbraco.AI/Umbraco.AI.slnx— 714 unit + 25 integration pass^[\s\S]{1,160}$and an LLM Judge, confirm graders evaluate the unwrapped text🤖 Generated with Claude Code