Skip to content

[Tests] Prompt test graders evaluate raw JSON output instead of plain text value #142

@BoletteKern

Description

@BoletteKern

Which component is this issue related to?

Umbraco.AI (Core)

Which Umbraco AI version are you using? (Please write the exact version, example: 10.1.0)

17.2.0

Bug summary

When running a prompt test, the graders evaluate the raw JSON output {"value": "..."} instead of extracting the plain text value before evaluation. This causes both Regex and LLM Judge graders to fail even when the actual generated content is correct.

Specifics

No response

Steps to reproduce

  1. Create a prompt with Result Type set to Single Option
  2. Create a test targeting that prompt
  3. Add a Regex grader to check output length (e.g. ^[\s\S]{1,160}$)
  4. Add an LLM Judge grader to check compelling search results (eg Evaluation criteria: The meta description should be written in natural language, be relevant to the page topic, and read as something a real user would find compelling in a search result. It should not be generic, robotic, or simply repeat the page title.) and a pass threshold set to 70.0.
  5. Run the test

Expected result / actual result

Expected result: Graders should evaluate the plain text value extracted from the prompt output.

Actual result: Graders receive the full raw JSON object {"value": "..."} instead of the plain text string inside it.

Regex grader fails because the full JSON string exceeds the character limit
LLM Judge grader throws: Grader execution failed: The JSON value could not not be converted to System.Double | LineNumber: 0 | BytePositionInLine: 306

Workaround: For the Regex grader, the pattern can be adjusted to match inside the JSON wrapper:
"value":\s*"[\s\S]{1,160}"

Dependencies

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions