feat(ai-builder): Improve eval verifier and mock handler reliability (no-changelog) by JoseBra · Pull Request #28255 · n8n-io/n8n

JoseBra · 2026-04-09T14:02:24Z

Summary

Use structured output on the verifier to eliminate "No verification result" failures — response is now schema-validated instead of parsed from free-form text
Add retry logic to the mock handler with configurable maxRetries (default 1) to recover from transient LLM failures
Improve failure categorization — verifier now traces data flow before attributing failures, correctly distinguishing builder logic errors from mock data issues
Remove hardcoded maxTokens overrides, let the AI SDK use model defaults
Make test case scenarios less brittle (remove exact count expectations, use minimal response hints)

Related Linear ticket

Test plan

Mock handler tests: 32/32 pass
Typecheck clean (instance-ai)
Full suite run: zero "No verification result", 1 mock_issue out of 27 scenarios

🤖 Generated with Claude Code

…(no-changelog) - Use structured output on the verifier to eliminate "No verification result" failures — the LLM response is now schema-validated instead of parsed from free-form text - Add retry logic to the mock handler with configurable maxRetries (default 1) to recover from transient LLM failures - Improve failure categorization in the verifier prompt — trace data flow before attributing failures, distinguishing builder logic errors from mock data issues - Remove hardcoded maxTokens overrides, let the AI SDK use model defaults - Make test case scenarios less brittle: remove exact count expectations from rest-api-data-pipeline, use minimal Notion response hints Ref: https://linear.app/n8n/issue/TRUST-34 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

codecov · 2026-04-09T14:06:17Z

Codecov Report

❌ Patch coverage is 92.30769% with 1 line in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
...s/cli/src/modules/instance-ai/eval/mock-handler.ts	92.30%	1 Missing ⚠️

📢 Thoughts on this report? Let us know!

cubic-dev-ai

2 issues found across 5 files

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/@n8n/instance-ai/evaluations/data/workflows/rest-api-data-pipeline.json">

<violation number="1" location="packages/@n8n/instance-ai/evaluations/data/workflows/rest-api-data-pipeline.json:11">
P2: The happy-path success criteria dropped verification of the required post count, so this test can pass even if the workflow misses part of the requested Slack summary.</violation>
</file>

<file name="packages/@n8n/instance-ai/evaluations/checklist/verifier.ts">

<violation number="1" location="packages/@n8n/instance-ai/evaluations/checklist/verifier.ts:12">
P1: Structured output schema does not match the verifier prompt contract (array + nullable fields), so valid model responses can be rejected and still fall back to missing verification results.</violation>
</file>

Architecture diagram

sequenceDiagram
    participant Runner as Eval Runner
    participant Mock as Mock Handler
    participant LLM as AI Provider (SDK)
    participant Verifier as Eval Verifier

    Note over Runner,Verifier: Phase 1: Workflow Execution (Mocking)

    Runner->>Mock: interceptRequest(node, context)
    
    loop NEW: Retry Logic (maxRetries)
        Mock->>LLM: generate(userPrompt)
        Note right of LLM: CHANGED: Using model default maxTokens
        alt Success
            LLM-->>Mock: JSON specification
            Mock->>Mock: materializeSpec()
        else LLM Error / Timeout
            LLM-->>Mock: Transient Failure
            Note over Mock: Log warning & decrement retries
        end
    end

    alt All Retries Failed
        Mock-->>Runner: Return _evalMockError object
    else Success
        Mock-->>Runner: Return mocked response body/status
    end

    Note over Runner,Verifier: Phase 2: Result Verification

    Runner->>Verifier: verify(artifact, checklist)
    
    Verifier->>LLM: NEW: structuredOutput(schema)
    Note right of LLM: Uses checklistResultSchema (Zod)
    
    LLM-->>Verifier: Validated JSON Object
    
    alt Schema Validation Success
        Verifier->>Verifier: CHANGED: Trace data flow
        Note right of Verifier: Categorize: builder_issue vs mock_issue
        Verifier-->>Runner: List of ChecklistResult
    else NEW: Schema Validation Failure (null)
        Verifier->>Verifier: Log warning (No results)
        Verifier-->>Runner: Empty results array
    end

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.}

packages/@n8n/instance-ai/evaluations/checklist/verifier.ts

packages/@n8n/instance-ai/evaluations/data/workflows/rest-api-data-pipeline.json

…changelog) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

n8n-assistant bot added core Enhancement outside /nodes-base and /editor-ui n8n team Authored by the n8n team labels Apr 9, 2026

JoseBra marked this pull request as ready for review April 9, 2026 18:54

JoseBra requested review from a team and schrothbn and removed request for a team April 9, 2026 18:55

cubic-dev-ai bot reviewed Apr 9, 2026

View reviewed changes

packages/@n8n/instance-ai/evaluations/checklist/verifier.ts Show resolved Hide resolved

packages/@n8n/instance-ai/evaluations/data/workflows/rest-api-data-pipeline.json Outdated Show resolved Hide resolved

fix(ai-builder): Include post count in rest-api success criteria (no-…

f7196c4

…changelog) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ai-builder): Improve eval verifier and mock handler reliability (no-changelog)#28255

feat(ai-builder): Improve eval verifier and mock handler reliability (no-changelog)#28255
JoseBra wants to merge 2 commits intomasterfrom
trust-34-improve-llm-response-parsing-in-eval-verifier-and-mock

JoseBra commented Apr 9, 2026

Uh oh!

codecov bot commented Apr 9, 2026 •

edited

Loading

Uh oh!

cubic-dev-ai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

JoseBra commented Apr 9, 2026

Summary

Related Linear ticket

Test plan

Uh oh!

codecov bot commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

codecov bot commented Apr 9, 2026 •

edited

Loading