feat(ai-builder): Improve eval verifier and mock handler reliability (no-changelog)#28255
Open
feat(ai-builder): Improve eval verifier and mock handler reliability (no-changelog)#28255
Conversation
…(no-changelog) - Use structured output on the verifier to eliminate "No verification result" failures — the LLM response is now schema-validated instead of parsed from free-form text - Add retry logic to the mock handler with configurable maxRetries (default 1) to recover from transient LLM failures - Improve failure categorization in the verifier prompt — trace data flow before attributing failures, distinguishing builder logic errors from mock data issues - Remove hardcoded maxTokens overrides, let the AI SDK use model defaults - Make test case scenarios less brittle: remove exact count expectations from rest-api-data-pipeline, use minimal Notion response hints Ref: https://linear.app/n8n/issue/TRUST-34 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
Contributor
There was a problem hiding this comment.
2 issues found across 5 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="packages/@n8n/instance-ai/evaluations/data/workflows/rest-api-data-pipeline.json">
<violation number="1" location="packages/@n8n/instance-ai/evaluations/data/workflows/rest-api-data-pipeline.json:11">
P2: The happy-path success criteria dropped verification of the required post count, so this test can pass even if the workflow misses part of the requested Slack summary.</violation>
</file>
<file name="packages/@n8n/instance-ai/evaluations/checklist/verifier.ts">
<violation number="1" location="packages/@n8n/instance-ai/evaluations/checklist/verifier.ts:12">
P1: Structured output schema does not match the verifier prompt contract (array + nullable fields), so valid model responses can be rejected and still fall back to missing verification results.</violation>
</file>
Architecture diagram
sequenceDiagram
participant Runner as Eval Runner
participant Mock as Mock Handler
participant LLM as AI Provider (SDK)
participant Verifier as Eval Verifier
Note over Runner,Verifier: Phase 1: Workflow Execution (Mocking)
Runner->>Mock: interceptRequest(node, context)
loop NEW: Retry Logic (maxRetries)
Mock->>LLM: generate(userPrompt)
Note right of LLM: CHANGED: Using model default maxTokens
alt Success
LLM-->>Mock: JSON specification
Mock->>Mock: materializeSpec()
else LLM Error / Timeout
LLM-->>Mock: Transient Failure
Note over Mock: Log warning & decrement retries
end
end
alt All Retries Failed
Mock-->>Runner: Return _evalMockError object
else Success
Mock-->>Runner: Return mocked response body/status
end
Note over Runner,Verifier: Phase 2: Result Verification
Runner->>Verifier: verify(artifact, checklist)
Verifier->>LLM: NEW: structuredOutput(schema)
Note right of LLM: Uses checklistResultSchema (Zod)
LLM-->>Verifier: Validated JSON Object
alt Schema Validation Success
Verifier->>Verifier: CHANGED: Trace data flow
Note right of Verifier: Categorize: builder_issue vs mock_issue
Verifier-->>Runner: List of ChecklistResult
else NEW: Schema Validation Failure (null)
Verifier->>Verifier: Log warning (No results)
Verifier-->>Runner: Empty results array
end
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.
packages/@n8n/instance-ai/evaluations/data/workflows/rest-api-data-pipeline.json
Outdated
Show resolved
Hide resolved
…changelog) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
maxRetries(default 1) to recover from transient LLM failuresmaxTokensoverrides, let the AI SDK use model defaultsRelated Linear ticket
https://linear.app/n8n/issue/TRUST-34
Test plan
🤖 Generated with Claude Code