fix(eval): prevent RangeError on large eval table payloads#7716
fix(eval): prevent RangeError on large eval table payloads#7716
Conversation
Strip large/redundant fields from table cell payloads to prevent RangeError crashes from JSON.stringify on evals with base64 images. - Add trimTableCellForApi() that strips: rendered prompt (set to ''), response (keep only cached/tokenUsage/prompt), testCase (keep only provider), and unnecessary spread fields from EvalResult - Apply trimming in GET /:id/table before building the response - Strip config.tests from table response (unused by frontend) - Add GET /:evalId/results/:resultId/detail endpoint for on-demand fetching of full prompt, response, and testCase data Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The table endpoint now strips per-cell prompt content to reduce payload size. Update the frontend to fetch full prompt data on demand when the user opens the prompt dialog. - Add fetchCellDetail() API function - Add lazy-loading state in EvalOutputCell with on-demand fetch - Always show prompt dialog button (prompt data loads on click) - Pass testVars prop from ResultsTable for variable display - Use output.text as fallback for image alt text Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
getPromptsWithPredicate and getTestCasesWithPredicate called
eval_.toResultsFile() which loads all results into memory, but only
config was ever accessed. Use { config: eval_.config } directly.
Also catch RangeError in CLI export when outputting large evals to
console, showing the exact command to export to a file instead.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Test that table cells have prompt stripped, spread fields removed, and response trimmed to essential fields only - Test that config.tests is stripped from table response - Test detail endpoint returns full prompt/response/testCase - Test detail endpoint returns 404 for missing or wrong-eval results - Unit tests for trimTableCellForApi covering all field handling Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
👍 All Clear
I reviewed this PR for LLM security vulnerabilities focusing on the six critical vulnerability classes (Prompt Injection, Data Exfiltration, PII/Secrets in Prompts, Insecure Output Handling, Excessive Agency, and Jailbreak Risks). This PR is a performance optimization that reduces API payload sizes and adds lazy-loading for evaluation results. No LLM security vulnerabilities were identified.
Minimum severity threshold: 🟡 Medium | To re-scan after changes, comment @promptfoo-scanner
Learn more
1. Remove eval ownership check from detail endpoint — in comparison
mode the frontend passes the base eval ID but comparison results
belong to a different eval, causing 404. Result IDs are unique UUIDs.
2. Reset cellDetail state when output changes to prevent stale prompt
content when React reuses a component instance for a different row.
3. Auto-fetch prompt when "Show Prompts" is toggled on and prompt was
stripped, so inline prompts appear without clicking each cell.
4. Replace `{} as any` with `{} as AtomicTestCase` for proper typing
(AtomicTestCase has all optional fields so `{}` is valid).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1. Restore eval ownership check in detail endpoint. Pass evalId through trimmed cells (from ...result spread) so the frontend uses the correct evalId for comparison-mode cells instead of the base eval ID. 2. Strip response.prompt from trimmed cells — for multimodal providers this can contain base64 images duplicated across every cell. The providerPrompt in the dialog now reads from cellDetail?.response. 3. Add stale-request cancellation to the showPrompts auto-fetch effect using the cleanup pattern (let cancelled = false; return () => ...). Prevents older responses from winning races during rapid updates. 4. Update tests: restore wrong-eval 404, verify evalId is preserved, verify response.prompt is stripped. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
TypeScript's strict mode rejects direct cast from a typed interface to Record<string, unknown>. Cast through `unknown` first. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The providerPrompt now reads from cellDetail?.response but the state type was missing the response field. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
✅ Generated 14 tests - 14 passed (e4c52cb) View tests ↗Test Summary
ResultsTusk's tests are all passing and validate the core changes in this PR: lazy-loading of cell details via the new 📈 Coverage gainsLine coverage - avg 11% gain for 2 files
Coverage is calculated by running tests directly associated with each source file, learn more here. Branch coverage - avg 9% gain for 2 files
Coverage is calculated by running tests directly associated with each source file, learn more here. View check history
Was Tusk helpful? Give feedback by reacting with 👍 or 👎 |
Security Review ✅No critical issues found. The changes properly validate inputs via Zod schemas, use parameterized Drizzle ORM queries (no SQL injection risk), and enforce eval ownership on the new detail endpoint to prevent IDOR. 🟡 Minor Observations (3 items)
Last updated: 2026-02-16 | Reviewing: e4c52cb |
There was a problem hiding this comment.
👍 All Clear
This PR introduces lazy loading for evaluation result data to optimize payload sizes. The security review found no LLM-specific vulnerabilities - the changes are purely infrastructure-level optimizations for data retrieval and display that don't affect LLM interactions.
Minimum severity threshold: 🟡 Medium | To re-scan after changes, comment @promptfoo-scanner
Learn more
📝 WalkthroughWalkthroughThe PR introduces lazy-loaded cell detail retrieval to reduce API payload sizes. A new GET endpoint returns full prompt, response, and testCase details for specific result cells on demand. The table API response is trimmed via a new Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related issues
🚥 Pre-merge checks | ✅ 2 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
⚔️ Resolve merge conflicts (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@src/app/src/pages/eval/components/EvalOutputCell.tsx`:
- Around line 186-211: The useEffect that auto-fetches prompt details
(React.useEffect) can leave loadingDetail stuck because the cleanup only sets
cancelled=true and the async .then never runs setLoadingDetail(false); update
the effect to use async/await inside an inner async function called (e.g.,
fetchDetailAsync) that awaits fetchCellDetail(detailEvalId, output.id), and in
the cleanup always call setLoadingDetail(false) as well as flip the cancelled
flag; ensure you still only call setCellDetail(detail) when not cancelled and
that the effect dependencies remain [showPrompts, cellDetail, loadingDetail,
output.prompt, output.id, detailEvalId].
In `@src/server/routes/eval.ts`:
- Around line 353-372: Wrap the async route handler for
evalRouter.get('/:evalId/results/:resultId/detail') in a try/catch; validate
params using EvalSchemas.ResultDetail.Params.parse as currently done, but if
parse throws return res.status(400).json({ success: false, error: '<validation
message>' }); after DB lookup (EvalResult.findById) return 404 via
res.status(404).json({ success: false, error: 'Result not found' }) when missing
or mismatched, and on success return res.json({ success: true, data: { prompt:
result.prompt.raw, response: result.response, testCase: result.testCase } });
catch any other errors and return res.status(500).json({ success: false, error:
String(err) }) to ensure all responses follow the {success, data/error} contract
and Zod/DB errors are handled.
| // Auto-fetch prompt when "Show Prompts" is toggled on and prompt was stripped. | ||
| // Uses cleanup function to cancel stale responses during rapid cell/toggle changes. | ||
| React.useEffect(() => { | ||
| if ( | ||
| showPrompts && | ||
| !cellDetail && | ||
| !loadingDetail && | ||
| !output.prompt && | ||
| output.id && | ||
| detailEvalId | ||
| ) { | ||
| let cancelled = false; | ||
| setLoadingDetail(true); | ||
| fetchCellDetail(detailEvalId, output.id).then((detail) => { | ||
| if (!cancelled) { | ||
| if (detail) { | ||
| setCellDetail(detail); | ||
| } | ||
| setLoadingDetail(false); | ||
| } | ||
| }); | ||
| return () => { | ||
| cancelled = true; | ||
| }; | ||
| } | ||
| }, [showPrompts, cellDetail, loadingDetail, output.prompt, output.id, detailEvalId]); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Check if the file exists and read the specific lines
wc -l src/app/src/pages/eval/components/EvalOutputCell.tsxRepository: promptfoo/promptfoo
Length of output: 120
🏁 Script executed:
# Read the context around lines 186-211
sed -n '180,220p' src/app/src/pages/eval/components/EvalOutputCell.tsxRepository: promptfoo/promptfoo
Length of output: 1624
🏁 Script executed:
# Find the fetchCellDetail function to understand its signature
rg -n "fetchCellDetail" src/app/src/pages/eval/components/EvalOutputCell.tsx -A 2 -B 2Repository: promptfoo/promptfoo
Length of output: 714
🏁 Script executed:
# Search for fetchCellDetail definition in the codebase
rg -n "const fetchCellDetail|function fetchCellDetail|export.*fetchCellDetail" --type ts --type tsxRepository: promptfoo/promptfoo
Length of output: 90
🏁 Script executed:
# Search for fetchCellDetail definition more broadly
rg "export.*fetchCellDetail|const fetchCellDetail\s*=" --type ts -A 5Repository: promptfoo/promptfoo
Length of output: 419
🏁 Script executed:
# Also check the api utils file directly
fd -t f "api.ts" src/app/src/utils/Repository: promptfoo/promptfoo
Length of output: 87
🏁 Script executed:
# Read the api utility file
cat -n src/app/src/utils/api.ts | head -150Repository: promptfoo/promptfoo
Length of output: 3063
🏁 Script executed:
# Search for fetchCellDetail in api.ts
rg -n "fetchCellDetail" src/app/src/utils/api.ts -A 10 -B 2Repository: promptfoo/promptfoo
Length of output: 400
Reset loadingDetail in cleanup to prevent stuck loading state.
When the effect cleanup runs before the async request resolves (e.g., showPrompts toggles off), the if (!cancelled) guard prevents setLoadingDetail(false) from executing. This leaves the dialog in a perpetual loading state and blocks subsequent fetch attempts. Reset loadingDetail in the cleanup function and use async/await for consistency with coding guidelines.
🛠️ Suggested fix
React.useEffect(() => {
- if (
- showPrompts &&
- !cellDetail &&
- !loadingDetail &&
- !output.prompt &&
- output.id &&
- detailEvalId
- ) {
- let cancelled = false;
- setLoadingDetail(true);
- fetchCellDetail(detailEvalId, output.id).then((detail) => {
- if (!cancelled) {
- if (detail) {
- setCellDetail(detail);
- }
- setLoadingDetail(false);
- }
- });
- return () => {
- cancelled = true;
- };
- }
+ if (
+ !showPrompts ||
+ cellDetail ||
+ loadingDetail ||
+ output.prompt ||
+ !output.id ||
+ !detailEvalId
+ ) {
+ return;
+ }
+
+ let cancelled = false;
+ const loadDetail = async () => {
+ setLoadingDetail(true);
+ try {
+ const detail = await fetchCellDetail(detailEvalId, output.id);
+ if (!cancelled && detail) {
+ setCellDetail(detail);
+ }
+ } finally {
+ if (!cancelled) {
+ setLoadingDetail(false);
+ }
+ }
+ };
+ loadDetail();
+
+ return () => {
+ cancelled = true;
+ setLoadingDetail(false);
+ };
}, [showPrompts, cellDetail, loadingDetail, output.prompt, output.id, detailEvalId]);📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Auto-fetch prompt when "Show Prompts" is toggled on and prompt was stripped. | |
| // Uses cleanup function to cancel stale responses during rapid cell/toggle changes. | |
| React.useEffect(() => { | |
| if ( | |
| showPrompts && | |
| !cellDetail && | |
| !loadingDetail && | |
| !output.prompt && | |
| output.id && | |
| detailEvalId | |
| ) { | |
| let cancelled = false; | |
| setLoadingDetail(true); | |
| fetchCellDetail(detailEvalId, output.id).then((detail) => { | |
| if (!cancelled) { | |
| if (detail) { | |
| setCellDetail(detail); | |
| } | |
| setLoadingDetail(false); | |
| } | |
| }); | |
| return () => { | |
| cancelled = true; | |
| }; | |
| } | |
| }, [showPrompts, cellDetail, loadingDetail, output.prompt, output.id, detailEvalId]); | |
| // Auto-fetch prompt when "Show Prompts" is toggled on and prompt was stripped. | |
| // Uses cleanup function to cancel stale responses during rapid cell/toggle changes. | |
| React.useEffect(() => { | |
| if ( | |
| !showPrompts || | |
| cellDetail || | |
| loadingDetail || | |
| output.prompt || | |
| !output.id || | |
| !detailEvalId | |
| ) { | |
| return; | |
| } | |
| let cancelled = false; | |
| const loadDetail = async () => { | |
| setLoadingDetail(true); | |
| try { | |
| const detail = await fetchCellDetail(detailEvalId, output.id); | |
| if (!cancelled && detail) { | |
| setCellDetail(detail); | |
| } | |
| } finally { | |
| if (!cancelled) { | |
| setLoadingDetail(false); | |
| } | |
| } | |
| }; | |
| loadDetail(); | |
| return () => { | |
| cancelled = true; | |
| setLoadingDetail(false); | |
| }; | |
| }, [showPrompts, cellDetail, loadingDetail, output.prompt, output.id, detailEvalId]); |
🤖 Prompt for AI Agents
In `@src/app/src/pages/eval/components/EvalOutputCell.tsx` around lines 186 - 211,
The useEffect that auto-fetches prompt details (React.useEffect) can leave
loadingDetail stuck because the cleanup only sets cancelled=true and the async
.then never runs setLoadingDetail(false); update the effect to use async/await
inside an inner async function called (e.g., fetchDetailAsync) that awaits
fetchCellDetail(detailEvalId, output.id), and in the cleanup always call
setLoadingDetail(false) as well as flip the cancelled flag; ensure you still
only call setCellDetail(detail) when not cancelled and that the effect
dependencies remain [showPrompts, cellDetail, loadingDetail, output.prompt,
output.id, detailEvalId].
| // Returns the full prompt, response, and testCase for a single result cell. | ||
| // The table endpoint strips these fields to keep payloads small; the frontend | ||
| // fetches them on demand when the user clicks "Show Prompt". | ||
| evalRouter.get( | ||
| '/:evalId/results/:resultId/detail', | ||
| async (req: Request, res: Response): Promise<void> => { | ||
| const { evalId, resultId } = EvalSchemas.ResultDetail.Params.parse(req.params); | ||
|
|
||
| const result = await EvalResult.findById(resultId); | ||
| if (!result || result.evalId !== evalId) { | ||
| res.status(404).json({ error: 'Result not found' }); | ||
| return; | ||
| } | ||
|
|
||
| res.json({ | ||
| prompt: result.prompt.raw, | ||
| response: result.response, | ||
| testCase: result.testCase, | ||
| }); | ||
| }, |
There was a problem hiding this comment.
Wrap the result-detail handler with try/catch and {success,data/error} responses.
Uncaught Zod parse or DB errors will bubble, and the response shape doesn’t match the API contract.
🛠 Suggested fix
evalRouter.get(
'/:evalId/results/:resultId/detail',
async (req: Request, res: Response): Promise<void> => {
- const { evalId, resultId } = EvalSchemas.ResultDetail.Params.parse(req.params);
-
- const result = await EvalResult.findById(resultId);
- if (!result || result.evalId !== evalId) {
- res.status(404).json({ error: 'Result not found' });
- return;
- }
-
- res.json({
- prompt: result.prompt.raw,
- response: result.response,
- testCase: result.testCase,
- });
+ try {
+ const { evalId, resultId } = EvalSchemas.ResultDetail.Params.parse(req.params);
+ const result = await EvalResult.findById(resultId);
+ if (!result || result.evalId !== evalId) {
+ res.status(404).json({ success: false, error: 'Result not found' });
+ return;
+ }
+
+ res.json({
+ success: true,
+ data: {
+ prompt: result.prompt.raw,
+ response: result.response,
+ testCase: result.testCase,
+ },
+ });
+ } catch (error) {
+ if (error instanceof z.ZodError) {
+ res.status(400).json({ success: false, error: z.prettifyError(error) });
+ return;
+ }
+ logger.error('[GET /:evalId/results/:resultId/detail] Failed to fetch result detail', {
+ error,
+ evalId: req.params.evalId,
+ resultId: req.params.resultId,
+ });
+ res.status(500).json({ success: false, error: 'Failed to fetch result detail' });
+ }
},
);As per coding guidelines, "src/server/routes/**/*.{ts,tsx}: Validate requests with Zod schemas from src/types/api/, wrap all responses in { success, data/error } format, handle errors with try-catch blocks in async route handlers."
🤖 Prompt for AI Agents
In `@src/server/routes/eval.ts` around lines 353 - 372, Wrap the async route
handler for evalRouter.get('/:evalId/results/:resultId/detail') in a try/catch;
validate params using EvalSchemas.ResultDetail.Params.parse as currently done,
but if parse throws return res.status(400).json({ success: false, error:
'<validation message>' }); after DB lookup (EvalResult.findById) return 404 via
res.status(404).json({ success: false, error: 'Result not found' }) when missing
or mismatched, and on success return res.json({ success: true, data: { prompt:
result.prompt.raw, response: result.response, testCase: result.testCase } });
catch any other errors and return res.status(500).json({ success: false, error:
String(err) }) to ensure all responses follow the {success, data/error} contract
and Zod/DB errors are handled.
Summary
RangeErrorwhenJSON.stringifyexceeds V8's ~512MB limit (e.g. base64 images duplicated across every cell)trimTableCellForApi()utility that strips the rendered prompt,...resultspread fields (evalId,promptIdx,testIdx, etc.), and trimsresponseto onlycached/tokenUsage/promptGET /api/eval/:evalId/results/:resultId/detailendpoint so the frontend can fetch full cell data on demandconfig.testsfrom table response (unused by frontend, potentially large)database.tsjust to check config (toResultsFile()→{ config })RangeErrorinexportcommand with helpful message suggesting-oflagSupersedes #7599 — addresses the root cause (payload bloat) rather than only catching the error.
Test plan
trimTableCellForApi(strips prompt, spread fields, trims response, preserves essentials, handles edge cases)config.tests🤖 Generated with Claude Code