fix(workflows): write large node outputs to temp file to prevent bash substitution corruption (fixes #1717)#1718
Conversation
… substitution corruption (coleam00#1717) When a bash node references $nodeId.output from an upstream node whose output exceeds ~32KB, inlining the full value as a bash -c argument causes silent data corruption. This adds a size threshold (NODE_OUTPUT_FILE_THRESHOLD = 32KB): outputs below it are still shell-quoted inline; outputs at or above it are written to a temp file in logDir and substituted with $(cat '<path>') so bash reads the value at runtime without argv size issues. Affected paths: executeBashNode and loop-node until_bash. Closes coleam00#1717
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
📝 WalkthroughWalkthroughAdds size-aware substitution for ChangesLarge output file-backed bash substitution
Sequence DiagramsequenceDiagram
participant Executor
participant substituteNodeOutputRefs
participant shellQuoteOrFile
participant FileSystem
participant BashSubprocess
Executor->>substituteNodeOutputRefs: prepare bash script with $nodeId.output
substituteNodeOutputRefs->>shellQuoteOrFile: value, escapedForBash=true, outputFileDir
alt value size >= 32KB
shellQuoteOrFile->>FileSystem: writeFileSync(outputFileDir/<nodeId>.<field?>.nodeoutput)
FileSystem-->>shellQuoteOrFile: file written
shellQuoteOrFile-->>substituteNodeOutputRefs: $(cat /path/to/file)
else value size < 32KB
shellQuoteOrFile-->>substituteNodeOutputRefs: shellQuote(value)
end
substituteNodeOutputRefs-->>Executor: substituted script text
Executor->>BashSubprocess: execute substituted script
BashSubprocess->>FileSystem: cat /path/to/file (if file-based)
FileSystem-->>BashSubprocess: original value bytes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related issues
Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
packages/workflows/src/dag-executor.test.ts (1)
835-836: ⚡ Quick winHarden temp-dir naming to avoid rare cross-test collisions.
Using only
Date.now()fortempDircan collide under tight scheduling and make this suite flaky. Add a random suffix (as used elsewhere in this file) for deterministic isolation.♻️ Suggested patch
- tempDir = join(tmpdir(), `archon-test-large-output-${Date.now()}`); + tempDir = join( + tmpdir(), + `archon-test-large-output-${Date.now()}-${Math.random().toString(36).slice(2)}` + );As per coding guidelines, “Prefer reproducible commands and locked dependency behavior in CI-sensitive paths; keep tests deterministic with no flaky timing or network dependence without guardrails.”
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/workflows/src/dag-executor.test.ts` around lines 835 - 836, The tempDir name construction using Date.now() alone (where tempDir is set via join(tmpdir(), `archon-test-large-output-${Date.now()}`) and then created with mkdir) can collide; change the naming to include a short random or unique suffix (reuse the same random-suffix pattern used elsewhere in this test file — e.g., append a crypto/random or Math.random()-derived token or process.pid) so the tempDir becomes `archon-test-large-output-${Date.now()}-<random>` before calling mkdir to guarantee isolation and avoid flaky cross-test collisions.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@packages/workflows/src/dag-executor.test.ts`:
- Around line 835-836: The tempDir name construction using Date.now() alone
(where tempDir is set via join(tmpdir(),
`archon-test-large-output-${Date.now()}`) and then created with mkdir) can
collide; change the naming to include a short random or unique suffix (reuse the
same random-suffix pattern used elsewhere in this test file — e.g., append a
crypto/random or Math.random()-derived token or process.pid) so the tempDir
becomes `archon-test-large-output-${Date.now()}-<random>` before calling mkdir
to guarantee isolation and avoid flaky cross-test collisions.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 79bf9f1f-532c-486f-91e3-15955fa5b8b0
📒 Files selected for processing (2)
packages/workflows/src/dag-executor.test.tspackages/workflows/src/dag-executor.ts
Review SummaryVerdict: blocking-issues Your change adds a file-spill mechanism for large node outputs (≥32KB) in bash scripts, preventing silent data corruption when shell argument limits are exceeded. The implementation is well-structured and the threshold rationale is clearly documented. One critical bug and two items need to be addressed before merge. Blocking issues
Suggested fixes
Minor / nice-to-have
Compliments
Reviewed via maintainer-review-pr workflow (Pi/Minimax). Aspects run: code-review, error-handling, test-coverage, comment-quality, docs-impact. |
… fallback Address review feedback from @Wirasm: - Wrap writeFileSync in try/catch so disk-full or permission errors produce a structured log instead of an unhandled exception - Fall back to inline shell-quoting on failure (pre-file-spill behavior) - Add test for fallback path using a non-existent directory Signed-off-by: kagura-agent <kagura.agent.ai@gmail.com>
|
Thanks for the thorough review @Wirasm! Addressed the blocking issue: writeFileSync try/catch — Wrapped in try/catch with structured logging via Also confirmed the existing test suite (all tests in Re the suggested fixes:
|
Summary
$nodeId.outputfrom an upstream LLM node whose output is large (~42KB+), the substituted value is silently corrupted because the full text is inlined as abash -cargument, hitting process argument passing limits.maintainer-standupworkflow (and any workflow with large intermediate outputs) fails silently — the downstream consumer gets garbled data and raises parse errors, despite working fine with the same input directly.NODE_OUTPUT_FILE_THRESHOLD = 32KB). Outputs below this continue to be shell-quoted inline (existing behavior). Outputs at or above this are written to a temp file inlogDirand substituted with$(cat '<path>')— bash reads the value at runtime via command substitution, bypassing the argv size issue entirely.escapedForBash=falseand never hit the file path. Small outputs (<32KB) still inline as before.UX Journey
Before
After
Architecture Diagram
Before
After
Connection inventory:
Validation Evidence
bun test packages/workflows/src/dag-executor.test.ts— 236 pass, 0 failbun run type-check— all 5 packages cleanSecurity Impact
logDir(already under the run's artifact directory) — no new paths introducedshellQuoteis used on the file path in the$(cat ...)command to prevent path injectionCompatibility/Migration
.nodeoutputextension and are scoped to the run's logDirRisks and Mitigations
Rollback Plan
Revert the commit — the only behavioral change is the file-based path for >32KB outputs.
Closes #1717
Summary by CodeRabbit
New Features
Bug Fixes
Tests