fix(workflows): write large node outputs to temp file to prevent bash substitution corruption (fixes #1717) by kagura-agent · Pull Request #1718 · coleam00/Archon

kagura-agent · 2026-05-18T08:25:36Z

Summary

Problem: When a bash node references $nodeId.output from an upstream LLM node whose output is large (~42KB+), the substituted value is silently corrupted because the full text is inlined as a bash -c argument, hitting process argument passing limits.
Why it matters: The maintainer-standup workflow (and any workflow with large intermediate outputs) fails silently — the downstream consumer gets garbled data and raises parse errors, despite working fine with the same input directly.
What changed: Added a size threshold (NODE_OUTPUT_FILE_THRESHOLD = 32KB). Outputs below this continue to be shell-quoted inline (existing behavior). Outputs at or above this are written to a temp file in logDir and substituted with $(cat '<path>') — bash reads the value at runtime via command substitution, bypassing the argv size issue entirely.
What did NOT change: Non-bash substitution paths (AI prompts, command nodes) are unaffected — they pass escapedForBash=false and never hit the file path. Small outputs (<32KB) still inline as before.

UX Journey

Before

Workflow              DAG Executor           bash -c
────────              ────────────           ───────
runs synthesize ───▶  captures 42KB output
runs persist node     substituteNodeOutputRefs
                      shellQuote(42KB) ─────▶ bash -c '<42KB inline>'
                                              ❌ data corrupted silently
                                              downstream parse fails

After

Workflow              DAG Executor           bash -c
────────              ────────────           ───────
runs synthesize ───▶  captures 42KB output
runs persist node     substituteNodeOutputRefs
                      output > 32KB → write to logDir/synthesize.nodeoutput
                      substitute with $(cat path) ─▶ bash -c '...$(cat /path)...'
                                                      ✅ cat reads full file at runtime
                                                      downstream parse succeeds

Architecture Diagram

Before

substituteNodeOutputRefs(prompt, nodeOutputs, escapedForBash)
      │
      └─▶ shellQuote(nodeOutput.output)  ← always inline, regardless of size
            │
            └─▶ embedded in bash -c argv

After

substituteNodeOutputRefs(prompt, nodeOutputs, escapedForBash, outputFileDir?)
      │
      ├─▶ output < 32KB: shellQuote(value)  ← existing behavior
      │
      └─▶ output >= 32KB AND outputFileDir:
            shellQuoteOrFile(value, nodeId, field, dir) [+]
              │
              ├─▶ writeFileSync(logDir/nodeId.nodeoutput, value)
              └─▶ returns $(cat '<path>')

Connection inventory:

From	To	Status
executeBashNode	substituteNodeOutputRefs	[~] passes logDir as 4th arg
executeLoopNode (until_bash)	substituteNodeOutputRefs	[~] passes logDir as 4th arg
substituteNodeOutputRefs	shellQuoteOrFile	[+] new helper for size-aware quoting
shellQuoteOrFile	writeFileSync + shellQuote	[+] writes file when threshold exceeded
Other callers (AI nodes, approval, cancel)	substituteNodeOutputRefs	unchanged (no outputFileDir)

Validation Evidence

bun test packages/workflows/src/dag-executor.test.ts — 236 pass, 0 fail
bun run type-check — all 5 packages clean
lint-staged passed on commit (eslint + prettier)
New tests cover: small output still inlines, large output writes to file, field access with large value, non-bash paths unaffected

Security Impact

Temp files are written to the existing logDir (already under the run's artifact directory) — no new paths introduced
Files contain workflow intermediate outputs (same data that was previously in argv) — no escalation of access
shellQuote is used on the file path in the $(cat ...) command to prevent path injection

Compatibility/Migration

Fully backward compatible: existing small outputs behave identically
No config changes or environment variables needed
Temp files use .nodeoutput extension and are scoped to the run's logDir

Risks and Mitigations

Risk	Mitigation
Disk write failure	writeFileSync will throw, caught by existing executeBashNode try/catch → node fails with clear error
File not cleaned up	Files live in logDir which is already managed by run lifecycle cleanup
$(cat) timing	File is written synchronously before exec, guaranteed to exist when bash reads it

Rollback Plan

Revert the commit — the only behavioral change is the file-based path for >32KB outputs.

Closes #1717

Summary by CodeRabbit

New Features
- Bash-based workflow steps now handle very large node outputs by writing them to files and referencing them, preventing oversized inline values from breaking scripts.
Bug Fixes
- Improved fallback behavior when file-based substitution fails — shell-quoting is used to keep scripts robust.
- Option to force full inlining preserved for cases that require it.
Tests
- Added tests covering small vs. large output handling, structured-field file naming, no-file inlining option, and file-write failure fallback.

… substitution corruption (coleam00#1717) When a bash node references $nodeId.output from an upstream node whose output exceeds ~32KB, inlining the full value as a bash -c argument causes silent data corruption. This adds a size threshold (NODE_OUTPUT_FILE_THRESHOLD = 32KB): outputs below it are still shell-quoted inline; outputs at or above it are written to a temp file in logDir and substituted with $(cat '<path>') so bash reads the value at runtime without argv size issues. Affected paths: executeBashNode and loop-node until_bash. Closes coleam00#1717

coderabbitai · 2026-05-18T08:25:49Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3143096f-e72c-4e7d-853f-f87c10bd4578

📥 Commits

Reviewing files that changed from the base of the PR and between 56caf90 and 9119dc8.

📒 Files selected for processing (2)

packages/workflows/src/dag-executor.test.ts
packages/workflows/src/dag-executor.ts

🚧 Files skipped from review as they are similar to previous changes (2)

packages/workflows/src/dag-executor.test.ts
packages/workflows/src/dag-executor.ts

📝 Walkthrough

Walkthrough

Adds size-aware substitution for $nodeId.output: when escapedForBash=true and a value is >=32KB, the value is written to an outputFileDir file and substituted as a $(cat /path) reference; substituteNodeOutputRefs now accepts an optional outputFileDir and executor paths pass logDir for bash contexts.

Changes

Large output file-backed bash substitution

Layer / File(s)	Summary
File I/O imports, threshold, shellQuoteOrFile helper `packages/workflows/src/dag-executor.ts`	Reorders imports, defines `NODE_OUTPUT_FILE_THRESHOLD` (32,768 bytes), and implements `shellQuoteOrFile(...)` which writes large values to a deterministic file under `outputFileDir` and returns `$(cat /path)`, falling back to inline shell-quoting on write errors.
substituteNodeOutputRefs signature & substitution logic `packages/workflows/src/dag-executor.ts`	Extends `substituteNodeOutputRefs(..., outputFileDir?)` and updates no-field and structured-field substitution to use `shellQuoteOrFile(...)` when `escapedForBash=true`; strings/JSON-stringified arrays/objects may spill to files, numbers/booleans remain unquoted, unknown types return empty quoted string.
Executor wiring: pass outputFileDir into bash paths `packages/workflows/src/dag-executor.ts`	Calls `substituteNodeOutputRefs(..., true, logDir)` from `executeBashNode` and passes `logDir` for `loop.until_bash`, enabling file-backed substitution for bash nodes and loop conditions.
Large output substitution test suite `packages/workflows/src/dag-executor.test.ts`	Adds tests creating a temp dir and validating small-output inlining, large-output file writing + `$(cat ...)` referencing (including field-file naming), suppression of file writes when `escapedForBash=false`, and fallback to shell-quoting when file writes fail.

Sequence Diagram

sequenceDiagram
  participant Executor
  participant substituteNodeOutputRefs
  participant shellQuoteOrFile
  participant FileSystem
  participant BashSubprocess

  Executor->>substituteNodeOutputRefs: prepare bash script with $nodeId.output
  substituteNodeOutputRefs->>shellQuoteOrFile: value, escapedForBash=true, outputFileDir
  alt value size >= 32KB
    shellQuoteOrFile->>FileSystem: writeFileSync(outputFileDir/<nodeId>.<field?>.nodeoutput)
    FileSystem-->>shellQuoteOrFile: file written
    shellQuoteOrFile-->>substituteNodeOutputRefs: $(cat /path/to/file)
  else value size < 32KB
    shellQuoteOrFile-->>substituteNodeOutputRefs: shellQuote(value)
  end
  substituteNodeOutputRefs-->>Executor: substituted script text
  Executor->>BashSubprocess: execute substituted script
  BashSubprocess->>FileSystem: cat /path/to/file (if file-based)
  FileSystem-->>BashSubprocess: original value bytes

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related issues

#1389: Both touch $nodeId.output substitution behavior and could be related to improvements in how outputs are embedded for downstream consumers.
#1132: Addresses unsafe substitution of node outputs into bash nodes; this PR implements bash-escaping and file-backed substitution that aligns with that objective.
#1585: Modifies substituteNodeOutputRefs and bash substitution paths similarly; this change extends those paths with size-aware file-backed handling.

Possibly related PRs

coleam00/Archon#1482: Related changes to handling structured $node.output.field values and JSON-stringification for substitution; this PR adds size-aware file-backed spill on top of that behavior.
coleam00/Archon#1654: Also modifies substituteNodeOutputRefs structured-output logic; the current change extends substitution to optionally spill large JSON-stringified outputs to files.
coleam00/Archon#1651: Related to bash-safe substitution patterns and env/quoting approaches for user/workflow variables; this PR provides an alternate file-backed fallback for very large values.

Poem

A rabbit nudges bytes to disk with care,
When thirty-two K is too much to wear.
It writes the spill, then whispers with glee,
"$(cat path)" brings the bytes back to me. 🐇📂

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	Title clearly summarizes the primary fix: writing large node outputs to temp files to prevent bash substitution corruption. It is specific, concise, and directly reflects the main change.
Description check	✅ Passed	Description covers most required sections: Problem, Why it matters, What changed, What did not change, UX Journey before/after, Architecture Diagram, Connection inventory, Validation Evidence, Security Impact, Compatibility, Risks & Mitigations, and Rollback Plan. Linked issue is clearly stated.
Linked Issues check	✅ Passed	PR fully addresses `#1717`'s primary objective: prevent silent corruption when substituting large $nodeId.output values into bash nodes. Implements file-based substitution (threshold-aware) with proper error handling (try/catch fallback); adds regression tests for large output handling.
Out of Scope Changes check	✅ Passed	All changes are scoped to the stated objectives: dag-executor.ts (substituteNodeOutputRefs signature update, shellQuoteOrFile helper, threshold logic), dag-executor.test.ts (new test suite), and executor invocation points (bash/loop nodes pass logDir). No unrelated refactoring or ancillary changes detected.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

packages/workflows/src/dag-executor.test.ts (1)
835-836: ⚡ Quick win

Harden temp-dir naming to avoid rare cross-test collisions.

Using only Date.now() for tempDir can collide under tight scheduling and make this suite flaky. Add a random suffix (as used elsewhere in this file) for deterministic isolation.
♻️ Suggested patch
-    tempDir = join(tmpdir(), `archon-test-large-output-${Date.now()}`);
+    tempDir = join(
+      tmpdir(),
+      `archon-test-large-output-${Date.now()}-${Math.random().toString(36).slice(2)}`
+    );
As per coding guidelines, “Prefer reproducible commands and locked dependency behavior in CI-sensitive paths; keep tests deterministic with no flaky timing or network dependence without guardrails.”
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/workflows/src/dag-executor.test.ts` around lines 835 - 836, The
tempDir name construction using Date.now() alone (where tempDir is set via
join(tmpdir(), `archon-test-large-output-${Date.now()}`) and then created with
mkdir) can collide; change the naming to include a short random or unique suffix
(reuse the same random-suffix pattern used elsewhere in this test file — e.g.,
append a crypto/random or Math.random()-derived token or process.pid) so the
tempDir becomes `archon-test-large-output-${Date.now()}-<random>` before calling
mkdir to guarantee isolation and avoid flaky cross-test collisions.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@packages/workflows/src/dag-executor.test.ts`:
- Around line 835-836: The tempDir name construction using Date.now() alone
(where tempDir is set via join(tmpdir(),
`archon-test-large-output-${Date.now()}`) and then created with mkdir) can
collide; change the naming to include a short random or unique suffix (reuse the
same random-suffix pattern used elsewhere in this test file — e.g., append a
crypto/random or Math.random()-derived token or process.pid) so the tempDir
becomes `archon-test-large-output-${Date.now()}-<random>` before calling mkdir
to guarantee isolation and avoid flaky cross-test collisions.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 79bf9f1f-532c-486f-91e3-15955fa5b8b0

📥 Commits

Reviewing files that changed from the base of the PR and between 45bc5e5 and 56caf90.

📒 Files selected for processing (2)

packages/workflows/src/dag-executor.test.ts
packages/workflows/src/dag-executor.ts

Wirasm · 2026-05-19T08:03:21Z

Review Summary

Verdict: blocking-issues

Your change adds a file-spill mechanism for large node outputs (≥32KB) in bash scripts, preventing silent data corruption when shell argument limits are exceeded. The implementation is well-structured and the threshold rationale is clearly documented. One critical bug and two items need to be addressed before merge.

Blocking issues

packages/workflows/src/dag-executor.ts:247–249 (shellQuoteOrFile): writeFileSync is not wrapped in a try/catch. If the disk is full, permissions are wrong, or any other filesystem error occurs, the throw propagates as an unhandled exception — no structured log, no user-facing message, just a crash with an opaque Node.js stack trace. Wrap it and fall back to shell-quoting on failure:
```
try {
  writeFileSync(filePath, value);
  return `$(cat ${shellQuote(filePath)})`;
} catch (fileErr) {
  const err = fileErr as Error;
  getLog().error({ err, nodeId, field, valueSize: value.length, filePath }, 'dag.large_output_file_write_failed');
  return shellQuote(value); // fallback: inline (pre-PR behavior)
}
```

Suggested fixes

packages/workflows/src/dag-executor.ts (test presence): The new describe('substituteNodeOutputRefs -- large output file substitution') block appears in the diff but could not be confirmed in the actual test file. Please ensure those tests are committed so they run in CI.
packages/workflows/src/dag-executor.ts (return type): substituteNodeOutputRefs is missing an explicit return type annotation. Per CLAUDE.md, add : string.
packages/workflows/src/dag-executor.ts:244–248 (JSDoc): Add a doc comment for the new outputFileDir parameter so future maintainers understand why it's intentionally omitted for non-bash call sites (prompt/approval/cancel nodes).

Minor / nice-to-have

JSON.parse fallback not tested with large payloads (packages/workflows/src/dag-executor.ts:300–318): The large-value path is tested via structuredOutput but not via raw JSON output. Consider adding a test that passes makeOutput('completed', JSON.stringify({ data: 'x'.repeat(33_000) })) with no structuredOutput field.
shellQuoteOrFile closing over constant: NODE_OUTPUT_FILE_THRESHOLD is read from outer scope. Low priority — fine as-is for now.
Test style (dag-executor.test.ts): new tests use sync it() with await import(...). Cosmetic; not blocking.

Compliments

The NODE_OUTPUT_FILE_THRESHOLD comment (line 1267–1270) explains the why (silent corruption risk) rather than just stating the number — exactly the kind of context that helps future maintainers.
The substituteNodeOutputRefs execution-ordering comment ("Called AFTER the standard substituteWorkflowVariables pass") documents a non-obvious dependency that would be very hard to debug if violated.
The design decision to make file-spill opt-in via outputFileDir and only wire it for bash-context nodes is sound and avoids over-engineering for non-bash call sites.

Reviewed via maintainer-review-pr workflow (Pi/Minimax). Aspects run: code-review, error-handling, test-coverage, comment-quality, docs-impact.

@Wirasm

… fallback Address review feedback from @Wirasm: - Wrap writeFileSync in try/catch so disk-full or permission errors produce a structured log instead of an unhandled exception - Fall back to inline shell-quoting on failure (pre-file-spill behavior) - Add test for fallback path using a non-existent directory Signed-off-by: kagura-agent <kagura.agent.ai@gmail.com>

kagura-agent · 2026-05-19T08:13:33Z

Thanks for the thorough review @Wirasm! Addressed the blocking issue:

writeFileSync try/catch — Wrapped in try/catch with structured logging via getLog().error() and fallback to inline shell-quoting (pre-file-spill behavior). Added a test that verifies the fallback path by using a non-existent directory.

Also confirmed the existing test suite (all tests in dag-executor.test.ts) passes cleanly — 237 tests, 0 failures.

Re the suggested fixes:

Tests are committed and running in CI ✅
Return type annotation — substituteNodeOutputRefs already has : string return type ✅
JSDoc for outputFileDir — good idea, happy to add if you'd like

coderabbitai Bot reviewed May 18, 2026

View reviewed changes

Wirasm merged commit ef5a381 into coleam00:dev May 20, 2026
4 checks passed

Wirasm mentioned this pull request May 20, 2026

Fix: bash node $nodeId.output silently corrupts large multi-KB inputs #1719

Closed

zerocool0133700-lgtm mentioned this pull request May 24, 2026

fix(web): bound tool_result output size to prevent chat UI crash on large payloads #1755

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(workflows): write large node outputs to temp file to prevent bash substitution corruption (fixes #1717)#1718

fix(workflows): write large node outputs to temp file to prevent bash substitution corruption (fixes #1717)#1718
Wirasm merged 2 commits into
coleam00:devfrom
kagura-agent:fix/bash-node-large-output-corruption

kagura-agent commented May 18, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 18, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related issues

Possibly related PRs

Poem

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Wirasm commented May 19, 2026

Uh oh!

kagura-agent commented May 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kagura-agent commented May 18, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

UX Journey

Before

After

Architecture Diagram

Before

After

Validation Evidence

Security Impact

Compatibility/Migration

Risks and Mitigations

Rollback Plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related issues

Possibly related PRs

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Wirasm commented May 19, 2026

Review Summary

Blocking issues

Suggested fixes

Minor / nice-to-have

Compliments

Uh oh!

kagura-agent commented May 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kagura-agent commented May 18, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 18, 2026 •

edited

Loading