Skip to content

Fix: bash node $nodeId.output silently corrupts large multi-KB inputs#1719

Closed
Wirasm wants to merge 3 commits into
devfrom
archon/thread-91a73a83
Closed

Fix: bash node $nodeId.output silently corrupts large multi-KB inputs#1719
Wirasm wants to merge 3 commits into
devfrom
archon/thread-91a73a83

Conversation

@Wirasm
Copy link
Copy Markdown
Collaborator

@Wirasm Wirasm commented May 18, 2026

Summary

  • Problem: bash nodes that reference $nodeId.output from an upstream LLM node inline-embed the value via shellQuote() into the bash script string passed to bash -c. For outputs ≥42KB this silently corrupts the value through OS argument-size boundaries and bash parser edge cases.
  • Why it matters: Any workflow where a bash node consumes large LLM output (e.g. maintainer-standup's persist node) fails on every run with no user-visible error — the upstream output is stored correctly but the bash node receives a truncated or corrupted copy.
  • What changed: Added extractNodeOutputEnvVars() helper that replaces $nodeId.output (whole-output) refs with "${_ARCHON_NODE_<ID>_OUTPUT}" bash expansions and injects the actual values via subprocessEnv. Applied at both call sites: executeBashNode and until_bash loop condition. Field-access refs ($nodeId.output.field) remain inline as before — they are small scalars.
  • What did NOT change: substituteNodeOutputRefs itself is unchanged. Script nodes, prompt/LLM nodes, and field-access ref handling are all out of scope.

UX Journey

Before

User triggers workflow
  bash node receives $nodeId.output
  → substituteNodeOutputRefs embeds 42KB value as single-quoted literal inline in bash script
  → execFileAsync('bash', ['-c', '<preamble + 42KB quoted text>'], ...)
  → bash parser or OS silently truncates/corrupts the value
  bash node stdout: corrupted/incomplete data  ← silent failure

After

User triggers workflow
  bash node receives $nodeId.output
  → extractNodeOutputEnvVars() replaces $nodeId.output with "${_ARCHON_NODE_<ID>_OUTPUT}"
  → value set in subprocessEnv._ARCHON_NODE_<ID>_OUTPUT
  → execFileAsync('bash', ['-c', '<script with env var ref>'], { env: { ...subprocessEnv, _ARCHON_NODE_<ID>_OUTPUT: '42KB value' } })
  bash node stdout: correct full value  ← [fixed]

Architecture Diagram

Before

dag-executor.ts::executeBashNode
  substituteWorkflowVariables (shellSafe: true)
    → skips $USER_MESSAGE, $ARGUMENTS, $LOOP_* (env vars)
  substituteNodeOutputRefs (escapedForBash: true)
    → embeds $nodeId.output inline via shellQuote()  ← fragile for large values
  execFileAsync('bash', ['-c', script], { env: subprocessEnv })

After

dag-executor.ts::executeBashNode
  substituteWorkflowVariables (shellSafe: true)
    → skips $USER_MESSAGE, $ARGUMENTS, $LOOP_* (env vars)
  [+] extractNodeOutputEnvVars()
    → replaces $nodeId.output with "${_ARCHON_NODE_<ID>_OUTPUT}"  [~]
    → collects node output values into nodeOutputEnvVars map
  substituteNodeOutputRefs (escapedForBash: true)
    → processes only remaining $nodeId.output.field refs (small scalars)
  execFileAsync('bash', ['-c', script], { env: { ...subprocessEnv, ...nodeOutputEnvVars } })  [~]

Same fix applied at until_bash loop condition call site.

Connection inventory:

From To Status Notes
executeBashNode extractNodeOutputEnvVars new New helper call before substituteNodeOutputRefs
executeBashNode substituteNodeOutputRefs modified Now receives script with env var refs; only field-access refs remain
executeBashNode execFileAsync modified subprocessEnv now includes node output env vars
until_bash (loop) extractNodeOutputEnvVars new Same fix applied to loop condition call site
until_bash (loop) execFileAsync modified env now includes node output env vars

Label Snapshot

  • Risk: risk: low
  • Size: size: S
  • Scope: workflows
  • Module: workflows:dag-executor

Change Metadata

  • Change type: bug
  • Primary scope: workflows

Linked Issue

Validation Evidence (required)

bun run validate

All checks passed:

Check Result
Bundled defaults ✅ 36 commands, 20 workflows up to date
Bundled skill ✅ 21 files up to date
Type check (all 10 packages) ✅ No errors
Lint ✅ 0 errors, 0 warnings
Format ✅ All files formatted
Tests ✅ All passed, 0 failed
Build ✅ All packages built
  • Evidence provided: full bun run validate pass documented in validation.md artifact
  • No commands intentionally skipped

Security Impact (required)

  • New permissions/capabilities? No
  • New external network calls? No
  • Secrets/tokens handling changed? No
  • File system access scope changed? No

Compatibility / Migration

  • Backward compatible? Yes — existing workflows with $nodeId.output in bash scripts continue to work; the env var injection is transparent to the script author
  • Config/env changes? No
  • Database migration needed? No

Human Verification (required)

  • Verified scenarios: unit tests cover the extractNodeOutputEnvVars helper with 8 cases including 50KB+ outputs with special shell characters ($, backticks, double/single quotes, globs), hyphenated node IDs, field-access refs left untouched, multiple refs to same node, unknown node refs left unchanged
  • Edge cases checked: negative lookahead correctly excludes $nodeId.output.field; hyphens in node IDs are normalized to underscores in env var names; unknown node IDs are left unchanged so substituteNodeOutputRefs can emit its existing warning
  • What was not verified: manual end-to-end run of maintainer-standup with a live ≥42KB synthesis output (no running Archon instance available in this session)

Side Effects / Blast Radius (required)

  • Affected subsystems/workflows: bash nodes and until_bash loop conditions in any workflow that references $nodeId.output
  • Potential unintended effects: none — field-access refs ($nodeId.output.field) are explicitly excluded by the negative lookahead and still go through the existing inline path
  • Guardrails: env var name convention _ARCHON_NODE_<ID>_OUTPUT is private/namespaced; unlikely to collide with user env vars

Rollback Plan (required)

  • Fast rollback: revert the two call-site changes in dag-executor.ts (remove the extractNodeOutputEnvVars call and spread, revert substituteNodeOutputRefs to receive substitutedScript directly)
  • Feature flags or config toggles: none
  • Observable failure symptoms: bash node stdout is truncated or garbled for large upstream outputs; reported symptom is the persist node in maintainer-standup receiving corrupted input

Risks and Mitigations

  • Risk: env var name collision when two node IDs differ only by hyphen vs underscore (e.g. my-node and my_node)
    • Mitigation: the map overwrite is deterministic (last write wins); collision is extremely unlikely in practice since node IDs in a single workflow are user-controlled and distinct
  • Risk: outputs approaching macOS ARG_MAX (256KB) could fail even with env var injection
    • Mitigation: 42KB output + ~15KB env total = ~57KB, well under the 256KB limit; outputs near that size would fail with a clear OS error rather than silent corruption

…#1717)

When a bash node references $nodeId.output from an upstream LLM node whose
output is large (~40KB+), the value was embedded inline as a single-quoted
shell literal in the script passed to `bash -c`. For large or adversarial
inputs this inline substitution can mishandle the value, causing the bash
subprocess to receive corrupted or truncated content.

This extends the shellSafe env-var pattern (PR #1651) to cover whole-output
$nodeId.output refs in bash nodes and until_bash loop conditions.
Field-access refs ($nodeId.output.field) remain inline because they are
small scalars.

Changes:
- Add extractNodeOutputEnvVars helper that rewrites $nodeId.output to
  "\${_ARCHON_NODE_<ID>_OUTPUT}" expansions and returns the env-var map
- Use the helper in executeBashNode so the value travels via subprocessEnv
- Use the helper in the until_bash loop condition check
- Add unit tests covering 50KB+ outputs, special shell characters, hyphenated
  node IDs, mixed whole/field refs, repeated refs, and unknown node refs

Fixes #1717
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 18, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c9cf74ec-4b54-4fea-bae5-c4ba0816578e

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch archon/thread-91a73a83

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@Wirasm
Copy link
Copy Markdown
Collaborator Author

Wirasm commented May 18, 2026

Comprehensive PR Review — 5 Agents

PR: #1719 (draft) — Fix: bash node $nodeId.output silently corrupts large multi-KB inputs
Reviewed by: code-review · error-handling · test-coverage · comment-quality · docs-impact
Date: 2026-05-18


Verdict: REQUEST_CHANGES

The core fix is technically sound. `extractNodeOutputEnvVars()` correctly partitions whole-output refs from field-access refs, injects via `ARCHON_NODE*` env vars, avoids the OS argument-size boundary, and is well-tested at the unit level. The issues below are documentation accuracy (one HIGH), comment style (CLAUDE.md), and integration test coverage.

Severity Count
🟠 HIGH 1
🟡 MEDIUM 3
🟢 LOW 5

🟠 HIGH — variables.md describes the old shell-quoting behavior

📍 packages/docs-web/src/content/docs/reference/variables.md:70

Line 70 currently says:

$nodeId.output values are auto shell-quoted (single-quoted, with embedded ' escaped) when substituted into bash: scripts

After this PR, whole-output refs are not inline-quoted — they are replaced with "${_ARCHON_NODE_<ID>_OUTPUT}" and passed via the subprocess environment. Only field-access refs ($nodeId.output.field) still go through shellQuote(). The doc is factually wrong.

Suggested replacement for line 70:

In bash: nodes, $nodeId.output (whole-output) is passed into the script via a dedicated environment variable (_ARCHON_NODE_<ID>_OUTPUT) and referenced as "${_ARCHON_NODE_<ID>_OUTPUT}". This avoids size limits that inline argument quoting imposes on large LLM outputs. Field-access refs ($nodeId.output.field) are still substituted inline as shell-quoted strings. Values are not shell-quoted when substituted into script: bodies — the raw value is embedded as-is.


🟡 MEDIUM — Issue-number refs in 3 comment locations

📍 dag-executor.ts:242, dag-executor.ts:1338, dag-executor.ts:2181

Three comments contain #1717 or #1651. CLAUDE.md: "Don't reference the current task, fix, or callers... those belong in the PR description and rot as the codebase evolves." The why prose is valid — just drop the #NNN suffixes.

View 3-line fix
// Line 242 JSDoc — change:
//   * the same pattern used for $USER_MESSAGE / $ARGUMENTS / $LOOP_* since #1651.
// to:
//   * pattern used for $USER_MESSAGE / $ARGUMENTS / $LOOP_* env vars.

// Line 1337–1338 — change:
//   // Pass $nodeId.output (whole output) via env vars to avoid inline shell-quoting of
//   // large values — see #1717. Field-access refs ($nodeId.output.field) remain inline.
// to:
//   // Pass $nodeId.output (whole output) via env vars to avoid inline shell-quoting of
//   // large values. Field-access refs ($nodeId.output.field) remain inline.

// Line 2181 — change:
//   // Pass $nodeId.output via env vars to avoid inline shell-quoting of large values — see #1717.
// to:
//   // Pass $nodeId.output via env vars to avoid inline shell-quoting of large values.

🟡 MEDIUM — No spy-based integration test for executeBashNode env-var wiring

📍 dag-executor.ts:1339-1360 — the ...nodeOutputEnvVars spread

Unit tests confirm extractNodeOutputEnvVars produces the right map. But no test asserts that _ARCHON_NODE_* actually reaches execFileAsync. If the spread ...nodeOutputEnvVars at line 1359 were accidentally dropped, all tests stay green while large-input handling silently regresses.

View suggested test skeleton
it('passes $nodeId.output values via _ARCHON_NODE_* env vars in executeBashNode', async () => {
  const execSpy = spyOn(git, 'execFileAsync').mockResolvedValue({ stdout: 'ok\n', stderr: '' });
  try {
    // run workflow with upstream node output pre-seeded, consumer node references $upstream.output
    // then:
    const consumerCall = execSpy.mock.calls.find(
      (call) => (call[1] as string[])[1]?.includes('_ARCHON_NODE_UPSTREAM_OUTPUT')
    );
    expect(consumerCall).toBeDefined();
    const envArg = (consumerCall![2] as { env: NodeJS.ProcessEnv }).env;
    expect(envArg?._ARCHON_NODE_UPSTREAM_OUTPUT).toBe('some upstream value');
  } finally {
    execSpy.mockRestore();
  }
});

See test-coverage-findings.md for the full test with mock setup.


🟡 MEDIUM — script-nodes.md cross-reference is now inaccurate

📍 packages/docs-web/src/content/docs/guides/script-nodes.md:202-203

"unlike bash: nodes, where $nodeId.output values are auto-quoted"

This contrast is stale — bash nodes no longer auto-quote whole-output refs inline. Suggested replacement:

unlike bash: nodes, where $nodeId.output whole-output values are passed safely via environment variables


🟢 LOW Issues

View 5 low-priority suggestions
Issue Location Note
Multi-line JSDoc (7 lines; CLAUDE.md says 1 max) dag-executor.ts:236-244 Pre-existing debt on substituteNodeOutputRefs makes this a grey area; trimming recommended
Regex comment explains WHAT not WHY dag-executor.ts:251 Remove; regex is readable without it
escapedForBash — field refs only remain trailing note is fragile dag-executor.ts:2187 Trim to true // escapedForBash or remove
until_bash env-var path has zero integration-level coverage dag-executor.ts:2181-2203 Add minimal spy test (criticality 5/10)
Pre-existing: until_bash EACCES/ETIMEDOUT errors are silent to user dag-executor.ts:2206-2222 Follow-up issue; not introduced here — executeBashNode handles these correctly

✅ What's Good

  • Correct regex: (?!\.[a-zA-Z_]) precisely mirrors the field-name start pattern in substituteNodeOutputRefs — clean partitioning, no overlap.
  • Well-namespaced env vars: _ARCHON_NODE_*_OUTPUT — user collision negligibly unlikely.
  • Correct call-site ordering: extractNodeOutputEnvVarssubstituteNodeOutputRefs — field refs see unmodified text.
  • Thorough unit tests: 8 cases covering large inputs, field exclusion, mixed refs, hyphen normalization, unknown nodes, duplicate refs, and negative-lookahead edge case.
  • Both call sites fixed: executeBashNode and until_bash both receive the fix without creating new asymmetry.
  • Correct env spread order: { ...nodeOutputEnvVars, ...(envVars ?? {}) } lets user envVars override _ARCHON_* vars correctly.
  • No silent failures introduced: Pure function; unknown node IDs intentionally deferred to substituteNodeOutputRefs warn path.
  • .catch chains are complete: Every fire-and-forget createWorkflowEvent call logs err, workflowRunId, and eventType.
  • Env-var approach vs PR C1718 temp-file approach: No threshold edge case, no temp file lifecycle — env-var injection is the cleaner fix.

Suggested Follow-up Issues

Title Priority
until_bash system errors (EACCES, ETIMEDOUT) are silent to user P2
Add integration test for until_bash env-var injection path P3

Reviewed by Archon comprehensive-pr-review workflow · 5 agents · Artifacts: ~/.archon/workspaces/coleam00/Archon/artifacts/runs/ccfd331ff8a6b094408c7e34771a89ef/review/

- Remove issue number refs (#1717, #1651) from comments per CLAUDE.md
- Trim multi-line JSDoc on extractNodeOutputEnvVars to 2-line comment
- Remove regex comment that explained WHAT not WHY
- Trim escapedForBash trailing comment
- Add collision guard warn log when two node IDs map to the same env var
- Spread config.envVars into until_bash subprocess for parity with executeBashNode
- Update variables.md: replace stale "auto shell-quoted" description with env-var mechanism
- Update script-nodes.md: fix stale contrast with bash node auto-quoting
- Add CHANGELOG entry for #1717 fix
- Add integration tests: executeBashNode and until_bash env var wiring via spy
- Add empty-string output unit test for extractNodeOutputEnvVars
@Wirasm
Copy link
Copy Markdown
Collaborator Author

Wirasm commented May 18, 2026

⚡ Self-Fix Report (Aggressive)

Status: COMPLETE
Pushed: ✅ Changes pushed to archon/thread-91a73a83
Commit: ea8e2a6
Philosophy: Fix everything unless clearly a new concern


Fixes Applied (11 total)

Severity Count
🔴 CRITICAL 0
🟠 HIGH 1
🟡 MEDIUM 3
🟢 LOW 7
View all fixes
  • variables.md stale shell-quoting docs (variables.md:70) — Replaced "auto shell-quoted" paragraph with accurate env-var injection description
  • Issue refs #1717 ×2, #1651 ×1 in comments (dag-executor.ts:235,1338,2181) — Stripped issue refs; kept WHY prose
  • Missing executeBashNode integration test (dag-executor.test.ts) — Added spy-based test asserting _ARCHON_NODE_* reaches execFileAsync
  • script-nodes.md stale bash cross-reference (script-nodes.md:202) — Updated "auto-quoted" contrast to describe env-var approach
  • Multi-line JSDoc on extractNodeOutputEnvVars (dag-executor.ts:236) — Trimmed to 2-line // comment per CLAUDE.md
  • Regex comment explains WHAT not WHY (dag-executor.ts:251) — Removed
  • Fragile escapedForBash trailing comment (dag-executor.ts:2187) — Trimmed to true // escapedForBash
  • until_bash path has zero integration tests (dag-executor.test.ts) — Added spy-based test for until_bash call site
  • Node ID collision silently overwrites env var (dag-executor.ts:259) — Added log.warn collision guard
  • until_bash missing config.envVars spread (dag-executor.ts:2202) — Added ...(config.envVars ?? {}) to match executeBashNode
  • Missing CHANGELOG entry (CHANGELOG.md) — Added ### Fixed entry under [Unreleased]

Tests Added

  • handles empty string node output — unit test for extractNodeOutputEnvVars with ''
  • passes $nodeId.output value via _ARCHON_NODE_* env var in executeBashNode — spy asserts env var reaches subprocess
  • passes $nodeId.output via _ARCHON_NODE_* env var in until_bash — spy asserts env var reaches until_bash subprocess

Skipped

(none — all findings addressed)


Suggested Follow-up Issues

  1. until_bash system errors (EACCES, ETIMEDOUT) are silent to user — pre-existing gap in until_bash catch block; executeBashNode handles these with safeSendMessage, until_bash does not

Validation

✅ Type check | ✅ Lint | ✅ Tests (243+ passed across all packages)


Self-fix by Archon · aggressive mode · fixes pushed to archon/thread-91a73a83

@Wirasm
Copy link
Copy Markdown
Collaborator Author

Wirasm commented May 20, 2026

Closing in favor of #1718 (merged).

Quick comparison of the two approaches for #1717:

#1718 (temp-file) — above a 32KB threshold, writes the value to <logDir>/<nodeId>.nodeoutput and replaces the inline ref with $(cat '<path>'). Below the threshold, normal shell-quote. Covers both $nodeId.output and $nodeId.output.field (structured field refs). File-write failure falls back to inline shellQuote(value) with an error log.

#1719 (env-var, this PR)$nodeId.output always becomes "${_ARCHON_NODE_<ID>_OUTPUT}" with the value passed through subprocess env. Field refs left inline. Matches the #1651 shellSafe precedent, no file I/O, no cleanup.

Trade-offs:

For the actual #1717 symptom (42KB synth output), both fix it. Since #1718 already merged and covers a strictly broader set of cases, this PR is no longer needed.

Thanks @wirasm-archon for the env-var experiment — the test scaffolding here was a useful sanity check while reviewing #1718.

@Wirasm Wirasm closed this May 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bash node: $nodeId.output substitution silently corrupts large multi-KB inputs (~42KB+)

1 participant