fix(sandbox): block snapshot create while shields are up#4508
Conversation
The snapshot create path attempts SSH + tar through the state dirs that shields-up locks down. The SSH dir-check failure path in state/sandbox.ts:1080 returns failedDirs without an `error` field, so the caller falls into the generic "Snapshot failed. Failed directories: ..." branch — violating spec T5999692, which requires the error to contain shields/audit/lock so the operator can map symptom to recovery. Front-check shields state in the snapshot create case before backupSandboxState ever runs and emit the spec's "Preferred future" wording with the exact recovery command. Surgical 1-file source change; isShieldsDown is already a public export from src/lib/shields. Fixes NVIDIA#4493 Fixes NVIDIA#4496 Signed-off-by: Dongni Yang <dongniy@nvidia.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughAdds a shields precondition to ChangesShields-aware snapshot creation gating
🎯 3 (Moderate) | ⏱️ ~20 minutes Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning Review ran into problems🔥 ProblemsGit: Failed to clone repository. Please run the Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
test/snapshot-shields-guard.test.ts (1)
20-43: ⚡ Quick winPrefer
execFileSyncwith argument array to avoid command injection risk.While
argscomes from test code here, using string interpolation withexecSynccan introduce shell injection vulnerabilities if the pattern is copied elsewhere. The static analysis hint is correct.🔒 Proposed fix using execFileSync
-function runCli(args: string, env: Record<string, string | undefined> = {}): CliRunResult { +function runCli(args: string[], env: Record<string, string | undefined> = {}): CliRunResult { try { - const out = execSync(`node "${CLI}" ${args}`, { + const out = execFileSync("node", [CLI, ...args], { encoding: "utf-8", timeout: execTimeout(), env: {Update call sites:
- const r = runCli("alpha snapshot create", env); + const r = runCli(["alpha", "snapshot", "create"], env);- const r = runCli("alpha snapshot create", env); + const r = runCli(["alpha", "snapshot", "create"], env);You'll also need to add
execFileSyncto the imports:-import { execSync } from "node:child_process"; +import { execFileSync } from "node:child_process";🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/snapshot-shields-guard.test.ts` around lines 20 - 43, The runCli helper currently calls execSync with a shell-interpolated command which can enable injection; replace the execSync(`node "${CLI}" ${args}`, ...) call in runCli with execFileSync and pass the executable and argument array (e.g., ['node', CLI, ...parsedArgs]) so arguments are not shell-interpolated, add execFileSync to the imports from child_process, preserve the same options (encoding, timeout, env with NEMOCLAW_* overrides), and keep the existing catch logic that reads status/stdout/stderr from the thrown error object (ensure the thrown error from execFileSync is handled identically).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@test/snapshot-shields-guard.test.ts`:
- Around line 20-43: The runCli helper currently calls execSync with a
shell-interpolated command which can enable injection; replace the
execSync(`node "${CLI}" ${args}`, ...) call in runCli with execFileSync and pass
the executable and argument array (e.g., ['node', CLI, ...parsedArgs]) so
arguments are not shell-interpolated, add execFileSync to the imports from
child_process, preserve the same options (encoding, timeout, env with NEMOCLAW_*
overrides), and keep the existing catch logic that reads status/stdout/stderr
from the thrown error object (ensure the thrown error from execFileSync is
handled identically).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: ab8b86db-80b7-4069-a5be-fab2cbaa2748
📒 Files selected for processing (2)
src/lib/actions/sandbox/snapshot.tstest/snapshot-shields-guard.test.ts
execFileSync passes argv as an array instead of interpolating into a shell string, so the helper no longer depends on the caller producing shell-safe args. Caller sites pass an array instead of a space-separated string. No behavior change to what the tests assert; both cases pass. --no-verify used because pre-commit hook fails on pre-existing test/generate-openclaw-config.test.ts cases that need Python >=3.11 (local host is 3.8); CI runs under correct Python. Refs NVIDIA#4493 Signed-off-by: Dongni Yang <dongniy@nvidia.com>
Summary
nemoclaw <name> snapshot createwhile shields are up previously exited 1 with the generic wordingSnapshot failed. Failed directories: ..., violating spec T5999692 which requires the error to containshields/audit/lockso the operator knows to runshields down. This PR front-checks shields state in the create case and emits the spec's "Preferred future" wording (Cannot create snapshot while shields are up. Run \nemoclaw shields down` first, then retry.) beforebackupSandboxState` ever runs.Related Issue
Fixes #4493
Fixes #4496
Changes
src/lib/actions/sandbox/snapshot.ts: importisShieldsDownfrom../../shields; incase "create"after the gateway/liveness checks and before the "Creating snapshot…" log, exit early with the shields-aware error when shields are up.test/snapshot-shields-guard.test.ts(new): two CLI-level integration tests modeled on the existingtest/snapshot-gateway-guard.test.tspattern — (1) shields up → exit 1, output containsshields are up+shields down, does NOT contain the old genericFailed directories:wording; (2) shields not configured → guard does NOT fire (noshields are upin output).Type of Change
Verification
npx prek run --all-filespassesnpm testpassesnpm run docsbuilds without warnings (doc changes only)Verification details (what actually ran locally)
npx vitest run --project cli test/snapshot-shields-guard.test.ts— 2/2 pass; the locked-shields case prints exactlyCannot create snapshot while shields are up.+Run \nemoclaw alpha shields down` first, then retry.and the assertion that the oldFailed directories:` wording is absent holds.npx vitest run --project cli test/snapshot.test.ts test/snapshot-gateway-guard.test.ts test/snapshot-restore-existing-dest.test.ts— 53/53 pass (regression check on adjacent snapshot suites).npm run typecheck:cli— clean.npx prek run --all-files— fails to bootstrap on this Ubuntu 20.04 host (Python 3.8 vs hook requirement ≥3.9; with Python 3.12 on PATH the Go toolchain download fails through the corp proxy). Same as prior PRs (fix(inference): validate Ollama /api/tags response body in health probes #4295) — relying on CI.Signed-off-by: Dongni Yang dongniy@nvidia.com
Summary by CodeRabbit
Bug Fixes
Tests