feat(strands-py): add Sandbox core abstraction (TS→Python port, core only 1/N)#2665
Conversation
Port the core Sandbox interface from strands-ts/src/sandbox/ to strands-py/src/strands/sandbox/, mirroring the TypeScript public barrel (Sandbox, PosixShellSandbox, the data types, and LANGUAGE_PATTERN). The streaming methods (execute_streaming/execute_code_streaming) are the abstract primitives; non-streaming and text wrappers consume them. PosixShellSandbox implements file/code ops over the shell (base64 heredoc transport, env export prefix, ls -1ap parsing), so subclasses implement only execute_streaming. Mirrors the TS posix-shell tests plus recovered strands-agents#2198 security cases. Docker/SSH backends and Agent<->Sandbox integration are intentionally out of scope (follow-up PRs).
Python's $ matches before a trailing newline, unlike JS /^...$/.test() in the strands-ts oracle. LANGUAGE_PATTERN/ENV_KEY_PATTERN were matched with re.match(), so 'python3\n' and env key 'FOO\n' passed the Python validator while the TS oracle rejects them — a behavioral-parity break in a security-sensitive validator (a trailing newline is a shell statement separator). Switch both call sites to re.fullmatch (idiomatic JS-$ parity), document the gotcha in constants.py, and add regression tests that fail on the old .match code.
|
Issue: This PR introduces a new public abstraction ( The PR description is thorough and does document the API surface, divergences from TS, and exports — which is great. However:
Suggestion: Add the |
|
Assessment: Comment This is a well-structured port with faithful parity to the TypeScript oracle, good security testing, and clear documentation of intentional divergences. Review Themes
The code quality, testing coverage (57 tests including adversarial/security cases), and documentation are strong. The |
|
@agent-of-mkmeral check the reviews here, update as necessary |
Addresses the review comments on PR strands-agents#2665: - shell.py: replace the hand-rolled shell_quote() with stdlib shlex.quote() (answers @mkmeral's 'is there a better library' question; tenet strands-agents#6 'embrace common standards'). shlex.quote is shell-equivalent and reduces maintenance surface. Updated build_shell_env_prefix and all file-op call sites; updated tests (shlex leaves shell-safe values unquoted, e.g. export FOO=bar). - shell.py read_file: wrap base64 decode in try/except so corrupt/unexpected stdout surfaces a clear OSError naming the path instead of a cryptic binascii.Error. Added a regression test. - shell.py PosixShellSandbox docstring: document the subclass contract for timeout and cwd symmetrically with env (silently ignored if not honored). - test_shell.py _stream_process: document idle-timeout (not wall-clock) semantics in the docstring and at the wait_for site. - sandbox/__init__.py: note that top-level strands re-export is deferred to the Agent<->Sandbox integration follow-up (per Flat Namespaces decision record). Validated (OTEL_* unset to avoid the hatch-env import artifact): ruff check + format clean, mypy clean (5 files), 58 tests pass.
|
@mkmeral went through every review thread and pushed On your question ( Other findings addressed:
Validation (with the runner's ambient One thing I can't do myself: the bot flagged this needs the The only red check is 🤖 Updated by an autonomous Strands agent. |
|
Assessment: Approve All previous review feedback has been addressed in Verification of addressed items
No new issues found on re-review. The only open process item is the |
|
@mkmeral re-checked the reviews — everything actionable is already addressed in
The Strands review bot independently re-reviewed at 13:30Z and flipped to Approve with zero new findings. Two items only a maintainer can clear (I lack write perms from a fork PR):
The only red check is 🤖 Updated by an autonomous Strands agent. |
|
@mkmeral thanks for the approval (15:41Z) — it unblocked the access-gated CI, so the full matrix ran. Re-checked everything: ✅ Green: 🔴 The two remaining red checks are NOT from this PR:
Still maintainer-only: the So: code is approved and fully green on its own surface; the only blockers are a flaky unrelated Windows timing test (needs a re-run) and the two process labels. Nothing actionable left in the diff on my end. 🤖 Updated by an autonomous Strands agent. |
…assert on Windows 3.12 runner, unrelated to this PR)
|
Assessment: Approve Clean TS→Python port with faithful parity, comprehensive security testing, and proper adherence to project conventions. All prior review feedback has been addressed. Review Summary
No new issues found. The only remaining action is for a maintainer to apply the |
|
@mkmeral re-checked — no new code feedback, and things actually improved: ✅ Now two human approvals: you (15:41Z) + @gautamsirdeshmukh (16:40Z). The Strands review bot also re-approved at 16:45Z with zero new findings. All 7 review threads resolved, On the flaky Windows test: I pushed an empty re-trigger commit ( Two maintainer-only items remain — nothing actionable on my end:
The 🤖 Updated by an autonomous Strands agent. |
Description
This PR ports the core Sandbox abstraction from the
strands-tsreference implementation intostrands-py, understrands-py/src/strands/sandbox/. It is a cross-language parity port: the behavior is derived from the TypeScript oracle, then re-expressed idiomatically in Python.Behavioral oracle
The source of truth is the TypeScript sandbox public barrel
strands-ts/src/sandbox/index.tsin this monorepo, which exports exactly:The Python module mirrors this barrel one-to-one (
strands/sandbox/__init__.py), porting:Sandbox(ABC) — two abstract streaming primitives (execute_streaming,execute_code_streaming), abstract file I/O (read_file,write_file,remove_file,list_files), and non-streaming/text convenience wrappers (execute,execute_code,read_text,write_text).PosixShellSandbox— shell-based defaults: base64-heredoc code execution, base64 file read/write withmkdir -p,rm, andls -1apparsing. Subclasses implement onlyexecute_streaming.StreamChunk,FileInfo,OutputFile,ExecutionResult, and theStreamTypeliteral (as dataclasses + aLiteral).LANGUAGE_PATTERN— interpreter-name validation, plus the internalENV_KEY_PATTERNand theshell_quote/validate_env_keys/build_shell_env_prefixhelpers.Docker/SSH concrete backends, the
stream-processevent-pump, and the Agent↔Sandbox integration are deliberately out of scope for this "core only (1/N)" PR (see Scope below).Prior-history note (what happened to the earlier Python sandbox attempt)
A previous Python sandbox attempt — tracked around PR #2198 ("feat: add Sandbox abstraction - core only (1/N)") — was merged to a
feature/sandboxbranch ONLY, never tomain. It used the pre-monorepo pathsrc/strands/sandbox/. Since then the design has evolved (it now lives in the monorepo and the TypeScript implementation is the canonical reference). This port follows the CURRENTstrands-tsoracle, not that stale attempt — though its adversarial/security tests were cross-referenced when mirroring coverage.Accepted Python↔TypeScript divergences
These are intentional, idiomatic translations (behavior preserved, surface adapted to Python):
ExecuteOptionsoptions-objecttimeout/cwd/envstrands-pycall-option styleAbortSignalcancellationasynciotask cancellationsignalparam in the core ABCtypetag ('streamChunk'/'executionResult')isinstance()checks on@dataclasstypesUint8ArraybytesexecuteStreaming,isDir,streamType,outputFiles)execute_streaming,is_dir,stream_type,output_files)throw new Error(...)FileNotFoundError/OSError/ValueError/RuntimeErrorrevia JS/^...$/.test()re.fullmatchon^...$patternsIteration-1 divergence note (security-relevant): Python's
$in a regex matches just before a trailing\n, unlike JavaScript's/^...$/.test()which anchors to the true end of the string. Usingre.matchon the$-anchoredLANGUAGE_PATTERN/ENV_KEY_PATTERNwould have acceptedlanguage="python3\n"and env key"FOO\n"(a trailing newline is a shell statement separator) while the TS oracle rejects them. Both validator call sites usere.fullmatchto reproduce the JS semantics; the gotcha is documented inconstants.pyand covered by two dedicated regression tests.Scope — "core only (1/N)" + follow-ups
This PR is the first in a series. Out of scope here, to follow as separate PRs mirroring the corresponding TypeScript work:
stream-processhelper).strands-agents/harness-sdk#2563).strands-agents/harness-sdk#2649).Related Issues
Parity-port tracking issue:
agent-of-mkmeral/strands-coder-private#248.Documentation PR
No user-facing docs required for this internal core abstraction (no public package export surface changes beyond the new
strands.sandboxmodule, which is documented via module/class docstrings). Docs will accompany the Agent-integration follow-up.Type of Change
New feature
Review Loop Summary
This PR went through a fresh-context pre-PR review loop (
task-pre-pr-review, max 3 iterations) before being opened. State was tracked in theREVIEW_LOOP_248repo variable.strands-ts/src/sandbox/core →strands-py/src/strands/sandbox/:SandboxABC +PosixShellSandbox+ types +LANGUAGE_PATTERN, 55 mirrored tests incl. security/adversarial.re.matchvs JS$transliteration bug inLANGUAGE_PATTERN/ENV_KEY_PATTERN— trailing-newline bypass in a security-sensitive validator. Switched both call sites tore.fullmatch, documented the gotcha inconstants.py, added 2 regression tests (verified to fail on the old.matchcode). 57 tests.fullmatch; grepped for any other.match/.searchvalidator sites (none); empirically re-verified the iteration-1 regression tests genuinely fail when reverted to.matchand pass withfullmatch. lint + mypy (5 files) + 57 tests all green. Opening the PR.Remaining Findings
None blocking. One 🟢 cosmetic nit was considered and dismissed as a non-issue:
list_filesstrips trailing CR withstr.rstrip("\r")(removes all trailing CRs) where the TS oracle usesreplace(/\r$/, '')(removes at most one). This can only differ for a filename literally containing a CR byte under CRLF line endings, which POSIXls -1apnever emits (POSIX uses LF). No behavioral impact; not filed.Testing
Readiness was run from
strands-py/(with the runner's ambientOTEL_*env unset, which otherwise breaks the hatch-test env at import — an env artifact, not a code defect):hatch run lint→ All checks passed!hatch run hatch-static-analysis:mypy ./src/strands/sandbox→ Success: no issues found in 5 source fileshatch test tests/strands/sandbox/test_shell.py→ 57 passedThe Python tests mirror
strands-ts/src/sandbox/__tests__/posix-shell.test.node.tsand itsTestSandboxfixture (ported to_ShellTestSandbox+_stream_process), and cover: streaming primitives + non-streaming/text wrappers; base64-heredoc code exec; base64 binary/all-256-byte/unicode file roundtrip withmkdir -p;rm;ls -1apparsing (excludes.//../, trailing-slash dir detection);export K=V &&env prefix with shell-quoting +ENV_KEYvalidation;shell_quotesingle-quote wrapping +'\''escaping;LANGUAGE_PATTERNinjection rejection (path traversal,;, spaces, trailing newline); env-name injection; shell-metacharacter content/path safety; command-substitution neutralization; timeout; asyncio cancellation; exit-code/signal mapping (127, 128+signal).hatch run prepare(scoped: lint + mypy + sandbox tests — all green). Note:hatch run formatflags two unrelated pre-existing files (src/strands/tools/mcp/mcp_client.py,tests/strands/models/test_gemini.py) as drift onmain; those are not part of this diff and were left untouched.Checklist
strands/sandbox/__init__.pyandSandboxdocstrings)cc @mkmeral @gautamsirdeshmukh (gautamsirdeshmukh owns the sandbox workstream — TS #2563/#2649).
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.
🤖 Opened by an autonomous Strands agent after a fresh-context pre-PR review loop converged.