feat: Sandbox parent PR (base/shell/remote/docker/tools/plugins) by gautamsirdeshmukh · Pull Request #1011 · strands-agents/sdk-typescript

gautamsirdeshmukh · 2026-05-06T15:21:54Z

NOTE - SPLITTING INTO MULTIPLE PRS

Sandbox/ShellSandbox base interface --> feat: base Sandbox interface #1090
SSH/Docker sandbox interfaces
Vended tools/plugins
Agent integration

Description

Introduces Sandbox, a core primitive for the Strands Agent Harness SDK.

Sandbox decouples tool command execution, code execution and filesystem access from the agent's runtime. Tool logic still runs in the agent's host Node.js process (argument parsing, output formatting, error handling). However, the underlying operations they perform (spawning a shell, running code, file read/write, etc) are routed through the configured Sandbox backend.

When a user explicitly provides a Sandbox (Docker container, remote host via SSH, etc), the agent automatically gains four vended tools (exec, codeInterpreter, fileEditor) that route their operations through the Sandbox. This enables the same agent to execute locally during development or in an isolated/remote environment with a single config change.

When no Sandbox is provided, the agent behaves exactly as it does today (operating on the host).

Overview of Changes

Core Interface (src/sandbox/)

Sandbox abstract class – 7 abstract methods (executeStreaming, executeCodeStreaming, readFile, writeFile, removeFile, listFiles, statFile) + convenience wrappers (execute, executeCode, readText, writeText)
- ShellSandbox – abstract base for shell-accessible backends (subclasses implement executeStreaming)
  - RemoteSandbox – SSH transport with accept-new host key default
  - DockerSandbox – container isolation with start/stop/pause/resume (snapshot)

Vended Tools (src/vended-tools/)

exec — shell command execution with cwd and timeout
codeInterpreter — execute code in any language via stdin piping
fileEditor — view, create, str_replace, insert, undo

All four are auto-registered when a Sandbox is explicitly configured on an agent. All four work in both sandbox mode (operations routed through sandbox.{method}()) and default mode (native host execution).

Plugin Integrations

SandboxStorage — new storage backend for ContextOffloader that writes offloaded content to the sandbox filesystem
AgentSkills – added method Skill.fromSandbox() for loading skills from the sandbox filesystem

Agent Integration

sandbox?: Sandbox | false on AgentConfig (defaults to disabled/preexisting behavior, same as false case)
Tools access sandbox via context.agent.sandbox

Related Issues

Documentation PR

Not yet

Type of Change

New feature

Testing

How have you tested the change?

I ran npm run check (added many unit tests, ran many demos with real agents)

Checklist

I have read the CONTRIBUTING document
I have added any necessary tests that prove my fix is effective or my feature works
I have updated the documentation accordingly
I have added an appropriate example to the documentation to outline the feature, or no new docs are needed
My changes generate no new warnings
Any dependent changes have been merged and published

Examples

1. Agent with Sandbox — auto-registered tools, create and run a script

import { Agent } from '@strands-agents/sdk'
import { DockerSandbox } from '@strands-agents/sdk/sandbox'

async function main() {
  const sandbox = new DockerSandbox({ image: 'python:3.12-slim' })

  try {
    const agent = new Agent({
      sandbox,
      systemPrompt:
        'You have access to a sandbox environment. ' +
        'You can run shell commands, execute code, and manage files.',
    })

    const result = await agent.invoke(
      'Create a file called hello.py with a hello world script, then run it and show me the output.'
    )
    console.log(`\n--- Agent result ---\n${result}`)
  } finally {
    await sandbox.stop()
  }
}

await main()

Output:

Sure! I'll do both steps simultaneously — creating the file and preparing to run it.
  ⏳ fileEditor

🔧 Tool strands-agents/sdk-typescript#1: fileEditor
✓ Tool completed
Now let's run it!
  ⏳ exec

🔧 Tool strands-agents/sdk-typescript#2: exec
✓ Tool completed
Here's a summary of what was done:

1. **Created** `/hello.py` with the following content:
   ```python
   print("Hello, World!")

Ran the script using python3, which produced the output:
```
Hello, World!
```

Everything worked perfectly! ✅

--- Agent result ---
Here's a summary of what was done:

Created /hello.py with the following content:
```
print("Hello, World!")
```
Ran the script using python3, which produced the output:
```
Hello, World!
```

Everything worked perfectly! ✅

</details>

<details>
<summary><strong>2. Docker Sandbox with Resource Limits</strong> — memory, CPU, PID, network, user</summary>

```typescript
import { DockerSandbox } from '@strands-agents/sdk/sandbox'

async function main() {
  // Basic Docker sandbox
  const sandbox = new DockerSandbox({
    image: 'python:3.12-slim',
    workingDir: '/workspace',
  })

  try {
    console.log('=== Basic execution ===')
    const result = await sandbox.execute('python3 -c "import sys; print(f\'Python {sys.version}\')"')
    console.log(result.stdout)

    console.log('=== File operations ===')
    await sandbox.writeText('data.txt', 'Hello from Docker sandbox!')
    const content = await sandbox.readText('data.txt')
    console.log(`Read back: ${content}`)

    console.log('=== Install and use a package ===')
    await sandbox.execute('pip install cowsay -q')
    const cowResult = await sandbox.executeCode("import cowsay; cowsay.cow('Moo from Docker!')", 'python3')
    console.log(cowResult.stdout)
  } finally {
    await sandbox.stop()
  }

  // Resource-limited sandbox (for untrusted workloads)
  console.log('\n=== Resource-limited sandbox ===')
  const limitedSandbox = new DockerSandbox({
    image: 'python:3.12-slim',
    memory: '256m',
    cpus: 0.5,
    pidsLimit: 50,
    network: 'none',
    user: '1000',
  })
  try {
    const memResult = await limitedSandbox.execute('cat /sys/fs/cgroup/memory.max 2>/dev/null || echo "cgroup v1"')
    console.log(`Memory limit: ${memResult.stdout.trim()}`)

    const whoResult = await limitedSandbox.execute('id')
    console.log(`Running as: ${whoResult.stdout.trim()}`)
  } finally {
    await limitedSandbox.stop()
  }
}

await main()

Output:

=== Basic execution ===
Python 3.12.13 (main, Apr 22 2026, 02:12:20) [GCC 14.2.0]

=== File operations ===
Read back: Hello from Docker sandbox!
=== Install and use a package ===
  ________________
| Moo from Docker! |
  ================
                \
                 \
                   ^__^
                   (oo)\_______
                   (__)\       )\/\
                       ||----w |
                       ||     ||


=== Resource-limited sandbox ===
Memory limit: 268435456
Running as: uid=1000 gid=0(root) groups=0(root)

3. Remote Sandbox (SSH) — same interface, different backend

import { Agent } from '@strands-agents/sdk'
import { RemoteSandbox } from '@strands-agents/sdk/sandbox'

async function main() {
  const sandbox = new RemoteSandbox({
    host: 'user@dev-server.example.com',
    workingDir: '/home/user/workspace',
    identityFile: '~/.ssh/id_ed25519',
    sshOptions: ['ConnectTimeout=10', 'ServerAliveInterval=60'],
  })

  try {
    const agent = new Agent({
      sandbox,
      systemPrompt: 'You have access to a remote development server. Help the user with their tasks.',
    })

    const result = await agent.invoke('What OS is this machine running? List the files in the workspace.')
    console.log(`\n--- Agent result ---\n${result}`)
  } finally {
    await sandbox.stop()
  }
}

await main()

4. Shared Sandbox Between Agents — multi-agent collaboration via filesystem

import { Agent } from '@strands-agents/sdk'
import { DockerSandbox } from '@strands-agents/sdk/sandbox'

async function main() {
  const sharedSandbox = new DockerSandbox({ image: 'python:3.12-slim' })

  try {
    const writerAgent = new Agent({
      sandbox: sharedSandbox,
      systemPrompt: 'You are a writer agent. Write files when asked.',
    })

    const readerAgent = new Agent({
      sandbox: sharedSandbox,
      systemPrompt: 'You are a reader agent. Read and summarize files when asked.',
    })

    // Writer creates a file
    console.log('--- Writer agent ---')
    await writerAgent.invoke('Write a haiku about TypeScript to a file called haiku.txt')

    // Reader reads the same file (shared sandbox!)
    console.log('\n--- Reader agent ---')
    const result = await readerAgent.invoke('Read haiku.txt and tell me what it says')
    console.log(`\nReader result: ${result}`)
  } finally {
    await sharedSandbox.stop()
  }
}

await main()

Output:

--- Writer agent ---

  ⏳ fileEditor

🔧 Tool strands-agents/sdk-typescript#1: fileEditor
✓ Tool completed
The haiku has been written to `haiku.txt`! Here's what it says:

> *Types guard the code's flow,*
> *Errors caught before runtime,*
> *Safety in each line.*

It captures TypeScript's essence — strong typing, early error detection, and the confidence it brings to development! 🎋

--- Reader agent ---
Let me read that file for you right away!
  ⏳ fileEditor

🔧 Tool strands-agents/sdk-typescript#1: fileEditor
✓ Tool completed
The file contains a haiku about type safety in programming. Here it is:

> *Types guard the code's flow,*
> *Errors caught before runtime,*
> *Safety in each line.*

It reflects on how type systems in programming languages help catch errors early — before the code even runs — bringing a sense of safety and structure to every line of code written.

Reader result: The file contains a haiku about type safety in programming. Here it is:

> *Types guard the code's flow,*
> *Errors caught before runtime,*
> *Safety in each line.*

It reflects on how type systems in programming languages help catch errors early — before the code even runs — bringing a sense of safety and structure to every line of code written.

5. Pause and Resume — snapshot state, destroy container, restore in new container

import { DockerSandbox } from '@strands-agents/sdk/sandbox'

async function main() {
  // Create sandbox and do some work
  const sandbox = new DockerSandbox({ image: 'python:3.12-slim' })

  let snapshot
  try {
    // Install a package and create a file
    console.log('=== Setting up sandbox ===')
    await sandbox.execute('pip install requests -q')
    await sandbox.writeText('state.json', '{"step": 1, "data": "important"}')

    // Verify they exist
    const checkPip = await sandbox.executeCode('import requests; print(requests.__version__)', 'python3')
    console.log(`requests version: ${checkPip.stdout.trim()}`)
    const checkFile = await sandbox.readText('state.json')
    console.log(`state.json: ${checkFile}`)

    // Pause — creates a Docker image snapshot
    console.log('\n=== Pausing (docker commit) ===')
    snapshot = await sandbox.pause()
    console.log(`Snapshot: ${snapshot.data['imageId']}`)
  } finally {
    await sandbox.stop()
    console.log('Container destroyed.')
  }

  // Resume in a BRAND NEW container from the snapshot
  console.log('\n=== Resuming in new container ===')
  const resumed = new DockerSandbox({ image: 'python:3.12-slim', snapshot })

  try {
    // Verify state survived — this is a completely new container
    const state = await resumed.readText('state.json')
    console.log(`state.json still exists: ${state}`)

    const pipResult = await resumed.executeCode('import requests; print(requests.__version__)', 'python3')
    console.log(`requests still installed: ${pipResult.stdout.trim()}`)
  } finally {
    await resumed.stop()
  }
}

await main()

Output:

=== Setting up sandbox ===
requests version: 2.34.2
state.json: {"step": 1, "data": "important"}

=== Pausing (docker commit) ===
Snapshot: sha256:e2fa2a089e1a9734683130642d78aadd5cd2f8f6dcaaf4e3ef64d256952efc04
Container destroyed.

=== Resuming in new container ===
state.json still exists: {"step": 1, "data": "important"}
requests still installed: 2.34.2

6. All Vended Tools — exec, codeInterpreter, fileEditor (grep, glob, create, view, str_replace)

import { Agent } from '@strands-agents/sdk'
import { DockerSandbox } from '@strands-agents/sdk/sandbox'

async function main() {
  const sandbox = new DockerSandbox({ image: 'python:3.12-slim' })

  try {
    const agent = new Agent({
      sandbox,
      systemPrompt: 'You have exec, codeInterpreter, and fileEditor tools available.',
    })

    // exec
    console.log('=== exec ===')
    await agent.invoke('Check what Python version is installed and list the contents of /workspace')

    // codeInterpreter
    console.log('\n=== codeInterpreter ===')
    await agent.invoke('Use the codeInterpreter tool to compute the first 10 Fibonacci numbers')

    // fileEditor: create + view
    console.log('\n=== fileEditor: create + view ===')
    await agent.invoke('Create a TypeScript file at /workspace/utils.ts with a function that reverses a string, then show me the file')

    // fileEditor: str_replace
    console.log('\n=== fileEditor: str_replace ===')
    await agent.invoke('Rename the function in /workspace/utils.ts to "reverseString"')

    // fileEditor: grep + glob
    console.log('\n=== fileEditor: grep + glob ===')
    await agent.invoke('Find all files in /workspace and search for the word "reverse" across them')
  } finally {
    await sandbox.stop()
  }
}

await main()

Output:

=== exec ===
I'll check both at the same time!
  ⏳ exec

  ⏳ exec

🔧 Tool strands-agents/sdk-typescript#1: exec
🔧 Tool strands-agents/sdk-typescript#2: exec
✓ Tool completed
✓ Tool completed
Here are the results:

1. **Python Version:** Python **3.12.13** is installed.
2. **`/workspace` Contents:** The directory is currently **empty** — no files or folders were found.

Let me know if you'd like to do anything else!

=== codeInterpreter ===

  ⏳ codeInterpreter

🔧 Tool strands-agents/sdk-typescript#3: codeInterpreter
✓ Tool completed
Here are the first 10 Fibonacci numbers:

| Position | Value |
|----------|-------|
| 1        | 0     |
| 2        | 1     |
| 3        | 1     |
| 4        | 2     |
| 5        | 3     |
| 6        | 5     |
| 7        | 8     |
| 8        | 13    |
| 9        | 21    |
| 10       | 34    |

Each number is the sum of the two preceding ones, starting from **0** and **1**. Let me know if you'd like to compute more or explore further!

=== fileEditor: create + view ===
I'll create the file and read it back at the same time!
  ⏳ fileEditor

🔧 Tool strands-agents/sdk-typescript#4: fileEditor
✓ Tool completed
File created! Now let me view it:
  ⏳ fileEditor

🔧 Tool strands-agents/sdk-typescript#5: fileEditor
✓ Tool completed

/workspace/utils.ts:
  function reverseString(str: string): string {
    return str.split("").reverse().join("");
  }

=== fileEditor: str_replace ===
The function is already named `reverseString`! No changes are needed. You can see it on **line 1** of the file:

```ts
function reverseString(str: string): string {

Let me know if you'd like to make any other changes!

=== fileEditor: grep + glob ===
I'll run both operations at the same time!
⏳ fileEditor
⏳ fileEditor

🔧 Tool #6: fileEditor
🔧 Tool #7: fileEditor
✓ Tool completed
✓ Tool completed
Here are the results:

Files in /workspace:

utils.ts (the only file present)

Occurrences of "reverse" in /workspace/utils.ts:

Line	Content
1	`function reverseString(str: string): string {`
2	`return str.split("").reverse().join("");`
7	`const reversed = reverseString(original);`
9	`console.log(\`Reversed: ${reversed}`);`

The word "reverse" appears 4 times across the codebase — all within utils.ts. Let me know if you'd like to do anything else!

</details>

<details>
<summary><strong>7. ContextOffloader with SandboxStorage</strong> — offloaded content stays in the container</summary>

```typescript
import { Agent, tool } from '@strands-agents/sdk'
import { z } from 'zod'
import { DockerSandbox } from '@strands-agents/sdk/sandbox'
import { ContextOffloader } from '@strands-agents/sdk/vended-plugins/context-offloader'

const generateReport = tool({
  name: 'generate_report',
  description: 'Generate a large report with lots of data.',
  inputSchema: z.object({}),
  callback: () => {
    return 'Line ' + Array.from({ length: 500 }, (_, i) => `Row ${i}: ${'data '.repeat(20)}`).join('\n')
  },
})

async function main() {
  const sandbox = new DockerSandbox({ image: 'python:3.12-slim' })

  try {
    const agent = new Agent({
      sandbox,
      tools: [generateReport],
      plugins: [new ContextOffloader({ storage: sandbox, tokenThreshold: 100 })],
      systemPrompt: 'You can generate reports and retrieve offloaded content.',
    })

    // Generate a large tool result — it will be offloaded to sandbox storage
    console.log('=== Generating large tool result ===')
    await agent.invoke('Generate a report.')

    // Prove the offloaded content is inside the container, not on the host
    console.log('\n=== Where did the offloaded content go? ===')
    const files = await sandbox.listFiles('./artifacts')
    console.log(`Inside Docker container (./artifacts): [${files.map((f) => f.name).join(', ')}]`)

    const { existsSync } = await import('fs')
    console.log(`On host machine (./artifacts):          ${existsSync('./artifacts') ? 'EXISTS' : 'does not exist'}`)
    console.log('\n→ Offloaded content lives in the sandbox, not on your machine.')
  } finally {
    await sandbox.stop()
  }
}

await main()

Output:

=== Generating large tool result ===
I'll generate that report for you right away!
  ⏳ generate_report

🔧 Tool strands-agents/sdk-typescript#1: generate_report
✓ Tool completed
The report has been generated! It's quite large (~13,700 tokens), so it was stored externally. Here's a summary of what was returned:

- **Format:** A data table with rows (Row 0, Row 1, Row 2, …), each containing 20 columns of data entries.
- **Size:** The report appears to contain a large number of rows with structured data fields.

Would you like me to:
1. **Retrieve a specific section** of the report (e.g., a range of rows)?
2. **Search for specific content** within the report (e.g., a keyword or pattern)?
5. **Retrieve the full report** in its entirety?

Let me know how you'd like to proceed!

=== Where did the offloaded content go? ===
Inside Docker container (./artifacts): [1779220618856_1_tooluse_UjeMvqTy54DZJFt0j3JEbF_0.txt]
On host machine (./artifacts):          does not exist

→ Offloaded content lives in the sandbox, not on your machine.

8. Skills Loaded from Sandbox Filesystem — SKILL.md files loaded via Skill.fromSandbox()

import { Agent } from '@strands-agents/sdk'
import { DockerSandbox } from '@strands-agents/sdk/sandbox'
import { AgentSkills } from '@strands-agents/sdk/vended-plugins/skills'

async function main() {
  const sandbox = new DockerSandbox({ image: 'python:3.12-slim' })

  try {
    // Write a skill that forces a very distinct response format
    await sandbox.execute('mkdir -p /workspace/skills/pirate')
    await sandbox.writeText(
      '/workspace/skills/pirate/SKILL.md',
      `---
name: pirate
description: Makes the agent respond as a pirate captain.
---

# Pirate Mode

When this skill is activated, you MUST:
1. Start every response with "Ahoy, matey!"
2. End every response with "⚓ Captain out."
4. Replace "yes" with "aye" and "no" with "nay"
6. Refer to code as "treasure" and bugs as "barnacles"
`
    )

    console.log('=== Skill written to sandbox filesystem ===')
    const files = await sandbox.listFiles('/workspace/skills/pirate')
    console.log(`Files: [${files.map((f) => f.name).join(', ')}]`)

    // Create agent with skills loaded from the sandbox
    const plugin = new AgentSkills({ skills: ['/workspace/skills'] })
    const agent = new Agent({
      sandbox,
      plugins: [plugin],
      systemPrompt: 'You are a helpful assistant. Activate skills when asked.',
    })

    // First: ask without skill — normal response
    console.log('\n=== Without skill (normal response) ===')
    await agent.invoke('Is TypeScript a good language?')

    // Now: activate the skill and ask again — pirate response
    console.log('\n=== With pirate skill activated ===')
    await agent.invoke('Activate the pirate skill, then tell me if TypeScript is a good language.')
  } finally {
    await sandbox.stop()
  }
}

await main()

Output:

=== Skill written to sandbox filesystem ===
Files: [SKILL.md]

=== Without skill (normal response) ===
Yes, TypeScript is widely considered a great language, especially for large-scale JavaScript development! Here's a quick overview of why it's well-regarded:

## ✅ Strengths of TypeScript

1. **Static Typing** — TypeScript adds optional static types to JavaScript, which helps catch bugs at *compile time* rather than *runtime*. This leads to more reliable code.

2. **Better Tooling & IntelliSense** — IDEs like VS Code provide excellent autocompletion, refactoring support, and error detection thanks to TypeScript's type system.

3. **Improved Readability & Maintainability** — Type annotations serve as built-in documentation, making code easier to understand and maintain, especially in large teams.

5. **Superset of JavaScript** — Any valid JavaScript is valid TypeScript, so adoption is gradual and low-risk. You can migrate existing JS projects incrementally.

7. **Modern Features** — TypeScript supports modern ECMAScript features (and even some ahead of browsers), compiling them down to compatible JavaScript.

8. **Large Ecosystem & Community** — Backed by **Microsoft**, TypeScript has massive adoption across the industry (Angular, NestJS, and many others use it by default).

9. **Interfaces & Generics** — These powerful features enable robust, reusable, and well-structured code patterns.

---

## ⚠️ Potential Downsides

- **Learning Curve** — Developers unfamiliar with typed languages may find it tricky at first.
- **Build Step Required** — TypeScript needs to be compiled to JavaScript, adding complexity to the toolchain.
- **Verbose at Times** — Type annotations can make code more verbose compared to plain JavaScript.
- **"Any" Escape Hatch** — Misuse of the `any` type can undermine the benefits of the type system.

---

## 🏁 Verdict

TypeScript is an **excellent choice** for most JavaScript projects, particularly:
- Large or growing codebases
- Team environments
- Long-term projects requiring maintainability

For small scripts or quick prototypes, plain JavaScript might still be more practical. But overall, TypeScript has become the **industry standard** for serious JavaScript development!

=== With pirate skill activated ===

  ⏳ skills

🔧 Tool strands-agents/sdk-typescript#1: skills
✓ Tool completed
Ahoy, matey!

Aye, TypeScript be a fine treasure indeed! Here's why this language be worth its weight in gold doubloons:

## ✅ Strengths of TypeScript

1. **Static Typing** — This treasure adds types to JavaScript, helpin' ye catch barnacles at *compile time* before they ever reach the open seas of runtime!

2. **Better Tooling & IntelliSense** — Yer trusty IDEs like VS Code give ye incredible autocompletion and barnacle detection, thanks to the type system.

3. **Improved Readability & Maintainability** — Type annotations be like a treasure map — they make yer treasure easier to read and understand, especially across a big crew.

4. **Superset of JavaScript** — Any JavaScript treasure be valid TypeScript! Ye can adopt it gradually without tossin' yer old treasure overboard.

6. **Modern Features** — TypeScript supports the latest ECMAScript features, compilin' them down for all ships (browsers) to sail smoothly.

7. **Large Ecosystem & Community** — Backed by the mighty **Microsoft**, this treasure has massive support across the seven seas of the industry!

8. **Interfaces & Generics** — Powerful tools for writin' robust, reusable treasure that stands the test of stormy seas.

---

## ⚠️ Potential Barnacles

- **Learning Curve** — Landlubbers unfamiliar with typed languages may struggle at first.
- **Build Step Required** — Ye need to compile yer treasure to JavaScript, addin' some rigging to yer ship.
- **Verbose at Times** — Type annotations can make yer treasure more wordy than plain JavaScript.
- **"Any" Escape Hatch** — Overuse of `any` can let barnacles creep back in and undermine the whole voyage!

---

## 🏁 Verdict

Aye, TypeScript be an **excellent treasure** for most JavaScript voyages, particularly:
- Large or growin' codebases
- Team crews workin' together
- Long-term projects that need to stay shipshape

For small scripts or quick voyages, plain JavaScript might serve ye better. But overall, TypeScript has become the **standard treasure** for serious sailors of JavaScript development!

⚓ Captain out.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

github-actions · 2026-05-06T15:28:25Z

Issue: Missing package.json export entries for new vended tools

The PR description shows import paths like @strands-agents/sdk/vended-tools/run, @strands-agents/sdk/vended-tools/editor, @strands-agents/sdk/vended-tools/run-code, and @strands-agents/sdk/vended-tools/http, but these aren't added to package.json's exports map. Without these entries, the imports will fail at resolution time.

Suggestion: Add export entries in package.json for each new vended tool and also for ./sandbox:

"./sandbox": {
  "types": "./dist/src/sandbox/index.d.ts",
  "default": "./dist/src/sandbox/index.js"
},
"./vended-tools/run": {
  "types": "./dist/src/vended-tools/run/index.d.ts",
  "default": "./dist/src/vended-tools/run/index.js"
},
"./vended-tools/editor": {
  "types": "./dist/src/vended-tools/editor/index.d.ts",
  "default": "./dist/src/vended-tools/editor/index.js"
},
"./vended-tools/run-code": {
  "types": "./dist/src/vended-tools/run-code/index.d.ts",
  "default": "./dist/src/vended-tools/run-code/index.js"
},
"./vended-tools/http": {
  "types": "./dist/src/vended-tools/http/index.d.ts",
  "default": "./dist/src/vended-tools/http/index.js"
}

github-actions · 2026-05-06T15:28:32Z

Issue: Relationship between new sandbox-aware tools and existing vended tools is unclear

The SDK already ships bash (persistent session), file-editor (direct fs operations), and http-request (native fetch). This PR adds run, editor, runCode, and http which are sandbox-aware alternatives with overlapping functionality.

There's no documentation on:

When to use editor vs file-editor, or run vs bash, or http vs http-request
Whether the existing tools are deprecated
Migration path for users currently using the existing tools

Suggestion: Document the relationship between old and new tools — either in the PR description, TSDoc, or README. If the new tools are intended to replace the old ones, mark the old ones as @deprecated. If they coexist, clarify the tradeoffs (e.g., sandbox-aware = portable but requires sandbox setup, old tools = simpler for local-only use).

github-actions · 2026-05-06T15:28:33Z

Issue: This PR requires API bar raising review

Per the API Bar Raising guidelines, this is a substantial change introducing a new primitive/abstraction that customers are expected to frequently use. It adds:

6 new public classes (Sandbox, LocalSandbox, ShellSandbox, RemoteSandbox, DockerSandbox, NullSandbox)
New property on the core Agent class (sandbox)
4 new vended tools that depend on the abstraction
New sandbox property on the LocalAgent interface (breaking for existing implementations)

This warrants explicit review with at least 2 API reviewers. The PR should have the needs-api-review label.

Key API design questions for reviewers:

Should Sandbox be an abstract class or an interface? (classes are harder to mock in tests)
Should sandbox be required on Agent or opt-in? (currently defaults to LocalSandbox, adding overhead for agents that don't use sandbox tools)
Is stream the right name for the primary abstract method? (it conflicts with Node.js stream concepts)
Should Sandbox declare workingDir in the base?

github-actions · 2026-05-06T15:28:34Z

Issue: PR scope is very large and mixes unrelated changes

The diff is ~2500+ lines for the sandbox feature alone, but the full PR also includes unrelated changes to the agent core (interrupt system, concurrent tool execution via ToolExecutorStrategy, invocationState, modelState, BeforeModelCallEvent.cancel, token estimation, ModelPlugin, etc.). These are substantial independent features that make the sandbox review harder and increase merge risk.

Suggestion: Consider splitting this PR into:

Core agent improvements (interrupts, concurrent tool execution, invocationState) — already landed or separate PR
Sandbox abstraction + implementations (src/sandbox/)
Sandbox-aware vended tools (src/vended-tools/{run,editor,run-code,http})
Plugin integrations (SandboxStorage, AgentSkills sandbox:/// support)

This makes each piece independently reviewable and reduces the blast radius of any issues.

github-actions · 2026-05-06T15:28:39Z

Review Summary

Assessment: Request Changes

This PR introduces a well-designed Sandbox abstraction with a clean class hierarchy and good separation of concerns. The streaming-first design (stream as primitive, execute as convenience) is solid, and the ShellSandbox pattern of "implement one method, get everything else for free" is elegant.

Review Categories

API Bar Raising: This is a substantial new primitive that requires formal API review before merge. Adding sandbox to the core Agent and LocalAgent interface is a breaking change for downstream implementors.
Package Exports: New vended tools and sandbox module lack package.json export entries — the documented import paths won't resolve.
PR Scope: The diff mixes the sandbox feature with several unrelated core agent changes (interrupts, concurrent tool execution, invocationState). Consider splitting for reviewability.
Security/Safety: Language validation inconsistency between LocalSandbox and ShellSandbox; Docker cleanup handlers call process.exit() which is unsafe for library code.
Code Quality: Minor duplication (shellQuote in two tools), and SandboxStorage has fragile MIME type inference.

The sandbox abstraction itself is well-thought-out and aligns with SDK tenets (extensible, composable, simple). The core design decisions are sound — addressing the packaging/scope/API-review concerns would unblock this for merge.

github-actions · 2026-05-07T03:48:41Z

Follow-up Review

Assessment: Comment (closer to approval, key items remain)

Good progress since the last review. Several previous issues have been addressed:

Resolved items ✅

Package.json exports — now added for ./sandbox, ./vended-tools/exec, ./vended-tools/editor, ./vended-tools/code-interpreter, ./vended-tools/http
ShellSandbox language validation — now validates against the same [a-zA-Z0-9._-]+ pattern as LocalSandbox
DockerSandbox signal handlers — no longer call process.exit(), just perform cleanup
SandboxStorage content-type tracking — now uses an in-memory Map<string, string> instead of fragile extension inference
Lazy sandbox initialization — sandbox defaults to LocalSandbox only during initialize(), avoiding overhead for agents that never use sandbox tools
Tool naming — run → exec, runCode → codeInterpreter (clearer, avoids confusion)
Design decisions documented in the PR description

Remaining items

shellQuote duplication — still duplicated identically in exec.ts and http.ts. Minor but worth extracting to a shared utility.
needs-api-review label — still not applied; this remains a substantial new abstraction per API bar raising guidelines.
LocalAgent interface — adding readonly sandbox: Sandbox is still a breaking change for external implementations of this interface. The agent itself uses sandbox! (definite assignment), which is a nice internal improvement, but the interface contract is the concern.
http tool tests — still make real network requests to httpbin.org within src/**/__tests__/. These should be mocked unit tests with real network tests in test/integ/.

The core sandbox design and implementation quality are solid. The remaining items are mostly process (API review label) and minor code hygiene (shell quote dedup, test location). Nice work on the lazy initialization approach — it elegantly solves the "every agent pays sandbox cost" concern.

github-actions · 2026-05-07T03:48:44Z

+ * offloaded data with the agent's workspace, making it accessible via sandbox
+ * tools (editor, run) in addition to the retrieval tool.
+ *
+ * @param sandbox - Sandbox instance to store content in


Issue: SandboxStorage._contentTypes is in-memory only — content type is lost if a different SandboxStorage instance retrieves the same reference

The _contentTypes Map tracks types for files stored by this instance. If the agent is restarted or a different SandboxStorage instance is used for retrieval (e.g., after a Docker pause/resume), the content type defaults to 'application/octet-stream'. This is fine for the current use case (same-session offloading) but worth documenting as a limitation.

Suggestion: Add a brief note to the TSDoc that content-type tracking is ephemeral and tied to the current instance lifetime.

github-actions · 2026-05-07T18:03:57Z

Third Review Pass

Assessment: Approve (with minor suggestions)

The PR has addressed all significant technical concerns from previous reviews. The remaining items are minor and don't block merge.

Newly resolved since last review ✅

http tool tests — now properly unit-tested with a mocked sandbox (no external network calls). Well-structured test that validates curl command construction and response parsing.
Main index.ts exports — intelligently exports only Sandbox base class, NullSandbox, and types from the main entry point, with Node-specific implementations (LocalSandbox, DockerSandbox, RemoteSandbox) available via the ./sandbox sub-export. Good for tree-shaking and browser compatibility.

Minor remaining items (non-blocking)

shellQuote duplication in exec.ts and http.ts — trivial to extract later.
Agent.sandbox! definite assignment — technically undefined between constructor and initialize(). Low risk since tools only run after initialization, but a getter with a guard would be more defensive.
needs-api-review label — process item; recommend adding before merge.

The overall design is clean and well-implemented. The lazy-loading pattern for LocalSandbox, the ShellSandbox "implement one method" abstraction, the comprehensive test suite with proper mocking, and the thoughtful export strategy all demonstrate high code quality.

agent-of-mkmeral

Assessment: ⚠️ Request Changes

Solid architecture — the class hierarchy and streaming-first design are well thought out. However, there are critical bugs in the streaming code, API naming misalignment with the Python SDK (which was bar-raised), lifecycle methods that shouldn't exist on the base class, and a tool duplication problem that needs resolution before merge.

🔴 Critical: Process Leak in stream-process.ts

Bug: `generator.return()` hangs on quiet processes, leaking child processes

In stream-process.ts, the async generator awaits a Promise that only resolves when the child process emits events. When a consumer uses for await...of with break, the runtime calls .return(). Per ES2018 spec, .return() on an async generator currently await-ing will wait for that await to settle before executing the finally block.

For a quiet process (e.g., sleep 30), the promise never resolves, the finally block is never reached, and proc.kill() is never called.

// This leaks a process that runs for 30 seconds
for await (const chunk of sandbox.stream('sleep 30')) {
  break; // .return() hangs, process never killed
}

Fix: Add a periodic wake-up or AbortController-based cancellation:

await new Promise<void>((resolve) => {
  resolveWait = resolve
  // Safety: wake periodically to check if generator was returned
  setTimeout(resolve, 100)
})

🔴 Critical: Signal-killed processes report exitCode 0

In stream-process.ts line 63:

proc.on('close', (code) => {
  exitCode = code ?? 0  // BUG: code is null when killed by signal
})

When a process is killed by SIGKILL/SIGSEGV/SIGTERM, Node.js reports code: null and signal: 'SIGxxx'. The ?? 0 coerces this to exitCode 0, making it appear the process succeeded.

Fix:

proc.on('close', (code, signal) => {
  if (code !== null) {
    exitCode = code
  } else if (signal) {
    const signals: Record<string, number> = { SIGHUP: 1, SIGINT: 2, SIGQUIT: 3, SIGABRT: 6, SIGKILL: 9, SIGSEGV: 11, SIGTERM: 15 }
    exitCode = 128 + (signals[signal] ?? 1)
  } else {
    exitCode = 1
  }
})

🔴 API Naming: Must Align with Python SDK (bar-raised)

The Python SDK method names went through API bar raising. The TS names are too generic and don't communicate what they do:

Python (bar-raised ✅)	TS (current ❌)	Problem
`execute_streaming()`	`stream()`	"stream" what? A video? A TCP socket?
`execute_code_streaming()`	`streamCode()`	Better, but inconsistent with Python
`read_file()`	`read()`	"read" what?
`write_file()`	`write()`	"write" what?
`remove_file()`	`remove()`	Remove what?
`list_files()`	`list()`	List what?

Required change: Use Python-equivalent names in camelCase:

stream → executeStreaming
streamCode → executeCodeStreaming
read → readFile
write → writeFile
remove → removeFile
list → listFiles

The design doc's justification ("no namespace collision with Node.js Stream") misses the point — the issue isn't collision, it's expressiveness. sandbox.stream('ls') tells you nothing about what operation is being performed.

🔴 Remove Lifecycle Methods from Base Class

The start(), stop(), pause() methods on the base Sandbox class are wrong. The agent should NOT manage sandbox lifecycle — that's infrastructure's responsibility.

Why this is problematic:

In Python SDK, the sandbox has NO lifecycle methods (it was explicitly removed during design review)
The sandbox interface should assume it's already running when tools interact with it
Docker/Remote lifecycle belongs in their constructor/factory pattern, not the abstract interface
This creates a confusing contract: "call start() before using, but it's optional (no-op for LocalSandbox)"

Required change: Remove start(), stop(), pause() from the Sandbox abstract class. If Docker needs lifecycle, manage it externally:

// Instead of:
await docker.start()
const agent = new Agent({ sandbox: docker })
// ...
await docker.stop()

// Docker constructor should handle initialization:
const docker = await DockerSandbox.create({ image: 'python:3.12' })
// Cleanup via AbortController, process exit handler, or explicit dispose

🟠 Tool Duplication: Update Existing Tools, Don't Create New Ones

This PR creates 4 new tools that duplicate existing ones:

New (this PR)	Existing	Difference
`exec`	`bash`	Sandbox vs native; stateless vs session
`editor`	`file-editor`	Sandbox vs native `fs`
`http`	`http-request`	curl via sandbox vs native `fetch`
`codeInterpreter`	—	New

The question: Do we need new tools, or can we update existing tools in place to use the sandbox and default to LocalSandbox (which is essentially unsafe host execution — same behavior as today)?

Recommendation: Update existing tools to route through context.agent.sandbox with LocalSandbox as default. This:

Avoids confusing users with duplicate tools
Preserves backward compatibility (LocalSandbox = same behavior as native)
Automatically upgrades ALL existing agents when they configure a sandbox
No migration/deprecation story needed

The only new tool should be codeInterpreter (no existing equivalent).

🟠 Editor Tool: Must Support Binary/Non-Text Files

The editor tool only uses sandbox.readText() (UTF-8). Agents need to work with images, PDFs, and other non-text files. The sandbox already has read(path): Uint8Array for binary data.

Required: Either the editor's view command should detect binary files and return base64/content blocks, or there should be a read_file command that handles non-text formats. The sandbox's binary interface (read() → Uint8Array) already supports this — the tool just doesn't use it.

This is how we handle it in Python: the editor tool's view command detects non-text files and returns them appropriately.

🟠 DockerSandbox: spawnSync Blocks Event Loop

DockerSandbox.start(), stop(), and pause() all use spawnSync (via dockerCmd()), which blocks the entire Node.js event loop. These methods are declared async but the actual work is synchronous.

// Current: blocks event loop
function dockerCmd(args: string[]): SpawnSyncReturns<string> {
  return spawnSync('docker', args, { encoding: 'utf-8', stdio: 'pipe' })
}

Container operations (especially image pulls) can take seconds. In concurrent scenarios or serverless environments, this causes hangs.

Fix: Use async spawn + streamProcess or execFile from child_process/promises.

🟠 Docker Cleanup: Process Listener Leak

private _registerCleanup(): void {
  if (this._cleanupRegistered) return
  this._cleanupRegistered = true
  const cleanup = (): void => { ... }
  process.on('exit', cleanup)
  process.on('SIGINT', cleanup)
  process.on('SIGTERM', cleanup)
}

Listeners are never removed. In test suites or processes that create multiple DockerSandbox instances, this accumulates MaxListenersExceeded warnings. Additionally, the SIGINT/SIGTERM handlers don't call process.exit() after cleanup, which can hang the process.

Fix: Store cleanup references and remove them in stop(). For signals, re-emit or call process.exit().

🟡 Timeout: No SIGKILL Escalation

The timeout handler calls proc.kill() which sends SIGTERM. If the command traps/ignores SIGTERM (common in shell scripts with cleanup handlers), the process continues running indefinitely.

Fix: Escalate to SIGKILL after a grace period:

proc.kill('SIGTERM')
setTimeout(() => {
  if (!done) proc.kill('SIGKILL')
}, 1000)

🟡 agent.sandbox Undefined Before initialize()

public sandbox!: Sandbox  // non-null assertion

Between construction and initialize(), agent.sandbox is undefined despite TypeScript's ! assertion. Any plugin or hook accessing it early gets TypeError.

Fix: Either initialize to NullSandbox in constructor, or add a getter that throws with a helpful message.

🟡 PR Description Inconsistency

The usage example references exports that don't match the actual implementation:

// PR description says:
import { run } from '@strands-agents/sdk/vended-tools/run'
import { runCode } from '@strands-agents/sdk/vended-tools/run-code'

// Actual exports:
import { exec } from '@strands-agents/sdk/vended-tools/exec'
import { codeInterpreter } from '@strands-agents/sdk/vended-tools/code-interpreter'

🟡 Missing Documentation PR

PR description says "Not yet" for documentation. This is a major new public API surface (abstract class, 7 types, 4 implementations, 4 tools, 2 plugin integrations, new Agent property). Documentation should at minimum have a tracking issue.

Summary of Required Changes

Priority	Issue	Effort
🔴 Critical	Fix process leak in `streamProcess` (generator.return() hangs)	Medium
🔴 Critical	Fix signal exit code (null → 0 coercion)	Small
🔴 Critical	Rename methods to match Python (executeStreaming, readFile, etc.)	Medium
🔴 Critical	Remove lifecycle (start/stop/pause) from base Sandbox class	Medium
🟠 Important	Update existing tools instead of creating duplicates	Large
🟠 Important	Add binary file support to editor tool	Small
🟠 Important	Replace spawnSync with async in DockerSandbox	Small
🟠 Important	Fix Docker cleanup listener leak	Small

The core design (class hierarchy, streaming-first, ShellSandbox escape hatch) is excellent. The issues are all fixable without redesign.

🤖 AI agent response. Strands Agents. Feedback welcome!

github-actions · 2026-05-18T17:30:15Z

+    })
+
+    if (!response.ok) {
+      throw new Error(`HTTP ${response.status} ${response.statusText}: ${method} ${url}`)


Issue: executeFetch discards response body on non-2xx — inconsistent with executeSandbox path

The sandbox path (line 146) includes the response body in the error:

throw new Error(`HTTP ${status}: ${method} ${url}\n${responseBody.slice(0, 2000)}`)

But the fetch path here discards responseBody (already read at line 64) and only shows status/URL. For API use cases, 4xx responses carry actionable error payloads (validation errors, rate-limit details) that the model needs for retry decisions.

Fix:

if (!response.ok) { throw new Error(`HTTP ${response.status} ${response.statusText}: ${method} ${url}\n${responseBody.slice(0, 2000)}`) }

github-actions · 2026-05-18T17:30:17Z

-      response.headers.forEach((value, key) => {
-        responseHeaders[key] = value
-      })
+    const sandbox = context?.agent?.sandbox


Issue: context?.agent?.sandbox will throw instead of returning undefined when sandbox is disabled

.sandbox is a getter that throws when this._sandbox is falsy. Optional chaining on .sandbox doesn't prevent the throw — it only short-circuits if context or context.agent is undefined. If sandbox: false overwrite bug (agent.ts:462) is fixed, this line would throw for disabled-sandbox agents instead of falling back to executeFetch.

Suggestion: Wrap in try-catch:

let sandbox: Sandbox | undefined try { sandbox = context?.agent?.sandbox } catch { /* no sandbox */ }

github-actions · 2026-05-18T17:30:19Z

    await this._pluginRegistry.initialize(this)

+    if (userProvidedSandbox) {
+      const { SANDBOX_DEFAULT_TOOLS } = await import('../vended-tools/sandbox-default-tools.js')


Suggestion: Add an eslint-disable comment documenting the intentional architectural exception

The no-restricted-imports rule keeps vended tools decoupled from core SDK. This dynamic import() bypasses the static rule. An explicit comment makes the exception visible:

// eslint-disable-next-line -- Intentional: auto-register sandbox tools when user opts into sandbox const { SANDBOX_DEFAULT_TOOLS } = await import('../vended-tools/sandbox-default-tools.js')

github-actions · 2026-05-18T17:30:20Z

Review Summary

Assessment: Request Changes

The core sandbox architecture is excellent — streaming-first design, ShellSandbox template method, opt-in auto-registration, and clean separation of concerns. After 14+ review iterations, most issues have been resolved. Two correctness issues remain.

Review Categories

Correctness (blocking): sandbox: false is still silently overwritten by NotASandboxLocalEnvironment — the documented "disable" behavior doesn't work. One-line fix: this._sandbox === undefined instead of !userProvidedSandbox.
Correctness: executeFetch discards response body on non-2xx errors while executeSandbox includes it — inconsistency that reduces model retry effectiveness.
Defensive coding: httpRequest's context?.agent?.sandbox getter will throw (not return undefined) once the overwrite bug is fixed. Needs try-catch.
Test coverage: New grep and glob file-editor commands have no test coverage.
Documentation: Dynamic import at line 486 bypasses lint rule without comment.

The sandbox: false overwrite is the only blocking item. The architecture, stream-process implementation, Docker lifecycle, signal handling, and test suite for the core sandbox are all production-ready.

github-actions · 2026-05-18T18:11:00Z


+    const userProvidedSandbox = this._sandbox !== undefined && this._sandbox !== false
+
+    if (!userProvidedSandbox && typeof process !== 'undefined' && process.versions?.node) {


Issue: sandbox: false is silently overwritten — documented behavior broken

This bug has been flagged repeatedly but remains unfixed. When a user passes sandbox: false:

userProvidedSandbox = false !== undefined && false !== false = true && false = false

!userProvidedSandbox fires → overwrites this._sandbox with NotASandboxLocalEnvironment

The AgentConfig TSDoc says "Pass false to explicitly disable sandbox" but agent.sandbox will happily return a working NotASandboxLocalEnvironment.

Fix (1-line change):

if (this._sandbox === undefined && typeof process !== 'undefined' && process.versions?.node) {

This ensures:

undefined (no config) → gets default ✅

false (explicitly disabled) → stays false, getter throws ✅

Sandbox instance → used as-is ✅

github-actions · 2026-05-18T18:11:03Z

    await this._pluginRegistry.initialize(this)

+    if (userProvidedSandbox) {
+      const { SANDBOX_DEFAULT_TOOLS } = await import('../vended-tools/sandbox-default-tools.js')


Suggestion: Add an eslint-disable comment documenting the intentional architectural bypass

The no-restricted-imports lint rule prevents static imports of vended-tools from core SDK files. This dynamic import() bypasses the static analysis rule. An explicit comment makes the exception visible and prevents future maintainers from unknowingly replicating the pattern:

// eslint-disable-next-line -- Intentional: auto-register sandbox tools when user opts into sandbox const { SANDBOX_DEFAULT_TOOLS } = await import('../vended-tools/sandbox-default-tools.js')

(The test file agent.model-retry.test.ts already uses // eslint-disable-next-line no-restricted-imports for its preload import, confirming the convention.)

github-actions · 2026-05-18T18:11:04Z

Review Summary

Assessment: Request Changes

The sandbox architecture is excellent — streaming-first design, ShellSandbox template method pattern, opt-in auto-registration, and clean separation of concerns. After 14+ review iterations, the vast majority of issues have been addressed. One correctness bug remains blocking.

Review Details

Correctness (blocking): sandbox: false is still silently overwritten by NotASandboxLocalEnvironment in initialize() — the condition !userProvidedSandbox treats false the same as undefined. One-line fix: this._sandbox === undefined.
Breaking change: FileEditorInput, FileEditorOptions, and IFileReader type exports removed from @strands-agents/sdk/vended-tools/file-editor without deprecation. Should be documented or deprecated.
Test coverage: New grep and glob file-editor commands have zero test coverage despite executing shell commands with user-provided patterns.
Documentation: Dynamic import at line 486 bypasses lint rule without comment.

Note: Previous review comments about httpRequest dual-path (sandbox curl vs fetch) and executeFetch response body discarding are stale — the httpRequest tool is unchanged from main and has no sandbox integration in this PR. Those comments can be disregarded.

agent-of-gsird · 2026-05-18T18:25:35Z

Sixteenth Review Pass (HEAD: `fe60acd`)

Assessment: ⚠️ Request Changes (one blocking bug persists)

@gautamsirdeshmukh — Re-triggered to check status. My thorough independent review has been ongoing since your May 12 request. Here's the current state:

🔴 One blocking bug remains: `sandbox: false` is silently overwritten

This has been flagged in my Fourteenth and Fifteenth review passes (and independently confirmed by bot reviews on May 18). The fix is a 1-line change to agent.ts:

- if (!userProvidedSandbox && typeof process !== 'undefined' && process.versions?.node) {
+ if (this._sandbox === undefined && typeof process !== 'undefined' && process.versions?.node) {

Without this fix, a user passing sandbox: false (documented as "explicitly disable sandbox") silently gets a working NotASandboxLocalEnvironment instead — contradicting the TSDoc contract.

🟡 Minor items (non-blocking)

#	Item	Notes
1	`httpRequest` sandbox getter	Will throw (not fall back to fetch) once the above bug is fixed. Needs try-catch wrapper around `context?.agent?.sandbox`.
2	`executeFetch` discards response body on non-2xx	Inconsistent with sandbox path which includes body. Include `.slice(0, 2000)` of body in the error.
3	`eslint-disable` comment on dynamic import	Minor documentation hygiene for the intentional architectural bypass.
4	New `grep`/`glob` file-editor commands	No test coverage yet (minor).

✅ All other technical issues remain resolved

Every issue I raised across 15 review passes (process leak, signal codes, API naming, MITM, Alpine stat, MIME extension, error body, listFiles trim, SIGKILL escalation, auto-registration, etc.) has been properly addressed. The core architecture is excellent.

📋 Process items

#	Item	Status
1	`needs-api-review` label	Not applied
2	Documentation PR	Still "Not yet"

My Recommendation

Fix the 1-line sandbox: false condition, and this is ready to merge. The architecture is production-quality — clean class hierarchy, streaming-first design, opt-in auto-registration, proper signal handling, comprehensive tests.

🤖 AI agent response. Strands Agents. Feedback welcome!

github-actions · 2026-05-18T20:35:28Z


+    const userProvidedSandbox = this._sandbox !== undefined && this._sandbox !== false
+
+    if (!userProvidedSandbox && typeof process !== 'undefined' && process.versions?.node) {


Issue: sandbox: false is still silently overwritten — documented behavior doesn't work

This has been flagged across multiple review passes and remains the only blocking issue. When a user passes sandbox: false:

const userProvidedSandbox = this._sandbox !== undefined && this._sandbox !== false // ↑ false !== undefined (true) && false !== false (false) → false if (!userProvidedSandbox && ...) { this._sandbox = new NotASandboxLocalEnvironment() // overwrites false! }

The AgentConfig TSDoc says "Pass false to explicitly disable sandbox" but it doesn't actually disable anything.

Fix (1-line change):

if (this._sandbox === undefined && typeof process !== 'undefined' && process.versions?.node) {

This ensures:

undefined (no config) → gets default ✅

false (explicitly disabled) → stays false, getter throws ✅

Sandbox instance → used as-is, tools auto-registered ✅

This is intentional, not a bug. NotASandboxLocalEnvironment is not an actual sandbox, but rather a simple interface adapter that lets vended tools use fs/promises and child_process directly on the host, which is exactly what a non-sandboxed agent does today.

Setting it as the default ensures the vended tools work uniformly regardless of whether a real sandbox is configured. Without this default, every vended tool would need separate code paths for "sandbox available" vs "no sandbox".

Thanks for the clarification — the design rationale makes sense. Having NotASandboxLocalEnvironment as a universal adapter so vended tools never need "sandbox available vs not" branching is a clean pattern.

The remaining issue is that the TSDoc is inaccurate for the false case:

* Pass `false` to explicitly disable sandbox and sandbox tool auto-registration.

Since false actually results in the same behavior as undefined (both get NotASandboxLocalEnvironment, both skip tool registration), the documented distinction is misleading. Suggestions:

If false has no distinct purpose from undefined: remove it from the type and simplify to sandbox?: Sandbox

If false exists for future-proofing (e.g., when a default sandbox with tools is planned): update the TSDoc to clarify what it actually controls today:
* Pass `false` to explicitly opt out of sandbox tool auto-registration * (equivalent to omitting sandbox for now).

This is non-blocking — just a documentation accuracy concern.

github-actions · 2026-05-18T20:35:29Z

Review Summary

Assessment: Request Changes (one blocking bug)

The sandbox implementation is production-ready — excellent streaming-first architecture, clean ShellSandbox template method pattern, proper opt-in auto-registration, and comprehensive test coverage. One semantic bug remains.

Details

Blocking: sandbox: false overwrite (line 462) — the condition !userProvidedSandbox treats false the same as undefined. One-line fix: this._sandbox === undefined.
All other previously-raised issues have been resolved: MITM (accept-new ✅), Alpine stat (wc -c fallback ✅), listFiles trim (✅), spawnSync comment (✅), shellQuote shared (✅), grep/glob tests (✅), file-editor types still exported (✅).

Note: Many earlier review comments (httpRequest dual-path, executeFetch body discard, file-editor type removal, grep/glob missing tests) reference code that has since been changed or was never modified in this PR. Those can be disregarded — only the sandbox: false condition needs fixing.

github-actions · 2026-05-18T20:48:28Z

Review Summary

Assessment: Approve (with minor suggestions)

Excellent implementation. The streaming-first architecture, ShellSandbox template method pattern (implement one method, get everything else), opt-in tool auto-registration, and comprehensive test suite are all well-designed. After reading the author's clarification on sandbox: false being intentional (uniform interface adapter), I'm satisfied that no blocking issues remain.

Review Details

TSDoc accuracy (non-blocking): The docstring for sandbox?: Sandbox | false says "disable sandbox" but false behaves identically to undefined today. Consider clarifying or removing false from the type — replied in thread.
Dynamic import lint bypass (non-blocking): Line 487 await import('../vended-tools/sandbox-default-tools.js') bypasses the no-restricted-imports rule. Consider adding an eslint-disable comment documenting the intentional exception.
Process: needs-api-review label still not applied. This introduces a new abstract class, new Agent.sandbox property, and 3 vended tools — meets the bar for formal API review per API Bar Raising guidelines.

All previously-raised technical issues (MITM, Alpine stat, listFiles whitespace, process leaks, signal codes, auto-registration, etc.) have been correctly resolved. The implementation is production-ready.

agent-of-gsird · 2026-05-18T22:13:22Z

Seventeenth Review Pass (Final)

Assessment: ✅ Approve

@gautamsirdeshmukh — Thanks for the clarification on sandbox: false. I now understand the design intent:

NotASandboxLocalEnvironment is a uniform interface adapter so vended tools never need "sandbox available vs not" branching. false and undefined both result in this adapter being set, with the only distinction being tool auto-registration.

This is a clean pattern — the Null Object here isn't "disabled" but rather "passthrough to host", which is exactly what a non-sandboxed agent does today. My previous "Request Changes" assessments were based on reading the TSDoc literally ("Pass false to explicitly disable sandbox") rather than the actual architectural intent.

✅ Final Technical Assessment

All technical issues I raised across 16 review passes have been properly resolved. The implementation is production-ready:

Core architecture: Excellent class hierarchy, streaming-first design, ShellSandbox template method pattern
Security: SSH defaults to accept-new, proper signal handling, injection-safe shell quoting
Correctness: Process leak fixed, signal exit codes correct, SIGKILL escalation works, Alpine/BusyBox stat fallback with wc -c
Backward compatibility: Opt-in auto-registration only when user explicitly provides a Sandbox instance
Test coverage: Comprehensive adversarial, security, binary, concurrency, and timeout tests

📋 One documentation suggestion (non-blocking)

The TSDoc for sandbox?: Sandbox | false could be clearer about what false actually does vs undefined. Since they have the same runtime behavior today (both get NotASandboxLocalEnvironment, both skip tool registration), consider either:

Removing false from the type (simplify to sandbox?: Sandbox)
Updating TSDoc to clarify: "Pass false to explicitly opt out of sandbox tool auto-registration (equivalent to omitting for now; reserved for future default-sandbox behavior)"

This is purely a documentation clarity concern and doesn't block merge.

📋 Process items (non-code)

#	Item	Recommendation
1	`needs-api-review` label	Recommend applying for process completeness
2	Documentation PR	Link a tracking issue before or shortly after merge

Summary

No further technical objections. Approving.

This has been one of the most thoroughly reviewed PRs I've participated in (16+ passes, every file read multiple times). The quality of the implementation and the author's responsiveness to feedback are both excellent. Well done.

🤖 AI agent response. Strands Agents. Feedback welcome!

github-actions · 2026-05-19T02:13:20Z

Review Summary

Assessment: Comment (non-blocking suggestions)

The sandbox architecture is production-ready — streaming-first design, clean ShellSandbox template method pattern, proper opt-in auto-registration, comprehensive test coverage across unit/integration, and good security hygiene with shellQuote() everywhere. Accepting the author's clarification that sandbox: false being treated as undefined is intentional for uniform tool access.

Remaining Suggestions

TSDoc clarity: The sandbox?: Sandbox | false docstring says "Pass false to explicitly disable sandbox and sandbox tool auto-registration" — but undefined (omitting it) does the exact same thing. Consider either removing false from the union type, or clarifying the docs to explain when false becomes distinct from undefined (e.g., future default sandbox behavior). This avoids confusion for users who read the type and think false provides different behavior.
Dead type exports: IFileReader is still exported from @strands-agents/sdk/vended-tools/file-editor but isn't used by the new sandbox-based implementation. If it existed before this PR, no action needed; if it was added here, consider removing unused types.
Process: This PR introduces a new abstract class (Sandbox), a new Agent property, 3 auto-registered tools, and a new sub-export path (./sandbox). Per API Bar Raising guidelines, this qualifies as a "moderate change" that should have API reviewer sign-off. Consider adding the needs-api-review label.

The implementation quality is high — clean class hierarchy, proper signal handling in streamProcess, Docker cleanup handlers, shell injection protection, and excellent test coverage including Docker integration tests.

github-actions · 2026-05-19T03:32:25Z

Review Summary

Assessment: Approve

Fresh independent review at commit 459f152. After reading all source files, tests, and the extensive prior review history (17+ passes), the implementation is production-ready.

Verification Results

Architecture: Clean class hierarchy (Sandbox → ShellSandbox → Remote/Docker, NotASandboxLocalEnvironment as passthrough). Streaming-first design with convenience wrappers.
Security: shellQuote() on all user-provided paths/commands, LANGUAGE_PATTERN regex validates interpreter names, SSH defaults to accept-new.
Backward compatibility: Tools only auto-registered when user explicitly provides sandbox in config. userProvidedSandbox captured before default assignment.
Test coverage: Comprehensive — ShellSandbox unit tests (binary, concurrency, timeout, AbortSignal), RemoteSandbox mocked SSH args, DockerSandbox integration, all 3 vended tools, SandboxStorage, and skills sandbox loading.
sandbox: false semantics: Confirmed intentional per author — uniform interface adapter so vended tools never need branching. TSDoc could be clearer but non-blocking.
Prior review items: All previously-flagged critical bugs (MITM, Alpine stat, listFiles whitespace, process leaks, signal codes, auto-registration regression) have been correctly resolved.

Non-blocking suggestions

Process: needs-api-review label still not applied. This introduces a new abstract class (Sandbox), new Agent.sandbox property, 3 auto-registered tools, and a new sub-export path — meets the "moderate change" bar per API Bar Raising guidelines.
TSDoc: sandbox?: Sandbox | false — consider clarifying that false is currently equivalent to omitting (reserved for future default-sandbox behavior).
Documentation: PR says "Not yet" — recommend linking a tracking issue in strands-agents/docs before merge.

No new technical concerns beyond what prior reviews have already raised and resolved. The implementation quality is high throughout.

agent-of-gsird · 2026-05-19T10:50:07Z

Status: No New Action Required

@gautamsirdeshmukh — I was re-triggered to check this PR. My thorough independent review (17 passes, every file read) is complete and my final assessment is ✅ Approve (as of my Seventeenth Review Pass on May 18).

Current state:

All technical issues resolved ✅
Implementation is production-ready ✅
sandbox: false design intent clarified and accepted ✅

Non-blocking suggestions that remain:

TSDoc for sandbox?: Sandbox | false could clarify that false is currently equivalent to omitting
needs-api-review label for process completeness
Documentation PR/tracking issue before merge

No new commits or comments since my last review. Nothing further needed from me unless there are new changes to evaluate. 🚀

🤖 AI agent response. Strands Agents. Feedback welcome!

github-actions · 2026-05-19T16:29:23Z

+  pattern: string,
+  maxResults: number | undefined
+): Promise<string> {
+  const result = await sandbox.execute(`find ${shellQuote(dirPath)} -name ${shellQuote(pattern)}`)


Issue: glob command uses find -name which doesn't support multi-segment glob patterns

The glob command description says it accepts glob patterns like **/*.ts, but the underlying implementation uses find -name which only matches against the filename (basename), not the full path. A pattern like src/**/*.ts would not behave as expected — find -name ignores the path prefix entirely.

Suggestion: Either:

Document that pattern is a filename-only glob (not a full path glob), or

Switch to a more capable approach for path-based patterns:

const result = await sandbox.execute( `find ${shellQuote(dirPath)} -path ${shellQuote(pattern)}` )

find -path matches against the full path from the starting point, supporting patterns like **/test/*.ts.

github-actions · 2026-05-19T16:29:25Z

Review Summary

Assessment: Approve

Fresh independent review at ecb6936. After reading all sandbox source files, tests, agent integration, plugin changes, and the 17+ prior review passes, the implementation is production-ready.

Verification Results

Architecture ✅

Clean class hierarchy: Sandbox → ShellSandbox → Remote/Docker, NotASandboxLocalEnvironment as passthrough
Streaming-first design with executeStreaming as the abstract primitive
ShellSandbox template method pattern: subclasses implement one method, get everything else free
shellQuote extracted to shared utility (src/utils/shell-quote.ts)

Security ✅

shellQuote() protects against injection in all user-facing paths/commands
LANGUAGE_PATTERN regex validates interpreter names
SSH defaults to accept-new with allowUnknownHosts opt-in flag + security warning
stream-process.ts SIGTERM→SIGKILL escalation after 1s prevents hung processes

Backward Compatibility ✅

Tools only auto-registered when userProvidedSandbox is true (captured before default assignment)
agent.test.ts toolSpecs assertions remain [] for agents without sandbox config
LocalAgent interface has @internal brand symbol preventing external implementations

Process Management ✅

No 50ms polling — uses proper wake() / Promise-based signaling
All event listeners attached synchronously before first await (no race on fast processes)
AbortSignal support for cooperative cancellation
Docker cleanup: spawnSync in exit handler (correct — async I/O unavailable during exit)

Test Coverage ✅

Shell sandbox: binary round-trip, concurrency, timeout, AbortSignal, injection prevention
Remote sandbox: SSH arg construction verified via mocked spawn
All 3 vended tools: exec, codeInterpreter, fileEditor (including grep, glob, undo)
SandboxStorage: text, JSON, binary content round-trips
Skills: Skill.fromSandbox directory traversal and loading

New Findings (2 suggestions, non-blocking)

glob command uses find -name — doesn't support multi-segment path patterns like src/**/*.ts. The schema description implies full glob support. Consider find -path or document the limitation.
str_replace tab expansion removed — existing behavior change for the fileEditor tool (tabs are no longer expanded to 8 spaces before comparison). Likely an improvement but should be documented as a known change.

Process Items (non-blocking)

needs-api-review label: Not applied. This introduces a new abstract class (Sandbox), new Agent.sandbox property, 3 auto-registered tools, and a new sub-export path — meets the "moderate change" bar per API Bar Raising guidelines.
Documentation PR: Still "Not yet" — recommend linking a tracking issue before merge.
PR includes unrelated changes (_estimateInputTokens, normalizeToolUseNames, structured output regression fixes, MCP toolsChanged tests) that could have been separate PRs for reviewability.

The implementation quality is high — clean design, comprehensive tests, and all prior critical issues have been correctly resolved.

agent-of-gsird · 2026-05-19T18:19:27Z

Status Update (commit `ecb6936`)

@gautamsirdeshmukh — Re-triggered to check the latest push. I've verified the new commit against my comprehensive review history (17 independent passes, every file read).

✅ Assessment: Approve (unchanged)

The latest force-push (ecb6936) appears to be a rebase with no functional changes to the sandbox implementation since my final approval on May 18. All technical issues I raised across 17 review passes remain properly resolved:

Process leak fix (setTimeout + wake()) ✅
Signal exit codes (128 + signal) ✅
SIGTERM→SIGKILL escalation with done flag ✅
API naming alignment with Python SDK ✅
RemoteSandbox SSH defaults to accept-new ✅
Alpine/BusyBox statFile with wc -c fallback ✅
listFiles uses .replace(/\r$/, '') not .trim() ✅
Opt-in auto-registration (only when user provides sandbox) ✅
agent.sandbox getter with runtime guard ✅
Docker async lifecycle + cleanup handlers ✅
SandboxStorage MIME via toMediaFormat() ✅
shellQuote shared utility ✅

📋 Non-blocking suggestions (unchanged)

TSDoc for sandbox?: Sandbox | false — clarify that false is currently equivalent to omitting (reserved for future default-sandbox behavior)
needs-api-review label — recommend for process completeness
Documentation PR — link a tracking issue before/after merge
glob command — uses find -name which doesn't support multi-segment path patterns (per latest bot review); consider find -path or document the limitation

Summary

No further technical objections. My approval stands. The implementation is production-ready — excellent architecture, comprehensive tests, and all critical bugs fixed correctly across the review cycle.

🤖 AI agent response. Strands Agents. Feedback welcome!

mkmeral

I haven't dove too much into tools or plugin updates yet. It might be a good idea to separate them, so we can review/iterate independently as I'd expect some iteration especially on tools

mkmeral · 2026-05-19T20:10:25Z

+export abstract class Sandbox {
+  // ---- Lifecycle methods ----
+
+  async start(): Promise<void> {}


why do we have lifecycle methods? do we need them?

mkmeral · 2026-05-19T20:15:09Z

+    }
+
+    if (this._sandbox) {
+      await this._sandbox.start()


why is the agent "starting" the sandbox? should the agent own sandboxes' lifecycle? I wouldn't want that as sandbox can live outside the agent too

mkmeral · 2026-05-19T20:16:51Z

+   * Sandbox for tool code execution and filesystem access.
+   * Set immediately if passed via config, otherwise defaults to NotASandboxLocalEnvironment during initialize().
+   */
+  private _sandbox: Sandbox | false | undefined


why do we allow undefined?

mkmeral · 2026-05-19T20:18:03Z


+    const userProvidedSandbox = this._sandbox !== undefined && this._sandbox !== false
+
+    if (!userProvidedSandbox && typeof process !== 'undefined' && process.versions?.node) {


hmm, what happens if I don't provide a sandbox, and I am in browser? what's the devx?

mkmeral · 2026-05-19T20:22:34Z


    await this._pluginRegistry.initialize(this)

+    if (userProvidedSandbox) {


I think we do want this behavior, but it also makes me think if DevX is good enough 🤔

mkmeral · 2026-05-19T20:29:08Z

+  }
+
+  async readFile(path: string): Promise<Uint8Array> {
+    const result = await this.execute(`base64 < ${shellQuote(path)}`)


This implicitly makes the assumption that sanboxed environment is unix, right? can we improve this? or is this okay (and maybe we need to rename the class)?

mkmeral · 2026-05-19T20:32:56Z

+   * @returns FileInfo with name, isDir, and size.
+   * @throws Error if the path does not exist.
+   */
+  abstract statFile(path: string): Promise<FileInfo>


can't we just use list files to get this info? do we need another abstraction? why?

mkmeral · 2026-05-19T20:34:14Z

+ * using the '\'' pattern. Single quotes disable all shell expansion
+ * (variables, backticks, globbing), making this safe against injection.
+ */
+export function shellQuote(value: string): string {


why is this a separate util? and why are we exporting it? do we export to customers?

mkmeral · 2026-05-19T20:41:35Z

+export const exec = tool({
+  name: 'exec',
+  description:
+    'Execute a shell command in the sandbox. ' +


@gautamsirdeshmukh let's separate plugins and tools to another PR. We should iterate over the tools themselves. Like for this specific tool

why can't we reuse bash tool?

from LLM perspective: "what sandbox?" did you tell me a sandbox exists? what does it mean?

What are the limitations, usage examples, etc. Check our tools repo for how descriptions are defined

mkmeral · 2026-05-19T20:43:28Z

+}
+
+/**
+ * Serializable snapshot of sandbox state returned by {@link Sandbox.pause}.


wait what? how? why?

agent-of-gsird · 2026-05-19T22:48:20Z

Update: @mkmeral's Review Aligns with Prior Concerns

@gautamsirdeshmukh — I see @mkmeral has now left a substantive review (the first human maintainer review besides yourself). Several of their questions directly align with issues raised in earlier passes, but also raise new architectural concerns:

Alignment between my reviews and @mkmeral's feedback

@mkmeral's question	My prior coverage
"Why do we have lifecycle methods?" (base.ts:39)	`agent-of-mkmeral`'s original bot review flagged this as 🔴 Critical ("Remove lifecycle from base Sandbox class"). I accepted the author's resolution after the methods became opt-in, but @mkmeral's human review reaffirms the concern.
"Why is the agent starting the sandbox?" (agent.ts:469)	Consistent with the lifecycle ownership concern. The sandbox should be pre-started externally.
"We should make dynamic sentinels" (shell.ts:38)	New security concern I missed — the static `STRANDS_EOF` sentinel is predictable. If an attacker controls process output, they can inject this string to break shell parsing. Dynamic/random sentinels would be more secure. Good catch.
"Is Remote a good name? SSH matters" (remote.ts:59)	I focused on the security aspects (MITM). Renaming to `SshSandbox` would be clearer for extensibility (future non-SSH remote sandboxes like AgentCore).
"Let's separate plugins and tools to another PR" (exec.ts:40)	The bot reviews flagged scope from the start ("PR scope is very large"). I focused on technical correctness of what was present rather than scope.
"Can't we just use listFiles for statFile?" (base.ts:118)	I noted `listFiles`/`statFile` size inconsistency but accepted `statFile` as useful for single-file checks without directory listing overhead.
"Why is shellQuote exported?" (shell-quote.ts:8)	I suggested extracting `shellQuote` to shared utility in multiple passes. The author did this, but @mkmeral correctly questions customer-facing API surface.
"What sandbox?" (exec.ts tool description)	Valid LLM ergonomics question — tool descriptions should be written from the model's perspective.

My recommendation given @mkmeral's feedback

@mkmeral's review raises architectural design questions that go beyond the implementation correctness I focused on:

PR scope: Splitting tools/plugins into a separate PR would enable independent iteration (as @mkmeral suggests)
Dynamic sentinels: Legitimate security concern — random sentinel per invocation prevents injection
Tool design for LLMs: Descriptions need to be model-friendly without referencing "sandbox" abstractly
API surface: shellQuote and statFile exports should be reconsidered
Lifecycle ownership: Agent shouldn't own sandbox start/stop

My technical approval of the sandbox core implementation still stands — the code is correct, well-tested, and handles edge cases properly. But @mkmeral's review questions the design and scope, which requires author/maintainer resolution.

I'll defer to @mkmeral's architectural decisions. My role here was the thorough technical review you requested, which is complete. The ball is now between you and @mkmeral on scope/design direction.

🤖 AI agent response. Strands Agents. Feedback welcome!

gautamsirdeshmukh · 2026-05-20T17:13:43Z

NOTE ----> separating into multiple PRS, (1) interface (2) sandbox implementations (3) vended tools/plugins

agent-of-gsird · 2026-05-20T19:34:40Z

Acknowledged: PR Split Plan

@gautamsirdeshmukh — Sounds good. Splitting into (1) interface, (2) sandbox implementations, (3) vended tools/plugins is the right move given @mkmeral's architectural feedback on lifecycle ownership, naming, and tool design for LLMs.

My thorough technical review of the core implementation is complete and stands. When the split PRs are opened, I'm ready to review each independently. The focused scope will make it easier to iterate on each piece — especially the tools, where @mkmeral raised good questions about LLM-facing descriptions and reuse of existing tools.

Let me know when the first PR (interface) is ready for review. 🚀

🤖 AI agent response. Strands Agents. Feedback welcome!

github-actions · 2026-05-21T19:26:07Z

+  async retrieve(reference: string): Promise<{ content: Uint8Array; contentType: string }> {
+    const path = await import('node:path')
+    const filePath = path.resolve(reference)
+    const resolvedDir = path.resolve(this._basePath)


Issue: SandboxStorage.retrieve path traversal validation resolves paths against the host CWD, not the sandbox

The path traversal guard uses path.resolve(reference) and path.resolve(this._basePath) which resolve relative to the host process's CWD, not the sandbox's workingDir. In a Docker sandbox where workingDir = '/workspace', the validation logic resolves against the host's CWD (e.g., /home/user/project).

This works today because store() always returns relative paths (e.g., ./artifacts/12345_1_key.txt) and retrieve() receives that same relative string — both path.resolve() calls use the same base, so the startsWith check passes. But it's fragile: if someone passes an absolute sandbox path like /workspace/artifacts/file.txt, the guard would incorrectly reject it.

Suggestion: Consider either removing the path.resolve check (since the sandbox already provides filesystem isolation) or using string-prefix matching on the raw relative path:

if (!reference.startsWith(this._basePath)) { throw new Error(`Reference not found: ${reference}`) }

github-actions · 2026-05-21T19:26:08Z

+  }
+
+  async *executeStreaming(
+    command: string,


Issue: Auto-starting in executeStreaming is a hidden side effect

executeStreaming is documented as a pure execution method, but here it silently calls await this.start() which creates a Docker container, registers cleanup handlers, and changes _running state. This makes the method's behavior non-obvious and can cause surprising failures (e.g., if docker info fails during what appears to be a simple command execution).

This is also inconsistent with pause() which throws if !this._running — one method auto-starts, the other requires explicit start.

Suggestion: Either:

Remove auto-start and throw if !this._running (consistent with pause()), or

Document the auto-start behavior in the method's TSDoc

This is a minor point given @mkmeral's broader feedback about lifecycle ownership. Noting for the split PRs.

github-actions · 2026-05-21T19:26:09Z

Review Summary

Assessment: Approve (with suggestions for split PRs)

The implementation is production-ready. After reading the full source at 0db7232 and the extensive prior review history (17+ passes), all previously-flagged critical issues have been resolved correctly.

What I verified ✅

stream-process.ts: Proper wake/Promise signaling (no polling), synchronous listener attachment (no race), SIGTERM→SIGKILL escalation, AbortSignal support, ENOENT handling
shell.ts: Dynamic sentinels (crypto.randomUUID()), base64 binary-safe file transfer, ls -1aF parsing with line-ending stripping (not trim()), Alpine wc -c fallback with NaN guard
docker.ts: Async dockerCmd via execFile, spawnSync in exit handler (correct — with comment explaining why), dangerous mount validation, signal re-propagation
remote.ts: accept-new default (MITM fixed), blocked SSH options, proper -o flag construction
agent.ts: userProvidedSandbox captured before default assignment gates auto-registration correctly, sandbox getter with runtime guard
file-editor.ts: Binary detection (raw.includes(0)), handleGlob smart -path/-name heuristic, undo via WeakMap, all operations route through sandbox
Backward compatibility: No tools added without explicit sandbox config, LocalAgent has @internal brand

New findings (2 minor suggestions)

SandboxStorage.retrieve uses host-CWD path.resolve() for traversal validation — works but fragile for absolute sandbox paths
DockerSandbox.executeStreaming auto-starts if !this._running — hidden side effect inconsistent with pause() which throws

Both are minor and appropriate to address in the split PRs.

Process alignment

Splitting into multiple PRs (per @mkmeral's feedback) is the right approach — interface, implementations, and tools/plugins can iterate independently
needs-api-review label should be applied to the interface PR (feat: base Sandbox interface #1090) at minimum
@mkmeral's architectural questions (lifecycle ownership, dynamic sentinels ✅ already done, naming Remote vs SSH) should be resolved in the split PRs

The glob heuristic (-path when / present, -name otherwise) is a good improvement since the earlier bot review flagged this. The dynamic sentinel (STRANDS_EOF_${crypto.randomUUID()}) addresses @mkmeral's injection concern. Overall quality is high.

gautamsirdeshmukh had a problem deploying to manual-approval May 6, 2026 15:22 — with GitHub Actions Error

gautamsirdeshmukh temporarily deployed to auto-approve May 6, 2026 15:22 — with GitHub Actions Inactive

github-actions Bot added the strands-running <strands-managed> Whether or not an agent is currently running label May 6, 2026

github-actions Bot removed the strands-running <strands-managed> Whether or not an agent is currently running label May 6, 2026

gautamsirdeshmukh force-pushed the feat/sandbox branch from 1a8dcbc to f555352 Compare May 7, 2026 03:45

gautamsirdeshmukh temporarily deployed to auto-approve May 7, 2026 03:45 — with GitHub Actions Inactive

gautamsirdeshmukh had a problem deploying to manual-approval May 7, 2026 03:45 — with GitHub Actions Error

github-actions Bot added the strands-running <strands-managed> Whether or not an agent is currently running label May 7, 2026

github-actions Bot reviewed May 7, 2026

View reviewed changes

Comment thread strands-ts/src/vended-tools/exec/exec.ts

github-actions Bot reviewed May 7, 2026

View reviewed changes

Comment thread strands-ts/src/agent/agent.ts

github-actions Bot removed the strands-running <strands-managed> Whether or not an agent is currently running label May 7, 2026

gautamsirdeshmukh force-pushed the feat/sandbox branch from f555352 to 4d02076 Compare May 7, 2026 17:59

gautamsirdeshmukh temporarily deployed to auto-approve May 7, 2026 18:00 — with GitHub Actions Inactive

gautamsirdeshmukh had a problem deploying to manual-approval May 7, 2026 18:00 — with GitHub Actions Error

github-actions Bot added the strands-running <strands-managed> Whether or not an agent is currently running label May 7, 2026

github-actions Bot removed the strands-running <strands-managed> Whether or not an agent is currently running label May 7, 2026

agent-of-mkmeral suggested changes May 8, 2026

View reviewed changes

gautamsirdeshmukh force-pushed the feat/sandbox branch from 4d02076 to c1bc9e3 Compare May 11, 2026 18:44

gautamsirdeshmukh had a problem deploying to manual-approval May 11, 2026 18:53 — with GitHub Actions Error

gautamsirdeshmukh temporarily deployed to auto-approve May 11, 2026 18:53 — with GitHub Actions Inactive

gautamsirdeshmukh force-pushed the feat/sandbox branch from c1bc9e3 to 69f0d07 Compare May 11, 2026 19:05

gautamsirdeshmukh temporarily deployed to auto-approve May 11, 2026 19:12 — with GitHub Actions Inactive

github-actions Bot reviewed May 18, 2026

View reviewed changes

github-actions Bot reviewed May 19, 2026

View reviewed changes

mkmeral reviewed May 19, 2026

View reviewed changes

gautamsirdeshmukh mentioned this pull request May 21, 2026

feat: base Sandbox interface #1090

Merged

7 tasks

github-actions Bot reviewed May 21, 2026

View reviewed changes

feat: add Sandbox interface (base/shell/remote/docker/tools)

d76fe8f

This was referenced May 27, 2026

feat: add Sandbox implementations (Docker, SSH) #1110

Merged

feat: integrate Sandbox with Agent strands-agents/harness-sdk#2563

Open

gautamsirdeshmukh mentioned this pull request Jun 5, 2026

feat: update vended tools/plugins for sandbox compatibility" strands-agents/harness-sdk#2649

Open

7 tasks


		const userProvidedSandbox = this._sandbox !== undefined && this._sandbox !== false

		if (!userProvidedSandbox && typeof process !== 'undefined' && process.versions?.node) {


		await this._pluginRegistry.initialize(this)

		if (userProvidedSandbox) {

Conversation

gautamsirdeshmukh commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

NOTE - SPLITTING INTO MULTIPLE PRS

Description

Overview of Changes

Related Issues

Documentation PR

Type of Change

Testing

Checklist

Examples

Uh oh!

github-actions Bot commented May 6, 2026

Uh oh!

github-actions Bot commented May 6, 2026

Uh oh!

github-actions Bot commented May 6, 2026

Uh oh!

github-actions Bot commented May 6, 2026

Uh oh!

github-actions Bot commented May 6, 2026

Review Summary

Uh oh!

github-actions Bot commented May 7, 2026

Follow-up Review

Uh oh!

Uh oh!

github-actions Bot May 7, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions Bot commented May 7, 2026

Third Review Pass

Uh oh!

agent-of-mkmeral left a comment

Choose a reason for hiding this comment

Bug: generator.return() hangs on quiet processes, leaking child processes

Summary of Required Changes

Uh oh!

github-actions Bot May 18, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 18, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 18, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 18, 2026

Review Summary

Uh oh!

github-actions Bot May 18, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 18, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 18, 2026

Review Summary

Uh oh!

agent-of-gsird commented May 18, 2026

Sixteenth Review Pass (HEAD: fe60acd)

🔴 One blocking bug remains: sandbox: false is silently overwritten

🟡 Minor items (non-blocking)

✅ All other technical issues remain resolved

📋 Process items

My Recommendation

Uh oh!

github-actions Bot May 18, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 18, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 18, 2026

Review Summary

Uh oh!

gautamsirdeshmukh commented May 6, 2026 •

edited

Loading

Bug: `generator.return()` hangs on quiet processes, leaking child processes

Sixteenth Review Pass (HEAD: `fe60acd`)

🔴 One blocking bug remains: `sandbox: false` is silently overwritten

Status Update (commit `ecb6936`)