Skip to content

fix(deepagents): add default chunked upload/download to BaseSandbox#1402

Open
Dr. Tristan Behrens (AI-Guru) wants to merge 4 commits intolangchain-ai:mainfrom
AI-Guru:fix/chunked-file-transfer
Open

fix(deepagents): add default chunked upload/download to BaseSandbox#1402
Dr. Tristan Behrens (AI-Guru) wants to merge 4 commits intolangchain-ai:mainfrom
AI-Guru:fix/chunked-file-transfer

Conversation

@AI-Guru

Summary

  • Add concrete upload_files() and download_files() implementations to BaseSandbox, replacing the current @abstractmethod stubs
  • Small files (<64KB) use a single execute() call (same as before)
  • Large files are split into 64KB base64 chunks to avoid hitting the Linux kernel ARG_MAX limit (~128KB per argument)
  • Downloads also chunk to avoid execute() output truncation
  • Methods are non-abstract so backends with native file transfer (SSH/SFTP, Daytona REST) can still override them

Problem

When using sandbox backends that route file transfers through execute() (Docker with tmpfs, microsandbox), uploading binary files larger than ~100KB fails with:

exec /bin/bash: argument list too long

This happens because the base64-encoded file content is embedded in the command string passed to exec_run(), which exceeds the kernel's ARG_MAX limit. Docker's put_archive API cannot be used as a workaround because tmpfs mounts are invisible to it.

The error is particularly problematic for autonomous workflows that generate binary artifacts (PDFs, images) inside sandboxed containers.

Solution

BaseSandbox already provides default implementations of read(), write(), edit(), ls_info(), glob_info(), and grep_raw() via execute(). This PR extends that pattern to upload_files() and download_files():

  • Upload: Files over 64KB are base64-encoded, split into chunks, each chunk appended to a temp file inside the sandbox via separate execute() calls, then the assembled base64 is decoded to the final path
  • Download: Files over 64KB are read in binary chunks inside the sandbox, each chunk base64-encoded and returned via separate execute() calls, then reassembled on the host

All operations go through execute(), respecting the full middleware chain.

Test plan

  • Unit tests: 6 new tests covering small file upload, large file chunking, upload error handling, small file download, large file chunked download, and missing file error
  • All 15 sandbox backend tests pass
  • Verified end-to-end with Docker backend: 301KB PDF upload/download roundtrip succeeds
  • Verified 1MB random binary roundtrip with byte-level fidelity

🤖 Generated with Claude Code

BaseSandbox already provides default implementations of read(), write(),
edit(), ls_info(), glob_info(), and grep_raw() via execute(). However,
upload_files() and download_files() are abstract, forcing every sandbox
backend to reimplement the same base64+heredoc pattern.

More importantly, the common pattern of embedding the entire base64
payload in the command string hits the Linux kernel ARG_MAX limit
(~128KB per argument) for files larger than ~100KB. This causes
"argument list too long" errors when uploading binary files like PDFs
to Docker containers with tmpfs mounts (where put_archive doesn't work).

This commit:
- Adds concrete upload_files() and download_files() to BaseSandbox
- Small files (<64KB) use a single execute() call (no change)
- Large files are split into 64KB base64 chunks, each written via a
  separate execute() call, then decoded in the sandbox
- Downloads also chunk large files to avoid execute() output truncation
- Methods are non-abstract so backends with native file transfer (e.g.
  SSH/SFTP, Daytona REST) can still override them
- Adds 6 unit tests covering small/large/error paths for both operations

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions github-actions bot added external User is not a member of the `langchain-ai` GitHub organization deepagents Related to the `deepagents` SDK / agent harness and removed external User is not a member of the `langchain-ai` GitHub organization labels Feb 18, 2026
@AI-Guru Dr. Tristan Behrens (AI-Guru) changed the title Add default chunked upload/download to BaseSandbox fix(deepagents): add default chunked upload/download to BaseSandbox Feb 18, 2026
@github-actions github-actions bot added the fix A bug fix (PATCH) label Feb 18, 2026
- SIM108: use ternary for upload/download strategy selection
- Q003: use single outer quotes to avoid escaping inner double quotes
- BLE001: catch (ValueError, binascii.Error) instead of blind Exception
- F401: remove unused FileDownloadResponse/FileUploadResponse imports

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

return responses

def _upload_single(self, file_path: str, b64: str) -> ExecuteResponse:
Copy link
Collaborator

@eyurtsev Eugene Yurtsev (eyurtsev) Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're interested in working on this feature, could you try to match the style of the other provided implementations (e.,g., write_file etc)?

offload as much as you can into the template which attempts to make it easier to see the boundary between python and shell code (makes it easier to verify that the implementation does not security issues)

Make sure that everything that needs to be escaped is escaped properly so there's no additional attack surface

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! Refactored all upload/download methods to use module-level _COMMAND_TEMPLATE constants matching the existing _WRITE_COMMAND_TEMPLATE / _EDIT_COMMAND_TEMPLATE pattern.

Seven new templates added: _UPLOAD_COMMAND_TEMPLATE, _UPLOAD_CHUNK_COMMAND_TEMPLATE, _UPLOAD_DECODE_COMMAND_TEMPLATE, _REMOVE_COMMAND_TEMPLATE, _DOWNLOAD_SIZE_COMMAND_TEMPLATE, _DOWNLOAD_COMMAND_TEMPLATE, _DOWNLOAD_CHUNK_COMMAND_TEMPLATE.

All file paths are now base64-encoded before interpolation (safe charset [A-Za-z0-9+/=]), which also eliminates the shell injection risk from the previous string concatenation approach. Template format tests added for each.


Args:
file_path: Absolute path inside the sandbox.
b64: Base64-encoded file content (must fit within ARG_MAX).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is ARG_MAX relevant if we're feeding through stdin?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question! ARG_MAX itself (typically ~2MB) is the total size limit for all arguments + environment passed to execve(). But the more relevant constraint here is MAX_ARG_STRLEN (typically PAGE_SIZE * 32 = 128KB on Linux), which limits any single argument string.

When we use bash -c 'script<<heredoc', the entire string — including the heredoc content — is passed as a single argument to execve(). So heredocs via bash -c don't bypass the limit the way a standalone heredoc in an interactive shell would (where bash itself reads from stdin).

With 64KB raw chunks → ~87KB base64, we stay safely under the 128KB MAX_ARG_STRLEN per-argument limit.

I've updated the comments in the code to reference MAX_ARG_STRLEN instead of ARG_MAX to be more precise about which kernel limit actually applies.

Replace string concatenation in upload/download methods with
module-level _COMMAND_TEMPLATE constants, matching the existing
pattern used by _WRITE_COMMAND_TEMPLATE and _EDIT_COMMAND_TEMPLATE.

All file paths are now base64-encoded before interpolation,
eliminating shell injection risk from path concatenation.

Seven new templates added:
- _UPLOAD_COMMAND_TEMPLATE (small file, heredoc stdin)
- _UPLOAD_CHUNK_COMMAND_TEMPLATE (append chunk to temp file)
- _UPLOAD_DECODE_COMMAND_TEMPLATE (decode assembled base64)
- _REMOVE_COMMAND_TEMPLATE (cleanup helper)
- _DOWNLOAD_SIZE_COMMAND_TEMPLATE (get file size)
- _DOWNLOAD_COMMAND_TEMPLATE (small file download)
- _DOWNLOAD_CHUNK_COMMAND_TEMPLATE (chunked download)

Also updates ARG_MAX comments to reference MAX_ARG_STRLEN (128KB
per-argument limit on Linux) which is the actual constraint when
heredocs are used inside bash -c.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

deepagents Related to the `deepagents` SDK / agent harness fix A bug fix (PATCH)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants