feat (sandbox): Unify container session management with Redis-backed SessionManager and add workspace filesystem facade (SandboxFS)#426
Conversation
There was a problem hiding this comment.
Pull request overview
This PR introduces unified session management and a comprehensive workspace filesystem API for the sandbox runtime, addressing both Redis-backed persistence and refactoring workspace operations into dedicated components.
Changes:
- Introduces a unified
SessionManagerto replace per-client session managers in FC and AgentRun clients, supporting both in-memory and Redis modes - Adds
SandboxFS/SandboxFSAsyncfacade components for workspace file operations with dedicated mixins for client and server-side implementations - Removes the
timeoutparameter from sandbox constructors, standardizing on theTIMEOUTconstant from configuration
Reviewed changes
Copilot reviewed 31 out of 31 changed files in this pull request and generated 25 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/unit/test_cloud_sandbox.py | Removes timeout parameter from CloudSandbox test fixtures |
| tests/sandbox/test_sandbox.py | Adds comprehensive sync and async filesystem operation tests for SandboxFS facade |
| src/agentscope_runtime/sandbox/registry.py | Replaces hardcoded timeout default with TIMEOUT constant |
| src/agentscope_runtime/sandbox/manager/workspace_mixin.py | Adds workspace filesystem mixins for SandboxManager with proxy support |
| src/agentscope_runtime/sandbox/manager/server/config.py | Extends field validator to parse FC_VSWITCH_IDS and AGENT_RUN_VSWITCH_IDS |
| src/agentscope_runtime/sandbox/manager/server/app.py | Adds /proxy/{identity}/{path:path} endpoint to forward requests to runtime |
| src/agentscope_runtime/sandbox/manager/sandbox_manager.py | Integrates WorkspaceFSMixin into SandboxManager |
| src/agentscope_runtime/sandbox/client/workspace_mixin.py | Implements workspace client mixins for HTTP operations |
| src/agentscope_runtime/sandbox/client/http_client.py | Removes legacy workspace methods, delegates to WorkspaceMixin |
| src/agentscope_runtime/sandbox/client/async_http_client.py | Removes legacy workspace methods, delegates to WorkspaceAsyncMixin |
| src/agentscope_runtime/sandbox/box/shared/routers/workspace.py | Complete refactor with new streaming API and threaded filesystem operations |
| src/agentscope_runtime/sandbox/box/shared/routers/init.py | Updates workspace_router import |
| src/agentscope_runtime/sandbox/box/shared/app.py | Adds /workspace prefix to workspace_router |
| src/agentscope_runtime/sandbox/box/sandbox.py | Integrates SandboxFS/SandboxFSAsync and removes timeout parameter |
| src/agentscope_runtime/sandbox/box/components/fs.py | New SandboxFS facade classes for workspace operations |
| src/agentscope_runtime/sandbox/box/components/init.py | Exports SandboxFS components |
| src/agentscope_runtime/sandbox/box/training_box/training_box.py | Removes timeout parameter and uses TIMEOUT constant |
| src/agentscope_runtime/sandbox/box/mobile/mobile_sandbox.py | Removes timeout parameter from constructors |
| src/agentscope_runtime/sandbox/box/gui/gui_sandbox.py | Removes timeout parameter from constructors |
| src/agentscope_runtime/sandbox/box/filesystem/filesystem_sandbox.py | Removes timeout parameter from constructors |
| src/agentscope_runtime/sandbox/box/dummy/dummy_sandbox.py | Removes timeout parameter from constructors |
| src/agentscope_runtime/sandbox/box/cloud/cloud_sandbox.py | Removes timeout parameter and updates get_info to not include timeout |
| src/agentscope_runtime/sandbox/box/browser/browser_sandbox.py | Removes timeout parameter from constructors |
| src/agentscope_runtime/sandbox/box/base/base_sandbox.py | Removes timeout parameter from constructors |
| src/agentscope_runtime/sandbox/box/agentbay/agentbay_sandbox.py | Removes timeout parameter and uses TIMEOUT constant |
| src/agentscope_runtime/sandbox/custom/example.py | Removes timeout parameter from constructor |
| src/agentscope_runtime/sandbox/custom/custom_sandbox.py | Removes timeout parameter from constructor |
| src/agentscope_runtime/common/container_clients/utils.py | New SessionManager with Redis/in-memory support |
| src/agentscope_runtime/common/container_clients/fc_client.py | Replaces FCSessionManager with shared SessionManager |
| src/agentscope_runtime/common/container_clients/agentrun_client.py | Replaces AgentRunSessionManager with shared SessionManager |
| src/agentscope_runtime/common/container_clients/boxlite_client.py | Adds error hint for async SDK not being supported |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 31 out of 31 changed files in this pull request and generated 11 comments.
Comments suppressed due to low confidence (1)
src/agentscope_runtime/sandbox/box/sandbox.py:55
- This change removes the
timeoutparameter from the publicSandboxBaseconstructor (and derived sandbox classes). That is a backward-incompatible API change for any callers passingtimeout=..., but the PR metadata indicates this is not a breaking change. Either restore compatibility (e.g., keeptimeoutas a deprecated kwarg) or update the PR classification/release notes accordingly.
def __init__(
self,
sandbox_id: Optional[str] = None,
base_url: Optional[str] = None,
bearer_token: Optional[str] = None,
sandbox_type: SandboxType = SandboxType.BASE,
workspace_dir: Optional[str] = None,
) -> None:
| if fmt == "stream": | ||
| r = self.http_session.get( | ||
| url, | ||
| params={"path": path, "format": "bytes"}, | ||
| stream=True, | ||
| timeout=TIMEOUT, | ||
| ) | ||
| r.raise_for_status() | ||
|
|
||
| def gen() -> Iterator[bytes]: | ||
| with r: | ||
| for c in r.iter_content(chunk_size=chunk_size): | ||
| if c: | ||
| yield c | ||
|
|
||
| return gen() |
There was a problem hiding this comment.
In fs_read(..., fmt="stream"), the requests.get(...) call is executed eagerly before returning the generator. If the caller never iterates the returned iterator (or abandons it early without closing), the underlying HTTP connection/response may remain open longer than necessary. Consider moving the get() call inside the generator (and ensuring the response is closed in a finally) so the connection lifetime is tied to iteration/closure.
| try: | ||
| os.remove(tmp_path) | ||
| except Exception: |
There was a problem hiding this comment.
In the async FS test cleanup, os.remove(tmp_path) is attempting to remove the pytest tmp_path directory (a Path), not the temporary file created via NamedTemporaryFile. This leaves tmp_file_path on disk and the cleanup is effectively a no-op due to the broad exception handler. Remove tmp_file_path instead (and consider using pathlib.Path.unlink(missing_ok=True) to avoid masking unrelated errors).
| def update_session(self, session_id: str, updates: Dict): | ||
| """Update an existing session with new data.""" | ||
| if self.sessions.get(session_id): | ||
| self.sessions.set(session_id, updates) | ||
| logger.debug(f"Updated session: {session_id}") |
There was a problem hiding this comment.
update_session() currently overwrites the stored session with updates instead of merging into the existing session data. Callers (e.g., AgentRunClient/FCClient) store identifiers like agent_runtime_id in the session; overwriting with {"status": ...} will discard those fields and break later operations. Merge updates into the existing session dict before saving (or implement a mapping-level partial update).
| base_directory = os.path.abspath(base_directory) | ||
|
|
||
| # Determine if the input path is absolute or relative | ||
| if os.path.isabs(path): | ||
| full_path = os.path.abspath(path) | ||
| else: | ||
| full_path = os.path.abspath(os.path.join(base_directory, path)) | ||
|
|
||
| # Check for path traversal attacks and ensure path is within base_directory | ||
| if not full_path.startswith(base_directory): | ||
| raise HTTPException( | ||
| status_code=403, | ||
| detail="Permission error. Access restricted to /workspace " | ||
| detail=f"Permission error. Access restricted to {BASE_DIR} " | ||
| "directory.", | ||
| ) |
There was a problem hiding this comment.
ensure_within_workspace() uses full_path.startswith(base_directory) for containment checks, which is vulnerable to prefix bypasses (e.g., /workspace2/... passes for base /workspace). Use os.path.commonpath([full_path, base_directory]) == base_directory (and/or ensure a trailing path separator) to robustly prevent path traversal / escaping the workspace directory.
| print(f"--{target_url}---") | ||
|
|
||
| # 3) forward headers | ||
| headers = self._filter_hop_by_hop_headers(dict(request.headers)) | ||
| if cm.runtime_token: | ||
| headers["Authorization"] = f"Bearer {cm.runtime_token}" | ||
|
|
||
| hop_by_hop = { | ||
| "host", | ||
| "connection", | ||
| "keep-alive", | ||
| "proxy-authenticate", | ||
| "proxy-authorization", | ||
| "te", | ||
| "trailers", | ||
| "transfer-encoding", | ||
| "upgrade", | ||
| } | ||
|
|
||
| client = httpx.AsyncClient(timeout=None) | ||
| upstream = None |
There was a problem hiding this comment.
proxy_to_runtime() contains a print() debug statement and creates a new httpx.AsyncClient(timeout=None) per request. The print can leak URLs into stdout in production, and timeout=None risks hanging connections indefinitely. Prefer using the module logger (at debug level) and reusing a shared AsyncClient with an explicit timeout strategy (e.g., bounded connect/write, optional unbounded read for streaming).
| except Exception: | ||
| pass | ||
|
|
There was a problem hiding this comment.
'except' clause does nothing but pass and there is no explanatory comment.
| except Exception: | |
| pass | |
| except Exception as exc: | |
| # Best-effort cleanup of the temporary file; ignore removal errors but log them. | |
| import logging | |
| logging.getLogger(__name__).warning( | |
| "Failed to remove temporary file %s: %s", | |
| tmp2, | |
| exc, | |
| ) |
|
|
||
| try: | ||
| await box.fs.remove_async(base_dir) | ||
| except Exception: |
There was a problem hiding this comment.
'except' clause does nothing but pass and there is no explanatory comment.
| except Exception: | |
| except Exception: | |
| # Best-effort cleanup: failures during removal should not fail the test. |
| finally: | ||
| try: | ||
| os.remove(tmp_file_path) | ||
| except Exception: |
There was a problem hiding this comment.
'except' clause does nothing but pass and there is no explanatory comment.
| except Exception: | |
| except Exception: | |
| # Best-effort cleanup: ignore failures if the temporary file | |
| # has already been removed or is otherwise inaccessible. |
| finally: | ||
| try: | ||
| os.remove(tmp2) | ||
| except Exception: |
There was a problem hiding this comment.
'except' clause does nothing but pass and there is no explanatory comment.
| # directory delete policy may vary | ||
| try: | ||
| box.fs.remove(base_dir) | ||
| except Exception: |
There was a problem hiding this comment.
'except' clause does nothing but pass and there is no explanatory comment.
Description
This PR introduces two major improvements to the sandbox runtime:
1) Unified session management (in-memory or Redis)
SessionManager(common/container_clients/utils.py) that can persist sessions via:InMemoryMapping(default), orRedisMappingwhenredis_enabledis configured.SessionManager, improving consistency and enabling multi-process / multi-worker deployments with shared session state.2) Workspace filesystem API refactor + new
SandboxFSfacadeSandboxFS/SandboxFSAsynccomponents (sandbox/box/components/fs.py) and automatically attach them asbox.fson sandbox context enter/async-enter.SandboxHttpClient/SandboxHttpAsyncClientinto dedicatedWorkspaceMixin/WorkspaceAsyncMixin.WorkspaceFSMixinintegrated intoSandboxManager.Other changes
TIMEOUTconstant./proxy/{identity}/{path:path}endpoint to forward requests to runtime.tests/sandbox/test_sandbox.py).Example:
Type of Change
Component(s) Affected
Checklist
Testing
[How to test these changes]
Additional Notes
[Optional: any other context]