feat(workspace): sandbox-authoritative /workspace (SandboxFileBackend + agent-owned commit/MR) by srtab · Pull Request #1279 · srtab/daiv

srtab · 2026-05-30T19:52:22Z

Summary

Makes the daiv-sandbox container authoritative for both file and git state, with a single unified workspace under /workspace:

/workspace/repo — the repository working directory
/workspace/skills — seeded skills
/workspace/tmp — ephemeral per-run scratchpad (shared between file tools and bash, never committed)

The old local-clone-plus-sync model is retired: there is one source of truth (the sandbox), and the agent now owns publishing its own work.

What changed

SandboxFileBackend (middlewares/file_system.py) — a deepagents BackendProtocol that proxies file ops to the daiv-sandbox fs/* endpoints over the existing DAIVSandboxClient: one RPC per op, no local mirror. Root-configurable and bound to the session at start. Retires SandboxSyncer and the local patch-apply path.
Dual-mode GitManager (codebase/utils.py) via GitManager.for_sandbox / GitManager.for_local factories — git runs either through run_commands in /workspace/repo (sandbox-authoritative runs) or as a subprocess against a GitPython clone (sandbox-disabled / repoless runs), behind one async _git runner. Exit-code discipline means a failing git command raises instead of folding fatal text into its output; push failures are classified into GitPushPermissionError / GitPushNetworkError. open_git_manager() (git_utils.py) selects the mode from the run's session_id.
Agent-owned publishing — new commit_changes / create_merge_request tools (tools/git_publish.py) let the agent author its own commit and MR; commit/push/MR-API failures surface as tool-visible error Commands so the agent can recover rather than crash. GitMiddleware nudges the agent to publish before it stops (bounded by MAX_GIT_NUDGES, via jump_to=model) and falls back to a direct daiv publish as a safeguard. GIT_SYSTEM_PROMPT gains an agent_owns_commit variant; raw git add/commit/push/reset/rebase/config in bash stay hard-blocked by sandbox policy (read-only git remains allowed).
Oversized tool-output spill — gitlab/gh tool output that exceeds the inline cap is written verbatim to a /workspace/tmp scratch file (middlewares/git_platform.py).
SlashCommandMiddleware split out of the skills middleware into its own middlewares/slash_commands.py.
fs_* client methods + Fs* wire schemas mirrored byte-identically to daiv-sandbox (enforced by the schema-drift consistency test), with a regenerated schemas.dump.json. Sandbox system prompt, bash examples, and the output/working-directory invariants now reflect /workspace/repo and /workspace/tmp.

Prerequisite

Requires the companion daiv-sandbox fs/* endpoints to be deployed. The :main sandbox image does not have them yet; backend/client unit tests mock the sandbox and pass independently.

Test plan

Unit suite: 2848 passing (make test)
Lint/format clean (make lint — ruff check + ruff format + pyproject-fmt + djade)
Schema-drift consistency test green (runs as part of make test)
Re-run live e2e against the sandbox once the companion fs/* endpoints are deployed to the shared sandbox

…mmit/MR Move the one true workspace into the daiv-sandbox container under /workspace (repo at /workspace/repo, seeded skills at /workspace/skills, per-run scratchpad at /workspace/tmp) and make the sandbox authoritative for both file and git state. - SandboxFileBackend: a deepagents BackendProtocol that proxies file ops to the sandbox fs/* endpoints over DAIVSandboxClient — one RPC per op, no local mirror, root-configurable and bound at session start. Retires SandboxSyncer and the local patch-apply path. - Dual-mode GitManager (for_sandbox / for_local factories): runs git either via run_commands in /workspace/repo (sandbox-authoritative runs) or as a subprocess against a GitPython clone (sandbox-disabled / repoless runs), behind one async _git runner with exit-code discipline so a failing git command raises instead of folding fatal text into its output. Push failures are classified into GitPushPermissionError / GitPushNetworkError. open_git_manager() builds the right mode from the run's session_id. - Agent-owned publishing: new commit_changes / create_merge_request tools let the agent author its own commit and MR; commit, push and MR-API failures surface as tool-visible error Commands so the agent can recover rather than crashing the run. GitMiddleware nudges the agent to publish (bounded by MAX_GIT_NUDGES) and falls back to a direct daiv publish as a safeguard. GIT_SYSTEM_PROMPT gains an agent_owns_commit variant; raw git mutation in bash stays hard-blocked by policy. - gitlab/gh tool output spills verbatim to a /workspace/tmp scratch file when it exceeds the inline cap. SlashCommandMiddleware split out of the skills middleware. - fs_* client methods and Fs* wire schemas mirrored byte-identically to daiv-sandbox (enforced by the schema-drift consistency test); regenerated schemas.dump.json. Requires the companion daiv-sandbox fs/* endpoints to be deployed.

…on close_session

…n-chat runs

Agent-level middleware hooks receive a langgraph Runtime, which has no .config attribute (unlike the ToolRuntime used in git_platform.py). Reading runtime.config would AttributeError at runtime on every chat turn. Read the conversation thread_id from the run config contextvar via get_config() instead, falling back to None (reuse disabled) outside a runnable context.

PR-review follow-ups: - Wrap all Redis cache reads/writes (_cache_get/_cache_set/_cache_delete) so a cache outage degrades to a cold create+seed instead of crashing the agent run (the calls sat inside abefore_agent's except BaseException, turning a pure optimization into a hard Redis dependency). - Log when get_config() has no runnable context so silently-disabled reuse is diagnosable. - Fix session_exists docstring (404=>False, any other success=>True; 204 is the sandbox's current answer, not a client-enforced contract) and add cross-service contract pointers; reframe the TTL/reaper coupling as best-effort/self-healing. - Add resilience tests: session_exists propagates non-404; _reuse_warm_session swallows httpx.HTTPError without dropping the mapping; cache read/write outages degrade gracefully; aafter_agent swallows a 404 from close_session.

Register SlashCommandMiddleware only when slash_commands.enabled, matching the conditional-registration pattern used by the sandbox/web/deferred middleware, instead of always registering it and short-circuiting inside abefore_agent. When disabled the middleware is now never instantiated or run rather than no-op'ing every turn.

Move the thread->session cache mapping out of SandboxMiddleware into a new SandboxSessionStore (core/sandbox/session_store.py). This separates the persistence mechanism (key scheme, TTL, best-effort cache I/O) from the reuse orchestration (validate liveness, remember/forget) that stays in the middleware. The store knows nothing of the sandbox client, so the backing store can be swapped without touching the middleware. Behavior is unchanged. Warm-session cache tests move to the store's own unit suite; the middleware tests now drive a fake store and assert orchestration only.

The branch added commit_changes/create_merge_request tools plus a nudge loop so the agent published its own work. Revert to the prior behavior: GitMiddleware commits, pushes, and opens/updates the MR via GitChangePublisher when the agent's turn ends. Keep the sandbox/fs infrastructure intact — the publisher still threads session_id through open_git_manager so it runs git in sandbox or local mode. Remove the git_publish tools, the aafter_model nudge, MAX_GIT_NUDGES, and the agent_owns_commit prompt variant; GIT_SYSTEM_PROMPT returns to the "committing is automatic" contract.

The revert dropped _is_unpublished and let aafter_agent call the publisher unconditionally. The publisher only short-circuits on "dirty OR diff-vs-base", so on an idle follow-up turn of a warm-session chat (clean tree, but an earlier turn's commit is already pushed and ahead of base) it would re-run — burning a diff-to-metadata model call and doing a no-op push — where both main and the pre-revert branch skipped. Restore _is_unpublished as the gate: it adds the has_unpushed check that distinguishes "already live" from "needs pushing", so publishing now fires under the same circumstances as before. It is sandbox-aware publishing logic, not agent-owned, so it belongs after the revert.

The gitlab/gh platform tools previously took an agent-chosen output_file path and wrote oversized output to a sandbox scratch file via a short-lived client, with bespoke inline truncation and auto-eviction. Replace that with an output_to_file boolean that writes the full result through the agent's deepagents filesystem backend, into the same large_tool_results dir the FilesystemMiddleware auto-evicts to. The tools are now closures built inside GitPlatformMiddleware capturing the bound backend; the sandbox backend is wrapped in a DAIVCompositeBackend to carry an artifacts_root under /workspace, and _bind_backend unwraps it to bind the live session. Write failures (raised, backend-rejected, or a missing tool_call_id) surface as agent-visible error strings. Drop the now-orphaned SCRATCH_PATH constant and the legacy truncation helpers.

…on (#1287) * feat(sandbox): own the sandbox transport per-run in set_runtime_ctx * refactor(sandbox): inject run client by construction; state-based session reuse * refactor(git): move GitManager into automation layer, drop lazy import * refactor(git): remove unused GitManager.commit_and_push_changes/checkout * perf(git): batch publish reads into status_snapshot (<=2 round-trips) * refactor(git): fold publish decision into publisher; inject run client into git path * perf(git): run MR create/update and context-file suggestion off the event loop * refactor(git): extract _effective_mr_iid helper for MR resolution * refactor(git): harden publish path and sandbox client teardown Address review feedback on the sandbox-transport-injection branch: - Guard the run-scoped sandbox client teardown in set_runtime_ctx so a transport-close error can't mask the in-flight exception, and always reset the contextvar. - Fold protected_branch_fallback_source into the frozen PublishOutcome, dropping the mutable publisher side-channel read by both callers. - Remove the dead GitManager query methods (is_dirty / get_diff / has_unpushed / remote_branches) superseded by status_snapshot. - Add status_snapshot error-branch tests; re-target the empty-results and no-index hard-error tests onto the surviving helpers. - Fix the "read once" client comments (BaseManager is a second reader), annotate GitMiddleware.sandbox_client, and flag the possibly orphaned container in _session_exists.

…on handle (#1292) * feat(sandbox): add run_commands to SandboxFileBackend * refactor(git): thread bound sandbox backend through git/publish path * refactor(sandbox): run bash through the bound backend; thread it to subagents * test(sandbox): guard that the backend never advertises execution * feat(sandbox): classify bash failures as transient or permanent The bash tool degraded every transport/HTTP error to the same generic "sandbox call failed" string, so the agent could not tell a momentary blip (worth one retry) from a non-recoverable rejection (stop using the tool). Introduce a BashFailure enum that maps httpx errors to TRANSIENT (no response, or a retryable status: 408/425/429/5xx) vs PERMANENT (auth, session-gone, bad-request, not-implemented), and return distinct agent-facing guidance for each. The transient message is byte-stable so the system prompt's "two identical error strings => stop" backstop still fires when a retry fails the same way. Non-httpx failures (malformed 200 body, unbound-backend RuntimeError) are left to propagate as loud wire/programming bugs.

Tool loading called MultiServerMCPClient.get_tools(), which gathers all servers with no timeout, so one server whose handshake hangs blocked the entire toolset build and froze every chat and run. Load each server independently via asyncio.wait_for with a per-server timeout (MCP_TOOL_LOAD_TIMEOUT, default 30s); a server that times out or errors is skipped instead of blocking the others. CancelledError still propagates so outer cancellation is preserved.

…tives Filesystem tools (ls/read_file/grep/glob/write_file/edit_file) now branch on a structured FsError/FsErrorCode from the sandbox instead of free-form strings: a missing path reads as "does not exist" (distinct from an empty dir / no match), reading a directory or writing over an existing file routes the agent to the right tool, and deletes are idempotent. Requires the matching daiv-sandbox release with the structured fs/* error responses. Path directives and subagent prompts are now derived from the run's working directory (REPO_PATH in a sandbox, clone basename on disk) via a single _resolve_working_directory helper, so the main prompt and subagents address the same repo root. SandboxFileBackend._abs normalises the virtual root and prefix-dropped repo slips onto the workspace/repo root.

…fence (#1293) * feat(agent-fs): add TMP_PATH and disk-mode workspace fence permissions * feat(agent-fs): add build_disk_workspace_backend for the unified /workspace namespace * feat(agent): unify graph workspace root to /workspace with disk-only fence * feat(agent-skills): upload global skills under /workspace/skills * feat(agent-subagents): fence subagents to /workspace subtrees in disk mode * refactor(agent): drop unused GLOBAL_SKILLS_* constants after /workspace unification Remove GLOBAL_SKILLS_PATH and GLOBAL_SKILLS_ROUTE from constants.py now that the skills middleware addresses skills via SKILLS_PATH (/workspace/skills) directly. Update skills/services.py and associated tests to classify skill invocations against SKILLS_PATH instead of the removed /skills virtual route. * fix(agent-fs): allow read-back of offloaded-artifact dirs under /workspace fence The disk-mode fence denied everything under /workspace except the three real subtrees, which also blocked reading the offloaded large_tool_results/ conversation_history files. deepagents eviction and git_platform's output_to_file write those through the backend directly (bypassing the fence) and hand the agent the path to read back — so the read-back dead-ended on the /workspace/** deny. Add a read-only carve-out for the artifact dirs ahead of the deny (write stays denied; the agent never writes there itself), mirror it in the explore permissions, and pin the prefixes with a drift-guard test against deepagents' computed values so a framework rename fails loudly.

When the model batches tool calls, LangGraph runs them concurrently, but the sandbox serves one op per session (a Redis lock with a short wait) and returns 409 "Session is busy" to the loser. The file backend's raise_for_status() propagated that 409, aborting the whole run. The agent-facing file tools (ls/read/grep/glob/write/edit + delete) now catch httpx transport/HTTP faults and return a soft, agent-actionable result (transient -> retry once; permanent -> tools unavailable) like the bash tool already does, logging WARNING for transient and ERROR for permanent so genuine faults still reach the logs/Sentry. The bash tool's classifier now also treats 409 as transient (was permanent). The shared transient/permanent classifier lives in core/sandbox/client.py to avoid an import cycle. Pairs with a daiv-sandbox change raising the per-session lock wait so batched ops queue instead of failing fast.

* feat(codebase): add ephemeral GitLab clone token provisioning * fix(codebase): degrade clone-token provisioning on transport errors * feat(codebase): clone GitLab repos with ephemeral project tokens * feat(agent): log git push auth failures for diagnosability * docs: document GitLab ephemeral clone token behavior * fix(codebase): Harden clone-token provisioning failure handling Refine how clone-token provisioning degrades when GitLab cannot mint a project access token: - Negative-cache transient failures (network, 429, 5xx) for 5 minutes instead of an hour so a single blip doesn't park clones on the PAT. - Name the real culprit when GitLab rejects the configured PAT (401) rather than claiming a benign fallback, and guard against a created token arriving without a secret. - Raise ImproperlyConfigured when neither an ephemeral token nor a PAT is available instead of building an oauth2:None@ clone URL, and log per clone which credential was embedded. - Expand the git push auth-failure message with expired-clone-token and branch-protection guidance; log push network failures too. * chore(docker): Build sandbox locally and pin compose network name Build the sandbox service from the sibling daiv-sandbox checkout with a source mount so local development runs against the unreleased sandbox the clone-token work depends on. Pin the default compose network to daiv_default so sandbox-launched containers can attach to it by a stable name, and pin the gitlab host via DAIV_SANDBOX_EXTRA_HOSTS since gVisor can't use Docker's embedded DNS resolver.

srtab self-assigned this May 30, 2026

srtab force-pushed the feat/sandbox-filebackend-scratchpad branch from a6a5fe7 to 3e8db8a Compare June 1, 2026 16:03

srtab changed the title ~~feat(scratch): sandbox-backed /scratch scratchpad (SandboxFileBackend + /scratch route)~~ feat(workspace): sandbox-authoritative /workspace (SandboxFileBackend + agent-owned commit/MR) Jun 1, 2026

srtab and others added 20 commits June 1, 2026 17:07

feat(sandbox-client): add session_exists status check and force flag …

25ea096

…on close_session

feat(sandbox-mw): add warm-session cache helpers

9974b3b

test(sandbox-mw): give runtime helper a real config/thread_id

5c34d96

feat(sandbox-mw): reuse warm session across turns, skipping re-seed

4465ffd

feat(sandbox-mw): stop reusable sessions at turn-end, force-remove no…

d62a60c

…n-chat runs

docs: changelog for warm sandbox session reuse

e9f550f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(workspace): sandbox-authoritative /workspace (SandboxFileBackend + agent-owned commit/MR)#1279

feat(workspace): sandbox-authoritative /workspace (SandboxFileBackend + agent-owned commit/MR)#1279
srtab wants to merge 21 commits into
mainfrom
feat/sandbox-filebackend-scratchpad

srtab commented May 30, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

srtab commented May 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What changed

Prerequisite

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

srtab commented May 30, 2026 •

edited

Loading