feat(workspace): sandbox-authoritative /workspace (SandboxFileBackend + agent-owned commit/MR)#1279
Draft
srtab wants to merge 21 commits into
Draft
feat(workspace): sandbox-authoritative /workspace (SandboxFileBackend + agent-owned commit/MR)#1279srtab wants to merge 21 commits into
srtab wants to merge 21 commits into
Conversation
…mmit/MR Move the one true workspace into the daiv-sandbox container under /workspace (repo at /workspace/repo, seeded skills at /workspace/skills, per-run scratchpad at /workspace/tmp) and make the sandbox authoritative for both file and git state. - SandboxFileBackend: a deepagents BackendProtocol that proxies file ops to the sandbox fs/* endpoints over DAIVSandboxClient — one RPC per op, no local mirror, root-configurable and bound at session start. Retires SandboxSyncer and the local patch-apply path. - Dual-mode GitManager (for_sandbox / for_local factories): runs git either via run_commands in /workspace/repo (sandbox-authoritative runs) or as a subprocess against a GitPython clone (sandbox-disabled / repoless runs), behind one async _git runner with exit-code discipline so a failing git command raises instead of folding fatal text into its output. Push failures are classified into GitPushPermissionError / GitPushNetworkError. open_git_manager() builds the right mode from the run's session_id. - Agent-owned publishing: new commit_changes / create_merge_request tools let the agent author its own commit and MR; commit, push and MR-API failures surface as tool-visible error Commands so the agent can recover rather than crashing the run. GitMiddleware nudges the agent to publish (bounded by MAX_GIT_NUDGES) and falls back to a direct daiv publish as a safeguard. GIT_SYSTEM_PROMPT gains an agent_owns_commit variant; raw git mutation in bash stays hard-blocked by policy. - gitlab/gh tool output spills verbatim to a /workspace/tmp scratch file when it exceeds the inline cap. SlashCommandMiddleware split out of the skills middleware. - fs_* client methods and Fs* wire schemas mirrored byte-identically to daiv-sandbox (enforced by the schema-drift consistency test); regenerated schemas.dump.json. Requires the companion daiv-sandbox fs/* endpoints to be deployed.
a6a5fe7 to
3e8db8a
Compare
Agent-level middleware hooks receive a langgraph Runtime, which has no .config attribute (unlike the ToolRuntime used in git_platform.py). Reading runtime.config would AttributeError at runtime on every chat turn. Read the conversation thread_id from the run config contextvar via get_config() instead, falling back to None (reuse disabled) outside a runnable context.
PR-review follow-ups: - Wrap all Redis cache reads/writes (_cache_get/_cache_set/_cache_delete) so a cache outage degrades to a cold create+seed instead of crashing the agent run (the calls sat inside abefore_agent's except BaseException, turning a pure optimization into a hard Redis dependency). - Log when get_config() has no runnable context so silently-disabled reuse is diagnosable. - Fix session_exists docstring (404=>False, any other success=>True; 204 is the sandbox's current answer, not a client-enforced contract) and add cross-service contract pointers; reframe the TTL/reaper coupling as best-effort/self-healing. - Add resilience tests: session_exists propagates non-404; _reuse_warm_session swallows httpx.HTTPError without dropping the mapping; cache read/write outages degrade gracefully; aafter_agent swallows a 404 from close_session.
Register SlashCommandMiddleware only when slash_commands.enabled, matching the conditional-registration pattern used by the sandbox/web/deferred middleware, instead of always registering it and short-circuiting inside abefore_agent. When disabled the middleware is now never instantiated or run rather than no-op'ing every turn.
Move the thread->session cache mapping out of SandboxMiddleware into a new SandboxSessionStore (core/sandbox/session_store.py). This separates the persistence mechanism (key scheme, TTL, best-effort cache I/O) from the reuse orchestration (validate liveness, remember/forget) that stays in the middleware. The store knows nothing of the sandbox client, so the backing store can be swapped without touching the middleware. Behavior is unchanged. Warm-session cache tests move to the store's own unit suite; the middleware tests now drive a fake store and assert orchestration only.
The branch added commit_changes/create_merge_request tools plus a nudge loop so the agent published its own work. Revert to the prior behavior: GitMiddleware commits, pushes, and opens/updates the MR via GitChangePublisher when the agent's turn ends. Keep the sandbox/fs infrastructure intact — the publisher still threads session_id through open_git_manager so it runs git in sandbox or local mode. Remove the git_publish tools, the aafter_model nudge, MAX_GIT_NUDGES, and the agent_owns_commit prompt variant; GIT_SYSTEM_PROMPT returns to the "committing is automatic" contract.
The revert dropped _is_unpublished and let aafter_agent call the publisher unconditionally. The publisher only short-circuits on "dirty OR diff-vs-base", so on an idle follow-up turn of a warm-session chat (clean tree, but an earlier turn's commit is already pushed and ahead of base) it would re-run — burning a diff-to-metadata model call and doing a no-op push — where both main and the pre-revert branch skipped. Restore _is_unpublished as the gate: it adds the has_unpushed check that distinguishes "already live" from "needs pushing", so publishing now fires under the same circumstances as before. It is sandbox-aware publishing logic, not agent-owned, so it belongs after the revert.
The gitlab/gh platform tools previously took an agent-chosen output_file path and wrote oversized output to a sandbox scratch file via a short-lived client, with bespoke inline truncation and auto-eviction. Replace that with an output_to_file boolean that writes the full result through the agent's deepagents filesystem backend, into the same large_tool_results dir the FilesystemMiddleware auto-evicts to. The tools are now closures built inside GitPlatformMiddleware capturing the bound backend; the sandbox backend is wrapped in a DAIVCompositeBackend to carry an artifacts_root under /workspace, and _bind_backend unwraps it to bind the live session. Write failures (raised, backend-rejected, or a missing tool_call_id) surface as agent-visible error strings. Drop the now-orphaned SCRATCH_PATH constant and the legacy truncation helpers.
…on (#1287) * feat(sandbox): own the sandbox transport per-run in set_runtime_ctx * refactor(sandbox): inject run client by construction; state-based session reuse * refactor(git): move GitManager into automation layer, drop lazy import * refactor(git): remove unused GitManager.commit_and_push_changes/checkout * perf(git): batch publish reads into status_snapshot (<=2 round-trips) * refactor(git): fold publish decision into publisher; inject run client into git path * perf(git): run MR create/update and context-file suggestion off the event loop * refactor(git): extract _effective_mr_iid helper for MR resolution * refactor(git): harden publish path and sandbox client teardown Address review feedback on the sandbox-transport-injection branch: - Guard the run-scoped sandbox client teardown in set_runtime_ctx so a transport-close error can't mask the in-flight exception, and always reset the contextvar. - Fold protected_branch_fallback_source into the frozen PublishOutcome, dropping the mutable publisher side-channel read by both callers. - Remove the dead GitManager query methods (is_dirty / get_diff / has_unpushed / remote_branches) superseded by status_snapshot. - Add status_snapshot error-branch tests; re-target the empty-results and no-index hard-error tests onto the surviving helpers. - Fix the "read once" client comments (BaseManager is a second reader), annotate GitMiddleware.sandbox_client, and flag the possibly orphaned container in _session_exists.
…on handle (#1292) * feat(sandbox): add run_commands to SandboxFileBackend * refactor(git): thread bound sandbox backend through git/publish path * refactor(sandbox): run bash through the bound backend; thread it to subagents * test(sandbox): guard that the backend never advertises execution * feat(sandbox): classify bash failures as transient or permanent The bash tool degraded every transport/HTTP error to the same generic "sandbox call failed" string, so the agent could not tell a momentary blip (worth one retry) from a non-recoverable rejection (stop using the tool). Introduce a BashFailure enum that maps httpx errors to TRANSIENT (no response, or a retryable status: 408/425/429/5xx) vs PERMANENT (auth, session-gone, bad-request, not-implemented), and return distinct agent-facing guidance for each. The transient message is byte-stable so the system prompt's "two identical error strings => stop" backstop still fires when a retry fails the same way. Non-httpx failures (malformed 200 body, unbound-backend RuntimeError) are left to propagate as loud wire/programming bugs.
Tool loading called MultiServerMCPClient.get_tools(), which gathers all servers with no timeout, so one server whose handshake hangs blocked the entire toolset build and froze every chat and run. Load each server independently via asyncio.wait_for with a per-server timeout (MCP_TOOL_LOAD_TIMEOUT, default 30s); a server that times out or errors is skipped instead of blocking the others. CancelledError still propagates so outer cancellation is preserved.
…tives Filesystem tools (ls/read_file/grep/glob/write_file/edit_file) now branch on a structured FsError/FsErrorCode from the sandbox instead of free-form strings: a missing path reads as "does not exist" (distinct from an empty dir / no match), reading a directory or writing over an existing file routes the agent to the right tool, and deletes are idempotent. Requires the matching daiv-sandbox release with the structured fs/* error responses. Path directives and subagent prompts are now derived from the run's working directory (REPO_PATH in a sandbox, clone basename on disk) via a single _resolve_working_directory helper, so the main prompt and subagents address the same repo root. SandboxFileBackend._abs normalises the virtual root and prefix-dropped repo slips onto the workspace/repo root.
…fence (#1293) * feat(agent-fs): add TMP_PATH and disk-mode workspace fence permissions * feat(agent-fs): add build_disk_workspace_backend for the unified /workspace namespace * feat(agent): unify graph workspace root to /workspace with disk-only fence * feat(agent-skills): upload global skills under /workspace/skills * feat(agent-subagents): fence subagents to /workspace subtrees in disk mode * refactor(agent): drop unused GLOBAL_SKILLS_* constants after /workspace unification Remove GLOBAL_SKILLS_PATH and GLOBAL_SKILLS_ROUTE from constants.py now that the skills middleware addresses skills via SKILLS_PATH (/workspace/skills) directly. Update skills/services.py and associated tests to classify skill invocations against SKILLS_PATH instead of the removed /skills virtual route. * fix(agent-fs): allow read-back of offloaded-artifact dirs under /workspace fence The disk-mode fence denied everything under /workspace except the three real subtrees, which also blocked reading the offloaded large_tool_results/ conversation_history files. deepagents eviction and git_platform's output_to_file write those through the backend directly (bypassing the fence) and hand the agent the path to read back — so the read-back dead-ended on the /workspace/** deny. Add a read-only carve-out for the artifact dirs ahead of the deny (write stays denied; the agent never writes there itself), mirror it in the explore permissions, and pin the prefixes with a drift-guard test against deepagents' computed values so a framework rename fails loudly.
When the model batches tool calls, LangGraph runs them concurrently, but the sandbox serves one op per session (a Redis lock with a short wait) and returns 409 "Session is busy" to the loser. The file backend's raise_for_status() propagated that 409, aborting the whole run. The agent-facing file tools (ls/read/grep/glob/write/edit + delete) now catch httpx transport/HTTP faults and return a soft, agent-actionable result (transient -> retry once; permanent -> tools unavailable) like the bash tool already does, logging WARNING for transient and ERROR for permanent so genuine faults still reach the logs/Sentry. The bash tool's classifier now also treats 409 as transient (was permanent). The shared transient/permanent classifier lives in core/sandbox/client.py to avoid an import cycle. Pairs with a daiv-sandbox change raising the per-session lock wait so batched ops queue instead of failing fast.
* feat(codebase): add ephemeral GitLab clone token provisioning * fix(codebase): degrade clone-token provisioning on transport errors * feat(codebase): clone GitLab repos with ephemeral project tokens * feat(agent): log git push auth failures for diagnosability * docs: document GitLab ephemeral clone token behavior * fix(codebase): Harden clone-token provisioning failure handling Refine how clone-token provisioning degrades when GitLab cannot mint a project access token: - Negative-cache transient failures (network, 429, 5xx) for 5 minutes instead of an hour so a single blip doesn't park clones on the PAT. - Name the real culprit when GitLab rejects the configured PAT (401) rather than claiming a benign fallback, and guard against a created token arriving without a secret. - Raise ImproperlyConfigured when neither an ephemeral token nor a PAT is available instead of building an oauth2:None@ clone URL, and log per clone which credential was embedded. - Expand the git push auth-failure message with expired-clone-token and branch-protection guidance; log push network failures too. * chore(docker): Build sandbox locally and pin compose network name Build the sandbox service from the sibling daiv-sandbox checkout with a source mount so local development runs against the unreleased sandbox the clone-token work depends on. Pin the default compose network to daiv_default so sandbox-launched containers can attach to it by a stable name, and pin the gitlab host via DAIV_SANDBOX_EXTRA_HOSTS since gVisor can't use Docker's embedded DNS resolver.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Makes the daiv-sandbox container authoritative for both file and git state, with a single unified workspace under
/workspace:/workspace/repo— the repository working directory/workspace/skills— seeded skills/workspace/tmp— ephemeral per-run scratchpad (shared between file tools andbash, never committed)The old local-clone-plus-sync model is retired: there is one source of truth (the sandbox), and the agent now owns publishing its own work.
What changed
SandboxFileBackend(middlewares/file_system.py) — a deepagentsBackendProtocolthat proxies file ops to the daiv-sandboxfs/*endpoints over the existingDAIVSandboxClient: one RPC per op, no local mirror. Root-configurable and bound to the session at start. RetiresSandboxSyncerand the local patch-apply path.GitManager(codebase/utils.py) viaGitManager.for_sandbox/GitManager.for_localfactories — git runs either throughrun_commandsin/workspace/repo(sandbox-authoritative runs) or as a subprocess against a GitPython clone (sandbox-disabled / repoless runs), behind one async_gitrunner. Exit-code discipline means a failing git command raises instead of folding fatal text into its output; push failures are classified intoGitPushPermissionError/GitPushNetworkError.open_git_manager()(git_utils.py) selects the mode from the run'ssession_id.commit_changes/create_merge_requesttools (tools/git_publish.py) let the agent author its own commit and MR; commit/push/MR-API failures surface as tool-visible errorCommands so the agent can recover rather than crash.GitMiddlewarenudges the agent to publish before it stops (bounded byMAX_GIT_NUDGES, viajump_to=model) and falls back to a direct daiv publish as a safeguard.GIT_SYSTEM_PROMPTgains anagent_owns_commitvariant; rawgit add/commit/push/reset/rebase/configinbashstay hard-blocked by sandbox policy (read-only git remains allowed).gitlab/ghtool output that exceeds the inline cap is written verbatim to a/workspace/tmpscratch file (middlewares/git_platform.py).SlashCommandMiddlewaresplit out of the skills middleware into its ownmiddlewares/slash_commands.py.fs_*client methods +Fs*wire schemas mirrored byte-identically to daiv-sandbox (enforced by the schema-drift consistency test), with a regeneratedschemas.dump.json. Sandbox system prompt,bashexamples, and the output/working-directory invariants now reflect/workspace/repoand/workspace/tmp.Prerequisite
Requires the companion daiv-sandbox
fs/*endpoints to be deployed. The:mainsandbox image does not have them yet; backend/client unit tests mock the sandbox and pass independently.Test plan
make test)make lint— ruff check + ruff format + pyproject-fmt + djade)make test)fs/*endpoints are deployed to the shared sandbox