Skip to content

feat(workspace): sandbox-authoritative /workspace (SandboxFileBackend + agent-owned commit/MR)#1279

Draft
srtab wants to merge 21 commits into
mainfrom
feat/sandbox-filebackend-scratchpad
Draft

feat(workspace): sandbox-authoritative /workspace (SandboxFileBackend + agent-owned commit/MR)#1279
srtab wants to merge 21 commits into
mainfrom
feat/sandbox-filebackend-scratchpad

Conversation

@srtab

@srtab srtab commented May 30, 2026

Copy link
Copy Markdown
Owner

Summary

Makes the daiv-sandbox container authoritative for both file and git state, with a single unified workspace under /workspace:

  • /workspace/repo — the repository working directory
  • /workspace/skills — seeded skills
  • /workspace/tmp — ephemeral per-run scratchpad (shared between file tools and bash, never committed)

The old local-clone-plus-sync model is retired: there is one source of truth (the sandbox), and the agent now owns publishing its own work.

What changed

  • SandboxFileBackend (middlewares/file_system.py) — a deepagents BackendProtocol that proxies file ops to the daiv-sandbox fs/* endpoints over the existing DAIVSandboxClient: one RPC per op, no local mirror. Root-configurable and bound to the session at start. Retires SandboxSyncer and the local patch-apply path.
  • Dual-mode GitManager (codebase/utils.py) via GitManager.for_sandbox / GitManager.for_local factories — git runs either through run_commands in /workspace/repo (sandbox-authoritative runs) or as a subprocess against a GitPython clone (sandbox-disabled / repoless runs), behind one async _git runner. Exit-code discipline means a failing git command raises instead of folding fatal text into its output; push failures are classified into GitPushPermissionError / GitPushNetworkError. open_git_manager() (git_utils.py) selects the mode from the run's session_id.
  • Agent-owned publishing — new commit_changes / create_merge_request tools (tools/git_publish.py) let the agent author its own commit and MR; commit/push/MR-API failures surface as tool-visible error Commands so the agent can recover rather than crash. GitMiddleware nudges the agent to publish before it stops (bounded by MAX_GIT_NUDGES, via jump_to=model) and falls back to a direct daiv publish as a safeguard. GIT_SYSTEM_PROMPT gains an agent_owns_commit variant; raw git add/commit/push/reset/rebase/config in bash stay hard-blocked by sandbox policy (read-only git remains allowed).
  • Oversized tool-output spillgitlab/gh tool output that exceeds the inline cap is written verbatim to a /workspace/tmp scratch file (middlewares/git_platform.py).
  • SlashCommandMiddleware split out of the skills middleware into its own middlewares/slash_commands.py.
  • fs_* client methods + Fs* wire schemas mirrored byte-identically to daiv-sandbox (enforced by the schema-drift consistency test), with a regenerated schemas.dump.json. Sandbox system prompt, bash examples, and the output/working-directory invariants now reflect /workspace/repo and /workspace/tmp.

Prerequisite

Requires the companion daiv-sandbox fs/* endpoints to be deployed. The :main sandbox image does not have them yet; backend/client unit tests mock the sandbox and pass independently.

Test plan

  • Unit suite: 2848 passing (make test)
  • Lint/format clean (make lint — ruff check + ruff format + pyproject-fmt + djade)
  • Schema-drift consistency test green (runs as part of make test)
  • Re-run live e2e against the sandbox once the companion fs/* endpoints are deployed to the shared sandbox

@srtab srtab self-assigned this May 30, 2026
…mmit/MR

Move the one true workspace into the daiv-sandbox container under /workspace
(repo at /workspace/repo, seeded skills at /workspace/skills, per-run scratchpad
at /workspace/tmp) and make the sandbox authoritative for both file and git state.

- SandboxFileBackend: a deepagents BackendProtocol that proxies file ops to the
  sandbox fs/* endpoints over DAIVSandboxClient — one RPC per op, no local mirror,
  root-configurable and bound at session start. Retires SandboxSyncer and the
  local patch-apply path.
- Dual-mode GitManager (for_sandbox / for_local factories): runs git either via
  run_commands in /workspace/repo (sandbox-authoritative runs) or as a subprocess
  against a GitPython clone (sandbox-disabled / repoless runs), behind one async
  _git runner with exit-code discipline so a failing git command raises instead of
  folding fatal text into its output. Push failures are classified into
  GitPushPermissionError / GitPushNetworkError. open_git_manager() builds the right
  mode from the run's session_id.
- Agent-owned publishing: new commit_changes / create_merge_request tools let the
  agent author its own commit and MR; commit, push and MR-API failures surface as
  tool-visible error Commands so the agent can recover rather than crashing the run.
  GitMiddleware nudges the agent to publish (bounded by MAX_GIT_NUDGES) and falls
  back to a direct daiv publish as a safeguard. GIT_SYSTEM_PROMPT gains an
  agent_owns_commit variant; raw git mutation in bash stays hard-blocked by policy.
- gitlab/gh tool output spills verbatim to a /workspace/tmp scratch file when it
  exceeds the inline cap. SlashCommandMiddleware split out of the skills middleware.
- fs_* client methods and Fs* wire schemas mirrored byte-identically to daiv-sandbox
  (enforced by the schema-drift consistency test); regenerated schemas.dump.json.

Requires the companion daiv-sandbox fs/* endpoints to be deployed.
@srtab srtab force-pushed the feat/sandbox-filebackend-scratchpad branch from a6a5fe7 to 3e8db8a Compare June 1, 2026 16:03
@srtab srtab changed the title feat(scratch): sandbox-backed /scratch scratchpad (SandboxFileBackend + /scratch route) feat(workspace): sandbox-authoritative /workspace (SandboxFileBackend + agent-owned commit/MR) Jun 1, 2026
srtab and others added 20 commits June 1, 2026 17:07
Agent-level middleware hooks receive a langgraph Runtime, which has no
.config attribute (unlike the ToolRuntime used in git_platform.py). Reading
runtime.config would AttributeError at runtime on every chat turn. Read the
conversation thread_id from the run config contextvar via get_config()
instead, falling back to None (reuse disabled) outside a runnable context.
PR-review follow-ups:
- Wrap all Redis cache reads/writes (_cache_get/_cache_set/_cache_delete) so a
  cache outage degrades to a cold create+seed instead of crashing the agent run
  (the calls sat inside abefore_agent's except BaseException, turning a pure
  optimization into a hard Redis dependency).
- Log when get_config() has no runnable context so silently-disabled reuse is
  diagnosable.
- Fix session_exists docstring (404=>False, any other success=>True; 204 is the
  sandbox's current answer, not a client-enforced contract) and add cross-service
  contract pointers; reframe the TTL/reaper coupling as best-effort/self-healing.
- Add resilience tests: session_exists propagates non-404; _reuse_warm_session
  swallows httpx.HTTPError without dropping the mapping; cache read/write outages
  degrade gracefully; aafter_agent swallows a 404 from close_session.
Register SlashCommandMiddleware only when slash_commands.enabled, matching
the conditional-registration pattern used by the sandbox/web/deferred
middleware, instead of always registering it and short-circuiting inside
abefore_agent. When disabled the middleware is now never instantiated or run
rather than no-op'ing every turn.
Move the thread->session cache mapping out of SandboxMiddleware into a
new SandboxSessionStore (core/sandbox/session_store.py). This separates
the persistence mechanism (key scheme, TTL, best-effort cache I/O) from
the reuse orchestration (validate liveness, remember/forget) that stays
in the middleware. The store knows nothing of the sandbox client, so the
backing store can be swapped without touching the middleware.

Behavior is unchanged. Warm-session cache tests move to the store's own
unit suite; the middleware tests now drive a fake store and assert
orchestration only.
The branch added commit_changes/create_merge_request tools plus a nudge
loop so the agent published its own work. Revert to the prior behavior:
GitMiddleware commits, pushes, and opens/updates the MR via
GitChangePublisher when the agent's turn ends.

Keep the sandbox/fs infrastructure intact — the publisher still threads
session_id through open_git_manager so it runs git in sandbox or local
mode. Remove the git_publish tools, the aafter_model nudge,
MAX_GIT_NUDGES, and the agent_owns_commit prompt variant;
GIT_SYSTEM_PROMPT returns to the "committing is automatic" contract.
The revert dropped _is_unpublished and let aafter_agent call the
publisher unconditionally. The publisher only short-circuits on
"dirty OR diff-vs-base", so on an idle follow-up turn of a warm-session
chat (clean tree, but an earlier turn's commit is already pushed and
ahead of base) it would re-run — burning a diff-to-metadata model call
and doing a no-op push — where both main and the pre-revert branch
skipped.

Restore _is_unpublished as the gate: it adds the has_unpushed check that
distinguishes "already live" from "needs pushing", so publishing now
fires under the same circumstances as before. It is sandbox-aware
publishing logic, not agent-owned, so it belongs after the revert.
The gitlab/gh platform tools previously took an agent-chosen output_file
path and wrote oversized output to a sandbox scratch file via a
short-lived client, with bespoke inline truncation and auto-eviction.

Replace that with an output_to_file boolean that writes the full result
through the agent's deepagents filesystem backend, into the same
large_tool_results dir the FilesystemMiddleware auto-evicts to. The
tools are now closures built inside GitPlatformMiddleware capturing the
bound backend; the sandbox backend is wrapped in a DAIVCompositeBackend
to carry an artifacts_root under /workspace, and _bind_backend unwraps
it to bind the live session.

Write failures (raised, backend-rejected, or a missing tool_call_id)
surface as agent-visible error strings. Drop the now-orphaned
SCRATCH_PATH constant and the legacy truncation helpers.
…on (#1287)

* feat(sandbox): own the sandbox transport per-run in set_runtime_ctx

* refactor(sandbox): inject run client by construction; state-based session reuse

* refactor(git): move GitManager into automation layer, drop lazy import

* refactor(git): remove unused GitManager.commit_and_push_changes/checkout

* perf(git): batch publish reads into status_snapshot (<=2 round-trips)

* refactor(git): fold publish decision into publisher; inject run client into git path

* perf(git): run MR create/update and context-file suggestion off the event loop

* refactor(git): extract _effective_mr_iid helper for MR resolution

* refactor(git): harden publish path and sandbox client teardown

Address review feedback on the sandbox-transport-injection branch:

- Guard the run-scoped sandbox client teardown in set_runtime_ctx so a
  transport-close error can't mask the in-flight exception, and always
  reset the contextvar.
- Fold protected_branch_fallback_source into the frozen PublishOutcome,
  dropping the mutable publisher side-channel read by both callers.
- Remove the dead GitManager query methods (is_dirty / get_diff /
  has_unpushed / remote_branches) superseded by status_snapshot.
- Add status_snapshot error-branch tests; re-target the empty-results
  and no-index hard-error tests onto the surviving helpers.
- Fix the "read once" client comments (BaseManager is a second reader),
  annotate GitMiddleware.sandbox_client, and flag the possibly orphaned
  container in _session_exists.
…on handle (#1292)

* feat(sandbox): add run_commands to SandboxFileBackend

* refactor(git): thread bound sandbox backend through git/publish path

* refactor(sandbox): run bash through the bound backend; thread it to subagents

* test(sandbox): guard that the backend never advertises execution

* feat(sandbox): classify bash failures as transient or permanent

The bash tool degraded every transport/HTTP error to the same generic
"sandbox call failed" string, so the agent could not tell a momentary
blip (worth one retry) from a non-recoverable rejection (stop using the
tool). Introduce a BashFailure enum that maps httpx errors to TRANSIENT
(no response, or a retryable status: 408/425/429/5xx) vs PERMANENT
(auth, session-gone, bad-request, not-implemented), and return distinct
agent-facing guidance for each. The transient message is byte-stable so
the system prompt's "two identical error strings => stop" backstop still
fires when a retry fails the same way.

Non-httpx failures (malformed 200 body, unbound-backend RuntimeError)
are left to propagate as loud wire/programming bugs.
Tool loading called MultiServerMCPClient.get_tools(), which gathers all
servers with no timeout, so one server whose handshake hangs blocked the
entire toolset build and froze every chat and run.

Load each server independently via asyncio.wait_for with a per-server
timeout (MCP_TOOL_LOAD_TIMEOUT, default 30s); a server that times out or
errors is skipped instead of blocking the others. CancelledError still
propagates so outer cancellation is preserved.
…tives

Filesystem tools (ls/read_file/grep/glob/write_file/edit_file) now branch on a
structured FsError/FsErrorCode from the sandbox instead of free-form strings: a
missing path reads as "does not exist" (distinct from an empty dir / no match),
reading a directory or writing over an existing file routes the agent to the
right tool, and deletes are idempotent. Requires the matching daiv-sandbox
release with the structured fs/* error responses.

Path directives and subagent prompts are now derived from the run's working
directory (REPO_PATH in a sandbox, clone basename on disk) via a single
_resolve_working_directory helper, so the main prompt and subagents address the
same repo root. SandboxFileBackend._abs normalises the virtual root and
prefix-dropped repo slips onto the workspace/repo root.
…fence (#1293)

* feat(agent-fs): add TMP_PATH and disk-mode workspace fence permissions

* feat(agent-fs): add build_disk_workspace_backend for the unified /workspace namespace

* feat(agent): unify graph workspace root to /workspace with disk-only fence

* feat(agent-skills): upload global skills under /workspace/skills

* feat(agent-subagents): fence subagents to /workspace subtrees in disk mode

* refactor(agent): drop unused GLOBAL_SKILLS_* constants after /workspace unification

Remove GLOBAL_SKILLS_PATH and GLOBAL_SKILLS_ROUTE from constants.py now that
the skills middleware addresses skills via SKILLS_PATH (/workspace/skills) directly.
Update skills/services.py and associated tests to classify skill invocations
against SKILLS_PATH instead of the removed /skills virtual route.

* fix(agent-fs): allow read-back of offloaded-artifact dirs under /workspace fence

The disk-mode fence denied everything under /workspace except the three real
subtrees, which also blocked reading the offloaded large_tool_results/
conversation_history files. deepagents eviction and git_platform's
output_to_file write those through the backend directly (bypassing the fence)
and hand the agent the path to read back — so the read-back dead-ended on the
/workspace/** deny.

Add a read-only carve-out for the artifact dirs ahead of the deny (write stays
denied; the agent never writes there itself), mirror it in the explore
permissions, and pin the prefixes with a drift-guard test against deepagents'
computed values so a framework rename fails loudly.
When the model batches tool calls, LangGraph runs them concurrently, but
the sandbox serves one op per session (a Redis lock with a short wait)
and returns 409 "Session is busy" to the loser. The file backend's
raise_for_status() propagated that 409, aborting the whole run.

The agent-facing file tools (ls/read/grep/glob/write/edit + delete) now
catch httpx transport/HTTP faults and return a soft, agent-actionable
result (transient -> retry once; permanent -> tools unavailable) like the
bash tool already does, logging WARNING for transient and ERROR for
permanent so genuine faults still reach the logs/Sentry. The bash tool's
classifier now also treats 409 as transient (was permanent). The shared
transient/permanent classifier lives in core/sandbox/client.py to avoid
an import cycle.

Pairs with a daiv-sandbox change raising the per-session lock wait so
batched ops queue instead of failing fast.
* feat(codebase): add ephemeral GitLab clone token provisioning

* fix(codebase): degrade clone-token provisioning on transport errors

* feat(codebase): clone GitLab repos with ephemeral project tokens

* feat(agent): log git push auth failures for diagnosability

* docs: document GitLab ephemeral clone token behavior

* fix(codebase): Harden clone-token provisioning failure handling

Refine how clone-token provisioning degrades when GitLab cannot mint
a project access token:

- Negative-cache transient failures (network, 429, 5xx) for 5 minutes
  instead of an hour so a single blip doesn't park clones on the PAT.
- Name the real culprit when GitLab rejects the configured PAT (401)
  rather than claiming a benign fallback, and guard against a created
  token arriving without a secret.
- Raise ImproperlyConfigured when neither an ephemeral token nor a PAT
  is available instead of building an oauth2:None@ clone URL, and log
  per clone which credential was embedded.
- Expand the git push auth-failure message with expired-clone-token and
  branch-protection guidance; log push network failures too.

* chore(docker): Build sandbox locally and pin compose network name

Build the sandbox service from the sibling daiv-sandbox checkout with a
source mount so local development runs against the unreleased sandbox
the clone-token work depends on. Pin the default compose network to
daiv_default so sandbox-launched containers can attach to it by a
stable name, and pin the gitlab host via DAIV_SANDBOX_EXTRA_HOSTS since
gVisor can't use Docker's embedded DNS resolver.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant