Gofannon user-ready#577
Merged
Merged
Conversation
Clicking a composer or invokable chip on ViewAgent opens the model dialog pre-populated with that item's existing config, so users can edit a previously-added model in place instead of deleting and re-adding. Dialog title reflects whether it's an add or edit.
Adds OpenRouter alongside the existing providers with an 11-model catalog: grok-code-fast-1, grok-4.1-fast, claude-sonnet-4.5, claude-opus-4.1, gpt-5, gpt-5-mini, deepseek-v3.2, deepseek-chat-v3.1, qwen3-coder, qwen3-coder-next, llama-3.3-70b-instruct. - New config/openrouter/ module with _make_entry helper for the catalog - provider_config.py registers the new provider - models/user.py adds openrouter_api_key field - services/user_service.py extracts PROVIDER_KEY_MAP constant - ApiKeysTab.jsx adds the OpenRouter row No llm_service.py changes needed — existing model_string routing handles openrouter/* model ids.
Input fields in the agent Sandbox now match the declared schema type instead of always rendering as text. Number fields get numeric input, boolean gets a Switch, JSON gets a multiline textarea with inline parse-error feedback. Adds 'json' as a schema type option in the SchemaEditor. Values are cast on submit so backend receives correctly-typed payloads.
Backend portion of PR 6 only. SandboxScreen.jsx hunks deferred
because 3 of 5 hunks conflict with PR 5's handleRun restructuring
and partial apply would leave the frontend referencing undefined
variables (schemaWarnings, WarningAmberIcon).
Included:
- agent_factory/prompts.py: strengthens output directive prompts
with three ✅/❌ examples so the LLM returns structured results
matching the declared output_schema instead of wrapping in
{outputText: ...}.
- dependencies.py: validate_output_against_schema() — checks dict
shape, missing/extra keys, type mismatches (with bool-vs-int
gotcha handling).
- models/agent.py: adds output_schema to RunCodeRequest and
schema_warnings to RunCodeResponse.
- routes.py: sandbox route calls the validator, returns warnings.
- services/agentService.js: runCodeInSandbox signature extended
with outputSchema parameter.
Deferred to PR 8b:
- SandboxScreen.jsx: adds WarningAmberIcon import, schemaWarnings
state, capture from response, advisory banner JSX, outputSchema
arg on the service call. All land atomically when PR 8b rebuilds
handleRun.
Adds a 'Chain View' accordion on ViewAgent showing the transitive
dependency tree of an agent: nested GofannonClient calls and MCP
servers, rendered as an indented MUI List. Root agent is expanded
by default. Cycles and missing agents are badged; depth capped at
8 to prevent runaway recursion.
- dependencies.py: build_agent_chain with ancestry-based cycle
detection and missing-agent handling
- routes.py: GET /agents/{id}/chain
- New components/AgentChainView.jsx with Launch icon to navigate
into a child agent
User-facing browser for the persistent data store. - models/data_store.py: DataStoreRecord, NamespaceStats, NamespaceListResponse, SetRecordRequest, ClearNamespaceResponse - routes.py: 6 new endpoints for namespace/record CRUD with path-matched keys - services/dataStoreService.js: API client wrapper - pages/DataStoresPage.jsx: stats cards + namespace table + clear confirmation - pages/DataStoreBrowser.jsx: prefix-grouped record table with search, right drawer with Value/Metadata/Copy/Edit/Delete, JSON-or-raw-string edit dialog - HomePage adds a 3rd column for Data Stores at xl breakpoints - New routes /data-stores and /data-stores/:namespace
…rs PR 6 frontend (item 12 part 2)
Per-agent data store configuration and a live sandbox panel showing
every data store op the agent performed during its run.
- services/data_store_service.py: AgentDataStoreProxy instrumented
with an ops_log parameter; all 9 ops log structured entries
{op, namespace, agent, ts, key?, valuePreview?, found?, count?}
with 200-char value previews.
- dependencies.py: _execute_agent_code returns (result, ops_log);
three internal callsites updated.
- models/agent.py: DataStoreNamespaceConfig + data_store_config
field on Create/Update requests; ops_log on RunCodeResponse.
- components/SandboxDataPanel.jsx: right-side panel with
Operations tab (color-coded READ/WRITE/DEL chips, expandable
rows) + State tab (per-namespace aggregation).
- components/DataStoreConfigAccordion.jsx: flow preview
(Reads From → agent → Writes To) + namespace table with
Autocomplete suggestions.
- ViewAgent.jsx inserts the config accordion between Schemas and
Model Config; data_store_config threaded through save payloads.
- SandboxScreen.jsx rebuilt to integrate PR 5's typed inputs,
PR 6's schemaWarnings capture (deferred from PR 6), and PR 8b's
opsLog capture + two-column layout with the data panel. Fixes
the frontend half of PR 6 that was skipped due to hunk conflicts.
- Updates 5 unit tests (test_dependencies.py, test_context_window.py)
to unpack the new (result, ops_log) tuple from _execute_agent_code.
Backend-only. Adds the session-based auth system gated behind the
AUTH_CONFIG_PATH env var. No user-visible changes without operator
opt-in; legacy Firebase auth continues to work untouched.
- config/__init__.py: loads AUTH_CONFIG_PATH YAML into settings
- models/session.py, workspace.py, auth.py: data models
- auth/base.py: AuthProvider ABC with get_authorize_url,
exchange_code, get_workspace_memberships, evaluate_login
- auth/ldap_client.py: ldap3 wrapper for ASF committer/PMC/banned
queries with soft-fail on LDAP outage
- auth/providers/dev_stub.py: local-dev provider with YAML-configured
test users and a plain HTML picker page
- auth/providers/asf.py: real oauth.apache.org + LDAP integration
with ASF-specific banned/emeritus/site-admin policy
- auth/__init__.py: ProviderRegistry with startup init
- services/session_service.py: CRUD + refresh with diff computation
- services/audit_service.py: append-only log (scaffolding for B-3)
- routes_auth.py: /auth/providers, /auth/login/{type},
/auth/callback/{type}, /auth/logout, /auth/refresh-workspaces,
/auth/me, /auth/dev-stub-picker
- routes.py: get_current_user is dual-mode (session cookie first,
Firebase bearer token fallback)
- app_factory.py: registry init and conditional auth router mount
- requirements.txt: +ldap3>=2.9
Three-tier role model: member (workspace), admin (workspace, from
LDAP PMC intersection), site_admin (global, from config allowlist).
Personal workspace auto-created per session. Soft-fail on LDAP
outage preserves existing memberships.
…ow tests Infra hardening for the E2E test harness: - playwright.config.js: fullyParallel:false, workers:1. Tests share a single backend user (local-dev-user) so parallel workers race on state mutations. - packages/webui/vite.config.js: ignore test-results, playwright- report, .auth, coverage, htmlcov from watch. Prevents Vite HMR reloading the page mid-test. - infra/docker/docker-compose.yml: uvicorn --reload excludes for *.pyc, __pycache__, tests, pytest_cache, htmlcov, coverage. Prevents uvicorn restarting mid-request during test runs. - tests/e2e/api-keys.spec.js: skip 5 write-flow tests (add/update/ remove/masked/success) that intermittently fail with 'Failed to fetch' despite the above fixes. Backend API-key write endpoints are covered by pytest integration tests; skipping gives a stable 11/16 passing baseline while root cause is investigated in a separate project.
Teaches the UI to talk to the B-1 session backend. - services/authService.js: new sessionAuth implementation alongside firebase/mock/cognito. Selected when appConfig.auth.provider is 'session'. Uses the gofannon_sid cookie set by B-1's callback route; exposes refreshWorkspaces. Exports fetchAuthProviders helper for LoginPage. - services/fetchInterceptor.js: wraps window.fetch once at load to auto-add credentials:'include' to same-origin API calls — avoids editing 19 fetch sites across 5 service files. - contexts/AuthContext.jsx: adds refreshWorkspaces and isSessionAuth to context value. - pages/LoginPage.jsx: fetches /auth/providers on mount. Renders one button per provider when Phase B is enabled; legacy Firebase form shown below when operator sets legacyFirebaseEnabled=true. - components/ProfileMenu.jsx: user identity header with site-admin chip, workspace list preview (top 5 with admin chips, '+N more'), 'Refresh workspaces' menu item with snackbar for the diff. - App.jsx: imports fetchInterceptor at top. Zero regression for teams not on Phase B: without AUTH_CONFIG_PATH the backend 404s /auth/providers and LoginPage renders legacy form identically to before.
Extends the Phase B pluggable auth infrastructure with three additional
identity providers. Backend-only; existing LoginPage (from B-2) picks
them up via /auth/providers.
New providers:
- auth/providers/google.py: Google Workspace OAuth + Admin SDK Directory
API for Google Groups. Hosted-domain enforcement, allowlist default,
OWNER/MANAGER -> admin role mapping.
- auth/providers/microsoft.py: Microsoft Entra ID OAuth + Graph
/me/transitiveMemberOf for security group memberships. Tenant-scoped
authorize, optional admin_groups subset for role promotion.
- auth/providers/github.py: GitHub OAuth + /user/memberships/orgs/{org}
for org role. Numeric id for external_id (rename-stable), case-
insensitive org normalization.
Common patterns:
- All three default to mode=allowlist (deny unless in configured
group/org); operators opt into open_domain/open_tenant/open_github
for public-style deployments.
- Access token stashed on UserInfo for the subsequent memberships call.
- 403 soft-fails to empty memberships.
Wired into the registry:
- auth/__init__.py: three new entries in _PROVIDER_CLASSES.
- auth/providers/__init__.py: re-exports new classes.
Tests: 46 mock-based unit tests across tests/unit/auth/, covering
config validation, authorize URL shape, exchange_code happy/error
paths, membership allowlist + role mapping + 403 soft-fail, and all
evaluate_login branches.
Docs: auth.example.yaml extended with disabled example blocks for
each new provider (same pattern as the existing asf + dev_stub blocks).
Reference configuration template. Operators copy this, fill in secrets, and point AUTH_CONFIG_PATH at the copy. Includes blocks for dev_stub (local dev), asf (PR B-1), and google/microsoft/ github (PR B-1.1). All providers disabled by default except dev_stub for the example.
The couchdb-python library no longer accepts shard-count (n) and
shard-quorum (q) kwargs on Server.create(). These params only apply
to clustered CouchDB deployments anyway; the dev stack runs
single-node where they're ignored.
Symptom: 500 on any route that triggers DB auto-creation against a
fresh CouchDB instance ('TypeError: Server.create() got an
unexpected keyword argument n').
Pre-existing bug, surfaced when testing against a fresh CouchDB.
app_factory.py computed allowed_origins from FRONTEND_URL then discarded it, hardcoding allow_origins=['*']. With allow_credentials=True the browser blocks responses because the CORS spec forbids wildcard + credentials. Pre-existing bug. Exposed by PR B-2's fetchInterceptor adding credentials:'include' to every API call to support session cookies. Before B-2, nothing sent credentials, so the wildcard was tolerated. Fix: honor the computed allowed_origins list.
…-tail - Dockerfile.api: bump base image from python:3.10-slim to python:3.12-slim. Silences google.api_core's FutureWarning about Python 3.10 EOL in Oct 2026. - Dockerfile.api: upgrade pip before installing requirements and pass --root-user-action=ignore to silence the 'running pip as root' warning in container builds. - playwright.config.js: baseURL and dev-server command on port 3000 to match the backend's FRONTEND_URL default (which drives the CORS allowlist). - dev-tail.sh: new local dev runner that starts docker + vite in a single script with tailing logs. Uses isolated COMPOSE_PROJECT_NAME=gofannon-dev so other compose projects on the host aren't disturbed. Supports --phase-b for Phase B auth testing with a dev_stub-configured yaml, and --stop for clean teardown.
Inline the auth.yaml mount into docker-compose.yml, commit a working .dev-auth.yaml at the repo root with three dev_stub users (alice/bob/site_admin_1), and remove the override mechanism that was conditionally generated by dev-tail.sh. After this: ./dev-tail.sh # auth is on, login works out of the box ./dev-tail.sh --stop # unchanged Operators deploying gofannon override AUTH_CONFIG_PATH and mount their own auth.yaml at deploy time; the committed .dev-auth.yaml exists only for local development. The header comment in that file spells out the dev-only nature loudly enough that nobody copies it into production by accident. Personal customization without touching the committed file: copy to .dev-auth.local.yaml (gitignored) and point the api service's volume mount at that copy. Documentation for the new flow lives at docs/developers/local-auth.md — covers the committed fixtures, the .dev-auth.local.yaml escape hatch, production guidance, and troubleshooting. Signed-off-by: Andrew Musselman <andrew.musselman@gmail.com>
The companion fix (a9add26 'fix(cors): use computed allowed_origins, don't hardcode wildcard') made the production code honor FRONTEND_URL instead of hardcoding ['*']. This test was asserting the old buggy behavior; update to assert the new correct behavior.
The previous global-setup seeded localStorage with a mock user, which worked when the frontend's authService picked mockAuth at module-load time. With session auth as the default, mockAuth is not loaded; the seeded localStorage is ignored; tests start unauthenticated and the AccountCircle menu (only rendered when logged in) never appears. Replace with a setup that walks the dev_stub login flow: - GET /auth/login/dev_stub kicks off the OAuth-shaped flow - Picker page (rendered by backend) lists configured users as <a>s - Click the alice link → backend callback → session cookie set - storageState now contains gofannon_sid; tests inherit it Adds E2E_STUB_USER env var so a suite can run as bob (deny path) or site_admin_1 (admin views) without editing the file. Adds a post- login /auth/me sanity check so authentication failures produce a useful error instead of cascading into 'AccountCircle not found' timeouts in every test.
Three unrelated fixes bundled into one commit.
1. sessionAuth.onAuthStateChanged was firing callback(null)
synchronously on first listener subscription, before /auth/me
resolved. AuthContext set loading=false on the null callback;
PrivateRoute saw {user: null, loading: false} and bounced to
/login; LoginPage's 'if (user) navigate(/)' fired the moment
/auth/me resolved with the real user; user landed on home with
no idea what happened. Refresh on /agent/<id> always landed on
home as a result. Fix: only emit synchronously if we already
have a resolved user (covers later subscriptions); otherwise
wait for _fetchMe to resolve and let _emit() send the real
value, keeping AuthContext in loading=true until then.
2. HomePage and DataStoreConfigAccordion fetched the namespace
list only on mount with [] deps. A namespace created in another
tab/page didn't appear until hard refresh. Fix: refetch on
visibilitychange so coming back to a tab gives fresh data.
3. run-all-tests.sh checked for a 'webui' container in 'docker ps'
as the gate on running e2e tests. With dev-tail.sh the webui is
vite on the host, not a container, so the check always failed.
Fix: replace with a curl probe of localhost:3000 — answers the
actual question (can e2e reach the frontend?) and works whether
the frontend is vite, nginx-in-compose, or anything else.
The agent editor required users to click Generate Code before they could see the code editor or save the agent. Two friction points: 1. The Agent Code accordion was gated on hasCode, so the editor was hidden entirely until code existed. Users with code already written (or copied from another agent) had no way to drop it in. 2. Save validation required a non-empty description even when code was already present. Description exists primarily as the prompt input for the LLM generator; once code exists, it's optional metadata. Un-gate the accordion: always render, default-expanded in the creation flow or when code exists, with a hint pointing at both paths (paste here, or use Generate Code below). Reorder save validation to check code first, and only require description when code is absent. The Generate Code button is unchanged and still requires a description (correct — that operation does need it). This just adds 'paste your own' as an equally valid path.
Sandbox Progress Log Adds a structured per-run trace and a Progress Log accordion in the sandbox right column above the data-store panel. Backend (services/agent_trace.py): Trace collects events (agent_start, agent_end, llm_call, data_store, error, stdout, log). Bound to the asyncio task tree via contextvar so nested layers emit without threading the collector through every signature. capture_user_io() routes stdout/stderr/logging into the trace, with 4KB-per-event and 2000-event caps. GOFANNON_DISABLE_USER_TRACE=1 suppresses user-origin capture; structural events still emit. Failure path returns a structured response with partial trace instead of raising. friendly_name flows through the request to the trace's per-event agent_name. Frontend (SandboxProgressLog): runs listed newest-first, each with per-agent groups. Outcome icons, durations, 'chained' badges for nested calls. Errors highlighted with red border + bg. Multi-line content truncated to 3 lines with a 'more' link to a Drawer side sheet (right anchor, 600px). In-memory history, refresh wipes. Transport errors get a synthetic event so the log doesn't spin. Trace ships in the bulk response — events appear at run completion, not live. SSE streaming follow-up planned. docs/developers/agent-trace.md covers the env var, caps, contextvar rationale, and how to add new event types. Profile Menu Cleanup Drop Basic Info / Usage / Billing menu items (placeholders with no content); ProfilePage now renders ApiKeysTab directly. Restyle ApiKeysTab to match other top-level pages: constrained max width, back-arrow + h5 title, single in-place TextField per row, 'Configured' chip only when keyed. Tests realigned.
Long-running agents (e.g. the ASVS auditor that emits hundreds of
print() lines per run) made the Progress Log accordion unwieldy:
the panel grew vertically without bound and the right column
spilled past its boundary. Reading the run shape — what LLM calls
happened, where errors hit — meant scrolling through walls of
debug output.
Bucket stdout/log events into per-agent collapsible rows. Each
contiguous run of stdout/log events between structural events
(llm_call, data_store, error, agent_end) becomes one row showing
'N lines of stdout/log output · click to view'. Buckets break at
structural events so chronological flow is preserved — you still
see 'agent prints → LLM call → agent prints more', just with each
'prints' segment collapsed.
Click → side sheet shows all the bucketed lines in a scrollable
monospace block, with timestamp gutter and per-line error
highlighting (heuristic: lines containing ERROR/FAIL/TRACEBACK
get a red left border).
The bucket summary row pops the latest error-flavored line into a
preview underneath ('Latest: ERROR: ...consolidated.md') and
shows an error count badge ('47 lines · 30 with errors'), so the
common 'agent caught an exception, printed it, kept going'
failure mode is visible without clicking.
Structural events (llm_call, data_store, error, agent_end) still
render inline. Errors keep their red border + bg highlight.
Stack traces still go to the existing single-event side sheet
view; bucket view is a sibling Drawer mode that's mutually
exclusive with the single-event view.
Sandbox runs waited for the agent to finish before showing any trace events — for long agents (file ingest, multi-step LLM flows) the Progress Log spun for minutes showing 'No events recorded' while api logs filled with the agent's actual output. Adds POST /agents/run-code/stream returning text/event-stream. Each Trace event becomes one SSE 'trace' frame, dispatched to the client within ~50ms. A final 'done' frame carries result, error, schema_warnings, and ops_log. Trace gains an optional asyncio.Queue. attach_queue() wires it up before the agent runs; every Trace.append() also publishes to the queue (put_nowait, never blocks emitters). The streaming endpoint runs the agent in an asyncio.Task and pulls from the queue, yielding SSE frames until a sentinel signals completion. Frontend uses fetch + ReadableStream rather than EventSource (POST and custom headers needed). agentService gains runCodeInSandboxStreaming which parses SSE frames manually and dispatches each event to an onEvent callback. SandboxScreen wires onEvent into setRuns so events accumulate in the in-flight 'running' entry as they arrive. Non-streaming /agents/run-code endpoint stays for callers that want the bulk response (deployed agents, scripts, batch tests). Both share _execute_agent_code; only the response shape differs. 30s heartbeat comment frames prevent proxy idle-timeout. X-Accel-Buffering: no header tells nginx not to buffer the response body.
The test navigated to /profile/basic (a placeholder route that's
gone) then clicked text=API Keys, which drifted onto the h5 page
title because ProfilePage now always renders ApiKeysTab regardless
of the path segment. Click hit the page title behind the still-
open menu backdrop, timed out.
Start from / instead, and use getByRole('menuitem') to target the
menu item unambiguously.
Signed-off-by: Andrew Musselman <andrew.musselman@gmail.com>
Companion to 9e113f9 which dropped lines but missed statements. Signed-off-by: Andrew Musselman <andrew.musselman@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Sandbox observability + dev-stack hardening
A bundle of medium features and stack hardening accumulated through testing the dev stack against real workloads.
Sandbox Progress Log
The sandbox previously showed an agent's final result and a panel of data store ops; users had to read the api container's stdout to see what an agent was actually doing. Adds a structured per-run trace surfaced in a new Progress Log accordion in the sandbox right column.
Backend.
services/agent_trace.pycollects events (agent_start/end,llm_call,data_store,error,stdout,log). Bound via contextvar so nested layers (LLM service, data store proxy,GofannonClient.callrecursion) emit without threading the collector through every signature.capture_user_io()routes stdout/stderr/logging into the trace with 4 KB per-event and 2000 events per-trace caps; streams restored on exit including on exception.GOFANNON_DISABLE_USER_TRACE=1suppresses user-origin events; structural events still emit. The LLM call wrapper times each call so duration appears even whencall_llmraises. Sandbox failure path returns a structured response with the partial trace instead of raising.Frontend.
SandboxProgressLog.jsxlists runs newest-first; each is a card with status chip and per-agent groups. Outcome icons (✓/✗/⏳), durations, "chained" badges for nested calls. Errors get red border + bg. In-memory history (lost on refresh).Streaming.
POST /agents/run-code/streamreturnstext/event-stream. Each Trace event becomes one SSEtraceframe (~50 ms latency); finaldoneframe carries result/error/opsLog/schemaWarnings. Trace gains an optionalasyncio.Queuepublished to on each append. Frontend usesfetch + ReadableStream(not EventSource — POST + custom headers needed). 30s heartbeat comments +X-Accel-Buffering: nokeep proxies from idling out the connection. Non-streaming endpoint stays for callers that want a bulk shape.Bucketing. Long agents emit hundreds of lines; the panel got unwieldy. stdout/log events collapse into per-agent buckets ("47 lines of stdout/log output · click to view"), breaking at structural events so chronological flow is preserved. Click → Drawer side sheet with all lines in a scrollable monospace block, per-line error highlighting (lines with ERROR/FAIL/TRACEBACK), and a one-line preview of the latest error-flavored line in the bucket summary.
Side sheet for stack traces. Multi-line content truncated to 3 lines inline with a "more" link to the same side sheet.
Tests.
test_agent_trace.py(33 unit tests) covers event collection, depth/agent stack, truncation cap, env-var disable, contextvar binding, line-buffering stdout wrapper, logging handler, andcapture_user_ioincluding stream restoration on exception.test_run_code_streaming.py(6 integration tests) covers the streaming endpoint end-to-end: success path, error path with structureddoneframe, opsLog/schemaWarnings in the done frame, response headers, friendly_name plumbing, SSE parser tolerance for heartbeat comments.agent_trace.pyjumps from 0 % to 87 % coverage.docs/developers/agent-trace.mdcovers the env var, leak vectors, caps, contextvar rationale, and how to add new event types.Phase B (session) auth as the default dev mode
Session-cookie auth becomes the default dev-stack mode.
dev-tail.shno longer needs--phase-b;mockAuthis no longer the default frontend service.Flow:
GET /auth/login/dev_stub→ backend redirects to picker → user clicks alice/bob/site_admin_1 → callback setsgofannon_sidhttpOnly cookie → redirected back to frontend..dev-auth.yamlcommitted as a dev fixture.developers/local-auth.mddocuments the flow.Five bugs surfaced and were fixed during validation:
FRONTEND_URL.FRONTEND_URLwhen relative.AuthContextmisrecognizing sessions becauselocal.jshadprovider:'mock'→ flipped tosession./auth/dev_stub/login→ use the real GET → picker → callback flow.sessionAuth.onAuthStateChangedfired a synchronous null callback before/auth/meresolved, soPrivateRoutebounced to/loginandLoginPage's "already logged in" effect bounced to home — refresh on/agent/<id>always landed on/. Fix: only emit synchronously if a user is already resolved; otherwise wait for_fetchMeand let_emit()send the real value.E2E tests rewritten —
global-setup.jsnow walks the dev_stub flow and savesstorageState. CORS unit test updated to match the fixedallowed_origins. E2Eapi-keys.spec.jsrealigned with the new ApiKeysTab DOM (h5 not h6, no "Not configured" chip, no "About API Keys" alert, profile menu trimmed).Smaller features and fixes
Refresh redirect — see (5) above. Bonus: refresh on any deep route now stays on that route.
Stale namespace lists. HomePage and
DataStoreConfigAccordionfetched namespaces only on mount with[]deps. A namespace created in another tab/page didn't appear until hard refresh. Refetch onvisibilitychangeso coming back to a tab gives fresh data.webui readiness probe.
run-all-tests.shchecked for awebuicontainer indocker ps, but withdev-tail.shthe webui is vite on the host, not a container — check always failed and warned the stack was misconfigured. Replace with acurl localhost:3000probe.Paste agent code without generation. Agent Code accordion was gated on
hasCode; pasted-in code couldn't be saved without first running the LLM generator. Un-gate the accordion (default-expanded in creation flow or when code exists). Save validation reordered: code required first, description only required when code is absent (description is the prompt input for the generator; once code exists it's optional metadata).Sandbox shows agent's data store config. The agent page renders
DataStoreConfigAccordionwith configured namespaces + record counts; the sandbox page only hadSandboxDataPanel(ops from the most recent run), so users had no view of "what data does this agent have access to" until after running it. Add areadOnlyprop toDataStoreConfigAccordion(hides edit/add/delete) and render it on the sandbox page above the ops panel. Reverted in a follow-up — the additional pane cluttered the sandbox view.Profile menu cleanup. Profile menu had Basic Info / Usage / Billing / API Keys; only API Keys did anything. Drop the placeholders, collapse
ProfilePageto renderApiKeysTabdirectly. RestyleApiKeysTabto match other top-level pages (constrained max width, back-arrow + h5 title, single in-place TextField per row, "Configured" chip only when keyed, absence implies not configured).friendly_name → trace events. Plumb the agent's friendly name through
RunCodeRequestso the trace's per-eventagent_namereflects the actual agent (e.g.test_agent) instead of a placeholder. Frontend sources fromagentData.friendlyName / agentData.nameor the creation-flow context.Roadmap
The end-to-end flow has been manually validated against a real Bedrock-backed ASVS auditor agent doing tarball ingest, multi-step LLM analysis, and bursty GitHub pushes.