feat(agents): file navigation, web browsing, scratchpad tools, and write security guardrails#495
Conversation
- Enhanced PathValidator with write guardrails: blocked system directories, sensitive file protection (.env, credentials, keys), size limits (10 MB), overwrite confirmation prompts, timestamped backups, and audit logging - Fixed ChatAgent write_file (had zero security checks) and added edit_file tool - Fixed CodeAgent generic write_file and edit_file (missing PathValidator) - Added FileSystemToolsMixin: browse_directory, tree, find_files, file_info, read_file with smart type detection, bookmarks - Added BrowserToolsMixin: fetch_page, search_web, download_file - Added ScratchpadToolsMixin: SQLite-backed data analysis tables - Added FileSystemIndexService: persistent file index with FTS5 full-text search - Added WebClient: HTTP client with rate limiting and content extraction - Integrated all new tools into ChatAgent with config toggles - 95 unit tests for write guardrails (all passing)
Fix black/isort formatting across all modified files to pass CI lint checks. Address all 17 open CodeQL code scanning alerts: Python: Add path traversal validation with realpath/symlink checks (EMR server), sanitize API responses to strip stack traces, restrict returned fields from clear_database endpoint, redact URLs in Jira agent logs. JavaScript: Add final path validation in eval webapp server, sanitize redirect URLs to reject protocol-relative paths, add in-memory rate limiters to docs server and dev server, remove identity replacement no-op, add crossorigin attributes to CDN scripts, add HTML sanitizer for XSS prevention in Jira webui, replace innerHTML with safe DOM APIs for user messages. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
@claude review this PR. |
|
Claude finished @kovtcharov's task —— View job PR Review Complete
Overall Assessment: ✅ Approve with Minor SuggestionsThis is a well-designed, security-conscious PR that adds significant functionality to the ChatAgent. The write security guardrails, SSRF prevention, and audit logging are particularly well-implemented. Below are my detailed findings. 🟢 StrengthsSecurity Module (
Web Client (
Scratchpad Service (
CodeAgent File I/O (
🟡 Minor Issues (Non-blocking)1. CodeQL Alert: URL Substring Sanitization in Tests The CodeQL alerts about URL substring sanitization are false positives in this context. These are unit tests validating rate limiter domain tracking, not security-sensitive URL validation. The test code: self.client._rate_limit_wait("example.com")
assert "example.com" in self.client._domain_last_requestis simply checking that domains are correctly stored in the rate limit tracking dictionary. No action needed. 2. CodeQL Alert: DOM Text as HTML The // Current check:
if (attr.name === 'href' && attr.value.trimStart().toLowerCase().startsWith('javascript:'))
// Suggested enhancement:
const dangerousSchemes = ['javascript:', 'data:', 'vbscript:'];
if (attr.name === 'href' && dangerousSchemes.some(s => attr.value.trimStart().toLowerCase().startsWith(s)))3. CodeQL Alert: Polynomial Regex The regex patterns in 4. Potential Enhancement: Browser Tools Path Validation The # After path validation:
is_blocked, reason = mixin._path_validator.is_write_blocked(resolved_dir)
if is_blocked:
return f"Error: {reason}"🔴 Security Alerts from CodeQL (Already Addressed or Pre-existing)The following alerts appear to be pre-existing issues or already addressed in this PR:
📋 Test CoverageThe test coverage is excellent:
SummaryThis PR significantly improves GAIA's security posture by:
The minor suggestions above are non-blocking. The code is ready to merge. |
…profiles
Before: monolithic ChatAgent with 13K-token system prompt caused 95s TTFT
for a simple "Hi!" on Gemma-4-E4B. Eval scenarios timed out at 610s.
After: 5 focused agents (chat, doc, file, data, web) + lite variants,
each with a lean prompt profile. TTFT drops from 95s to 0.12s (chat)
and 3-10s (doc). Eval pass rate: 89% judged (34/38), avg score 9.4/10.
Agent architecture:
- chat: conversation only, ~2K tokens, no tools
- doc: RAG + file search, ~5K tokens, hallucination prevention
- file: filesystem ops + discovery, ~4K tokens
- data: CSV/Excel analysis with scratchpad, ~3K tokens
- web: browser tools, ~2K tokens
- Each has a -lite variant using ~4B model for low-memory hardware
Eval framework updates:
- Per-scenario agent_type field in YAML (overrides --agent-type CLI)
- Latency validation: warns when TTFT > 30s
- Preserve eval sessions for review (no delete_session)
- Increased startup timeout 120s → 240s for Windows
- Fixed shutil.which("claude") for Windows .cmd resolution
- Registry: chat-lite is now a first-class agent, not an alias - ChatAgentConfig: enable_filesystem/scratchpad/browser default to False - Eval timeout: startup overhead increased 120s → 240s for Windows - Prompt tier test: FILE SEARCH AND AUTO-INDEX is filesystem-gated
itomek
left a comment
There was a problem hiding this comment.
Approving. The PR has had multiple rounds of automated review with substantive iteration from the author — most recently the system-prompt gating fix, the edit_file size-cap enforcement, the bookmarks per-instance fix, and the dev-server rate-limiter eviction. The remaining items from the latest bot pass (inline import sys, EMR comment polish) are appropriate for follow-up rather than blocking this merge.
Net: the security work (write guardrails, SSRF defense-in-depth, SQL injection layering, FTS5 self-healing index) is solid, test coverage is real, and the feature surface is gated correctly behind config flags. Ready to land.
Generated by Claude Code
subprocess.run(["claude"]) throws FileNotFoundError on Windows when the binary isn't on PATH. Wrap in try/except so preflight_check returns an error message instead of crashing. Also fixes agents router: cr.provider → cr.connector_id and handles plain strings in required_connections (connectors-demo regression).
…le-navigation # Conflicts: # src/gaia/agents/registry.py # src/gaia/apps/webui/package-lock.json # src/gaia/ui/routers/agents.py
|
@claude review this PR. |
PR Review — #495Strong PR overall. The 🟡 Important1. SSRF — DNS rebinding bypass in
The standard fix is to resolve once, validate the IP, then connect to the IP directly with the original 2. The download path checks 3. Blocked-directories list misses common Linux write targets ( Only 🟢 Minor4. Inconsistent
5. Setting 6. After ✅ Things that are clearly right
Suggested next steps
Tagging @kovtcharov-amd for the SSRF item — DNS rebinding is a known-tricky bypass and worth a maintainer's eye on the fix. |
…le-navigation Resolve add/add conflict in src/gaia/web/client.py: keep our WebClient (extensively tested via agent eval) and integrate PinnedIPAdapter from PR #979 as a DNS-rebind TOCTOU guard mounted on the WebClient session.
Four must-fix bugs from the PR #495 code review: 1. Tool registry isolation: _TOOL_REGISTRY.pop()/.clear() corrupted tools for other agents in the same process. Replace with per-instance _snapshot_tools() that copies the global registry into self._instance_tools. All tool lookup methods (format, execute, resolve, schemas) now use self._tools_registry property which prefers the snapshot. 2. Prompt section gating: getattr(config, "enable_*", True) used True as fallback, injecting tool sections into prompts even when tools weren't registered. Fixed to check profile membership OR explicit enable flag with False fallback. 3. Bookmark isolation: _bookmarks class-level dict shared across all instances. Changed to None sentinel with per-instance init in register_filesystem_tools(). 4. OAuth token expiry: added get_token_with_expiry() returning (str, float) for callers that need the wall-clock expiry alongside the access token.
The MCP tool counting block at the end of _register_tools uses _TOOL_REGISTRY directly to measure how many tools the @tool decorator added. The previous commit removed the local import when deleting the .pop() loop, causing a Pylint E0602 (undefined-variable) in CI.
…miter - PinnedIPAdapter resolves DNS once and pins the connection to that IP, closing the DNS rebinding SSRF bypass identified in the #495 review - Rate limiter switched from time.time() to time.monotonic() so NTP adjustments cannot disable throttling
…ardening - browser_tools: validate final download file path (not just directory) to prevent path traversal via server-controlled Content-Disposition - scratchpad: use PRAGMA page_count * page_size for accurate DB size instead of row-count estimate; fail loudly on size check errors - security: backup timestamps include milliseconds to avoid collisions - dev-server: improved error handling
## Why this matters v0.18.0 ships agent memory v2 (hybrid-search second brain with LLM extraction and observability dashboard), ChatAgent split into three composable agents (Chat/FileIO/DocumentQA), parallel tool calls, and a Telegram adapter scaffold — plus fixes the RAG-on-PDF timeout with Gemma 4 that broke document Q&A since v0.17.6 and adds CI gates that enforce RAG quality baselines on every future PR. Full notes: `docs/releases/v0.18.0.mdx`. ## What's New - **Agent memory v2** ([amd#606](amd#606)) — Hybrid semantic + keyword search, LLM extraction, observability dashboard via SSE streaming ([amd#1032](amd#1032)). Per-user isolation enforced; extraction runs async so it doesn't add latency. - **ChatAgent split** ([amd#979](amd#979)) — `ChatAgent`, `FileIOAgent`, and `DocumentQAAgent` replace the monolithic class; each composable via `tools=`. Backward-compatible shim preserved. - **Parallel tool calls** ([amd#946](amd#946)) — Multiple `tool_calls` from a single LLM turn are executed concurrently, cutting round-trips for multi-tool workflows. - **Telegram adapter scaffold, Phase 0** ([amd#951](amd#951)) — `gaia telegram start|stop|status`, per-user session isolation, `[telegram]` extras. Phase 1 (message handling + allowed-users gate) tracked in [amd#889](amd#889). - **Connectors: per-MCP toggle + single-writer enforcement** ([amd#1018](amd#1018), [amd#998](amd#998)) — Disable individual MCP servers without removing them; concurrent writes serialised with actionable errors on contention. - **File navigation, web browsing, and write security** ([amd#495](amd#495)) — `FileSearchToolsMixin`, web browsing tool, and scratchpad mixin in `KNOWN_TOOLS`; write tools check `allowed_paths` before dispatch. - **Email UI and policy alerts** ([amd#995](amd#995), [amd#1039](amd#1039), [amd#952](amd#952)) — Pre-scan triage card, in-chat Connect, policy alert cards, and durable receipts for confirmation-gated actions. ## Bug Fixes - **RAG-on-PDF timeouts on Gemma 4** ([amd#1034](amd#1034), closes [amd#1030](amd#1030)) — Prompt-size budget check added at composition time; CI gates enforce it on every PR ([amd#1040](amd#1040)). - **Envelope-level parse failure crashed SD recovery** ([amd#1047](amd#1047), closes [amd#1023](amd#1023)) — Falls through to a clean recovery path with step-1 context preserved. - **Windows-path tool args corrupted** ([amd#1027](amd#1027)) — Backslash normalisation now happens after argument parsing. - **Blender `send_command` hung** ([amd#1026](amd#1026), closes [amd#1022](amd#1022)) — Read timeout applied to persistent-connection servers. - **`gaia chat init` in post-install banner** ([amd#1029](amd#1029), closes [amd#1024](amd#1024)) — Replaced with the correct `gaia init`. - **Keyring treated as required** ([amd#1028](amd#1028)) — Import guarded; optional on systems without `keyring`. - **electron-builder URLs stale** ([amd#953](amd#953)) — Three doc/installer files updated to current download paths. ## Tooling & Docs - **RAG eval CI gates** ([amd#1040](amd#1040), closes [amd#1033](amd#1033)) — RAG quality baselines + prompt-size budget enforced on every PR. - **Fork-PR authors now receive Claude review** ([amd#932](amd#932)) — `allowed_non_write_users: "*"` with prompt-injection mitigations documented. - **Eval runs mandated before merging** ([amd#1036](amd#1036)) — `CLAUDE.md` requires `gaia eval agent` for LLM-affecting changes. - **GAIA website** ([amd#369](amd#369)) — [amd-gaia.ai](https://amd-gaia.ai) live. - **Custom agent guide reorganised** ([amd#997](amd#997)), Lemonade PPA docs ([amd#801](amd#801)), broken Lemonade CLI URL fixed ([amd#996](amd#996)), WhatsApp adapter evaluation spec ([amd#950](amd#950)). ## Release checklist - [x] `util/validate_release_notes.py docs/releases/v0.18.0.mdx --tag v0.18.0` passes - [x] `src/gaia/version.py` → `0.18.0` - [x] `src/gaia/apps/webui/package.json` → `0.18.0` - [x] Navbar label in `docs/docs.json` → `v0.18.0 · Lemonade 10.2.0` - [x] All 28 commits in range (v0.17.6..HEAD) are represented in the notes - [ ] Review from @kovtcharov-amd addressed

Summary
Before: monolithic ChatAgent with a 13K-token system prompt and 22 tools caused 95s TTFT on local models. Write operations had zero security checks. Users had to manually find files, download web content, and do data analysis outside GAIA.
After: ChatAgent is split into 5 focused agents (chat, doc, file, data, web) with lean prompt profiles, plus a centralized write-guardrail layer and 3 new tool groups. TTFT drops from 95s to 0.12s (chat) / 3-10s (doc). Eval pass rate: 87-89% judged.
Agent split
chatdocfiledatawebEach has a
-litevariant using a ~4B model for low-memory hardware. Per-scenarioagent_typefield in eval YAML routes scenarios to the right agent.Security hardening
src/gaia/security.py): blocked directories (incl./var/log,/var/lib,/var/spool,/opt), sensitive file protection, size limits, overwrite prompts, timestamped backups, rotating audit log (10 MB x 3), symlink resolutionsrc/gaia/web/client.py):PinnedIPAdaptercloses DNS-rebinding TOCTOU window, monotonic rate limiter, per-hop redirect validation, blocked ports, private-IP rejectionsrc/gaia/scratchpad/service.py): column DDL validation,PRAGMA/VACUUM/REINDEXblocked in queries, VACUUM on clear_all, per-call OOM guardsbrowser_tools.py): post-download sensitive-filename check (.env,credentials.json, etc.) — deletes and blocks if matched_snapshot_tools()prevents tool leakage across agent instances in the same processFollow-up issues
Test plan