Skip to content

feat(agents): file navigation, web browsing, scratchpad tools, and write security guardrails#495

Merged
kovtcharov-amd merged 55 commits into
mainfrom
feature/chat-agent-file-navigation
May 8, 2026
Merged

feat(agents): file navigation, web browsing, scratchpad tools, and write security guardrails#495
kovtcharov-amd merged 55 commits into
mainfrom
feature/chat-agent-file-navigation

Conversation

@kovtcharov

@kovtcharov kovtcharov commented Mar 11, 2026

Copy link
Copy Markdown
Collaborator

Summary

Before: monolithic ChatAgent with a 13K-token system prompt and 22 tools caused 95s TTFT on local models. Write operations had zero security checks. Users had to manually find files, download web content, and do data analysis outside GAIA.

After: ChatAgent is split into 5 focused agents (chat, doc, file, data, web) with lean prompt profiles, plus a centralized write-guardrail layer and 3 new tool groups. TTFT drops from 95s to 0.12s (chat) / 3-10s (doc). Eval pass rate: 87-89% judged.

Agent split

Agent Profile Prompt size Tools Purpose
chat conversation only ~2K tokens none Fast greetings, general chat
doc RAG + file search ~5K tokens RAG, file search Document Q&A with hallucination prevention
file filesystem ops ~4K tokens browse, tree, find, read, bookmark File navigation and discovery
data scratchpad + CSV ~3K tokens create_table, insert, query, list, drop Multi-document structured analysis
web browser tools ~2K tokens fetch_page, search_web, download_file Web research and content extraction

Each has a -lite variant using a ~4B model for low-memory hardware. Per-scenario agent_type field in eval YAML routes scenarios to the right agent.

Security hardening

  • Write guardrails (src/gaia/security.py): blocked directories (incl. /var/log, /var/lib, /var/spool, /opt), sensitive file protection, size limits, overwrite prompts, timestamped backups, rotating audit log (10 MB x 3), symlink resolution
  • SSRF prevention (src/gaia/web/client.py): PinnedIPAdapter closes DNS-rebinding TOCTOU window, monotonic rate limiter, per-hop redirect validation, blocked ports, private-IP rejection
  • SQL injection defense (src/gaia/scratchpad/service.py): column DDL validation, PRAGMA/VACUUM/REINDEX blocked in queries, VACUUM on clear_all, per-call OOM guards
  • Download guardrail (browser_tools.py): post-download sensitive-filename check (.env, credentials.json, etc.) — deletes and blocks if matched
  • Per-instance tool registry: _snapshot_tools() prevents tool leakage across agent instances in the same process

Follow-up issues

Test plan

  • ~500 PR-specific unit tests pass (11 new test files, ~8K LOC)
  • Full unit suite passes, lint clean
  • Agent eval: 87-89% judged pass rate, 100% on personality/RAG/adversarial/web scenarios
  • All 10 critical CI checks pass
  • 2 remaining CodeQL alerts documented as false positives (EMR dashboard)

- Enhanced PathValidator with write guardrails: blocked system directories,
  sensitive file protection (.env, credentials, keys), size limits (10 MB),
  overwrite confirmation prompts, timestamped backups, and audit logging
- Fixed ChatAgent write_file (had zero security checks) and added edit_file tool
- Fixed CodeAgent generic write_file and edit_file (missing PathValidator)
- Added FileSystemToolsMixin: browse_directory, tree, find_files, file_info,
  read_file with smart type detection, bookmarks
- Added BrowserToolsMixin: fetch_page, search_web, download_file
- Added ScratchpadToolsMixin: SQLite-backed data analysis tables
- Added FileSystemIndexService: persistent file index with FTS5 full-text search
- Added WebClient: HTTP client with rate limiting and content extraction
- Integrated all new tools into ChatAgent with config toggles
- 95 unit tests for write guardrails (all passing)
@github-actions github-actions Bot added documentation Documentation changes dependencies Dependency updates devops DevOps/infrastructure changes agents tests Test changes security Security-sensitive changes labels Mar 11, 2026
Comment thread tests/unit/test_browser_tools.py Fixed
Comment thread tests/unit/test_browser_tools.py Fixed
Comment thread tests/unit/test_browser_tools.py Fixed
Comment thread tests/unit/test_browser_tools.py Fixed
@kovtcharov kovtcharov added this to the GAIA Agent UI - v0.17.0 milestone Mar 12, 2026
kovtcharov-amd
kovtcharov-amd previously approved these changes Mar 12, 2026
@kovtcharov kovtcharov changed the title Add chat agent file navigation and write security guardrails Enhance ChatAgent with file navigation, web browsing, scratchpad tools, and write security guardrails Mar 13, 2026
Fix black/isort formatting across all modified files to pass CI lint
checks. Address all 17 open CodeQL code scanning alerts:

Python: Add path traversal validation with realpath/symlink checks
(EMR server), sanitize API responses to strip stack traces, restrict
returned fields from clear_database endpoint, redact URLs in Jira
agent logs.

JavaScript: Add final path validation in eval webapp server, sanitize
redirect URLs to reject protocol-relative paths, add in-memory rate
limiters to docs server and dev server, remove identity replacement
no-op, add crossorigin attributes to CDN scripts, add HTML sanitizer
for XSS prevention in Jira webui, replace innerHTML with safe DOM
APIs for user messages.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions github-actions Bot added jira Jira agent changes eval Evaluation framework changes electron Electron app changes performance Performance-critical changes labels Mar 13, 2026
Comment thread src/gaia/apps/jira/webui/public/js/modules/chat-ui.js Fixed
Comment thread src/gaia/apps/jira/webui/public/js/modules/chat-ui.js Fixed
Comment thread docs/server.js Fixed
Comment thread src/gaia/apps/jira/webui/public/js/modules/chat-ui.js Fixed
Comment thread src/gaia/eval/webapp/public/index.html Fixed
Comment thread src/gaia/eval/webapp/public/index.html Fixed
Comment thread src/gaia/agents/emr/dashboard/server.py Fixed
Comment thread src/gaia/agents/emr/dashboard/server.py Fixed
Comment thread src/gaia/agents/emr/dashboard/server.py Fixed
Comment thread src/gaia/agents/emr/dashboard/server.py Fixed
Comment thread src/gaia/agents/jira/agent.py Fixed
Comment thread src/gaia/agents/emr/dashboard/server.py Fixed
@kovtcharov

Copy link
Copy Markdown
Collaborator Author

@claude review this PR.

@github-actions

github-actions Bot commented Mar 16, 2026

Copy link
Copy Markdown
Contributor

Claude finished @kovtcharov's task —— View job


PR Review Complete

  • Read PR diff and changed files
  • Review security module (src/gaia/security.py)
  • Review filesystem tools (src/gaia/agents/tools/filesystem_tools.py)
  • Review browser tools and web client
  • Review scratchpad service
  • Review ChatAgent integration
  • Review CodeAgent file_io.py security fixes
  • Check CodeQL security alerts
  • Provide comprehensive review feedback

Overall Assessment: ✅ Approve with Minor Suggestions

This is a well-designed, security-conscious PR that adds significant functionality to the ChatAgent. The write security guardrails, SSRF prevention, and audit logging are particularly well-implemented. Below are my detailed findings.


🟢 Strengths

Security Module (src/gaia/security.py)

  • Excellent blocked directory coverage for both Windows and Unix systems
  • Sensitive file protection (.env, keys, certificates) is comprehensive
  • Symlink resolution via os.path.realpath() prevents TOCTOU attacks
  • Fail-closed design: blocks writes when path validation fails
  • Audit logging provides valuable forensics capability

Web Client (src/gaia/web/client.py)

  • Robust SSRF prevention: validates scheme, port, and resolved IP addresses
  • Manual redirect following with validation at each hop prevents redirect-based SSRF
  • Per-domain rate limiting prevents accidental DoS
  • Content-Length validation before download prevents memory exhaustion
  • Path traversal protection in download() via save_path.resolve() check

Scratchpad Service (src/gaia/scratchpad/service.py)

  • Good SQL injection prevention via _sanitize_name()
  • Read-only query enforcement (SELECT only) is correctly implemented
  • Resource limits (100 tables, 1M rows) prevent runaway resource usage
  • WAL mode enables concurrent access

CodeAgent File I/O (src/gaia/agents/code/tools/file_io.py)

  • Fixed missing PathValidator checks on write_file and edit_file tools
  • Automatic backup creation before overwrites
  • Audit logging integrated

🟡 Minor Issues (Non-blocking)

1. CodeQL Alert: URL Substring Sanitization in Tests
tests/unit/test_browser_tools.py:155-162

The CodeQL alerts about URL substring sanitization are false positives in this context. These are unit tests validating rate limiter domain tracking, not security-sensitive URL validation. The test code:

self.client._rate_limit_wait("example.com")
assert "example.com" in self.client._domain_last_request

is simply checking that domains are correctly stored in the rate limit tracking dictionary. No action needed.

2. CodeQL Alert: DOM Text as HTML
src/gaia/apps/jira/webui/public/js/modules/chat-ui.js:51

The sanitizeHTML() function at line 49-64 already addresses this by removing dangerous elements (script, iframe, etc.) and stripping event handlers. However, consider adding data: and vbscript: to the URL scheme check at line 58:

// Current check:
if (attr.name === 'href' && attr.value.trimStart().toLowerCase().startsWith('javascript:'))

// Suggested enhancement:
const dangerousSchemes = ['javascript:', 'data:', 'vbscript:'];
if (attr.name === 'href' && dangerousSchemes.some(s => attr.value.trimStart().toLowerCase().startsWith(s)))

3. CodeQL Alert: Polynomial Regex
src/gaia/agents/emr/dashboard/server.py:76-84

The regex patterns in sanitize_error_response() could theoretically cause ReDoS with crafted input. However, these are applied to error messages (internal content), not user input. The risk is low since error messages come from Python exceptions. If you want to harden this, consider using simpler patterns or adding a length limit before regex processing.

4. Potential Enhancement: Browser Tools Path Validation
src/gaia/agents/tools/browser_tools.py:259-266

The download_file tool checks is_path_allowed but doesn't call validate_write() to check blocked directories. Consider:

# After path validation:
is_blocked, reason = mixin._path_validator.is_write_blocked(resolved_dir)
if is_blocked:
    return f"Error: {reason}"

🔴 Security Alerts from CodeQL (Already Addressed or Pre-existing)

The following alerts appear to be pre-existing issues or already addressed in this PR:

Alert File Status
Server-side URL redirect docs/server.js:295 Fixed in this PR - Added // prefix check
Clear-text password logging jira/agent.py:654 ⚠️ Pre-existing - Only logs URL path, not credentials
Uncontrolled path expression emr/dashboard/server.py:1659 Fixed in this PR - Added path validation
CDN integrity check eval/webapp/public/index.html:9-10 🟡 Pre-existing - Consider adding SRI hashes

📋 Test Coverage

The test coverage is excellent:

  • 11 new test files with ~8000 lines of tests
  • Comprehensive security edge case testing (symlinks, TOCTOU, path traversal)
  • SSRF prevention tests
  • SQL injection prevention tests

Summary

This PR significantly improves GAIA's security posture by:

  1. Adding comprehensive write guardrails
  2. Implementing robust SSRF prevention for web browsing
  3. Fixing missing PathValidator checks in CodeAgent tools
  4. Adding audit logging for all file mutations

The minor suggestions above are non-blocking. The code is ready to merge.


itomek
itomek previously approved these changes May 7, 2026
…profiles

Before: monolithic ChatAgent with 13K-token system prompt caused 95s TTFT
for a simple "Hi!" on Gemma-4-E4B. Eval scenarios timed out at 610s.

After: 5 focused agents (chat, doc, file, data, web) + lite variants,
each with a lean prompt profile. TTFT drops from 95s to 0.12s (chat)
and 3-10s (doc). Eval pass rate: 89% judged (34/38), avg score 9.4/10.

Agent architecture:
- chat: conversation only, ~2K tokens, no tools
- doc: RAG + file search, ~5K tokens, hallucination prevention
- file: filesystem ops + discovery, ~4K tokens
- data: CSV/Excel analysis with scratchpad, ~3K tokens
- web: browser tools, ~2K tokens
- Each has a -lite variant using ~4B model for low-memory hardware

Eval framework updates:
- Per-scenario agent_type field in YAML (overrides --agent-type CLI)
- Latency validation: warns when TTFT > 30s
- Preserve eval sessions for review (no delete_session)
- Increased startup timeout 120s → 240s for Windows
- Fixed shutil.which("claude") for Windows .cmd resolution
- Registry: chat-lite is now a first-class agent, not an alias
- ChatAgentConfig: enable_filesystem/scratchpad/browser default to False
- Eval timeout: startup overhead increased 120s → 240s for Windows
- Prompt tier test: FILE SEARCH AND AUTO-INDEX is filesystem-gated
itomek
itomek previously approved these changes May 7, 2026

@itomek itomek left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving. The PR has had multiple rounds of automated review with substantive iteration from the author — most recently the system-prompt gating fix, the edit_file size-cap enforcement, the bookmarks per-instance fix, and the dev-server rate-limiter eviction. The remaining items from the latest bot pass (inline import sys, EMR comment polish) are appropriate for follow-up rather than blocking this merge.

Net: the security work (write guardrails, SSRF defense-in-depth, SQL injection layering, FTS5 self-healing index) is solid, test coverage is real, and the feature surface is gated correctly behind config flags. Ready to land.


Generated by Claude Code

@kovtcharov-amd

Copy link
Copy Markdown
Collaborator

Posting some results from eval testing below:
image

Ovtcharov added 2 commits May 7, 2026 14:42
subprocess.run(["claude"]) throws FileNotFoundError on Windows when
the binary isn't on PATH. Wrap in try/except so preflight_check
returns an error message instead of crashing.

Also fixes agents router: cr.provider → cr.connector_id and handles
plain strings in required_connections (connectors-demo regression).
…le-navigation

# Conflicts:
#	src/gaia/agents/registry.py
#	src/gaia/apps/webui/package-lock.json
#	src/gaia/ui/routers/agents.py
@kovtcharov-amd

Copy link
Copy Markdown
Collaborator

@claude review this PR.

@github-actions

github-actions Bot commented May 7, 2026

Copy link
Copy Markdown
Contributor

PR Review — #495

Strong PR overall. The PathValidator write guardrails (src/gaia/security.py), SQL-injection defenses in ScratchpadService, and the macOS /private symlink handling are all well-thought-out, and the test coverage (~8000 lines across 11 files) is genuinely impressive. A few items worth tightening before merge:

🟡 Important

1. SSRF — DNS rebinding bypass in WebClient._validate_host_ip (src/gaia/web/client.py:132-156)

validate_url() resolves the hostname via socket.getaddrinfo(), then requests.Session.request() performs a second independent DNS lookup at fetch time. A hostile DNS server returning a public IP on the first query and a private IP (e.g., 169.254.169.254 AWS metadata, 127.0.0.1) on the second bypasses every SSRF check.

The standard fix is to resolve once, validate the IP, then connect to the IP directly with the original Host header preserved — or pin socket.getaddrinfo results via a custom requests.adapters.HTTPAdapter. Worth fixing now since this surface also handles user-controllable redirect URLs.

2. download_file skips the sensitive-filename guardrail (src/gaia/agents/tools/browser_tools.py:259-275, src/gaia/web/client.py:485-609)

The download path checks is_path_allowed and is_write_blocked on the directory but never passes the final resolved file path through validate_write. So an agent that downloads to ~/Downloads/credentials.json or ~/Downloads/id_rsa succeeds, even though write_file blocks the same destination. Suggested fix: after _sanitize_filename, run the resolved save_path through path_validator.is_write_blocked (which already covers SENSITIVE_FILE_NAMES + SENSITIVE_EXTENSIONS).

3. Blocked-directories list misses common Linux write targets (src/gaia/security.py:107-128)

Only /var/run is in the Unix blocklist — but /var/log, /var/lib, /var/spool, and /opt are equally problematic for an agent to scribble into. Consider expanding, or at minimum adding /var as a parent guard with explicit allowlist for safe subpaths.

🟢 Minor

4. Inconsistent path_validator access in file_io.py

read_file/write_python_file use self.path_validator.is_path_allowed(...) directly (src/gaia/agents/code/tools/file_io.py:54, 224, 269), while the new write_file/edit_file use getattr(self, "path_validator", None) and gracefully no-op when missing (lines 563, 655). Pick one — silent degradation hides bugs (and the no-validator path would violate the "no silent fallback" rule in CLAUDE.md).

5. WebClient._consume_body_capped mutates private requests attrs (src/gaia/web/client.py:285-308)

Setting response._content / response._content_consumed is the right idea for capping decompression bombs, but it depends on requests internals that have changed across versions. Worth a unit test that reads response.text after the cap and asserts truncation, plus a comment that this needs revisiting when bumping requests.

6. ScratchpadService.clear_all drops tables but doesn't reclaim disk

After DROP TABLE, SQLite holds the pages until VACUUM. With MAX_TOTAL_SIZE_BYTES = 100MB enforced via the get_size_bytes() heuristic (src/gaia/scratchpad/service.py:349-369), a heavy create/drop cycle can leave the DB at 95 MB on disk while reporting 0 bytes — eventually breaking writes against the disk-size cap rather than the row cap. Either run VACUUM after clear_all, or document that the limit is row-based by design.

✅ Things that are clearly right

  • _normalize_macos_symlinks (security.py:141) — solves a real bypass that's easy to miss
  • is_path_allowed appending os.sep to defeat the /home/user/project vs /home/user/project-secrets prefix attack (security.py:303-313)
  • Manual redirect following with per-hop SSRF re-validation in _request (client.py:204-272)
  • _validate_columns with hard-deny on ;/--//* plus type-root allowlist — robust SQL DDL validation without a full parser (scratchpad/service.py:421-512)
  • _strip_sql_string_literals for keyword scanning (avoids false positives on 'UPDATE PENDING')
  • Fail-loud audit logging for write/edit operations
  • _sanitize_filename covering Windows reserved device names + leading-dot + length cap

Suggested next steps

  1. Fix SSRF DNS rebinding (Update installer and workflows/actions for CI/CD #1) before merge — this is the only finding I'd consider blocking.
  2. Add the sensitive-filename check to download_file (Use public Lemonade hybrid installer #2).
  3. Reconcile the path_validator access pattern in file_io.py (Documentation update from v0.7.2 tag. #4).
  4. Items Update Driver Check #3, Update Documentation #5, Prevent Users from Installing Hybrid mode on Unsupported Systems #6 can be follow-ups if the user wants to keep this PR focused.

Tagging @kovtcharov-amd for the SSRF item — DNS rebinding is a known-tricky bypass and worth a maintainer's eye on the fix.

…le-navigation

Resolve add/add conflict in src/gaia/web/client.py: keep our
WebClient (extensively tested via agent eval) and integrate
PinnedIPAdapter from PR #979 as a DNS-rebind TOCTOU guard
mounted on the WebClient session.
@kovtcharov-amd kovtcharov-amd changed the title Enhance ChatAgent with file navigation, web browsing, scratchpad tools, and write security guardrails feat(chat): file navigation, web browsing, scratchpad tools, and write security guardrails May 7, 2026
@kovtcharov-amd kovtcharov-amd changed the title feat(chat): file navigation, web browsing, scratchpad tools, and write security guardrails feat(agents): file navigation, web browsing, scratchpad tools, and write security guardrails May 7, 2026
Ovtcharov added 5 commits May 7, 2026 16:55
Four must-fix bugs from the PR #495 code review:

1. Tool registry isolation: _TOOL_REGISTRY.pop()/.clear() corrupted
   tools for other agents in the same process. Replace with per-instance
   _snapshot_tools() that copies the global registry into self._instance_tools.
   All tool lookup methods (format, execute, resolve, schemas) now use
   self._tools_registry property which prefers the snapshot.

2. Prompt section gating: getattr(config, "enable_*", True) used True as
   fallback, injecting tool sections into prompts even when tools weren't
   registered. Fixed to check profile membership OR explicit enable flag
   with False fallback.

3. Bookmark isolation: _bookmarks class-level dict shared across all
   instances. Changed to None sentinel with per-instance init in
   register_filesystem_tools().

4. OAuth token expiry: added get_token_with_expiry() returning (str, float)
   for callers that need the wall-clock expiry alongside the access token.
The MCP tool counting block at the end of _register_tools uses
_TOOL_REGISTRY directly to measure how many tools the @tool decorator
added. The previous commit removed the local import when deleting the
.pop() loop, causing a Pylint E0602 (undefined-variable) in CI.
…miter

- PinnedIPAdapter resolves DNS once and pins the connection to that IP,
  closing the DNS rebinding SSRF bypass identified in the #495 review
- Rate limiter switched from time.time() to time.monotonic() so NTP
  adjustments cannot disable throttling
…ardening

- browser_tools: validate final download file path (not just directory)
  to prevent path traversal via server-controlled Content-Disposition
- scratchpad: use PRAGMA page_count * page_size for accurate DB size
  instead of row-count estimate; fail loudly on size check errors
- security: backup timestamps include milliseconds to avoid collisions
- dev-server: improved error handling
@kovtcharov-amd kovtcharov-amd added this pull request to the merge queue May 8, 2026
Merged via the queue into main with commit d25d933 May 8, 2026
65 checks passed
@kovtcharov-amd kovtcharov-amd deleted the feature/chat-agent-file-navigation branch May 8, 2026 01:17
@itomek itomek mentioned this pull request May 14, 2026
6 tasks
pull Bot pushed a commit to bhardwajRahul/gaia that referenced this pull request May 15, 2026
## Why this matters

v0.18.0 ships agent memory v2 (hybrid-search second brain with LLM
extraction and observability dashboard), ChatAgent split into three
composable agents (Chat/FileIO/DocumentQA), parallel tool calls, and a
Telegram adapter scaffold — plus fixes the RAG-on-PDF timeout with Gemma
4 that broke document Q&A since v0.17.6 and adds CI gates that enforce
RAG quality baselines on every future PR.

Full notes: `docs/releases/v0.18.0.mdx`.

## What's New

- **Agent memory v2** ([amd#606](amd#606)) —
Hybrid semantic + keyword search, LLM extraction, observability
dashboard via SSE streaming
([amd#1032](amd#1032)). Per-user isolation
enforced; extraction runs async so it doesn't add latency.
- **ChatAgent split** ([amd#979](amd#979)) —
`ChatAgent`, `FileIOAgent`, and `DocumentQAAgent` replace the monolithic
class; each composable via `tools=`. Backward-compatible shim preserved.
- **Parallel tool calls** ([amd#946](amd#946))
— Multiple `tool_calls` from a single LLM turn are executed
concurrently, cutting round-trips for multi-tool workflows.
- **Telegram adapter scaffold, Phase 0**
([amd#951](amd#951)) — `gaia telegram
start|stop|status`, per-user session isolation, `[telegram]` extras.
Phase 1 (message handling + allowed-users gate) tracked in
[amd#889](amd#889).
- **Connectors: per-MCP toggle + single-writer enforcement**
([amd#1018](amd#1018),
[amd#998](amd#998)) — Disable individual MCP
servers without removing them; concurrent writes serialised with
actionable errors on contention.
- **File navigation, web browsing, and write security**
([amd#495](amd#495)) — `FileSearchToolsMixin`,
web browsing tool, and scratchpad mixin in `KNOWN_TOOLS`; write tools
check `allowed_paths` before dispatch.
- **Email UI and policy alerts**
([amd#995](amd#995),
[amd#1039](amd#1039),
[amd#952](amd#952)) — Pre-scan triage card,
in-chat Connect, policy alert cards, and durable receipts for
confirmation-gated actions.

## Bug Fixes

- **RAG-on-PDF timeouts on Gemma 4**
([amd#1034](amd#1034), closes
[amd#1030](amd#1030)) — Prompt-size budget
check added at composition time; CI gates enforce it on every PR
([amd#1040](amd#1040)).
- **Envelope-level parse failure crashed SD recovery**
([amd#1047](amd#1047), closes
[amd#1023](amd#1023)) — Falls through to a
clean recovery path with step-1 context preserved.
- **Windows-path tool args corrupted**
([amd#1027](amd#1027)) — Backslash
normalisation now happens after argument parsing.
- **Blender `send_command` hung**
([amd#1026](amd#1026), closes
[amd#1022](amd#1022)) — Read timeout applied
to persistent-connection servers.
- **`gaia chat init` in post-install banner**
([amd#1029](amd#1029), closes
[amd#1024](amd#1024)) — Replaced with the
correct `gaia init`.
- **Keyring treated as required**
([amd#1028](amd#1028)) — Import guarded;
optional on systems without `keyring`.
- **electron-builder URLs stale**
([amd#953](amd#953)) — Three doc/installer
files updated to current download paths.

## Tooling & Docs

- **RAG eval CI gates** ([amd#1040](amd#1040),
closes [amd#1033](amd#1033)) — RAG quality
baselines + prompt-size budget enforced on every PR.
- **Fork-PR authors now receive Claude review**
([amd#932](amd#932)) —
`allowed_non_write_users: "*"` with prompt-injection mitigations
documented.
- **Eval runs mandated before merging**
([amd#1036](amd#1036)) — `CLAUDE.md` requires
`gaia eval agent` for LLM-affecting changes.
- **GAIA website** ([amd#369](amd#369)) —
[amd-gaia.ai](https://amd-gaia.ai) live.
- **Custom agent guide reorganised**
([amd#997](amd#997)), Lemonade PPA docs
([amd#801](amd#801)), broken Lemonade CLI URL
fixed ([amd#996](amd#996)), WhatsApp adapter
evaluation spec ([amd#950](amd#950)).

## Release checklist

- [x] `util/validate_release_notes.py docs/releases/v0.18.0.mdx --tag
v0.18.0` passes
- [x] `src/gaia/version.py` → `0.18.0`
- [x] `src/gaia/apps/webui/package.json` → `0.18.0`
- [x] Navbar label in `docs/docs.json` → `v0.18.0 · Lemonade 10.2.0`
- [x] All 28 commits in range (v0.17.6..HEAD) are represented in the
notes
- [ ] Review from @kovtcharov-amd addressed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents dependencies Dependency updates devops DevOps/infrastructure changes documentation Documentation changes electron Electron app changes eval Evaluation framework changes jira Jira agent changes performance Performance-critical changes security Security-sensitive changes tests Test changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants