Commit d25d933
feat(agents): file navigation, web browsing, scratchpad tools, and write security guardrails (#495)
## Summary
Before: monolithic ChatAgent with a 13K-token system prompt and 22 tools
caused 95s TTFT on local models. Write operations had zero security
checks. Users had to manually find files, download web content, and do
data analysis outside GAIA.
After: ChatAgent is split into 5 focused agents (chat, doc, file, data,
web) with lean prompt profiles, plus a centralized write-guardrail layer
and 3 new tool groups. TTFT drops from 95s to 0.12s (chat) / 3-10s
(doc). Eval pass rate: 87-89% judged.
### Agent split
| Agent | Profile | Prompt size | Tools | Purpose |
|-------|---------|-------------|-------|---------|
| `chat` | conversation only | ~2K tokens | none | Fast greetings,
general chat |
| `doc` | RAG + file search | ~5K tokens | RAG, file search | Document
Q&A with hallucination prevention |
| `file` | filesystem ops | ~4K tokens | browse, tree, find, read,
bookmark | File navigation and discovery |
| `data` | scratchpad + CSV | ~3K tokens | create_table, insert, query,
list, drop | Multi-document structured analysis |
| `web` | browser tools | ~2K tokens | fetch_page, search_web,
download_file | Web research and content extraction |
Each has a `-lite` variant using a ~4B model for low-memory hardware.
Per-scenario `agent_type` field in eval YAML routes scenarios to the
right agent.
### Security hardening
- **Write guardrails** (`src/gaia/security.py`): blocked directories
(incl. `/var/log`, `/var/lib`, `/var/spool`, `/opt`), sensitive file
protection, size limits, overwrite prompts, timestamped backups,
rotating audit log (10 MB x 3), symlink resolution
- **SSRF prevention** (`src/gaia/web/client.py`): `PinnedIPAdapter`
closes DNS-rebinding TOCTOU window, monotonic rate limiter, per-hop
redirect validation, blocked ports, private-IP rejection
- **SQL injection defense** (`src/gaia/scratchpad/service.py`): column
DDL validation, `PRAGMA`/`VACUUM`/`REINDEX` blocked in queries, VACUUM
on clear_all, per-call OOM guards
- **Download guardrail** (`browser_tools.py`): post-download
sensitive-filename check (`.env`, `credentials.json`, etc.) — deletes
and blocks if matched
- **Per-instance tool registry**: `_snapshot_tools()` prevents tool
leakage across agent instances in the same process
### Follow-up issues
- #972 — Trim pre-existing system prompt bloat (~6K tokens)
- #955 — CodeAgent write tools missing blocklist guardrails
## Test plan
- [x] ~500 PR-specific unit tests pass (11 new test files, ~8K LOC)
- [x] Full unit suite passes, lint clean
- [x] Agent eval: 87-89% judged pass rate, 100% on
personality/RAG/adversarial/web scenarios
- [x] All 10 critical CI checks pass
- [x] 2 remaining CodeQL alerts documented as false positives (EMR
dashboard)
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Ovtcharov <kovtchar@amd.com>1 parent 7fadc3f commit d25d933
106 files changed
Lines changed: 18574 additions & 506 deletions
File tree
- .github/workflows
- docs
- spec
- eval/scenarios
- adversarial
- captured
- context_retention
- error_recovery
- personality
- rag_quality
- real_world
- tool_selection
- vision
- web_system
- src/gaia
- agents
- base
- builder
- chat
- code/tools
- connectors_demo
- emr/dashboard
- jira
- tools
- apps
- jira/webui/public
- js/modules
- webui
- connectors
- eval
- filesystem
- scratchpad
- web
- tests
- electron
- unit
- agents
- connectors
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
58 | 58 | | |
59 | 59 | | |
60 | 60 | | |
61 | | - | |
| 61 | + | |
62 | 62 | | |
63 | 63 | | |
64 | 64 | | |
65 | | - | |
| 65 | + | |
66 | 66 | | |
67 | 67 | | |
68 | 68 | | |
| |||
140 | 140 | | |
141 | 141 | | |
142 | 142 | | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
143 | 154 | | |
144 | 155 | | |
145 | 156 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
287 | 287 | | |
288 | 288 | | |
289 | 289 | | |
290 | | - | |
| 290 | + | |
| 291 | + | |
| 292 | + | |
291 | 293 | | |
292 | 294 | | |
293 | 295 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
271 | 271 | | |
272 | 272 | | |
273 | 273 | | |
| 274 | + | |
| 275 | + | |
| 276 | + | |
| 277 | + | |
| 278 | + | |
| 279 | + | |
| 280 | + | |
| 281 | + | |
| 282 | + | |
| 283 | + | |
| 284 | + | |
| 285 | + | |
| 286 | + | |
| 287 | + | |
| 288 | + | |
| 289 | + | |
| 290 | + | |
| 291 | + | |
| 292 | + | |
| 293 | + | |
| 294 | + | |
| 295 | + | |
| 296 | + | |
| 297 | + | |
| 298 | + | |
| 299 | + | |
| 300 | + | |
274 | 301 | | |
275 | | - | |
| 302 | + | |
276 | 303 | | |
277 | 304 | | |
278 | 305 | | |
| |||
285 | 312 | | |
286 | 313 | | |
287 | 314 | | |
288 | | - | |
| 315 | + | |
| 316 | + | |
| 317 | + | |
| 318 | + | |
| 319 | + | |
| 320 | + | |
| 321 | + | |
| 322 | + | |
| 323 | + | |
| 324 | + | |
289 | 325 | | |
290 | 326 | | |
291 | | - | |
292 | | - | |
293 | | - | |
294 | | - | |
295 | | - | |
296 | | - | |
| 327 | + | |
| 328 | + | |
| 329 | + | |
| 330 | + | |
| 331 | + | |
| 332 | + | |
| 333 | + | |
| 334 | + | |
| 335 | + | |
| 336 | + | |
| 337 | + | |
297 | 338 | | |
298 | 339 | | |
299 | 340 | | |
| |||
303 | 344 | | |
304 | 345 | | |
305 | 346 | | |
306 | | - | |
| 347 | + | |
307 | 348 | | |
308 | 349 | | |
309 | 350 | | |
| |||
312 | 353 | | |
313 | 354 | | |
314 | 355 | | |
315 | | - | |
| 356 | + | |
316 | 357 | | |
317 | 358 | | |
318 | 359 | | |
319 | 360 | | |
| 361 | + | |
| 362 | + | |
| 363 | + | |
320 | 364 | | |
321 | 365 | | |
322 | 366 | | |
| |||
0 commit comments