Skip to content

Commit d25d933

Browse files
kovtcharovclaudeOvtcharov
authored
feat(agents): file navigation, web browsing, scratchpad tools, and write security guardrails (#495)
## Summary Before: monolithic ChatAgent with a 13K-token system prompt and 22 tools caused 95s TTFT on local models. Write operations had zero security checks. Users had to manually find files, download web content, and do data analysis outside GAIA. After: ChatAgent is split into 5 focused agents (chat, doc, file, data, web) with lean prompt profiles, plus a centralized write-guardrail layer and 3 new tool groups. TTFT drops from 95s to 0.12s (chat) / 3-10s (doc). Eval pass rate: 87-89% judged. ### Agent split | Agent | Profile | Prompt size | Tools | Purpose | |-------|---------|-------------|-------|---------| | `chat` | conversation only | ~2K tokens | none | Fast greetings, general chat | | `doc` | RAG + file search | ~5K tokens | RAG, file search | Document Q&A with hallucination prevention | | `file` | filesystem ops | ~4K tokens | browse, tree, find, read, bookmark | File navigation and discovery | | `data` | scratchpad + CSV | ~3K tokens | create_table, insert, query, list, drop | Multi-document structured analysis | | `web` | browser tools | ~2K tokens | fetch_page, search_web, download_file | Web research and content extraction | Each has a `-lite` variant using a ~4B model for low-memory hardware. Per-scenario `agent_type` field in eval YAML routes scenarios to the right agent. ### Security hardening - **Write guardrails** (`src/gaia/security.py`): blocked directories (incl. `/var/log`, `/var/lib`, `/var/spool`, `/opt`), sensitive file protection, size limits, overwrite prompts, timestamped backups, rotating audit log (10 MB x 3), symlink resolution - **SSRF prevention** (`src/gaia/web/client.py`): `PinnedIPAdapter` closes DNS-rebinding TOCTOU window, monotonic rate limiter, per-hop redirect validation, blocked ports, private-IP rejection - **SQL injection defense** (`src/gaia/scratchpad/service.py`): column DDL validation, `PRAGMA`/`VACUUM`/`REINDEX` blocked in queries, VACUUM on clear_all, per-call OOM guards - **Download guardrail** (`browser_tools.py`): post-download sensitive-filename check (`.env`, `credentials.json`, etc.) — deletes and blocks if matched - **Per-instance tool registry**: `_snapshot_tools()` prevents tool leakage across agent instances in the same process ### Follow-up issues - #972 — Trim pre-existing system prompt bloat (~6K tokens) - #955 — CodeAgent write tools missing blocklist guardrails ## Test plan - [x] ~500 PR-specific unit tests pass (11 new test files, ~8K LOC) - [x] Full unit suite passes, lint clean - [x] Agent eval: 87-89% judged pass rate, 100% on personality/RAG/adversarial/web scenarios - [x] All 10 critical CI checks pass - [x] 2 remaining CodeQL alerts documented as false positives (EMR dashboard) --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Ovtcharov <kovtchar@amd.com>
1 parent 7fadc3f commit d25d933

106 files changed

Lines changed: 18574 additions & 506 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.github/workflows/test_unit.yml

Lines changed: 13 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -58,11 +58,11 @@ jobs:
5858
# pyfakefs is required by tests/unit/installer/test_uninstall_command.py
5959
# which uses the `fs` fixture to build a fake filesystem for testing
6060
# tiered uninstall logic cross-platform without touching the real FS.
61-
#
61+
# pytest-mock is required by the browser/filesystem tool tests.
6262
# keyring + httpx + respx are required by tests/unit/connections/
6363
# (issue #915). The in-memory keyring backend in tests/conftest.py
6464
# avoids the SecretService daemon prerequisite on Linux runners.
65-
uv pip install --system pytest pytest-cov pytest-asyncio pyfakefs \
65+
uv pip install --system pytest pytest-cov pytest-asyncio pytest-mock pyfakefs \
6666
keyring httpx respx
6767
uv pip install --system -e ".[api]"
6868
@@ -140,6 +140,17 @@ jobs:
140140
echo " - ASR: Automatic speech recognition utilities"
141141
echo " - TTS: Text-to-speech utilities"
142142
echo " - InitCommand: gaia init profiles and installer logic"
143+
echo " - FileSystemIndex: Persistent file index with FTS5 search"
144+
echo " - FileSystemToolsMixin: browse_directory, tree, file_info, find_files, read_file, bookmark tools"
145+
echo " - ScratchpadService: SQLite working memory for data analysis"
146+
echo " - ScratchpadToolsMixin: create_table, insert_data, query_data, list_tables, drop_table tools"
147+
echo " - BrowserTools: WebClient SSRF prevention, HTML extraction, downloads"
148+
echo " - WebClient Edge Cases: parse_html fallback, extract_text, tables, links, download redirects"
149+
echo " - Categorizer: auto_categorize, category map completeness, extension uniqueness"
150+
echo " - ChatAgent Integration: filesystem, scratchpad, browser init/config/cleanup"
151+
echo " - File Write Guardrails: blocked dirs, sensitive files, size limits, backup, audit"
152+
echo " - Security Edge Cases: symlinks, audit logging, TOCTOU, prompt_overwrite"
153+
echo " - Service Edge Cases: DB corruption rebuild, shared DB, row limits, transaction atomicity"
143154
echo ""
144155
echo "Integration Tests:"
145156
echo " - DatabaseMixin + Agent: Full agent lifecycle with database"

docs/docs.json

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -287,7 +287,9 @@
287287
"spec/prisma-tools-mixin",
288288
"spec/typescript-tools-mixin",
289289
"spec/external-tools-mixin",
290-
"spec/web-tools-mixin"
290+
"spec/web-tools-mixin",
291+
"spec/browser-tools",
292+
"spec/file-system-agent"
291293
]
292294
},
293295
{

docs/server.js

Lines changed: 54 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -271,8 +271,35 @@ const loginLimiter = rateLimit({
271271
legacyHeaders: false,
272272
});
273273

274+
// General per-IP rate limiter for all auth endpoints (not just /login).
275+
// Defined here so it can be applied to every auth route below, closing the
276+
// "missing rate-limiting" CodeQL alert on /auth/logout and
277+
// /auth/login-error which would otherwise accept unlimited requests.
278+
const rateLimitStore = new Map();
279+
const RATE_LIMIT_WINDOW = 60 * 1000; // 1 minute
280+
const RATE_LIMIT_MAX = 100; // max requests per window per IP
281+
282+
function rateLimiter(req, res, next) {
283+
const ip = req.ip || req.connection.remoteAddress;
284+
const now = Date.now();
285+
const record = rateLimitStore.get(ip) || { count: 0, resetAt: now + RATE_LIMIT_WINDOW };
286+
287+
if (now > record.resetAt) {
288+
record.count = 0;
289+
record.resetAt = now + RATE_LIMIT_WINDOW;
290+
}
291+
292+
record.count++;
293+
rateLimitStore.set(ip, record);
294+
295+
if (record.count > RATE_LIMIT_MAX) {
296+
return res.status(429).send('Too Many Requests');
297+
}
298+
next();
299+
}
300+
274301
// Login handler
275-
app.post('/auth/login', loginLimiter, (req, res) => {
302+
app.post('/auth/login', loginLimiter, rateLimiter, (req, res) => {
276303
const { code, nonce } = req.body;
277304

278305
if (code === ACCESS_CODE) {
@@ -285,15 +312,29 @@ app.post('/auth/login', loginLimiter, (req, res) => {
285312
maxAge: COOKIE_MAX_AGE,
286313
sameSite: 'lax'
287314
});
288-
// Retrieve redirect URL from server-side storage and validate with url.parse()
315+
// Server-side redirect target. Instead of validating the user-supplied
316+
// pathname and forwarding it (which CodeQL's
317+
// js/server-side-unvalidated-url-redirection analyzer can't prove safe),
318+
// we maintain an explicit allowlist of post-login destinations and
319+
// round-trip the incoming pathname through it. Anything that doesn't
320+
// exactly match a known-safe path falls back to '/'.
321+
const ALLOWED_POST_LOGIN_PATHS = new Set([
322+
'/',
323+
'/index.html',
324+
]);
289325
const target = consumeRedirect(nonce);
290326
const parsed = url.parse(target || '');
291-
// Only redirect to relative paths (no host/protocol) to prevent open redirects
292-
if (!parsed.host && !parsed.protocol && parsed.pathname) {
293-
res.redirect(303, parsed.pathname);
294-
} else {
295-
res.redirect(303, '/');
296-
}
327+
const pathname = parsed.pathname || '/';
328+
// Block open-redirects and traversal before the allowlist check.
329+
const structurallySafe =
330+
!parsed.host &&
331+
!parsed.protocol &&
332+
pathname.startsWith('/') &&
333+
!pathname.startsWith('//') &&
334+
!pathname.split('/').includes('..');
335+
const resolvedPath =
336+
structurallySafe && ALLOWED_POST_LOGIN_PATHS.has(pathname) ? pathname : '/';
337+
res.redirect(303, resolvedPath);
297338
} else {
298339
// Retrieve the original redirect URL and re-store with a new nonce for retry
299340
const originalRedirect = consumeRedirect(nonce);
@@ -303,7 +344,7 @@ app.post('/auth/login', loginLimiter, (req, res) => {
303344
});
304345

305346
// Login error handler (uses nonce to retrieve redirect URL)
306-
app.get('/auth/login-error', (req, res) => {
347+
app.get('/auth/login-error', rateLimiter, (req, res) => {
307348
// Retrieve redirect URL from server-side storage and re-store for the form
308349
const originalRedirect = consumeRedirect(req.query.nonce);
309350
const newNonce = storeRedirect(originalRedirect);
@@ -312,11 +353,14 @@ app.get('/auth/login-error', (req, res) => {
312353
});
313354

314355
// Logout handler
315-
app.get('/auth/logout', (req, res) => {
356+
app.get('/auth/logout', rateLimiter, (req, res) => {
316357
res.clearCookie(COOKIE_NAME);
317358
res.redirect('/');
318359
});
319360

361+
// Apply rate limiter before auth middleware for every other route
362+
app.use(rateLimiter);
363+
320364
// Apply auth middleware
321365
app.use(authMiddleware);
322366

0 commit comments

Comments
 (0)