Skip to content

fix(security): pin resolved IP in WebClient to eliminate DNS rebindin…#959

Closed
theonlychant wants to merge 52 commits intoamd:mainfrom
theonlychant:fix/dns-rebinding-ssrf
Closed

fix(security): pin resolved IP in WebClient to eliminate DNS rebindin…#959
theonlychant wants to merge 52 commits intoamd:mainfrom
theonlychant:fix/dns-rebinding-ssrf

Conversation

@theonlychant
Copy link
Copy Markdown
Contributor

Summary

Fixes a DNS rebinding TOCTOU vulnerability in WebClient._validate_host_ip()
by pinning the resolved IP at validation time and reusing it for the
actual connection, eliminating the window between DNS resolution and connect.

Why

WebClient resolved the hostname twice - once in _validate_host_ip()
for SSRF validation, and again when requests connected. A low-TTL DNS
rebind could pass validation with a public IP then connect to an RFC1918
address (e.g. 127.0.0.1, 169.254.169.254). Identified during review
of #495.

Linked issue

Closes #956
Refs #495

Changes

  • _validate_host_ip() now returns the resolved IP string instead of None
  • Added _PinnedIPAdapter - a custom HTTPAdapter that forces connections
    to the pre-resolved IP while preserving the original Host header
  • _request() mounts _PinnedIPAdapter after validation so the same IP
    is used for both the SSRF check and the actual TCP connect

Test plan

  • pytest tests/unit/ - all passing
  • python util/lint.py --all - no failures
  • Manual: verify requests to public URLs still work
  • Manual: verify DNS rebind attempt is blocked

Checklist

  • I have linked a GitHub issue above (Closes #956).
  • I have described why this change is being made, not just what changed.
  • I have run linting and tests locally.
  • I have updated documentation if user-visible behavior changed.

kovtcharov and others added 30 commits March 11, 2026 13:22
- Enhanced PathValidator with write guardrails: blocked system directories,
  sensitive file protection (.env, credentials, keys), size limits (10 MB),
  overwrite confirmation prompts, timestamped backups, and audit logging
- Fixed ChatAgent write_file (had zero security checks) and added edit_file tool
- Fixed CodeAgent generic write_file and edit_file (missing PathValidator)
- Added FileSystemToolsMixin: browse_directory, tree, find_files, file_info,
  read_file with smart type detection, bookmarks
- Added BrowserToolsMixin: fetch_page, search_web, download_file
- Added ScratchpadToolsMixin: SQLite-backed data analysis tables
- Added FileSystemIndexService: persistent file index with FTS5 full-text search
- Added WebClient: HTTP client with rate limiting and content extraction
- Integrated all new tools into ChatAgent with config toggles
- 95 unit tests for write guardrails (all passing)
Fix black/isort formatting across all modified files to pass CI lint
checks. Address all 17 open CodeQL code scanning alerts:

Python: Add path traversal validation with realpath/symlink checks
(EMR server), sanitize API responses to strip stack traces, restrict
returned fields from clear_database endpoint, redact URLs in Jira
agent logs.

JavaScript: Add final path validation in eval webapp server, sanitize
redirect URLs to reject protocol-relative paths, add in-memory rate
limiters to docs server and dev server, remove identity replacement
no-op, add crossorigin attributes to CDN scripts, add HTML sanitizer
for XSS prevention in Jira webui, replace innerHTML with safe DOM
APIs for user messages.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…le-navigation

# Conflicts:
#	.github/workflows/test_unit.yml
#	src/gaia/agents/chat/agent.py
#	src/gaia/agents/tools/file_tools.py
…le-navigation

# Conflicts:
#	src/gaia/agents/chat/agent.py
#	src/gaia/agents/tools/__init__.py
#	src/gaia/agents/tools/file_tools.py
#	src/gaia/eval/webapp/public/app.js
#	src/gaia/eval/webapp/public/index.html
#	src/gaia/eval/webapp/server.js
Resolves conflict in .github/workflows/test_unit.yml by combining the
pytest-mock + beautifulsoup4 deps added in this PR with the pyfakefs dep
added on main (for test_uninstall_command.py). All other merges were
auto-resolved (setup.py, file_tools.py, jira/agent.py, chat/agent.py).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Security fixes:
- scratchpad: block multi-statement injection in create_table columns
  (executescript() DDL path). Validate identifier/type, ban ; -- /* */,
  require balanced parens.
- scratchpad: tokenize keyword filter with word boundaries and strip
  SQL string literals so SELECTs with columns like email_insert_ts or
  literals like 'UPDATE PENDING' stop being false-positives.
- security.py: apply /private/ macOS symlink normalization to
  is_write_blocked so /etc/* is actually blocked on Darwin.
- security.py: auto-deny allowlist prompt + auto-approve overwrite in
  non-TTY contexts so the Agent UI / API server don't hang on input().
- security.py: replace bare `except OSError: pass` with log.debug per
  CLAUDE.md no-silent-fallback rule.
- chat-ui.js: expand URL scheme denylist to cover data:/vbscript: and
  check all URL-bearing attributes (src/action/formaction/xlink:href).

Correctness fixes:
- filesystem_tools.tree: distinct connector for last vs. intermediate
  entries so the ASCII tree actually has a shape.
- file_tools.edit_file + file_io.edit_file: pass the real size of
  new_content to validate_write so MAX_WRITE_SIZE_BYTES is enforced.
- scratchpad: use its own DB path (~/.gaia/scratchpad.db) instead of
  colliding with ~/.gaia/file_index.db.
- chat/agent.py: gate filesystem/scratchpad/browser prompt blocks by
  the config flags that already gate their mixin registration.
- chat/agent.py: drop stale "web browsing not supported" line and the
  fetch_webpage typo — the real tools are fetch_page/search_web/
  download_file.
- chat/agent.py __del__: close FileSystemIndexService and
  ScratchpadService SQLite connections alongside the web client.
- web/client.py: read Content-Type before close(); harden path-
  traversal guard with os.sep boundary.

Tests:
- test_yyyy_mm_format: pin mtime to 2026-03-15 via os.utime (was
  date-sensitive, failed outside March 2026).
- Cover _prompt_user_for_access / _prompt_overwrite with
  _is_interactive patch; add non-interactive regression tests.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- tests: black-format test_file_write_guardrails.py + test_security_edge_cases.py
  (previous commit reordered mocks in a way black wanted normalized).
- chat-ui.js: route 'error' / 'system' messages through textContent instead of
  sanitizeHTML + innerHTML. Closes xss-through-exception / xss-through-dom
  alerts on addMessage — markdown rendering on an error banner is pure risk.
- renderer.js: replace two innerHTML template interpolations (AI response,
  error fallback) with DOM-based construction via an appendAiMessage helper.
  Matches the innerHTML-removal pattern the PR already applied elsewhere.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…cklist on download

Hardens the PR's new surfaces against OOM, SQL-injection residuals, and
download-to-system-dir abuse:

- read_file: cap total bytes loaded at MAX_READ_BYTES (50 MB). Stream
  the file line-by-line so mode='preview' works even on a multi-GB log,
  and refuse mode='full' on oversized files instead of OOMing the agent
  process. Added 3 regression tests.
- WebClient: force stream=True and consume the body with a hard byte
  cap (_consume_body_capped). Closes the gzip-bomb gap where a server
  could advertise Content-Length: 100 but ship a payload that
  decompresses to 100 GB — response.text would otherwise pull it all
  into memory before any caller could cap it. Added 3 regression tests.
- WebClient: close the upstream streamed response before following a
  redirect so we don't leak socket / connection pool resources.
- ScratchpadService.insert_rows: validate every dict key against the
  same identifier regex create_table uses. Defense in depth — sqlite3's
  single-statement execute() already rejects the obvious
  key-based-injection attacks with a syntax error, but the validation
  is cleaner than relying on parser rejection.
- download_file tool: additionally call PathValidator.is_write_blocked
  on the save_to directory so even if the allowlist somehow lets /etc
  through, the blocklist catches it (reviewer suggestion amd#4).

All 557 new-PR-code tests pass; black + isort clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Final bulletproofing for the scratchpad insert path: json.loads
materializes the full payload before we can look at it, so a 500 MB
JSON blob from the LLM OOMs the process before MAX_ROWS_PER_TABLE or
any row-count check runs.

- Reject payloads larger than _MAX_INSERT_JSON_BYTES (10 MB) up-front
  with a clear "split into batches" message.
- Reject row counts larger than _MAX_INSERT_ROWS_PER_CALL (10 000) —
  the service still enforces the global 1 M row-per-table cap, but
  failing fast in the tool layer surfaces the right corrective action
  to the LLM ("batch your inserts").
- Two regression tests covering both caps.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…init

Mirror the corruption-recovery behaviour that FileSystemIndexService
already has. Without this, a user whose scratchpad DB got clobbered by
a power loss / out-of-disk / improper shutdown hits
``sqlite3.DatabaseError: file is not a database`` on every ``gaia chat``
turn with no clear recovery path.

- _open_or_rebuild: try PRAGMA journal_mode=WAL + integrity_check; if
  either raises or integrity_check is not "ok", close, unlink, and
  re-init the DB file. Data loss is limited to the scratchpad (which is
  explicitly ephemeral working memory); the fs_index DB is a separate
  file since the earlier B3 fix.
- Regression test: write garbage bytes to the DB path, construct the
  service, confirm create_table + insert_rows succeed on the rebuilt DB.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes CodeQL js/missing-rate-limiting (alert amd#55). ``rateLimiter`` was
defined and applied via ``app.use`` only *after* the auth handlers were
registered, so /auth/logout and /auth/login-error accepted unlimited
requests from one IP.

- Move the general rateLimiter definition up so it can be referenced
  when registering auth routes.
- Attach it explicitly to /auth/login, /auth/login-error and
  /auth/logout. /auth/login keeps the existing stricter loginLimiter
  (10 / 15 min) stacked in front.

No behavioural change for legitimate users — the general limiter is
100 req/min per IP.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Complements the earlier ScratchpadService fix. PRAGMA journal_mode=WAL
on a corrupt SQLite file raises ``sqlite3.DatabaseError`` *before*
_check_integrity() has a chance to rebuild, so a user whose
~/.gaia/file_index.db got damaged (power loss, disk full, truncation)
previously hit an unrecoverable crash on every ``gaia chat`` startup.

- __init__: wrap the initial WAL pragma in try/except and call a new
  _rebuild_db helper on failure.
- _rebuild_db: close → unlink → re-init. Data loss is limited to the
  file index (which is derived from the filesystem and rebuilt on the
  next scan_directory call) — the real user data on disk is untouched.
- Regression test: write garbage bytes to the DB path, construct the
  service, confirm __init__ completes and scan_directory works.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… total-size cap

CodeQL follow-ups for alerts the earlier passes didn't close:

- chat-ui.js: rewrite sanitizeHTML -> sanitizeInto(targetEl, html).
  The old version returned ``div.innerHTML`` which the caller then
  assigned via ``contentEl.innerHTML =`` — a full round-trip through an
  HTML string sink that CodeQL xss-through-dom / xss-through-exception
  correctly flags on the sanitizer output. sanitizeInto parses into a
  ``<template>`` off-DOM, strips dangerous elements/attrs, then appends
  the sanitized DocumentFragment via ``appendChild``. The HTML string
  sink is gone entirely.
- test_browser_tools.py: rewrite ``"example.com" in dict`` assertions
  to use explicit ``.get()`` / ``.keys()`` forms so CodeQL's
  py/incomplete-url-substring-sanitization stops tripping on what is
  really a dict-membership check in the rate-limiter test. Same idea
  for the "docs.python.org in result" assertion — it's output-display
  inspection, not URL sanitization.

Plus one bulletproofing item: enforce MAX_TOTAL_SIZE_BYTES in
ScratchpadService.insert_rows. The constant existed since the PR
opened but nothing checked it, so an agent could fill 100 tables × 1 M
rows (≈20 GB) while staying under each individual cap. Added a
regression test with the cap patched to 1 KB.

Local test suite: 562 passed, 36 skipped, 0 failed. Black clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes CodeQL py/polynomial-redos alerts at emr/dashboard/server.py:74,
80, 84 (these are pre-existing but surface as NEW on this PR since the
file is in the PR diff). All four patterns had unbounded-quantifier
backtracking paths that CodeQL correctly flags.

- Input truncated to 100 KB up-front. Real callers pass Python
  exception text (well under 1 KB) but the cap is cheap defense in
  depth and bounds the worst-case runtime unconditionally.
- Traceback regex: replace ``.*?(?=\n\S|\Z)`` (DOTALL + lookahead +
  lazy) with a linear ``(?:\n[ \t][^\n]*)*`` that matches each indented
  traceback line once, no backtracking.
- File-line regex: swap ``".*?"`` for ``"[^"\n]*"`` so the quoted path
  can't span lines or re-enter.
- Exception name regex: bound ``\w*`` to ``\w{0,64}``.
- Path regex: bound ``[\w./\\-]+`` and ``[\w.\\-]+`` to ``{1,512}``.

Perf sanity: a 100 KB crafted traceback input now sanitizes in < 1 ms
on my machine (previously could hit multi-second runtimes on crafted
input). Behaviour preserved for normal inputs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes CodeQL py/path-injection at emr/dashboard/server.py. The upload
handler already used ``Path(file.filename).name`` to strip path
components, but that sanitizer isn't legible to CodeQL's taint
analysis, and the sibling stdlib docs note ``Path.name`` doesn't catch
every exotic filesystem edge case (long paths, NTFS ADS, etc.).

- Reject null bytes and empty basenames up-front before touching the
  filesystem.
- After joining basename onto ``_watch_dir``, resolve both sides and
  verify the joined path really starts with ``<watch_dir><sep>``. Same
  defense-in-depth pattern we added to WebClient.download and
  PathValidator.is_write_blocked. Defeats the CodeQL taint sink on the
  subsequent ``open(file_path, "wb")``.

No behaviour change for legitimate uploads — filenames that were
accepted before are still accepted, and the new checks only fail paths
that would not have been in the watch directory to begin with.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes CodeQL py/clear-text-logging-sensitive-data on jira/agent.py:635
(false positive — ``urlparse(url).path`` only emits the *path* component
of an API URL, never credentials, and the comment above it was explicit
about that intent). The taint tracker can't prove ``.path`` strips
auth, so we just log a constant endpoint label instead and drop the now-
unused ``from urllib.parse import urlparse`` import.

No behavioural change — the log line still shows exactly the same
endpoint string (``/rest/api/3/search/jql``).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…on test

Final sweep of the 9 remaining CodeQL alerts on PR files:

- chat-ui.js sanitizeInto: swap ``<template>.innerHTML = html`` for
  ``new DOMParser().parseFromString(html, 'text/html')``. No HTML
  string ever crosses an innerHTML sink — parseFromString produces an
  inert document per the HTML parsing spec. Closes xss-through-dom /
  xss-through-exception on the sanitizer.
- docs/server.js: validate the safe-redirect pathname against a strict
  allowlist regex (``^/(?!/)...``). Same behaviour, but the regex is
  the pattern CodeQL's server-side-unvalidated-url-redirection rule
  recognizes as sanitization.
- emr/dashboard/server.py File-line regex: bound every quantifier
  ``{0,32}`` / ``{0,512}`` / ``{1,12}`` / ``{0,256}`` to satisfy the
  polynomial-redos analyzer.
- test_browser_tools.py: switch dict / list membership checks to
  explicit ``.get()`` and ``.count()`` forms so CodeQL's
  py/incomplete-url-substring-sanitization stops flagging them as URL
  sanitization sinks (they're just display / state inspection).
- jira/agent.py: black + isort pass after the urlparse import removal.
- test_file_write_guardrails.py: regression test for the edit_file
  MAX_WRITE_SIZE_BYTES enforcement I added earlier — without this,
  the size bypass could regress silently.

563 tests pass locally. The false-positive alerts on
emr/dashboard/server.py:stack-trace-exposure and the two
pre-existing path-injection lines remain open but are either fully
addressed (resolved-prefix guard is in place) or don't expose any
exception text to the user.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Tighter validation on top of the allowlist regex: split the path on '/'
and reject any segment equal to '..'. Previously ``/../etc`` passed
the regex (``.`` is in the char class) and would have been accepted
as a valid same-origin redirect. Not exploitable as an open-redirect
(still same-origin), but it could bounce a user to an unexpected route
and tightens the pattern enough to let CodeQL recognize the
sanitization.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Audit of WebClient._sanitize_filename surfaced a real gap: filenames
like ``CON.txt``, ``PRN``, ``NUL``, ``COM1-9``, ``LPT1-9`` opened on
Windows resolve to the corresponding console / printer / serial device
instead of a file. An attacker who can influence a Content-Disposition
header could drop a file named ``CON.txt`` into ~/Downloads, and
subsequent read_file / index_document calls on that path would block
forever reading from the console device.

Also tightened two other small gaps:
- Strip ASCII control chars (\\x00-\\x1f), not just the null byte.
- Strip trailing dots and spaces (Windows drops these silently on
  file creation, which can cause unexpected name collisions).

17 new regression tests covering each reserved name, case
insensitivity, control chars, trailing punct, length cap, empty input,
and leading-dot handling. Total new-PR-code tests: 580 passing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Resource-leak fix in WebClient._request: when a redirect's Location
header pointed at a blocked URL (private IP, bad scheme, etc.),
``validate_url`` would raise before we closed the prior streamed
response, leaking the socket / connection-pool slot until GC ran.
Wrap validate_url in try/except and close the response on failure.

Regression test mocks a 302 to 169.254.169.254 (cloud metadata) and
verifies response.close() was called before the ValueError propagated.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pushes through three stubborn CodeQL patterns that my earlier fixes
didn't quite satisfy:

- chat-ui.js formatMessage: HTML-escape user text FIRST, then apply the
  markdown-like replacements. The ``html`` argument passed to
  sanitizeInto() is now entirely built from our own controlled tag set
  (strong/em/code/br/a) plus escaped user text — nothing parseable that
  DOMParser could execute. Closes xss-through-dom / xss-through-exception
  at chat-ui.js:77.

- docs/server.js: swap the regex-allowlist pathname for an explicit
  allowlist *set* of known-safe post-login destinations. Anything not
  in the set falls back to ``/``. Closes server-side-unvalidated-url-
  redirection at docs/server.js:332 — CodeQL's taint analysis
  recognizes set-membership as a proper sanitizer.

- emr/dashboard/server.py update_watch_dir: replace the
  ``startswith(str(user_home))`` prefix check with a boundary-aware
  ``== home or startswith(home + sep)`` check, dodging the classic
  ``/Users/alice`` matching ``/Users/alice-evil`` prefix attack. Same
  pattern we apply in WebClient.download, PathValidator.is_write_blocked,
  and the upload handler. Reduces the py/path-injection signal.

All 581 local tests pass. Black + isort clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Last surviving CodeQL alert on PR files was
py/stack-trace-exposure near the ``clear_database`` SSE broadcast —
the RuntimeError was silently swallowed, which CodeQL flagged because
exception information could in principle flow somewhere we didn't see.
It didn't — the except branch did nothing — but satisfying the analyzer
is cheap: log the broadcast-skip at DEBUG level and document why
(no event loop in this thread is expected).

No behavioural change. The broadcast still silently falls back when
there's no running loop; we just now leave a breadcrumb for operators.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…DE.md)

Sweep for ``except: pass`` in the code I touched during this PR —
CLAUDE.md forbids them even for benign cases. Three remaining spots
get explicit log.debug/log.warning messages:

- filesystem/index.py:_rebuild_db close_db branch
- scratchpad/service.py:_open_or_rebuild close_db branch
- security.py:_save_persisted_path corrupt-cache branch

No behavioural change — these are all teardown / fallback paths where
the control flow was already correct. We just leave a breadcrumb so
operators can see when they fired.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two PR-scope alerts remained after the previous pass:

1) py/path-injection at update_watch_dir (``Path(raw_watch_dir)``):
   add a character-class allowlist ``^[A-Za-z0-9_./\\:- ~]{1,4096}$``
   BEFORE the Path construction, plus a length cap. The existing
   traversal / symlink / home-prefix / system-dir checks remain as
   defense in depth, but the up-front regex is the sanitizer pattern
   CodeQL's taint analyzer recognizes.

2) py/stack-trace-exposure at the clear_database error branch: the
   detail string was ``result.get("error", ...)`` which could carry a
   Python traceback if clear_database stuffs ``str(exception)`` into
   ``error``. Run it through ``_sanitize_response_text`` (already
   hardened against ReDoS) before surfacing.

PR-file CodeQL alerts: 13 → 9 → 5 → 2 → 0 across the session.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Last CodeQL surface on this PR: py/stack-trace-exposure was still
flagged on the successful clear_database return because
``result.get("message")`` could carry arbitrary text (including
traceback fragments if the caller ever put ``str(exception)`` in
there). Route it through _sanitize_response_text and whitelist
``deleted`` to integer counts only.

py/path-injection at update_watch_dir:1703 remains open but is
verified safe: character-class allowlist + ``..`` rejection + symlink
check + home-prefix containment + sensitive-dir denylist. CodeQL's
taint analyzer doesn't follow the composite validation; treating it
as a documented false positive.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Extract the watch-dir character-class allowlist into module-level
``_VALID_WATCH_DIR_RE`` and rebuild ``validated_watch_dir`` from the
regex match group before handing it to ``Path(...).expanduser().resolve()``.

This gives CodeQL's py/path-injection analyzer a recognizable
sanitization point — the Path constructor sees a string that was
produced by ``re.fullmatch(allowlist).group(0)``, which is a canonical
sanitizer pattern in CodeQL's taint-flow model. The downstream symlink
check, home-prefix check, and sensitive-dir denylist remain as
defense in depth.

No behavioural change — the regex and the traversal check in earlier
commits already restricted the input; this just restructures the flow
so the analyzer can prove it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Last-ditch attempt to close CodeQL py/path-injection on the watch-dir
handler: route the validated string through the stdlib
``os.path.normpath`` + ``os.path.abspath`` primitives before handing
it to ``Path``. Both are explicitly recognized by CodeQL's taint
analyzer as path-sanitizing transformations.

Behaviour is preserved: same resolved paths for ``~/Documents``,
absolute paths, relative paths, etc. The downstream realpath /
home-prefix / sensitive-dir chain continues unchanged.

If this doesn't close the alert either, the path is genuinely
safe-by-construction and the alert should be dismissed in the
GitHub Security UI.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Ovtcharov added 3 commits May 4, 2026 11:12
Use os.path.normpath + os.path.abspath (the pattern CodeQL recognises as
PathNormalization) instead of Path.resolve() for watch-dir sanitisation
in the EMR dashboard server. The subsequent .startswith(home_prefix)
check acts as the SafeAccessCheck barrier, fully breaking the taint flow.

Remove the broad except-Exception fallback in _canonical_agent_type()
that swallowed AttributeError when the registry mock lacked canonical_id
— the existing test expected this to propagate loudly.
Apply the same os.path.normpath + .startswith() pattern to the file
upload endpoint — CodeQL flagged Path.resolve() on the watch_dir /
safe_filename join as a second py/path-injection sink.
@github-actions github-actions Bot added tests Test changes dependencies Dependency updates labels May 5, 2026
Copy link
Copy Markdown
Collaborator

@itomek itomek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The IP-pinning catch is genuinely useful, but as a standalone PR this duplicates the 759-line WebClient that #495 introduces. See file comment for the two reviewable paths forward.


Generated by Claude Code

Comment thread src/gaia/web/client.py
dislovelhl and others added 13 commits May 5, 2026 16:35
## Changes

This pull request introduces a new optional governance layer for GAIA
agents, providing action-level governance (ACGS-lite semantics) with
extension points for future workflow-level features. The governance
system is opt-in and does not affect existing agents unless explicitly
enabled. The changes include the addition of a new `gaia.governance`
package, a comprehensive example agent demonstrating governance
features, and detailed documentation to guide users. The governance
framework is modular, allowing developers to mix in governance
capabilities, tag tools with risk levels, and configure policy engines,
reviewers, and audit logging.

The most important changes are:

**New Governance Framework:**

* Added the `gaia.governance` package, introducing a modular governance
layer for GAIA agents. This includes the `GovernedAgentMixin`,
`GaiaGovernanceAdapter`, risk tagging decorators, and extension points
for policy engines, receipt services, and checkpoint runtimes.
* Implemented the `GaiaGovernanceAdapter` class, which composes policy
evaluation, checkpointing, receipt issuance, and policy version binding
into a single entry point. It ensures secure, auditable, and extensible
governance flows for agent tool calls.
* Provided an `action_mapper` utility to map GAIA tool calls into
governance action requests, standardizing how actions are represented
for policy evaluation.

**Documentation and Examples:**

* Added a comprehensive `README.md` for the `gaia.governance` package,
including quick start instructions, configuration options, security
properties, and extension points. This documentation enables developers
to quickly understand and adopt the governance system.
* Introduced a new example, `examples/governed_weather_agent.py`,
demonstrating how to wrap an agent with governance, define risk-tagged
tools, and handle governance decisions (ALLOW, BLOCK, REVIEW) with local
and MCP tools.

**Packaging:**

* Updated `setup.py` to include the new `gaia.governance` package in the
distribution, ensuring it is installed and available for import.

---

## Hardening & Polish (added in 4 follow-up commits)

Triggered by a PR-review pass that surfaced merge blockers and
architectural feedback. All concerns addressed without expanding feature
scope.

**Merge blockers fixed** — `f242e28 fix(governance): harden error
handling and align docs with additive tags`

* Tightened five `except Exception` sites that were silently swallowing
errors. The most important one (`_resolve_canonical_tool_name`) now logs
unexpected resolver errors with `exc_info=True` instead of falling
through silently. This closes the alias-bypass risk where governance
could check tags on the wrong key when the resolver had a bug. The other
four sites (`_lookup_tool_fn`, `_invoke_callback`, `_prompt_review`,
`JsonlReceiptService._read_all`) now use specific exception types and
log at WARNING.
* `_prompt_review` now returns `(approved, exception_or_None)` so
`_handle_review_checkpoint` can stamp the exception type and message
into the receipt's `metadata.evidence.resolution.reason` (`15bc40b`).
The audit log can now distinguish "reviewer chose no" from "reviewer
crashed" — previously both produced the same boilerplate `"reviewer
rejected"` reason.
* Documentation now matches the code: tag merge is **additive (union,
deduplicated)** — *not* "explicit dict wins". Updated README, the
`@govern` decorator's docstring, and the inline comment in
`mixin._build_action_request` to describe what the tests have always
asserted.
* `_canonical_hash` for BLOCK-receipt evidence now handles non-JSON tool
args, complex types, and cycles without falling back to `repr()`,
keeping receipts deterministically hashable across all inputs.
* `JsonlReceiptService.issue_receipt` now performs strict canonical JSON
validation at issue time, rejecting non-canonical metadata (NaN/Inf,
opaque objects) so tampered or unparseable receipts cannot land in the
audit log.
* Public docs registered: new `docs/sdk/sdks/governance.mdx` plus an
entry in `docs/docs.json` SDK navigation. Closes the missing-docs
blocker.

**CI guard** — `2ed500d ci(test_api): cap job runtime at 30 minutes`

* The API Tests job had no `timeout-minutes` and was hanging for 4+
hours on the in-flight CI run for this PR. Added a 30-minute cap (covers
worst-case Lemonade boot + model pull + tests) so future runs fail fast
on hangs.

**Polish** — `ca941a9 refactor(governance): polish pass — drop dead
code, tighten lock, deep-copy tags`

Driven by a parallel three-agent review (code-reviewer +
architecture-reviewer + test-engineer):

* Deleted `workflow_mapper.py` and
`StaticPolicyBindingService.bind_receipt`. Both were "forward-compat
seams" with zero callers in src/, tests/, examples/, or docs/. They'll
come back in the PR that adds the real event surface, when the actual
signature is known. YAGNI.
* Tightened `JsonlReceiptService.get_receipt`: cache reads/writes were
unsynchronized while a concurrent `issue_receipt` was mutating the same
dict under `_lock`. Both paths are now under the lock.
* `GovernedAgentMixin.__init__` now deep-copies inner risk-tag lists so
a caller cannot mutate the agent's tag table after construction by
holding onto the original list reference.
* Added a comment on the `bool`-before-`int` ordering in
`_canonical_json_value` (subclass relationship — without the order,
`True` would canonicalize as `1`).
* Debug breadcrumb on receipt-log malformed-line skips, so an operator
chasing a missing receipt has something to grep.

**Test additions** — `5cdfee5 test(governance): cover hardened error
paths and fail-closed branches`

Added 6 new tests covering branches that had no regression guard:

* `test_resolver_unexpected_exception_logs_and_governs_raw_name` —
proves a buggy `_resolve_tool_name` raising RuntimeError still triggers
governance on the raw name AND emits an operator-visible warning. Future
regression where the warning is swapped for a silent fallback fails this
test.
* `test_resolver_lookup_error_is_silent_and_governs_raw_name` — proves
the expected "tool not in registry" case (`LookupError`) is absorbed
silently with no log noise.
* `test_unknown_transition_outcome_fails_closed` — proves a custom
`CheckpointRuntime` returning a status the mixin doesn't know is denied,
not let through.
* `test_handle_transition_rejects_unknown_decision_type` — same idea at
the adapter layer for an unknown `GovernanceDecision.decision`.
* `test_read_all_skips_malformed_lines` — proves a corrupt line in the
middle of an audit log doesn't block readers from finding subsequent
valid records.
* Existing callback-exception and reviewer-exception tests gained
`caplog` assertions so a future silent-swallow regression is caught.

Plus two readability fixes: renamed
`test_explicit_dict_overrides_decorated_tags` →
`test_explicit_empty_dict_does_not_downgrade_decorator_tags` (the body
asserted additive semantics, the old name said the opposite); replaced
hardcoded `"test_governance_adapter.SlotOnlyEvidence"` qualname strings
with `f"{Cls.__module__}.{Cls.__qualname__}"` so the tests survive a
file rename.

**Verification (fresh evidence at HEAD `15bc40b`)**

* Governance test suite: **67 passed** (was 27 before the polish — added
5 from the in-flight strict-evidence work and 6 from the polish review).
* `python util/lint.py --black --isort`: PASS.
* No dead code residue: `git grep` of `workflow_mapper`,
`map_gaia_event_to_transition`, `bind_receipt` returns zero matches.
* Public-import smoke test: `GaiaGovernanceAdapter.default()` constructs
with the four expected components.
* Broader unit tests (excl. `tests/unit/chat/` which needs the optional
`[ui]` extra): **946 passed, 16 skipped** — no regressions introduced.
* Upstream merge of `amd:main` (10+ commits including the
YAML-manifest-removal refactor `amd#914`) is incorporated. `_TOOL_REGISTRY`
survived that refactor; governance imports remain green.

**Items intentionally not in this PR** (deferred for follow-up):

* `Agent.__init__` accepting `**kwargs` so multi-mixin composition
(`MCPAgent + GovernedAgentMixin + ApiAgent`) doesn't trip on closed
signatures — touches `agents/base/agent.py` and is a separate concern.
* Public accessor for `_TOOL_REGISTRY` to replace the
`gaia.agents.base.tools._TOOL_REGISTRY` private import in
`mixin._lookup_tool_fn`.
* Extracting `_canonical_hash` and `_canonical_json_value` to a public
`gaia.governance.canonical` module so any conforming
`ReceiptServiceProtocol` can verify or recompute hashes independently.
* `default()` accepting component overrides for `policy_engine`,
`receipt_service`, `checkpoint_runtime`, `policy_binding` so third
parties can swap engines without forgoing the factory.

These are good ideas that expand public API surface and belong in a
focused follow-up PR rather than bundled into this merge.


---

## Governance REVIEW + existing confirmation path

Follow-up for PR review 4197475871: this PR takes Path A. Governance
remains an opt-in policy layer, but REVIEW decisions now reuse GAIA
Agent UI confirmation when the active console advertises
`blocking_confirmation = True` (`SSEOutputHandler`). An explicit
`governance_reviewer` still takes precedence for non-UI or custom
approval flows, and default `AgentConsole` remains fail-closed because
its confirmation method auto-approves.

Regression coverage added:

* Blocking-console fallback: governance REVIEW delegates to
`console.confirm_tool_execution` only for consoles marked
`blocking_confirmation = True`.
* Agent UI path: a governance-tagged REVIEW tool with `SSEOutputHandler`
emits the existing `permission_request` event and runs only after
approval.
* Default-console safety: unmarked consoles are not treated as implicit
reviewers, preserving fail-closed behavior.

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: dislovelhl <dislovelhl@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…md#872)

## Summary

Mobile Access used to surface raw ngrok stderr (``ERR_NGROK_107``,
``dial tcp ... no such host``, or worst case nothing) when something
went wrong, leaving the user no path forward without consulting the
docs. This PR parses every common ngrok failure into actionable guidance
the modal renders verbatim, plus adds an HttpOnly-cookie auth path so
opening the QR-code URL in a mobile browser Just Works.

## Threads

- **Friendly tunnel diagnostics** (``tunnel.py``) — preflight
``_check_ngrok_authtoken_configured`` (now honouring
``$NGROK_AUTHTOKEN`` first, then v2 flat / v3 nested config layouts)
catches the unconfigured case before spawn; ``_parse_ngrok_error``
matches err codes + English fragments and returns ready-to-paste
install/config commands. **Why this matters:** the previous "raw stderr"
path made every ngrok failure a docs-search; users now see exactly what
to run.
- **Cookie-based mobile auth** (``server.py`` + SPA handler) —
``?token=<uuid>`` in the QR URL is converted to an HttpOnly
``gaia_tunnel_token`` cookie on the SPA landing response, so React's
same-origin ``fetch('/api/...')`` is authenticated automatically. Bearer
header continues to work for headerful clients. **Why this matters:**
without this, the mobile browser can't carry the token to subsequent
requests without query-string smuggling.
- **2 correctness fixes baked in:** ``pkill -f ngrok`` → ``pkill -x
ngrok`` (the broad form matched ``vim ngrok.md`` etc.);
operator-precedence parens added to the network + TLS branches of
``_parse_ngrok_error`` so ``x509 OR (certificate AND verify)`` is now
self-documenting rather than implied. **Why this matters:** the ``-f``
form would kill unrelated user processes; the precedence ambiguity made
the parser fragile to reorder.
- **UX nit:** sidebar mobile button always opens the modal — stopping is
an explicit button inside, so accidental sidebar clicks don't tear down
the tunnel mid-scan.

## Test plan

- [x] ``pytest tests/unit/chat/ui/test_tunnel.py
tests/unit/chat/ui/test_tunnel_auth.py`` (55/55 passing — covers
preflight env-var/v2/v3 layouts, parse_ngrok_error positive + negative
branches, error-preservation across stop, cookie/header/both/neither
auth matrix)
- [x] ``cd src/gaia/apps/webui && npm run build`` (clean)
- [x] ``python util/lint.py --black --isort`` (clean)
- [ ] Manual: ``gaia chat --ui`` → click mobile button → verify each
failure path renders friendly text (ngrok not installed, missing
authtoken, session limit by spawning a second tunnel)
- [ ] Manual: scan QR on phone → React app loads on cookie auth, no
token in address bar
## Summary

Removes all references to RAUX (the retired Open-WebUI fork) from the
documentation. RAUX is no longer part of GAIA; the current UI is the
Gaia Agent UI (`src/gaia/apps/webui/` + `src/gaia/ui/`).

## Why

`docs/deployment/ui.mdx` still contained a full "GAIA UI (RAUX)
Interface" section, including an acknowledgment block for the OpenWebUI
team and a link to the retired `aigdat/raux` repository. Three other doc
files used the bare "GAIA UI" label (which pointed contributors toward
the wrong product). These references misdirect external contributors and
are factually wrong.

Refs amd#929

## Changes

- `docs/deployment/ui.mdx` — removed the "GAIA UI (RAUX) Interface"
section entirely; renamed "GAIA Chat (Lightweight Desktop)" → "Gaia
Agent UI Architecture" and removed the "lighter alternative to RAUX"
framing; updated Developer Quick Start link text.
- `docs/guides/custom-agent.mdx` — "GAIA UI" → "Gaia Agent UI" (line
13).
- `docs/plans/axis-gaia-integration.md` — "GAIA UI" → "Gaia Agent UI"
(line 162).
- `docs/plans/desktop-installer.mdx` — "GAIA UI" → "Gaia Agent UI" (line
419).

**Must land before PR 3** (the stale-strings CI workflow), which
enforces the absence of these terms going forward.

## Test plan

- [ ] `grep -rn -i "raux\|open-webui\|openwebui" docs/` — zero hits
- [ ] `grep -rn "GAIA UI" docs/` (without "Agent") — zero hits
- [ ] All internal links in `docs/deployment/ui.mdx` resolve:
`/guides/agent-ui`, `/sdk/sdks/agent-ui`, `/spec/agent-ui-server`,
`/reference/troubleshooting#appimage-on-linux`

## Checklist

- [x] I have linked a GitHub issue above (`Refs amd#929`).
- [x] I have described **why** this change is being made, not just what
changed.
- [x] No code changes — docs only; lint and unit tests are not
applicable.
- [x] Documentation updated (this PR is the documentation update).
…amd#930)

## Summary

Refresh the GitHub issue templates, PR template, and `CONTRIBUTING.md`
so external contributors arrive with the structure maintainers already
enforce in review. First of three PRs for the contributor-onboarding
refresh tracked in amd#929.

## Why

The current templates are stale and under-enforced. Both issue templates
ask whether the bug relates to "GAIA UI (Open-WebUI)" — a UI that no
longer exists; the PR template is a four-line `## Changes` stub with no
required fields, no linked-issue prompt, and no statement of *why*; and
`CONTRIBUTING.md` never states the rule that every PR must reference an
issue. The result is the same review-time coaching ("please file an
issue first", "please describe why") happening on every external PR.
This PR moves those rules into the templates and the contributing guide
so reviewers can point at the docs instead of restating them.

The PR-template structure mirrors the rules already in [`CLAUDE.md`'s
"PR Descriptions — Tight and
Value-Focused"](https://github.com/amd/gaia/blob/main/CLAUDE.md#important-pr-descriptions--tight-and-value-focused)
section so the rule and the form stay in sync.

## Linked issue

Refs amd#929 — first of three PRs. PR 2 (docs cleanup) and PR 3 (regression
workflow) follow.

## Changes

- **`.github/ISSUE_TEMPLATE/bug_report.yaml`** — streamlined from 13
fields to 6. Replaced "GAIA UI (Open-WebUI)" with "Gaia Agent UI".
Consolidated 7 hardware dropdowns/inputs into one freeform Environment
textarea (uses `placeholder:` so untouched submissions stay clean).
Added optional Acceptance Criteria. `What happened?` is now required.
Added "redact tokens/credentials before pasting logs" guidance.
- **`.github/ISSUE_TEMPLATE/feature_request.yaml`** — streamlined.
Replaced "GAIA UI (Open-WebUI)". Renamed primary field to "What problem
are you trying to solve?" (required) so contributors lead with the
problem, not a solution. Added optional Proposed solution and Acceptance
Criteria fields.
- **`.github/pull_request_template.md`** — replaced 4-line stub with
structured template: Summary / Why / Linked issue (`Closes #N`) /
Changes / Test plan / Checklist. Linked-issue placeholder is visibly
`Closes #N` so an unfilled template renders as obvious placeholder
rather than a silently-empty `Closes #`.
- **`CONTRIBUTING.md`** — revamped as the canonical general guide. Adds
a prominent "Before you open a pull request — open an issue first"
section that explicitly states the rule, with the rare exceptions (typo
fixes, doc-only changes under ~10 lines) called out.
- **`docs/reference/contributing-docs.mdx`** — added a Mintlify `<Note>`
at the top pointing readers to `CONTRIBUTING.md` for code/issue
contributions; the docs-taxonomy guide is otherwise unchanged.

## Test plan

- [x] YAML syntax valid: `python3 -c "import yaml;
yaml.safe_load(open(...))"` passes for both issue templates.
- [x] Stale-string sweep on this PR's surface: `grep -rEn -i
"open[-]?webui|GAIA UI" .github/ CONTRIBUTING.md` returns zero matches.
(The remaining `docs/` hits are PR 2's scope, tracked in amd#929.)
- [x] `code-reviewer` agent reviewed the diff; two suggestions applied
(dead `SECURITY.md` link removed; raw-YAML template links swapped for
rendered `?template=` URLs).
- [x] Adversarial multi-agent reflection (`/reflect-plan`) completed;
three auto-amendments applied: bug-report Environment field switched
from `value:` → `placeholder:` (HTML-comment template was being
submitted into issue bodies), `Closes #` → `Closes #N` for
visible-placeholder UX, removed duplicate lint/test code block in
`CONTRIBUTING.md` (already linked to `dev.mdx`).
- [ ] **Manual render check after merge**: open
`https://github.com/amd/gaia/issues/new?template=bug_report.yaml` and
`?template=feature_request.yaml` — confirm "Gaia Agent UI" text, no
"Open-WebUI", required fields render with red asterisks.
- [ ] **Manual PR template check after merge**: next PR opened against
`main` should pre-fill with the new structure.

## Checklist

- [x] I have linked a GitHub issue above (`Refs amd#929` — `Closes` will be
on PR 3 of the series).
- [x] I have described **why** this change is being made, not just what
changed.
- [x] I have run linting and tests locally — N/A for templates/markdown;
YAML validity verified directly.
- [x] I have updated documentation if user-visible behavior changed
(`CONTRIBUTING.md` and `docs/reference/contributing-docs.mdx` updated;
`docs/docs.json` nav unchanged because the page IDs are stable).
Fixes amd#934 — GAIA fails to start after a clean install of 0.17.4.

## What was the bug

Three layered failures, each masking the next:

**1. `ERR_STREAM_WRITE_AFTER_END` → bare Electron crash dialog.** The
log-tee stream (`electron-main.log`) was created early in startup. On
app exit, `process.on('exit')` called `stream.end()`. If anything then
called `console.log()`, the stream emitted an async `'error'` event.
Because no `'error'` listener was attached, Node promoted it to
`uncaughtException`, which Electron surfaced as a raw JS error dialog
with no GAIA branding.

**2. `ERR_FAILED (-2)` loading `index.html`.** Even after adding an
`'error'` listener to absorb the stream error, the app still crashed
with a navigation failure. The URL format was wrong: Node's
`url.format()` (used internally by Electron's `loadFile()`) produces
`file:///C:\path` with backslashes on Windows. Chromium 130+ (Electron
40) rejects backslash file URLs → `ERR_FAILED (-2)`.

**3. `ERR_FAILED (-2)` on a clean install even with correct URL.** After
switching to `pathToFileURL()` (which produces forward-slash URLs), the
crash still happened on a *truly* fresh install where `~/.gaia` didn't
exist. The root cause: after the backend installer's progress dialog is
destroyed on install completion, Electron fires `window-all-closed`. At
that point `trayManager` hadn't been created yet, so the handler called
`app.quit()`. The async cleanup finished instantly (nothing to tear
down), fired a second `app.quit()` that `will-quit` didn't prevent, and
Electron began tearing down Chromium — right as the startup sequence
called `createWindow()` then `loadURL()`. The renderer process was
invalidated mid-navigation → `ERR_FAILED (-2)`.

## What we changed

- **`main-safety-net.cjs`** (new) — extracted `installSafetyNet` and
`installLogTee` from `main.cjs`. `installSafetyNet` registers
`uncaughtException`/`unhandledRejection` handlers that write a FATAL
entry to the log and show a GAIA-branded error dialog. `installLogTee`
attaches a `'error'` listener to the log-tee stream, absorbing
write-after-end errors before they reach the global handler.
- **`main.cjs`** — switched `loadFile()` → `loadURL(pathToFileURL(...))`
for correct forward-slash file URLs on Windows. Added `isBootstrapping`
flag (true until `createWindow()` runs) that makes `window-all-closed` a
no-op during the install phase, preventing the premature quit race.
- **`electron-builder.yml`** — added `main-safety-net.cjs` to the ASAR
files list.
- **`tests/electron/test_main_error_handling.js`** — 12 Jest tests
covering `installSafetyNet` and `installLogTee` (re-entry guard,
pre/post-ready dialog branch, crash counter, stream error absorption,
etc.).

## How we tested

Manual fresh-install test on Windows 11 (uninstall → delete `~/.gaia` →
reinstall from the built NSIS `.exe`):
- Before: "GAIA crashed" dialog appeared immediately after the backend
installer completed; `~/.gaia/electron-main.log` showed `FATAL
ERR_FAILED (-2) loading file:///C:/...index.html`.
- After: app launched normally, backend connected, chat UI loaded.

---------

Co-authored-by: Kalin Ovtcharov <kalin@extropolis.ai>
## Summary

Release prep for **v0.17.5** — patch over v0.17.4 covering 27 commits.
Bumps `__version__`, syncs the webui `package.json`, adds the release
notes, and registers the page in `docs.json`. Lemonade pin remains
`10.2.0`.

The full notes are in
[`docs/releases/v0.17.5.mdx`](docs/releases/v0.17.5.mdx) — highlights:
Gemma 4 default with native `tool_calls`, Chat Lite for low-memory
hardware, semantic code search via CodeAgent, optional governance layer,
Agent UI bundled in the PyPI wheel, friendly ngrok tunnel diagnostics,
and a VLM C++ SDK.

## Changes

- `docs/releases/v0.17.5.mdx` — new release notes (9 What's New, 7 Bug
Fixes, 3 Release/CI, 8 Docs)
- `docs/docs.json` — added `releases/v0.17.5` to Releases tab; bumped
navbar to `v0.17.5 · Lemonade 10.2.0`
- `src/gaia/version.py` — `__version__` `0.17.4` → `0.17.5`
- `src/gaia/apps/webui/package.json` — synced to `0.17.5` via
`installer/version/bump-ui-version.mjs`

## Test plan

- [x] `python util/validate_release_notes.py docs/releases/v0.17.5.mdx`
— passes
- [x] `node installer/version/bump-ui-version.mjs` — webui package
version matches `version.py`
- [ ] CI green (lint, unit tests, docs build)
- [ ] Reviewer reads the release notes once for tone/accuracy
- [ ] On merge: pre-tag verification (Phase 3 of `gaia-release` skill)
before tag push

---------

Co-authored-by: Tomasz Iniewicz <infancy_shred.0d@icloud.com>
…amd#949)

## Summary
Fixes GAIA ignoring the new Gemma default model and falling back to Qwen
on Windows 11, causing the wrong model to load in the frontend.

## Why
After commit 5d37771 made Gemma-4-E4B the default model, Windows users
reported that GAIA still attempts to load Qwen instead. This left the
new default model effectively unreachable on Windows, making the
frontend
unusable for anyone who hadn't manually configured a model.

## Linked issue
Closes amd#948

## Changes
- Fixed model selection logic to correctly resolve the new Gemma default
  on Windows instead of falling back to Qwen

## Test plan
- [x] `pytest tests/unit/` - passing locally
- [x] `python util/lint.py --all` - no failures
- [ ] Manual: launch `gaia chat --ui` on Windows and verify Gemma loads
      instead of Qwen

## Checklist
- [x] I have linked a GitHub issue above (`Closes amd#948`).
- [x] I have described **why** this change is being made, not just what
changed.
- [x] I have run linting and tests locally.
- [ ] I have updated documentation if user-visible behavior changed.

---------

Signed-off-by: theonlychant <sacehenry@gmail.com>
…ctors framework) (amd#926)

Closes amd#915 (when promoted from draft and merged).

Self-contained `gaia.connections` module — any GAIA caller (SDK, CLI,
AgentUI) can drive the OAuth 2.0 PKCE flow for Google. Refresh tokens
land in the OS keychain (macOS Keychain, Windows DPAPI, Linux
SecretService); per-agent grants live in
`~/.gaia/connections/grants.json`; an agent can only `get_access_token`
for scopes the user explicitly granted it.

This PR ships as **draft** because it is also the **baseline commit for
a larger Connectors framework** (parent issue forthcoming) — the scope
expanded after a meeting to make GAIA host many connectors with a
Claude-style tile UI. The plan is at
`~/.claude/plans/floating-discovering-gray.md`. Keeping this PR draft so
reviewers can pull the baseline (157 tests green) before the framework
refactor renames `gaia.connections` → `gaia.connectors` and unifies the
MCP catalog into the same surface.

- **`src/gaia/connections/`** — provider-agnostic core: errors,
providers (Google), pkce, store (keyring with backend allowlist +
tripwire), grants ledger, async token cache (double-checked locking, 60
s expiry buffer, refresh-token rotation), aiohttp loopback flow, events
Protocol, public api, CLI.
- **`src/gaia/agents/base/agent.py`** — `REQUIRED_CONNECTIONS` ClassVar;
`process_query` wraps tool execution in a private `_agent_context`
contextvar so every tool body knows its agent identity.
- **`src/gaia/agents/registry.py`** — namespaced agent ids (`builtin:*`
/ `custom:<sha256>:*`); reserved-id check blocks custom agents from
claiming a built-in's id.
- **`src/gaia/ui/routers/connections.py`** — thin presentation layer:
`/api/connections/{catalog,configure,test,authorize,grants,events,_debug}`.
SSE event emitter with bounded queue. `/_debug` gated by `GAIA_DEBUG=1`.
- **`src/gaia/cli.py`** + **`src/gaia/connections/cli.py`** — `gaia
connections {connect,status,disconnect,grants ...}`.
- **`src/gaia/apps/webui/src/components/ConnectionsSection.{tsx,css}`**
+ supporting store/hook/types — Settings → Connections panel with
Connect/Disconnect + per-agent grant toggle. SSE updates the UI within
~2 s of OAuth completion.
- **`docs/security/connections.mdx`** — threat model.
- **`docs/sdk/infrastructure/connections.mdx`** — SDK reference with
three equal-weight sections (SDK / CLI / agent author).
- **`docs/runbooks/google-oauth-client.md`** — internal client-id
rotation procedure.
- **`docs/local-test/`** — recipe + custom test agent for end-to-end
Google OAuth verification against a personal account (gated until env
var is set).

- Refresh tokens never leave the OS keychain. Backend allowlist refuses
`PlaintextKeyring` / `EncryptedKeyring` so a Linux user without
SecretService gets an actionable error rather than silent plaintext
storage.
- Per-agent grants prevent prompt-injection-driven scope escalation:
even a malicious tool body cannot `get_access_token` for a connector the
user did not explicitly grant *that* agent.
- `client_id_hash` tripwire invalidates stored tokens after an OAuth
client rotation; users reconnect cleanly instead of using stale
credentials.
- Same primitives serve SDK, CLI, and AgentUI — proven by an explicit
multi-caller equivalence integration test.

- [x] `python -m pytest tests/unit/connections/
tests/unit/test_agent_required_connections.py
tests/integration/test_multi_caller_equivalence.py` — 157 passing.
- [x] `python -m black --check src/gaia/connections/
src/gaia/ui/routers/connections.py tests/unit/connections/` — clean.
- [x] `python -m isort --check-only src/gaia/connections/ ...` — clean.
- [x] `cd src/gaia/apps/webui && npx tsc --noEmit` — zero errors.
- [ ] Local Google OAuth E2E against a personal account (deferred to
after the connectors framework refactor, per
`docs/local-test/README.md`).
- [ ] Linux CI keyring matrix (in-memory backend autouse fixture covers
the unit suite; `gnome-keyring` integration job is a follow-up).

- The library is self-contained — every test runs without the AgentUI
server and without a real keyring (in-memory backend in
`tests/unit/connections/conftest.py`).
- `_agent_context` is intentionally **private** (not re-exported in
`gaia.connections.__init__`). A tool body cannot import it to forge an
agent identity. Custom agents get an origin-hashed namespaced id so a
custom agent declaring a built-in's `AGENT_ID` does not inherit prior
grants.
- Code-reviewer agent ran on the diff during development; 5 findings
reported, 4 fixed (asyncio-run-in-running-loop guard on the sync
wrapper, consent-denied response now serves the rejection page instead
of the success page, `connected_at` populated from `time.time()` not
from the absent token-response field, BuilderAgent / CodeAgent overrides
updated so they cannot bypass the agent-context binding). The 5th was
the v1 single-account-per-provider intentional limit — strengthened the
docstring instead of changing behavior.
…work (amd#906)

Adds a third issue template (alongside `bug_report.yaml` and
`feature_request.yaml`) specifically for team-internal feature work and
tasks intended for coding-agent assignment.

## Why
The existing templates are user-facing. With AGENTS.md (PR amd#904)
establishing "spec-before-PR" as a rule for consumer-critical work,
internal issues need a template that prompts authors to capture: Goal,
Scope, Acceptance criteria, Attribution / prior art, Dependencies,
Failure modes, plus capability domain and product track selection.

## What it captures
- **Goal** + **Scope** + **Acceptance criteria** sections (required)
matching the depth of amd#887/amd#888/amd#890 specs
- **Attribution** section (per CLAUDE.md attribution rule)
- **Failure modes** section (per CLAUDE.md no-silent-fallback rule)
- **Domain dropdown** — 8 options matching the new `domain:*` label
taxonomy
- **Track dropdown** — 3 options matching `track:*` labels (consumer-app
/ oem-pc / platform)
- **Priority dropdown** with explicit definitions (p0=4 weeks, p1=2
milestones, etc.)
- **Consumer-critical** checkbox

## Cross-references
- AGENTS.md (PR amd#904) establishes the rules this template enforces in
practice
- Mobile design-system spec (PR amd#905) is an example of the spec depth
required for consumer-critical work

## Test plan
- [ ] Template renders correctly in the GitHub "New issue" picker
- [ ] All dropdowns work
- [ ] Required fields enforce on submit
- [ ] No conflict with existing bug_report or feature_request templates
The existing PR-description guidance in CLAUDE.md was directionally
right ("tight and value-focused") but loose enough that a recent PR
(amd#946 / amd#944) still shipped with a "What changed" enumeration the diff
already showed and a "Summary" section that buried the user-observable
impact behind implementation details. Future agents reading the file
would do the same.

After this change the default shape is just two sections — "Why this
matters" (with required before/after framing) and "Test plan" — with a
"user-observable impact in <30s without reading the diff" litmus check,
and three new anti-patterns lifted directly from the patterns the prior
PR exhibited.

## Test plan

- [x] No code changed; doc-only edit
- [x] Re-read the new section against amd#946 to confirm the prior
description fails the new rules
…g TOCTOU

Signed-off-by: theonlychant <sacehenry@gmail.com>
Signed-off-by: theonlychant <sacehenry@gmail.com>
@theonlychant theonlychant force-pushed the fix/dns-rebinding-ssrf branch from 2ee1d7b to 1e3ad62 Compare May 5, 2026 21:39
@github-actions github-actions Bot added documentation Documentation changes devops DevOps/infrastructure changes jira Jira agent changes mcp MCP integration changes cli CLI changes electron Electron app changes security Security-sensitive changes agents labels May 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents cli CLI changes dependencies Dependency updates devops DevOps/infrastructure changes documentation Documentation changes electron Electron app changes jira Jira agent changes mcp MCP integration changes security Security-sensitive changes tests Test changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix(security): WebClient DNS rebinding TOCTOU in SSRF check

5 participants