Skip to content

chore: sync upstream paradigmxyz/centaur (63 commits)#25

Merged
jamalavedra merged 65 commits into
mainfrom
sync/upstream-2026-06-07
Jun 7, 2026
Merged

chore: sync upstream paradigmxyz/centaur (63 commits)#25
jamalavedra merged 65 commits into
mainfrom
sync/upstream-2026-06-07

Conversation

@jamalavedra
Copy link
Copy Markdown

Summary

Merges 63 commits from upstream/main (paradigmxyz/centaur) since our last sync (#21, cf14309). Range: cf14309b..6f1f2b71.

Notable upstream changes pulled in

  • Tools added: CloudWatch (read-only via iron-proxy aws_auth), Laminar tool plugin, Google Drive ETL, Linear sync workflow, Google Calendar ETL
  • Slack ETL hardening: reseeded widened lookback gaps, reduced rate-limit pressure, decoupled client from tool, gated app deep-link rewriting behind SLACK_TEAM_ID, share-verification for upload_file
  • Tracing: fuller OTLP/Otel surfaces in API, Slackbot, Codex
  • Sandbox: tool-server health gating, DB pool retry on iron-proxy startup race, overlay tool deps installed in sidecar, kubectl in sandbox image, ChatGPT-login auth mode for Codex, descriptive sandbox branches
  • Chart: flag-gated iron-control module, observability bypass for proxy
  • Misc fixes: SimilarWeb defaults, attribute PR prompts to Slack requester, removed Paradigm-specific workflow, company-context Linear + date filters

Conflict resolution

7 substantive conflicts were resolved with -X theirs (favoring upstream), per the directive to stay as close to upstream as possible:

  • services/api/api/agent.py — dropped our generic _PLATFORM_FORMATTING_RULES dict; took upstream's hardcoded Slack rules
  • services/api/api/runtime_control.py — took upstream's fuller otel/laminar tracing (depends on PR fix: restore laminar chart templates and api tracing #24 to provide the laminar services/imports it expects)
  • services/api/tests/test_workflow_engine_title.py — took upstream's test-name change
  • tools/infra/sentry/{.env.example,client.py,pyproject.toml} — replaced our Typer-CLI / rich-flavored sentry tool with upstream's stripped-down read-only version (intentional loss of features in favor of consistency)
  • tools/productivity/slack/tests/test_client.py — took upstream's test

Ordering note

PR #24 ("fix: restore laminar chart templates and api tracing") should land before this PR if possible — upstream's runtime_control.py and the new tools/infra/laminar/ plugin both expect laminar services to exist. If this lands first, the API may log import or call failures until #24 is also merged and the chart is upgraded.

Test plan

  • CI green
  • After merge: just deploy on centaur-vps; verify api/slackbot/chatbot still come up
  • Verify Slack @centaur round-trip still works end-to-end
  • If sentry tool is in active use, confirm the upstream API still satisfies callers

🤖 Generated with Claude Code

Zygimantass and others added 30 commits May 27, 2026 16:31
* fix: load secret env in slackbot

* chore: bump centaur chart
Mirror the CODEX_AUTH_MODE flip from paradigmxyz#223 for Claude Code: a new
CLAUDE_CODE_AUTH_MODE env var selects between api_key (default,
ANTHROPIC_API_KEY) and access_token (Claude.ai Pro or Max subscription
brokered through iron-token-broker). The entrypoint plants a dummy
~/.claude/.credentials.json under access_token mode so the CLI emits
OAuth-shaped requests; iron-proxy injects the real Bearer at egress.

Along the way, consolidate the harness credential wiring into one place.
Previously, each harness had a credentials-only tool (tools/infra/codex,
tools/infra/claude) that registered its brokered_token via the
tool-loading machinery for every sandbox, and the per-sandbox iron-proxy
filtered out the now-unused API key after the fact. Now ToolManager
holds a single _HARNESS_SECRETS table keyed by (engine, auth_mode) and
exposes secrets_for_sandbox(engine, auth_modes) that returns exactly
the right credential set. The per-sandbox iron-proxy calls that;
collect_secrets() still returns the union for the shared API-side proxy
and the token broker. tools/infra/{codex,claude} are deleted.

Operator interface is unchanged: same *_AUTH_MODE env vars, same
1Password items (CLAUDE_CODE_CLIENT_ID, CLAUDE_CODE_BLOB,
OPENAI_CODEX_CLIENT_ID, OPENAI_CODEX_BLOB, OPENAI_CODEX_ACCOUNT_ID),
same claude login / codex login bootstrap. Docs flag the
refresh-token-reuse foot-gun: do not run codex or claude locally with
the same account whose refresh token is in the broker.
feat: split tool execution and workflow execution out of the API process

Adds two new uvicorn / python targets in the api package; no code is
moved out, just additional entrypoints on the same image.

Tool server (api.tool_server_app)
  - FastAPI app that mounts only the /tools/* router.
  - Runs as an opt-in sidecar container in each sandbox Pod
    (KUBERNETES_TOOL_SERVER_IMAGE, controlled by Helm toolServer.enabled).
  - The sandbox harness's call.sh prefers CENTAUR_TOOLS_URL — tool calls
    now resolve via http://localhost:8001/tools/... instead of routing
    back to the API. Egress flows through the existing per-sandbox
    iron-proxy via HTTPS_PROXY.
  - The API drops its /tools/* router include; the in-process ToolManager
    stays only for identity and persona-metadata lookups.

Workflow execution (api.workflow_executor)
  - python -m api.workflow_executor --run-id <id> entrypoint that fetches
    the claimed row, drives the existing _run_handler logic, and exits.
  - KubernetesExecutorBackend gains spawn_workflow_run / wait_workflow_run_terminal
    / cleanup_workflow_run_pod. Pods are restartPolicy=Never, reuse the
    API image, and route egress through the API-self iron-proxy.
  - workflow_engine._dispatch_run wraps _run_handler: on K8s it spawns
    a per-run sandbox and blocks until terminal; on other backends (or
    WORKFLOW_RUN_SANDBOX_ENABLED=0) it falls back to in-process execution.
  - The API's worker remains the claimer/scheduler
    (api.workflowWorkerEnabled stays true). No standalone runner pod.

Helm
  - toolServer section (opt-in; default off, on in values.dev.yaml).
  - KUBERNETES_WORKFLOW_RUN_IMAGE / _IMAGE_PULL_POLICY on the API container
    so workflow-run pods reuse the api image.
  - Chart version 0.1.43.

Cleanup
  - Removes two helpers from routers/agent.py (_extract_attachments,
    _resolve_urls) that were never wired into a production code path.
    The live inline-attachment path is extract_inline_attachments in
    runtime_control.py.
Co-authored-by: Centaur AI <ai@centaur.local>
* fix: stop rendering Slack context blocks

* chore: bump Slackbot log UUID

* chore: align API log UUID

* refactor: remove Slack context block guard
Co-authored-by: Amp <amp@ampcode.com>
…aradigmxyz#249)

On fast turns the whole answer can arrive in one burst after the tool
calls and sit queued in pendingText without crossing a flush threshold,
so nothing reaches Slack as a durable live chunk. The finalize-time
appendStream then races chat.stopStream's composed layout and is dropped,
leaving the thread ending after the last tool output. Worse,
streamedTextSourceChars still counted those chars, so the control plane's
slackbot_live_delivery coverage check (_slackbot_live_delivery_covers_result
in services/api) treated the answer as fully delivered and skipped its
fallback repost — the answer was lost with no recovery.

When a segment carries a live task plan and the answer never streamed
live before finalize, fold the answer into the durable stopStream blocks
instead of relying on the racing live chunk, and credit the absorbed
chars so the coverage check stays honest. Text-only turns keep streaming
live as before.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
* Add Centaur tool directory docs

* docs: add tool directory credential names

---------

Co-authored-by: Georgios Konstantopoulos <me@gakonst.com>
)

feat(websearch): reimplement on Parallel (free MCP search + Task API deep research)

Replace the Exa + Claude implementation with Parallel Web Systems:

- search: free hosted Parallel Search MCP with no credentials; Parallel
  Search REST (domain/recency filters, num_results, mode) when
  PARALLEL_API_KEY is set; optional Claude reviewer->writer->citation-repair
  synthesis when ANTHROPIC_API_KEY is set
- deep_research: Parallel Task API (auto schema) with streamed progress and
  model-grounded, 1-based citations that match the report's Sources block;
  gated to the pro/ultra processor family
- add meta.estimated_cost_usd (best-effort from published list prices)
- fix the Sources-section regex so bulleted "- [N] url" lines are matched,
  eliminating spurious citation-validation failures
- accept-and-warn on removed Exa kwargs/CLI flags; retain meta.exa_request_ids
  alias and DeepResearchResponse.iterations for response-shape compatibility
* Fix sandbox tool routing

* Bump chart version

---------

Co-authored-by: Centaur AI <ai@centaur.local>
Mirrors the non-AI subset of the Sentry MCP: list/search issues with native
Sentry query syntax, read issue details, list an issue's events, pull a single
event's full stacktrace/breadcrumbs, and read issue tag value distributions.
Auth is a Sentry user auth token (Bearer) declared as an iron-proxy HTTP secret;
defaults to SaaS sentry.io with an optional SENTRY_URL override.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix: load secret env in slackbot

* chore: bump centaur chart

* fix: preserve sandbox state on execution silence recovery
* Revert "fix: hydrate workflow executor app db pool (paradigmxyz#265)"

This reverts commit 400ab42.

* Revert "fix: pass workflow db pool to system message insert (paradigmxyz#264)"

This reverts commit 8755465.

* Revert "fix: run sandbox tool server without database (paradigmxyz#263)"

This reverts commit 7c75537.

* Revert "fix: preserve sandbox state when executions go silent (paradigmxyz#259)"

This reverts commit 2d7932f.
…n-proxy (paradigmxyz#253)

* fix(workflows): clone API env into workflow-run pod; drop per-run iron-proxy

Workflows are trusted code (not arbitrary agent harnesses), so the per-run
iron-proxy and per-run NetworkPolicy bring-up added in paradigmxyz#216 was overkill and
also flaky: pods could exit before writing terminal state, leaving Slack
turns silently abandoned. Replace it with a much simpler model:

- The workflow-run pod inherits the API container's env (env, envFrom,
  volumeMounts, volumes) by reading the running API pod at spawn time, so it
  shares the API's view of secrets, HTTPS_PROXY (shared API iron-proxy), and
  CA mount.
- Override only what must differ: disable background loops, set
  WORKFLOW_RUN_SANDBOX_ENABLED=0 to prevent recursion, inject the run id.
- spawn_workflow_run no longer creates ConfigMaps / Services / NetworkPolicies
  / iron-proxy pods; cleanup_workflow_run_pod no longer tears them down.
- workflow_executor populates app.state.db_pool before invoking _run_handler
  so handler code that reads from FastAPI app state finds a live pool.
- Add NetworkPolicy rules permitting workflow-run pods to reach postgres,
  slackbot, the shared API iron-proxy, and the API itself.

* fix(workflows): default WORKFLOW_RUN_SANDBOX_ENABLED to off

Keep the per-run pod path opt-in until it's been baked in production. Defaulting to in-process execution matches the post-incident hotfix state and avoids surprising existing deployments on upgrade.

* chore(chart): bump version to 0.1.45
…ring with renderers (paradigmxyz#250)

Codex source code shows fanout can produce more than one root
final_answer agentMessage item in a single turn. Centaur currently
expects one assistant answer per execution, so disable the experimental
fanout flag in the Codex harness config as a temporary mitigation.
…d live (paradigmxyz#249)" (paradigmxyz#269)

This reverts commit a285db1.

We are seeing some small number of chars at the end of replies get
posted again. Reverting for now.
…#282)

Operator-supplied sandbox.extraEnv was applied last via _set_env, which
replaces any computed entry of the same name. A NO_PROXY/no_proxy override
that omitted the API host clobbered the computed value, routing sandbox
to-API traffic (e.g. attachment downloads) through iron-proxy, which 405s
the plain-HTTP forward.

Pin the proxy/CA wiring vars so extraEnv cannot replace them, and merge
NO_PROXY/no_proxy instead of replacing so operators can add bypass hosts
without dropping the firewall and API hosts.
…n-proxy (paradigmxyz#286)

* fix(tool-server): route the sandbox sidecar's DB pool through the iron-proxy

The per-sandbox tool-server sidecar opened an asyncpg pool and ran dbmate
migrations against the core DB directly. On any cluster that enforces the
sandbox NetworkPolicy (which denies sandbox pods direct Postgres egress) the
sidecar crashed at startup with `dbmate ... connect: connection refused`, so
every agent tool call failed. It only worked where NetworkPolicy is not
enforced (the sandbox reaches Postgres directly).

The sidecar genuinely needs the pool: the shared tool-invoke handler uses it
for attachment offloading, Slack live-delivery capture, and trace-span
parenting. So instead of dropping the DB, give the sidecar a path that
respects the sandbox's isolation: add a core-DB listener to the per-sandbox
iron-proxy and point the sidecar's DATABASE_URL at it. The real DB credentials
stay in the proxy pod (the proxy resolves the upstream from its own env); the
sandbox only ever holds an app_user proxied DSN, and that DSN is injected into
the tool-server container only, never the agent container. The sidecar opens
the pool with apply_migrations=False (the API owns migrations).

The core listener is added only when the tool-server sidecar runs, and only to
the per-sandbox proxy (the API reaches Postgres directly), so deployments
without the sidecar are unaffected.

Supersedes the skip-DB approach in paradigmxyz#260 by @0xdiid, whose report and root-cause
analysis identified this bug.

Co-authored-by: Will Drach <drach@splits.org>

* chore(just): add k3s flag to import local images into containerd

k3s uses containerd, not the Docker daemon, so locally-built images are
invisible to it. `just up k3s` now imports centaur-{api,iron-proxy,slackbot,agent}
into k3s's containerd after building. The ctr command is overridable via
CENTAUR_K3S_CTR for rootless/remote setups.

* fix(chart): allow API egress to the k8s API server under enforced NetworkPolicy

The API manages sandboxes/proxies via the k8s API, but the chart's default-deny
egress policy only allowed :443. The kubernetes.default ClusterIP (:443) is
DNAT'd to the real API-server endpoint (6443 on k3s/kubeadm), and enforcing CNIs
match egress post-DNAT, so the API could not reach the control plane on a
NetworkPolicy-enforcing cluster (ensure_api_proxy_pod failed with 'Cannot connect
to 10.43.0.1:443', so no iron-proxy came up). Add an egress rule for the
API-server port, configurable via networkPolicy.apiServerPort (default 6443).

* fix(chart): allow iron-proxy egress to Postgres for the sidecar's proxied pool

Option B routes the tool-server sidecar's DB pool through the per-sandbox
iron-proxy, making the proxy a new Postgres client. The Postgres ingress policy
only allowed api/slackbot/workflow-run pods, so under an enforcing NetworkPolicy
the proxy's upstream connection was refused ('postgres: upstream connect failed
... connection refused') and the sidecar crashed. Add iron-proxy pods to the
Postgres ingress allow-list. Sandboxes still cannot reach Postgres directly.

* chore(chart): bump version to 0.1.46

---------

Co-authored-by: Will Drach <drach@splits.org>
)

* docs(tool-directory): reflect Parallel-backed websearch (no required credentials)

The websearch tool was reimplemented on Parallel Web Systems in paradigmxyz#228:
search works credential-free via the free hosted Search MCP, with
PARALLEL_API_KEY unlocking the REST path and deep_research and
ANTHROPIC_API_KEY enabling cited synthesis. Update both directory rows
to drop the stale EXA_API_KEY requirement and signal the optional keys.

* docs(tool-directory): clarify only search is free; deep_research needs PARALLEL_API_KEY

The previous wording ("free with no credentials, with optional REST and
synthesis paths") implied deep_research was free too. It is not: the free
hosted Search MCP only covers `search`. `deep_research` is Task-API-only
and requires PARALLEL_API_KEY.

* docs(tool-directory): tighten websearch row wording
…nloads (paradigmxyz#289)

* feat(api): let sandbox tokens read any thread, and fix attachment downloads

Sandbox tokens were scoped to a single thread for every operation, so an
agent holding a link to another thread got 'Sandbox token is scoped to a
different thread' when trying to view its messages, status, or attachments.
Centralize the scope policy in deps.enforce_sandbox_thread_scope and split
read vs write: reads are allowed across threads by default (knowing the
thread key is the capability), writes stay confined to the token's own
thread. Gated by SANDBOX_CROSS_THREAD_READS (default on).

While fixing the slack download_file 401s this exposed two more breaks in
the sandbox attachment path:

- The per-sandbox tool-server sidecar set CENTAUR_API_URL but never
  CENTAUR_API_KEY, so its callbacks to /agent/attachments/upload were
  unauthenticated and the API returned 401. Mint a thread-scoped sandbox
  token for the sidecar, mirroring the agent container.
- The system prompt told the agent to download attachments from a
  hardcoded http://api:8000, but the in-cluster service is
  centaur-centaur-api and NO_PROXY is derived from CENTAUR_API_URL, so the
  request was routed through iron-proxy and failed. Use $CENTAUR_API_URL.

Also bump iron-proxy to 0.42.0-rc.3.

* fix(slack): parse team-scoped thread keys when inferring upload destination

Slack thread keys are slack:<team>:<channel>:<thread_ts> (4 parts, emitted
by the slackbot), but the slack tool's _current_slack_destination only
handled the legacy 3-part slack:<channel>:<thread_ts> form. On every live
key it fell through to (None, None), so upload_file with no explicit
channel raised 'channel is required', and when only channel_id was passed
the missing inferred thread_ts meant files posted to the channel root
instead of the thread — so they never rendered in the conversation.

Parse the 4-part form (channel=parts[2], thread_ts=parts[3]) and keep the
3-part form for backward compat with old persisted keys. Apply the same
fix to runtime_control._slack_thread_metadata, which mislabeled the team
as the channel for 4-part keys.
mslipper and others added 27 commits May 29, 2026 18:53
…corn (paradigmxyz#304)

The tool-server sidecar overrides the image ENTRYPOINT to run uvicorn
directly, so it skipped the overlay tool-dep install the API gets via
entrypoint.sh. Overlay tools are imported in-process, so deps like
pynacl/psycopg2 were never importable in the sidecar and overlay tools
failed at load with ModuleNotFoundError.

The sidecar runs as a non-root user and cannot write the root-owned
/app/.venv, so tool-server-startup.sh installs overlay deps into a
writable --target dir that the pod spec puts on PYTHONPATH, then execs
uvicorn. The API entrypoint is unchanged.
…esolves (paradigmxyz#305)

Tool code runs in the tool-server sidecar, but pg_dsn secrets were only
set on the agent container env. _resolve_secrets delivers PgDsnSecret via
the environment rather than ToolContext, so secret("<PG_DSN_NAME>") in the
sidecar fell through to the placeholder. Wire sandbox_pg_dsns into
_build_tool_server_container so the sidecar sees the same proxied DSNs as
the agent container.

Also fixes a stale test that omitted the now-required thread_key/
container_name args.
…aradigmxyz#307)

* fix(sandbox): poll tool-server /healthz before signalling readiness

The tool-server sidecar boots independently of the agent container and
installs overlay tool deps before listening, while the API's readiness
check treats the pod as ready as soon as .ready exists. That let the
harness fire its first tool call before the sidecar was up. Poll
/healthz (bounded, default 10s) before touching .ready to close the
race; warn-and-continue if the sidecar never comes up.

* docs(sandbox): trim tool-server wait comment
Fix local smoke auth and broker token bootstrap
paradigmxyz#311)

Slack's files.completeUploadExternal intermittently returns ok:true but
never shares the uploaded file, leaving an orphan whose permalink nobody
can open — so upload_file would report a phantom success while nothing
appeared in the thread.

- upload bytes via file= (binary) instead of content= (snippet text)
- stop forwarding alt_txt, which slack_sdk's files_upload_v2 mishandles
  (slackapi/python-slack-sdk#1818)
- verify the share landed via files.info, polling with a 0/1/2/4/8s
  backoff and parsing the real shares.{public,private}[channel] schema
- log the full files.info as JSON (structlog) when a share is dropped so
  the share state is visible
…igmxyz#330)

docs: improve local setup and Slack quickstart guidance

The existing local setup docs do not work smoothly on a macOS laptop, especially on Apple Silicon: the currently published Centaur images are x86-only, so the GHCR path fails on local arm64 clusters. I spent a lot of time debugging image pull failures, kind image loading, local Slack webhook access, and required Slack scopes before getting a working end-to-end smoke test.

Document the reproducible path so other developers can test Centaur quickly without rediscovering the same issues:
- add macOS/kind local image loading guidance to the Mac Mini setup doc
- explain why local Docker images must be loaded into kind/containerd
- make Slack quickstart setup more generic around the public webhook URL
- document minimal Slack app event/scopes for channel mention testing
- add local tunnel guidance for exposing Slackbot during laptop testing
* feat: add google drive docs etl

* feat: add google drive etl observability

* chore: bump chart version
refactor: extract gsuite integration helpers
…paradigmxyz#338)

* Route Slack archive links through app deep links

* fix(slackbot): gate Slack app deeplink rewriting behind SLACK_TEAM_ID

- Add SLACK_TEAM_ID config knob on the slackbot for Slack app deep links
- Rewrite archive URLs only when SLACK_TEAM_ID is set; otherwise pass through unchanged
- Drop implicit team-id inference from delivery metadata and thread key
- Fix latent bug: original logic used recipient_team_id, producing slack://channel?team=T_EXTERNAL&id=C_IN_HOME deep links that point a Slack Connect recipient at a channel that does not exist in their workspace
- Document the new env var in the slackbot configuration reference

Closes paradigmxyz#219

Amp-Thread-ID: https://ampcode.com/threads/T-019e8409-b73d-71fe-bb6b-dfe982926cfd
Co-authored-by: Amp <amp@ampcode.com>

---------

Co-authored-by: Centaur AI <ai@centaur.local>
Co-authored-by: Georgios Konstantopoulos <me@gakonst.com>
Co-authored-by: Amp <amp@ampcode.com>
…z#332)

Update the production deployment guide to match the current auth model. PR #23 removed the localhost/IP-based auth bypass, so operator and agent verification commands now need an API key even when run through kubectl exec.

Document the existing LOCAL_DEV_API_KEY bootstrap path for the first admin key, then use X-Api-Key on /health/tools, /admin/api-keys, and agent smoke-test requests. This aligns the guide with the service-key bootstrap behavior and the auth-aware smoke flow fixed in paradigmxyz#309.

Follow-up suggestion: rename LOCAL_DEV_API_KEY to BOOTSTRAP_ADMIN_API_KEY. The current name is misleading because this key is not local-dev-specific; it is the bootstrap admin API key used to create the first operator key in production deployments too.
fix(opentable): import browser_use lazily to avoid tool_load_failed

browser_use touches ~/.config on import, which raises PermissionError in the
non-root sandbox tool-server and crashes tool discovery for the opentable tool
on every sandbox start. Move the import into the function that uses it so
merely loading the tool no longer triggers it.

Co-authored-by: arjunblj <arjunblj@users.noreply.github.com>
Boot iron-control (Rails control plane) as an opt-in chart module behind
ironControl.enabled (off by default). Runs against a dedicated
iron_control_production database on the bundled Postgres, created idempotently
by an init container, so its Rails schema_migrations table never collides with
the API's dbmate table. The connection URL carries no database path, so Rails
resolves each connection's db name from the image's database.yml.

DATABASE_URL, the bootstrap user/API-key, the three ActiveRecord encryption
keys, and SECRET_KEY_BASE are injected from centaur-infra-env via explicit
secretKeyRefs (never envFrom, which would leak the API's ai_v2 DSN) and seeded
by bootstrap-secrets (generated when absent, never rotated in place).
fix: reduce slack etl rate limit pressure
feat: project Linear issues into company context
* refactor: centralize attachment processing

* fix: reseed slack etl widened lookback gaps
…aradigmxyz#287)

* feat(tools): add read-only CloudWatch tool via iron-proxy aws_auth

Add a `cloudwatch` infra tool mirroring the AWS CloudWatch MCP's read-only
surface: log groups, filter log events, Logs Insights queries, metrics, and
alarms (boto3-backed, JSON-safe responses, lazy client so discovery needs no
credentials or network).

AWS auth rides iron-proxy's `aws_auth` transform rather than holding real
credentials in the tool process. SigV4 can't be swapped on the wire like a
bearer token, but iron-proxy re-signs: boto3 signs each request with throwaway
placeholder credentials, and iron-proxy reads the region/service from the
signature scope and re-signs with the real read-only IAM keys it resolves from
the secrets backend. The keys never enter the workload — the SigV4 analogue of
the `secrets` placeholder swap. (aws_auth landed in iron-proxy v0.40.0; Centaur
pins 0.42.0-rc.2, which includes it.)

- tool_manager: AwsAuthSecret type + parser (access_key_id/secret_access_key/
  session_token refs, allowed_regions/services, hosts)
- proxy_config: render the aws_auth transform; add to _MANAGED_TRANSFORMS
- iron-proxy base configs: allowlist x-amz-* so x-amz-target (the CloudWatch
  operation header) survives egress filtering
- cloudwatch tool: declare the aws_auth secret; sign with placeholders; region
  is the only real value (non-secret, read from env, defaults us-east-1)
- kubernetes: expose only AWS_REGION (non-secret, optional) to the tool-server
  sidecar — no AWS credentials in-process
- tests: cloudwatch client, aws_auth parser + renderer, sidecar (creds absent)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(cloudwatch): allow AWS SDK signed headers through the egress filter

aws_auth re-signs CloudWatch requests with the AWS SDK v4 signer, whose
signed-headers set includes the SDK's amz-sdk-request, amz-sdk-invocation-id,
and (for CloudWatch's query-JSON protocol) x-amzn-query-mode headers.
header_allowlist runs after aws_auth and was stripping them, so AWS rebuilt
the canonical request without them and rejected every call with
InvalidSignatureException.

Allow /^amz-sdk-.*$/ and /^x-amzn-.*$/ so the signed headers reach AWS. Pairs
with the /^x-amz-.*$/ allowance already added for the SigV4 headers.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* chore(iron-proxy): bump base image to 0.42.0-rc.4 for the awsauth CONNECT fix

0.42.0-rc.4 is the first release containing ironsh/iron-proxy#167, which lets
the synthetic CONNECT through the tunnel transform-policy check so aws_auth
signs the post-MITM inner request instead of rejecting the CONNECT. Required
for the CloudWatch tool's aws_auth path.

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Centaur AI <ai@centaur.local>
return None
if host in {"docs.google.com", "drive.google.com"}:
return {"source": "google_drive_url", "host": host}
if host.endswith("docsend.com"):
]
if any(token in snippet_lower for token in low_signal_tokens):
score -= 2
if "linkedin.com" in domain:
# Conflicts:
#	contrib/chart/templates/_helpers.tpl
#	contrib/chart/values.dev.yaml
@jamalavedra jamalavedra merged commit 389cdaf into main Jun 7, 2026
7 of 9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.