| layout | default |
|---|---|
| title | Security Gateway |
Status: Experimental
The security gateway checks agent traffic that actually passes through Calciforge. That phrase matters. Calciforge can inspect model calls, tool requests, fetched pages, and provider traffic only when those requests use a Calciforge-controlled path. It is not a spell cast over every process on the machine.
Calciforge treats coverage as a support-tier question:
- First-class agents should have tested ingress and egress contracts. If a first-class adapter can receive user messages or send model/tool traffic around Calciforge in a protected profile, treat that as a Calciforge bug or an upstream limitation that needs a documented workaround.
- Recipe, generic command-line, and generic ACP agents are best effort unless the recipe documents a tested network boundary. ACP means Agent Client Protocol, a way to run persistent agent sessions. Calciforge can give these adapters safer defaults, wrapper scripts, command-line helpers, and proxy environment variables, but it cannot prove an arbitrary agent runtime will not open another network path.
- Hardened deployments should be able to reject or disable adapters whose
ingress, egress, or instruction path cannot be verified. That is the target
shape for release-hardening work; for now,
calciforge doctorreports the gaps it can see.
For stronger guarantees, route model calls through Calciforge's model gateway, give agents explicit Calciforge fetch/tool wrappers, or run the agent under a host/container boundary that prevents bypass.
Outbound traffic from protected agents can be routed through the gateway by a
specific supported integration. Calciforge's own provider calls, health
checks, and LAN control-plane traffic should not use ambient
HTTP_PROXY/HTTPS_PROXY; proxying Calciforge itself can send model-gateway
requests and internal webhooks through the security proxy unnecessarily or
recursively.
Outbound pipeline:
- Manual credential check: Before Calciforge substitutes any secrets,
IronClaw checks the original agent-supplied URL and non-transport headers
for raw credentials such as
api_key=sk-.... Transport-auth headers, such asAuthorization,Cookie, and provider API-key headers, are sanitized before this check; otherwise normal model/provider sessions and local gateways would look suspicious. Exact proxy-managed explicit references such as{% raw %}{{secret:NAME}}{% endraw %}andBearer {% raw %}{{secret:NAME}}{% endraw %}are safe control syntax; mixed manual-plus-reference values still remain visible to the check. - Optional exfiltration scan: When
scan_outbound = true, outgoing request bodies are analyzed by theadversary-detectorfor exfiltration language, credential-harvest phrasing, and adversarial patterns. This is opt-in by default because provider/tool transcripts often include benign prompt-injection examples and opaque IDs. - Secret substitution and credential injection: When the request is
visible to Calciforge, the gateway can substitute explicit references such as
{% raw %}{{secret:NAME}}{% endraw %}in URLs, headers, and supported bodies, and inject providerAuthorizationheaders from the configured env/fnox resolver. The staged placeholder path will use this same step to replace registered opaque credentials such ascfg_OPENAI_API_KEY_<random>once lifecycle wiring is enabled. - Control header strip and forwarding: Calciforge strips
X-Calciforge-*control headers, then forwards the request to the destination.
Inbound pipeline:
- Injection scan: Incoming text-like response bodies are scanned for prompt injection or adversarial payloads. This remains default-on.
- Optional response secret-leak scan: When
scan_response_secrets = true, response bodies are also checked for high-entropy and secret-shaped values. This is opt-in by default because provider APIs commonly return opaque IDs and hashes as normal transport data. - Enforcement: If the response is deemed
unsafe, the gateway blocks the content and returns403 Forbiddento the agent.
The gateway has several enforcement modes. They are not interchangeable; pick the strongest mode the target agent can actually run under, then verify that the selected agent adapter actually uses it.
| Mode | Level | Status | Description |
|---|---|---|---|
| Model gateway | API | Working | Route OpenAI-compatible model calls through Calciforge's gateway. This is the most reliable path for providers and local dispatcher routes because Calciforge owns the HTTP request. |
| Explicit tools/fetch | App | Working/expanding | Give agents Calciforge-provided fetch, MCP, or recipe wrappers for network actions that need scanning or secret substitution. MCP means Model Context Protocol: a structured way to expose tools to an agent. |
| Cooperative HTTP proxy | App | Limited | Set HTTP_PROXY only for agents and tools that have been tested with the proxy. This is useful for plaintext HTTP and simple HTTP clients. |
| HTTPS inspecting proxy | App/host trust | Experimental | Trust a Calciforge CA and terminate CONNECT traffic for clients that support custom trust stores. CA means certificate authority: a local certificate issuer your runtime agrees to trust. The hudsucker-backed prototype runs the existing scan/substitution pipeline over decrypted requests and responses. |
| OS redirect | Host | Roadmap | Use firewall rules such as Linux iptables/nftables or macOS pf to redirect outbound traffic from a controlled UID/process group to the gateway. |
| Container or VM isolation | Runtime | Roadmap | Run the agent in Docker, a Linux namespace, LXC, or a VM where egress is denied except through Calciforge-managed gateways. This is the likely path for agents that ignore proxy env or use complex transports. |
| Placeholder injection | Secret boundary | Staged primitives | Give off-the-shelf agents fake env credentials or managed credential files and substitute real secrets only at the gateway. This keeps raw secrets out of agent memory but still needs agent lifecycle wiring, live request rewriting, and a network enforcement path. |
Calciforge now has two related secret-use shapes:
- Explicit secret references are the working path:
{% raw %}{{secret:NAME}}{% endraw %}. Agents that know about Calciforge can askcalciforge-secrets ref NAMEor use the MCP tool, place that reference into a visible outbound request, and let the gateway resolve it. - Opaque placeholder credentials are the staged next path:
cfg_<NAME>_<random>. Calciforge generates the token, registers the full token against an authoritative secret name for one agent, then provides the token through a supervised surface such as an environment variable, wrapper, or managed credential file. The embedded<NAME>is only a hint for humans; policy must resolve the full token through the per-agent registry.
The second path exists because many agents and tools do not know Calciforge's
mustache-style syntax. They expect OPENAI_API_KEY, a credentials directory,
or a provider config file. For example, an OpenClaw lane may already have a
plaintext credentials folder. In a managed placeholder setup, Calciforge should
write placeholder values there instead of real keys, register those values with
the security proxy, and retire them when that managed runtime stops or rotates.
Do not mark explicit references deprecated yet. They remain the only fully wired path, they are simple to audit, and they work for agents that can follow Calciforge's CLI/MCP guidance. Placeholder credentials may become the default for some supervised first-class agents once generation, delivery, registration, live replacement, and retirement are all end-to-end tested. Even then, both mechanisms may remain supported: explicit references are clearer for agent-aware workflows, while opaque placeholders are better for ordinary tools that expect env vars or credential files.
There is also a scanner compatibility reason to keep both. Opaque placeholders are deliberately random and secret-shaped. If IronClaw-style exfiltration detection is enabled, those stand-ins may look like credentials unless the scanner learns Calciforge's placeholder registry or allowlist. That is solvable, but it means placeholder injection and aggressive exfil detection should be treated as separate knobs until the integration is proven.
The unified installer starts security-proxy, but it does not put
HTTP_PROXY/HTTPS_PROXY on the Calciforge service itself. Do not assume
command-line or exec-backed agents can be protected by generic proxy
environment variables.
Codex, Claude, ACPX, npm-backed adapters, and streaming clients may use
CONNECT, WebSockets, or browser-backed authentication flows that the current
proxy cannot inspect and may break. Keep those agents unproxied unless you
have a tested wrapper for that specific runtime, and prefer OpenAI-compatible
gateway routes or explicit fetch/tool integrations for traffic that must be
scanned.
By default security-proxy binds to 127.0.0.1. Keep that default for a
single-host install. For a trusted LAN deployment where other agent hosts must
use one shared proxy, set SECURITY_PROXY_BIND=0.0.0.0 for the local installer
run, or add "security_proxy_bind": "0.0.0.0" to that host's node entry in
deploy/nodes.json. Pair a LAN bind with host firewall rules or equivalent
network restrictions when the LAN is not fully trusted.
Ambient HTTPS_PROXY is not a complete protection story unless it points at a
Calciforge inspecting proxy and the client trusts the Calciforge CA. Standard
HTTPS proxying uses CONNECT tunnels; without inspection, a proxy can only see
the destination host and encrypted bytes. Current security-proxy uses
hudsucker to terminate CONNECT traffic, mint per-host certificates from the
configured CA, and run the existing request/response substitution and scanner
pipeline over the decrypted HTTP messages. Prefer Calciforge-owned model
gateway routes, explicit fetch/tool integration, or audited recipe wrappers for
runtimes that cannot use this trust setup.
Externally managed agent daemons are different. OpenClaw, ZeroClaw, Claude
Code, opencode, Dirac, or any custom process started by a separate service
manager must be launched with a tested proxy configuration in that service
manager, or enforced with an OS/network tier. Registering Calciforge webhooks
lets those agents talk back to Calciforge, but it does not by itself prove
their outbound HTTP is going through security-proxy.
For a manually started daemon that uses plaintext HTTP:
export HTTP_PROXY=http://127.0.0.1:8888
export NO_PROXY=localhost,127.0.0.1,::1Use service-manager environment blocks for persistent daemons, and validate by
checking security-proxy logs while the agent makes a known outbound request.
calciforge doctor warns if the Calciforge daemon itself has ambient proxy
environment, flags explicit subprocess proxy env for verification, and warns
when configured HTTP/native agent daemons need separate validation.
Calciforge did not remove proxy support; it narrowed where proxy env is treated as a reliable security mechanism.
HTTP_PROXYremains useful for tested plaintext HTTP clients. The OpenClaw installer path can write service proxy env viaproxy_endpoint, after checking that the configuredsecurity-proxyis reachable from the OpenClaw host.HTTPS_PROXYshould only be set for agent runtimes that have been tested with Calciforge's inspecting-proxy mode and trust the configured CA. Setting it globally can break streaming clients, WebSockets, browser/OAuth flows, and npm-backed adapters.- Browser-backed tools usually need runtime-specific wiring. Managed OpenClaw
gets
browser.extraArgs = ["--proxy-server=..."]; relying on ambient env is not enough because OpenClaw strips Chrome proxy env and otherwise starts Chrome with--no-proxy-server. - Ambient proxy env on the Calciforge daemon itself is avoided because it can route Calciforge provider calls, channel callbacks, health checks, and local control-plane traffic through its own proxy boundary.
- Secret injection works when the request reaches Calciforge in a visible form: model-gateway/provider routes, explicit fetch/MCP/tool wrappers, audited recipes, plaintext HTTP intercept mode, or HTTPS inspecting-proxy mode. It does not happen for an external daemon's direct HTTPS egress unless that daemon is configured to use Calciforge's inspecting proxy or another Calciforge-owned tool path.
The installer now starts security-proxy with the hudsucker-backed inspecting
listener enabled by default and generates a persistent local CA if one does not
already exist. On macOS, the installer explains why the trust step is needed
before it asks the system to add that CA to the login keychain. This is required
for any tested browser, tool, or agent runtime that sends HTTPS traffic through
security-proxy and expects inspected pages without certificate errors. Set
SECURITY_PROXY_TRUST_MITM_CA=false to skip the keychain prompt. That makes
inspected HTTPS the default available proxy mode, but it does not automatically
make every runtime trust that CA.
To run the binary manually, use:
SECURITY_PROXY_CA_CERT=/etc/calciforge/mitm-ca.pem \
SECURITY_PROXY_CA_KEY=/etc/calciforge/mitm-ca-key.pem \
SECURITY_PROXY_PORT=8888 \
security-proxyThen configure the target agent process, not the Calciforge daemon itself:
export HTTP_PROXY=http://127.0.0.1:8888
export HTTPS_PROXY=http://127.0.0.1:8888
export NO_PROXY=localhost,127.0.0.1,::1The agent runtime must trust mitm-ca.pem. Depending on the runtime that can
mean the system trust store, SSL_CERT_FILE, REQUESTS_CA_BUNDLE,
NODE_EXTRA_CA_CERTS, browser trust settings, or tool-specific configuration.
The current prototype covers explicit proxy mode; OS-level transparent
redirects and installer-managed per-runtime trust setup are next.
Practical tiers:
- Direct Mac Mini/Studio OpenClaw: use the Calciforge bridge plugin for inbound chat,
point provider/model calls at Calciforge's model gateway where possible, and
use
proxy_endpointplus inspecting-proxy CA trust for tested HTTP/HTTPS egress. This is convenient but cooperative; OpenClaw can still bypass Calciforge if it opens its own direct connections outside the configured proxy environment. - Linux service host: add systemd drop-ins, dedicated service users, and later
iptables/nftablesrules so the agent process has fewer unmanaged egress paths. - Container, LXC, or VM: deny external egress except to Calciforge services. This is the likely preferred profile for agents that use complex transports or ignore proxy environment.
For agents Calciforge launches as subprocesses, start with direct channel routing plus conservative CLI flags. Add gateway coverage only through a path that has been tested for that specific runtime:
- use
kind = "openai-compat"or the model gateway when the work is really a model call; - use artifact or recipe wrappers when the network action is a known command Calciforge can run and audit;
- use MCP/fetch tools when the agent can delegate web access to Calciforge;
- use container or VM isolation when the agent has broad network behavior that cannot be reliably proxied.
For externally managed daemons, Calciforge can authenticate inbound callbacks and gate channel access, but it cannot prove outbound network policy unless the daemon is launched in a controlled environment. The practical future path is a local-lab profile that can run selected agents inside a container or VM with egress limited to Calciforge services.
The gateway is configured via GatewayConfig:
scan_outbound: Toggle outbound adversary/exfiltration detection. Defaults off while this policy matures; enable only for deployments that have tuned false positives on provider/tool transcripts.scan_inbound: Toggle injection detection.scan_response_secrets: Toggle high-entropy/secret-pattern response leak detection. Defaults off independently from prompt-injection scanning.inject_credentials: Toggle automatic API key injection.manual_credential_override_requires_operator_approval: Require an operator token forironclaw.manual_credentialoverride headers. Default:true.bypass_domains: List of domains that skip scanning (e.g., internal services).scanner_checks: Ordered adversary-detector checks. Empty means the built-in default Starlark scanner policy.
Manual credential blocks return an agent-readable explanation plus structured headers:
X-Calciforge-Policy: ironclaw.manual_credentialX-Calciforge-Operator-Approval: requiredX-Calciforge-Override-Supported: operator_scopedX-Calciforge-Override-Header: X-Calciforge-Override
The operator override header is request-side control metadata, not upstream API input:
X-Calciforge-Override: ironclaw.manual_credential:<token>With the default configuration, <token> must match
SECURITY_PROXY_MANUAL_CREDENTIAL_OVERRIDE_TOKEN. Operators can explicitly
allow self-asserted overrides by setting
manual_credential_override_requires_operator_approval = false in
security-proxy.toml, or
SECURITY_PROXY_MANUAL_CREDENTIAL_OVERRIDE_REQUIRES_OPERATOR_APPROVAL=false
in the service environment. Calciforge strips X-Calciforge-* headers before
forwarding, so override metadata is never sent to the upstream server.
Calciforge's security checks are an ordered pipeline:
- Built-in default Starlark policy — runs when
scanner_checksis empty. It implements the default hidden-payload, prompt-injection, PII-harvest, and exfiltration checks in editable policy code. starlark— in-process operator policy. This is the low-latency path for site-specific rules that do not need network calls. Policies can callregex_match(pattern, content)andbase64_decoded_regex_match(pattern, content)for bounded Rust-backed matching.remote_http— optional custom policy service. This is where operators can add a model-based classifier, heavier data-loss prevention checks, or organization-specific threat modeling that belongs outside the proxy process.
Not every gateway denial should be equally overrideable. Recommended defaults:
| Policy / block class | Configurable? | Overrideable? | Default approval |
|---|---|---|---|
ironclaw.manual_credential — raw credential supplied by the agent |
Yes | Yes, scoped header | Operator required |
Secret substitution destination denied by secret_destination_allowlist or dynamic allowed_destinations metadata |
Yes, via operator config or secret metadata | Not by agent header | Operator config/metadata change required |
Malformed or unresolved {% raw %}{{secret:NAME}}{% endraw %} |
No | No | Fix request or secret store |
agent_web.forbid_search_engines |
Yes | Prefer config only | Operator config change required |
agent_web.preflight_message_urls destination denial |
Yes | Prefer config only | Operator config change required |
agent_web.scan_search_responses blocked result |
Yes | Prefer config only | Operator config change required |
| Provider-side browsing tool stripped/blocked | Yes | Prefer config only | Operator config change required |
| Inbound prompt-injection / unsafe response scan | Yes, scanner policy | Not by agent header | Operator policy/config change required |
| Outbound exfiltration scan | Yes, scanner policy; default off | Not by agent header | Operator policy/config change required |
| Response secret-leak scan | Yes; default off | Not by agent header | Operator policy/config change required |
The reason for the split is blast radius. Manual-credential detection can be a false positive for legacy APIs that use unfortunate parameter names, so a scoped override is useful. Transport authentication is not governed by a provider-host whitelist; known auth headers are sanitized before the manual-credential scanner, and real secret movement is governed by placeholder resolution plus destination allowlists. Destination allowlists, prompt-injection blocks, and opt-in exfiltration/secret-leak blocks are higher-risk policy boundaries; an agent should receive a clear explanation and ask for operator help rather than self-override.
Calciforge can still make these policies configurable for operators. The key
rule is that configuration changes should happen in security-proxy.toml,
service environment, or policy files, while request-carried override metadata
stays narrowly scoped and is stripped before forwarding upstream.
Calciforge intentionally has both local and remote adversary detectors. The local Starlark policy is for deterministic prefiltering: hidden page text, encoding, obvious exfiltration language, and concrete tool-policy bypass patterns. The remote HTTP/model check is for semantic judgment: foreign language, poetry or other style-shift attacks, fictional framing, coercion, multi-step decomposition, and intent that would be brittle or overbroad as regex. The remote pass adds latency and still asks one model to defend another model, so Calciforge keeps Starlark as the default and makes model review explicitly configurable.
No remote service is required for the default gateway. The localhost HTTP hop is small, but a model classifier call is not; enable it only when the extra security pass is worth the added latency.
On a local release build, the built-in Starlark default scanner measured about
299µs per warm scan for ordinary small content. Treat that as a sanity check,
not a universal latency guarantee: large bodies, cold starts, extra configured
policies, proxy I/O, and remote LLM checks dominate real end-to-end latency.
The example prompt covers more than classic prompt injection: credential
exfiltration, malicious tool-use instructions, false authority claims, identity
spoofing, cross-agent propagation, denial-of-service attempts, destructive
cleanup, unbounded resource use, and other governance failures described by
agent red-team work such as
Agents of Chaos.
For the standalone security-proxy binary, the fastest way to add a custom
remote check is:
SECURITY_PROXY_REMOTE_SCANNER_URL=http://127.0.0.1:9801 \
SECURITY_PROXY_REMOTE_SCANNER_FAIL_CLOSED=true \
security-proxyFor Calciforge channel-message scanning, use:
CALCIFORGE_REMOTE_SCANNER_URL=http://127.0.0.1:9801 \
CALCIFORGE_REMOTE_SCANNER_FAIL_CLOSED=true \
calciforgeThe unified installer can also host the example scanner as a managed local service:
CALCIFORGE_REMOTE_SCANNER_ENABLED=1 \
REMOTE_SCANNER_API_KEY_FILE=~/.config/calciforge/secrets/remote-scanner-api-key \
REMOTE_SCANNER_PROMPT_FILE=~/.config/calciforge/remote-llm-scanner-prompt.txt \
bash scripts/install.shWhen enabled, the installer starts remote-llm-scanner on
127.0.0.1:9801 and sets SECURITY_PROXY_REMOTE_SCANNER_URL plus
CALCIFORGE_REMOTE_SCANNER_URL for the managed services. The API key can be
provided through REMOTE_SCANNER_API_KEY_FILE or REMOTE_SCANNER_API_KEY; the
file path is preferred so service definitions do not contain the key. The
classifier prompt is also editable: set REMOTE_SCANNER_PROMPT_FILE to a text
file or REMOTE_SCANNER_PROMPT to an inline override. The installer seeds a
default prompt file when it manages the example service.
Or configure checks directly in config.toml:
[security]
profile = "balanced"
scan_outbound = false
scan_response_secrets = false
# Empty scanner_checks uses the built-in Starlark default:
# builtin:calciforge/default-scanner.star
#
# To customize it, copy
# crates/adversary-detector/policies/default-scanner.star to
# /etc/calciforge/scanner-policies/default-scanner.star, edit it, then
# configure it explicitly:
#
[[security.scanner_checks]]
kind = "starlark"
path = "/etc/calciforge/scanner-policies/default-scanner.star"
fail_closed = true
max_callstack = 64
[[security.scanner_checks]]
kind = "starlark"
path = "/etc/calciforge/scanner.star"
fail_closed = true
max_callstack = 64
[[security.scanner_checks]]
kind = "remote_http"
url = "http://127.0.0.1:9801"
fail_closed = trueChecks are evaluated in order. A clean result continues to the next check.
A review result is retained while later checks continue, so a later
unsafe result can still block; unsafe stops the pipeline immediately.
fail_closed controls scanner errors or outages only: with false, an
unavailable optional check is skipped; successful review or unsafe
verdicts still enforce.
Starlark checks run in-process with load() disabled and a bounded call stack.
The policy file must define scan(input) and return "clean", "review",
"unsafe", or a dict with verdict and optional reason:
def scan(input):
content = input["content"].lower()
if input["context"] == "api" and "wire money" in content:
return {
"verdict": "unsafe",
"reason": "operator policy blocks wire-transfer instructions",
}
return "clean"Starlark policies receive url, content, context,
discussion_ratio_threshold, and min_signals_for_ratio. They also have
helpers backed by Rust's regex crate with compiled-pattern caching:
regex_match(pattern, content) for direct matching and
base64_decoded_regex_match(pattern, content) for bounded inspection of
base64-encoded text tokens. See
crates/adversary-detector/policies/default-scanner.star for the default
policy, examples/security-scanner.star for a minimal starter policy, and
examples/scanner-policies/ for reusable examples covering destination
allowlists, destructive command patterns, and credential-language review.
calciforge doctor --no-network validates Starlark policy files and remote
scanner URL syntax without calling remote scanner services.
Remote checks receive the same content that would otherwise be allowed or blocked by the local scanner:
POST /scan
Content-Type: application/json
{"url":"https://api.example.com","content":"...","context":"api"}They return:
{"verdict":"clean|review|unsafe","reason":"short reason"}scripts/remote-llm-scanner.py is a built-in example. It exposes /scan and
uses the local Calciforge model boundary by default with a strict
security-classifier prompt:
REMOTE_SCANNER_API_KEY_FILE=~/.config/calciforge/secrets/model-gateway-client-key \
REMOTE_SCANNER_API_BASE=http://127.0.0.1:18083/v1 \
REMOTE_SCANNER_MODEL=adversary/default \
REMOTE_SCANNER_PROMPT_FILE=./scripts/remote-llm-scanner-prompt.txt \
./scripts/remote-llm-scanner.pyUse fail_closed = true when the remote check is part of your enforcement
boundary. Use fail_closed = false for advisory classifiers where local checks
must continue to work if the remote service is unavailable.
There are three extension paths today:
- Rust integrations that embed
adversary-detectorcan implement theScannerChecktrait and compose their own in-process pipeline. - Deployed Calciforge and
security-proxyinstances can load Starlark policy files for low-latency operator-owned logic without a sidecar service. - Deployed Calciforge and
security-proxyinstances load arbitrary custom logic through theremote_httpcontract above. That keeps heavyweight code outside the trusted proxy process and lets users write checks in Python, Rust, Go, Lua, shell, or any other runtime.
Scanner code is operator-owned configuration-layer policy, so the sandbox is
not about treating the operator as hostile. It is about reliability and
blast-radius reduction: accidental recursion, dependency behavior, or unexpected
file and network access should not weaken the gateway. Starlark is the default
in-process scanner layer because it is already used by Calciforge policy code,
has no ambient filesystem or network access in this integration, supports
editable branching logic, and can use cached Rust regexes through
regex_match(). WebAssembly remains a possible future plugin layer when
stronger fuel and memory controls are needed. Use Starlark for local rules,
including regexes, keyword lists, size limits, allowed-language checks, or
context-specific branching; use remote_http when the rule needs networked
services or heavyweight dependencies.
Starter Starlark policies live under examples/scanner-policies/:
| Policy | Purpose |
|---|---|
allowed-destinations.star |
Review or block credential-shaped content sent outside an allowed destination list. |
command-denylist.star |
Block destructive shell-command patterns and review network download commands. |
credential-language.star |
Review or block credential disclosure, forwarding, and exfiltration language. |
Copy these into /etc/calciforge/scanner-policies/, edit the constants at the
top of each file, then add one or more starlark checks to config.toml.
Integration tests are located in crates/security-proxy/tests/. They verify:
- Interception of adversarial content.
- Blocking of unsafe responses.
- Successful credential injection for known providers.
The scanner also has a contributor-friendly red-team fixture suite:
cargo run -p adversary-detector --example red-teamFixtures live in examples/red-team/adversary-fixtures.json. Add cases there
when you find a bypass or false positive. Useful categories include encoded
payloads, foreign-language prompt injection, Unicode obfuscation, benign
security research, and GTFOBins/LOLBins-style instructions where a legitimate
tool is used to bypass a higher-level policy. Some fixtures can intentionally
document current gaps by expecting clean; hardening work should update the
fixture expectation in the same PR that improves the policy.
Good sources for new fixture families include:
- GTFOBins and LOLBAS-style tool-policy bypasses.
- Agent-governance threat taxonomies such as
Agents of Chaos. - Adversarial-poetry and other style-shift jailbreak research.
- Agent Arena hidden web-content cases: comments, hidden DOM nodes, microtext, ARIA, data attributes, alt text, off-screen content, and zero-width text.
- scurl-style sanitized-fetch middleware; see the sanitized fetch roadmap.
[security.secret_access] gates which secret names an identified agent,
user, or channel may discover, reference, and substitute. This is an
identity gate; secret_destination_allowlist and dynamic
allowed_destinations metadata still apply independently as destination
gates.
[security.secret_access]
[[security.secret_access.rules]]
agents = ["research-*"]
users = ["owner"]
channels = ["signal"]
secrets = ["BRAVE_*", "SEARCH_*"]Rule selectors are conjunctive. Empty agents, users, or channels
lists are wildcards for that selector type; configured selectors must
match the active identity. secrets must be non-empty and supports *
wildcards.
Identity sources:
- MCP and
calciforge-secrets:CALCIFORGE_AGENT_ID,CALCIFORGE_USER_ID,CALCIFORGE_CHANNEL_ID, orCALCIFORGE_CHANNEL. - API-backed
calciforge-secretswrappers forward those identities to the central secret-control API; managed installs setCALCIFORGE_AGENT_IDto the claw name in the generated wrapper. - security proxy:
x-calciforge-agent-id, legacyx-agent-id,x-calciforge-user-id,x-calciforge-channel-id, orx-calciforge-channel.
Secret access rules fail closed: if no rule allows a secret, list_secrets
and calciforge-secrets list hide it, reference creation rejects it, and
security-proxy substitution refuses to resolve it. Unknown identities preserve
process-scoped compatibility only when no secret access rules are configured.
The proxy strips Calciforge identity headers, including legacy x-agent-id,
before forwarding upstream.
This ACL is a read/use policy. The central GET /control/secrets/list and
GET /control/secrets/ref/* helper endpoints use the read-only
secret_discovery_api_key. The central POST /control/secrets/set helper
remains a privileged operator path guarded by the secret_control_api_key
and, when allowed_destinations are supplied, refuses to store the secret
value unless destination metadata is stored first. It does not currently
grant per-identity write permissions; treat
that as separate secret-integrity hardening before exposing write-capable
helpers broadly.
Calciforge's inspecting gateway can scan outbound HTTPS when the runtime uses
the trusted proxy path, but the highest-likelihood leak path for blocked
content is not a direct egress to a denied host. It is the search-API
response that contains pre-indexed snippets of the same denied host, or a
provider-side browsing tool that the model invokes from inside an allowed
api.openai.com session.
[security.agent_web] adds four configurable defenses against this class of leak. All default to safe values; operators opt into stricter modes.
This complements but does not replace secret_destination_allowlist or
dynamic allowed_destinations secret metadata. Those allowlists gate
secrets-into-hosts, while agent_web gates content: search snippets,
provider browsing tool definitions, and URLs in large-language-model request
bodies. Static TOML policy and dynamic metadata are intersected; metadata read
failures fail closed when substitution needs a destination policy decision.
Block all egress to known search APIs entirely. When true, requests to any host matching search_engine_patterns are denied.
[security.agent_web]
forbid_search_engines = true
# Override the default curated list (api.search.brave.com, duckduckgo.com,
# api.tavily.com, serpapi.com, serper.dev, api.firecrawl.dev, api.you.com,
# api.exa.ai, api.kimi.com, api.minimax.com).
search_engine_patterns = ["api.search.brave.com", "api.tavily.com"]Scan responses from search APIs for prompt-injection AND for URLs that fail the url_destination_denylist.
search_response_strategy = "block"(default) — replace the entire response with the standard block page.search_response_strategy = "strip"— parse the JSON and drop only the offending result entries; falls back to "block" if the JSON can't be parsed.
[security.agent_web]
forbid_search_engines = false
scan_search_responses = true
search_response_strategy = "strip"
url_destination_denylist = ["leaked-corp-docs.example.com", "intranet.acme.local"]Inspect outbound LLM API request bodies and either strip or block known provider-side browsing tools (web_search, web_search_preview, web_search_20250305, computer_use_*, google_search, google_search_retrieval, browser, browser_use, …).
Always-search models (gpt-4o-search-preview*) cannot be stripped — they're always blocked when this is on.
provider_browsing_strategy = "strip"(default) — rewrite request body to drop the tool defs.provider_browsing_strategy = "block"— refuse the request entirely.
[security.agent_web]
forbid_provider_browsing = true
provider_browsing_strategy = "strip"
# Override the curated tool / model lists if needed.
forbidden_browsing_tools = ["web_search", "web_search_20250305", "google_search"]
forbidden_browsing_models = ["gpt-4o-search-preview"]
known_llm_apis = [
"api.openai.com",
"chatgpt.com",
"chat.openai.com",
"api.anthropic.com",
"openrouter.ai",
"generativelanguage.googleapis.com",
"api.groq.com",
]Extract https?://... URLs from outbound LLM request bodies for hosts in known_llm_apis; test each against url_destination_denylist. The scanner covers common shapes such as messages[].content, Anthropic content arrays, OpenAI Responses input, provider-specific nested JSON envelopes, and tools[].description when preflight_tool_descriptions = true.
If any URL would be blocked at fetch time, the LLM request is refused before forwarding to the provider. This is separate from content scanning: response scanners still inspect raw content that crosses the gateway, while URL preflight prevents opaque provider-side browsing from fetching denied origins where the gateway would otherwise only see a synthesized model summary.
[security.agent_web]
preflight_message_urls = true
preflight_tool_descriptions = true
url_destination_denylist = ["leaked-corp-docs.example.com", "ref.jock.pl"]Each policy hit emits a tracing INFO event with structured fields (policy = "agent_web.<feature>", dest_host, decision, plus tool/model/denied_host when relevant) — these flow into the existing Calciforge audit pipeline.