Summary
Agents should never see real bearer tokens. AuthBridge should replace the inbound Authorization header with an opaque placeholder after validation, and resolve that placeholder back to an exchanged token on the outbound path. The agent only ever sees a meaningless reference — credential handling is fully transparent.
Inspired by NVIDIA OpenShell's credential isolation model, where the agent process never has access to any real credential.
Current Flow
User -> [Authorization: Bearer <user-token>] -> AuthBridge sidecar (validates JWT)
-> [Authorization: Bearer <user-token>] -> Agent container <-- agent sees the token
-> Agent makes outbound call [Authorization: Bearer <user-token>]
-> AuthBridge sidecar (token exchange) -> [Authorization: Bearer <exchanged-token>] -> Tool
The agent receives and can read the user's bearer token. A compromised agent (prompt injection, malicious tool, dependency vulnerability) can exfiltrate or misuse it.
Proposed Flow
Inbound:
User -> [Authorization: Bearer <user-token>]
-> AuthBridge sidecar validates JWT, extracts claims
-> generates opaque placeholder: kagenti:ref:<uuid>
-> stores uuid -> {claims, act chain, expiry} in shared memory
-> replaces header: [Authorization: Bearer kagenti:ref:<uuid>]
-> forwards to Agent container <-- agent only sees placeholder
Outbound:
Agent makes call -> [Authorization: Bearer kagenti:ref:<uuid>]
-> AuthBridge sidecar intercepts
-> looks up uuid in shared memory -> retrieves cached claims
-> performs token exchange using claims + agent SPIFFE identity
-> replaces header: [Authorization: Bearer <exchanged-token>]
-> forwards to Tool
The agent never touches a real token. It receives and propagates an opaque placeholder naturally (most HTTP frameworks propagate Authorization headers), and the sidecar resolves it on the way out.
Why Placeholders (Not Just Strip-and-Cache)
With concurrent requests from multiple users (or multiple requests from the same user), there is no reliable way to correlate an outbound request back to the inbound request that triggered it. The placeholder solves this:
- Each inbound request gets a unique placeholder UUID
- The agent framework naturally propagates it in the
Authorization header to outbound calls
- The outbound path uses the placeholder to look up the exact claims for that specific request
- No ambiguity, no race conditions, no single-user assumption
This also resolves the agent framework propagation concern from #174 — frameworks already propagate Authorization headers, they just happen to contain a placeholder instead of a real token.
What Changes in authlib
Inbound (HandleInbound)
After validation succeeds:
- Generate a placeholder:
kagenti:ref:<uuid>
- Store
uuid -> {original claims, act chain, expiry} in a shared in-memory store
- Return a new action (e.g.,
ActionReplaceAndAllow) signaling the listener to replace the Authorization header value with the placeholder
Outbound (HandleOutbound)
When the Authorization header contains a kagenti:ref:* value:
- Extract the UUID, look up claims in the shared store
- Perform token exchange using cached claims + agent SPIFFE identity (preserving
act chain)
- Replace the placeholder with the exchanged token via existing
ActionReplaceToken
- If the UUID is not found or expired, deny the request (fail-closed)
When no Authorization header is present, fall back to existing noTokenPolicy behavior (unchanged).
Shared Memory Store
A sync.Map (or equivalent concurrent map) in the authlib Go process, keyed by placeholder UUID, storing validated claims and expiry. Both the inbound and outbound code paths in auth.Auth already share the same struct instance, so the store is naturally shared.
Open Design Questions
1. Where does the shared store live in envoy-sidecar mode?
In proxy-sidecar mode, a single Go process handles both directions — a sync.Map works directly. In envoy-sidecar mode, inbound and outbound are separate ext_proc filter invocations. If the same go-processor binary handles both, the in-memory store still works. If they are separate processes, a shared Unix socket or pod-local gRPC service would be needed.
2. Placeholder TTL
The placeholder must live long enough for the agent to process the request and make outbound calls. Options:
- Fixed TTL (e.g., 60s) — simple but may be too short for complex chains
- Tied to inbound request lifetime — evict when the inbound response completes
- Configurable per-agent — longer for agents that do multi-step reasoning
3. Streaming and long-running agent calls
If the agent takes minutes to process (complex LLM chain with multiple tool calls), the placeholder must live at least that long. A request-lifetime-scoped eviction may be more robust than a fixed TTL.
4. Multiple outbound calls per inbound request
An agent may make several tool calls for a single inbound request, all carrying the same placeholder. The store must support multiple lookups per UUID (read-many, not pop-on-read). Eviction should happen after the inbound request completes, not after the first outbound resolution.
5. Placeholder leakage
The UUID is opaque — no claims, scopes, or original token can be derived from it. Even if exfiltrated, it is only resolvable within the sidecar's in-memory store inside that specific pod. However, if the agent logs or persists the placeholder, it could theoretically be replayed within the TTL window. Consider whether the store should also bind to source IP or other request attributes.
How This Differs from #174
Issue #174 investigates getting agent frameworks to explicitly propagate the inbound token to outbound calls. This proposal makes that unnecessary — frameworks already propagate the Authorization header naturally, so the placeholder flows through without any framework-specific integration. These are complementary: #174 is for frameworks that want explicit token awareness; this solves it transparently at the infrastructure layer.
Acceptance Criteria
Summary
Agents should never see real bearer tokens. AuthBridge should replace the inbound
Authorizationheader with an opaque placeholder after validation, and resolve that placeholder back to an exchanged token on the outbound path. The agent only ever sees a meaningless reference — credential handling is fully transparent.Inspired by NVIDIA OpenShell's credential isolation model, where the agent process never has access to any real credential.
Current Flow
The agent receives and can read the user's bearer token. A compromised agent (prompt injection, malicious tool, dependency vulnerability) can exfiltrate or misuse it.
Proposed Flow
The agent never touches a real token. It receives and propagates an opaque placeholder naturally (most HTTP frameworks propagate
Authorizationheaders), and the sidecar resolves it on the way out.Why Placeholders (Not Just Strip-and-Cache)
With concurrent requests from multiple users (or multiple requests from the same user), there is no reliable way to correlate an outbound request back to the inbound request that triggered it. The placeholder solves this:
Authorizationheader to outbound callsThis also resolves the agent framework propagation concern from #174 — frameworks already propagate
Authorizationheaders, they just happen to contain a placeholder instead of a real token.What Changes in authlib
Inbound (
HandleInbound)After validation succeeds:
kagenti:ref:<uuid>uuid -> {original claims, act chain, expiry}in a shared in-memory storeActionReplaceAndAllow) signaling the listener to replace theAuthorizationheader value with the placeholderOutbound (
HandleOutbound)When the
Authorizationheader contains akagenti:ref:*value:actchain)ActionReplaceTokenWhen no
Authorizationheader is present, fall back to existingnoTokenPolicybehavior (unchanged).Shared Memory Store
A
sync.Map(or equivalent concurrent map) in the authlib Go process, keyed by placeholder UUID, storing validated claims and expiry. Both the inbound and outbound code paths inauth.Authalready share the same struct instance, so the store is naturally shared.Open Design Questions
1. Where does the shared store live in envoy-sidecar mode?
In proxy-sidecar mode, a single Go process handles both directions — a
sync.Mapworks directly. In envoy-sidecar mode, inbound and outbound are separate ext_proc filter invocations. If the samego-processorbinary handles both, the in-memory store still works. If they are separate processes, a shared Unix socket or pod-local gRPC service would be needed.2. Placeholder TTL
The placeholder must live long enough for the agent to process the request and make outbound calls. Options:
3. Streaming and long-running agent calls
If the agent takes minutes to process (complex LLM chain with multiple tool calls), the placeholder must live at least that long. A request-lifetime-scoped eviction may be more robust than a fixed TTL.
4. Multiple outbound calls per inbound request
An agent may make several tool calls for a single inbound request, all carrying the same placeholder. The store must support multiple lookups per UUID (read-many, not pop-on-read). Eviction should happen after the inbound request completes, not after the first outbound resolution.
5. Placeholder leakage
The UUID is opaque — no claims, scopes, or original token can be derived from it. Even if exfiltrated, it is only resolvable within the sidecar's in-memory store inside that specific pod. However, if the agent logs or persists the placeholder, it could theoretically be replayed within the TTL window. Consider whether the store should also bind to source IP or other request attributes.
How This Differs from #174
Issue #174 investigates getting agent frameworks to explicitly propagate the inbound token to outbound calls. This proposal makes that unnecessary — frameworks already propagate the
Authorizationheader naturally, so the placeholder flows through without any framework-specific integration. These are complementary: #174 is for frameworks that want explicit token awareness; this solves it transparently at the infrastructure layer.Acceptance Criteria
Authorizationheader is replaced with an opaque placeholder after validationAuthorizationheader