|
| 1 | +# Tier 3 — Kubernetes (pod-per-session) |
| 2 | + |
| 3 | +> Part of the [Agent SDK hosting cookbook](../../07_Hosting_the_agent.ipynb). |
| 4 | +> If you haven't picked a hosting tier yet, start there — it covers when a |
| 5 | +> managed option is the better fit and when you actually need this. |
| 6 | +
|
| 7 | +Run the agent on a Kubernetes cluster where every session gets its own |
| 8 | +isolated pod, with network-level controls ensuring agent pods can only reach |
| 9 | +the Anthropic API. |
| 10 | + |
| 11 | +``` |
| 12 | + ┌──────────────────────────────────────────────────┐ |
| 13 | + │ Kubernetes │ |
| 14 | + │ │ |
| 15 | + curl / SDK ──────► Gateway (FastAPI) │ |
| 16 | + │ ├─ creates/deletes agent pods via K8s API │ |
| 17 | + │ ├─ routes /sessions/{id}/messages to right pod │ |
| 18 | + │ └─ session → pod mapping stored in Redis │ |
| 19 | + │ │ |
| 20 | + ┌──────┴──────┐ │ |
| 21 | + │ │ │ |
| 22 | + Agent Pod Agent Pod ──► Egress Proxy ──► api.anthropic.com │ |
| 23 | + (session A) (session B) ▲ │ |
| 24 | + │ │ │ │ |
| 25 | + │ NetworkPolicy: pods can ONLY reach egress-proxy │ |
| 26 | + │ │ |
| 27 | + Redis (session → pod-IP mapping) │ |
| 28 | + │ │ |
| 29 | + └──────────────────────────────────────────────────┘ |
| 30 | +``` |
| 31 | + |
| 32 | +The agent image is the **same one** Tier 1 builds from |
| 33 | +[`hosting/Dockerfile`](../Dockerfile). Same image, different machinery: instead |
| 34 | +of a single container or a Modal sandbox, the gateway gives each session its |
| 35 | +own pod and the cluster enforces what that pod can reach. |
| 36 | + |
| 37 | +> **Before you self-host:** if you just want a hosted agent without running |
| 38 | +> infrastructure, use Anthropic's managed option — see the |
| 39 | +> [Hosting overview](../README.md). This guide is for teams that need the |
| 40 | +> agent on their own Kubernetes cluster (regulated environments, existing |
| 41 | +> platform, custom networking). |
| 42 | +
|
| 43 | + |
| 44 | +## Why each piece exists |
| 45 | + |
| 46 | +**Gateway** — Each user session gets its own agent pod. Something has to create |
| 47 | +those pods on demand, route traffic to the right one, and clean them up when |
| 48 | +sessions go idle. That's the gateway. It talks to the Kubernetes API to manage |
| 49 | +pod lifecycles and uses Redis to remember which session maps to which pod IP. |
| 50 | + |
| 51 | +**Egress proxy + NetworkPolicy** — Agents run arbitrary code. This pair ensures |
| 52 | +agent pods can reach `api.anthropic.com` and *nothing else*. The NetworkPolicy |
| 53 | +blocks all outbound traffic except to the egress proxy (port 443) and DNS |
| 54 | +(port 53). The egress proxy terminates TLS from the agent, then re-encrypts the |
| 55 | +request to Anthropic's API. Any attempt to reach the internet, other services, |
| 56 | +or other namespaces is dropped at the network level. |
| 57 | + |
| 58 | +**Redis** — The gateway needs to remember which pod is handling which session. |
| 59 | +When a request arrives, it looks up the session ID in Redis to find the pod IP |
| 60 | +and routes traffic there. Redis persists to disk so mappings survive gateway |
| 61 | +restarts. |
| 62 | + |
| 63 | +**Standby pool** — Pods take 10–30 seconds to start (image pull + container |
| 64 | +boot). The gateway pre-warms a configurable number of standby pods so new |
| 65 | +sessions can claim one instantly instead of waiting. After a pod is claimed, |
| 66 | +the pool replenishes in the background. |
| 67 | + |
| 68 | +## Prerequisites |
| 69 | + |
| 70 | +| Tool | What it's for | |
| 71 | +|------|---------------| |
| 72 | +| [kind](https://kind.sigs.k8s.io/) | Local Kubernetes cluster in Docker | |
| 73 | +| [kubectl](https://kubernetes.io/docs/tasks/tools/) | Applying manifests, inspecting the cluster | |
| 74 | +| [docker](https://docs.docker.com/get-docker/) | Building container images | |
| 75 | +| `openssl` | Generating the egress proxy's TLS certificate | |
| 76 | +| `ANTHROPIC_API_KEY` | Set as env var | |
| 77 | + |
| 78 | +## Quickstart (local, with kind) |
| 79 | + |
| 80 | +```bash |
| 81 | +cd hosting/kubernetes |
| 82 | +export ANTHROPIC_API_KEY=sk-ant-... |
| 83 | +./kind-quickstart.sh |
| 84 | +``` |
| 85 | + |
| 86 | +This builds the three images, loads them into a local `kind` cluster, applies |
| 87 | +every manifest, and port-forwards the gateway to `localhost:8080`. |
| 88 | + |
| 89 | +## Talk to it |
| 90 | + |
| 91 | +Same path and shape as Tier 1/2 — only the base URL changes: |
| 92 | + |
| 93 | +```bash |
| 94 | +curl -N -X POST http://localhost:8080/sessions/demo/messages \ |
| 95 | + -H 'Content-Type: application/json' \ |
| 96 | + -d '{"prompt": "What tools do you have?"}' |
| 97 | +``` |
| 98 | + |
| 99 | +The first request on a new `session_id` claims a standby pod (or spawns one if |
| 100 | +the pool is empty). Subsequent requests with the same `session_id` route to the |
| 101 | +same pod, so the agent sees a continuous conversation. |
| 102 | + |
| 103 | +Watch the machinery work: |
| 104 | + |
| 105 | +```bash |
| 106 | +kubectl -n claude-agent get pods -w |
| 107 | +# you'll see agent-standby-* pods appear, then one flip to active when you curl |
| 108 | +``` |
| 109 | + |
| 110 | +To end a session, go through the gateway so the Redis mapping is cleaned up: |
| 111 | + |
| 112 | +```bash |
| 113 | +curl -X DELETE http://localhost:8080/sessions/demo |
| 114 | +``` |
| 115 | + |
| 116 | +(`kubectl delete pod` works too, but leaves a stale `session → pod-IP` entry |
| 117 | +in Redis until the next request on that session 502s.) |
| 118 | + |
| 119 | +## Verify the egress lockdown |
| 120 | + |
| 121 | +The agent runs code the model decides to run. The egress proxy + NetworkPolicy |
| 122 | +mean a prompt-injected agent still can't reach arbitrary hosts. Prove it: |
| 123 | + |
| 124 | +> `kind-quickstart.sh` installs Calico because kind's default CNI (kindnet) |
| 125 | +> doesn't enforce NetworkPolicy. On GKE/EKS/AKS or any Calico/Cilium cluster, |
| 126 | +> enforcement is on by default and this section works unchanged. |
| 127 | +
|
| 128 | +```bash |
| 129 | +AGENT_POD=$(kubectl -n claude-agent get pods -l role=agent \ |
| 130 | + -o jsonpath='{.items[0].metadata.name}') |
| 131 | + |
| 132 | +# This should FAIL — Calico drops the route to anything except egress-proxy. |
| 133 | +# (The agent image is slim and has no curl, so we use Python's socket.) |
| 134 | +kubectl -n claude-agent exec "$AGENT_POD" -- python3 -c \ |
| 135 | + "import socket; socket.setdefaulttimeout(5); socket.create_connection(('example.com',443)); print('REACHED — policy NOT enforcing')" |
| 136 | +``` |
| 137 | + |
| 138 | +Expected: `OSError: [Errno 101] Network is unreachable` (or a timeout) and a |
| 139 | +non-zero exit. The positive control — that the egress-proxy path *is* open — |
| 140 | +was already proven by the curl above returning model output. |
| 141 | + |
| 142 | +## Standby pool |
| 143 | + |
| 144 | +`STANDBY_POOL_SIZE` (in the `agent-config` ConfigMap) controls how many warm |
| 145 | +pods the gateway keeps ready. Check current state: |
| 146 | + |
| 147 | +```bash |
| 148 | +curl http://localhost:8080/api/pool |
| 149 | +``` |
| 150 | + |
| 151 | +## Persistence |
| 152 | + |
| 153 | +`server.py` persists transcripts (and its caller-ID → SDK-ID map) to |
| 154 | +`CLAUDE_CONFIG_DIR=/data`. In this tier that's the pod's ephemeral filesystem, |
| 155 | +so: |
| 156 | + |
| 157 | +- **While the pod is alive** (within the idle-timeout window), follow-up |
| 158 | + messages resume the conversation exactly as in Tiers 1 and 2. |
| 159 | +- **After the pod is reaped**, `/data` is gone. The next message on that |
| 160 | + `session_id` gets a fresh pod with no history. |
| 161 | + |
| 162 | +For a cookbook demo this is fine — sessions outlive the curl, not the cluster. |
| 163 | +For production you need durable storage that survives pod recycle. Two options: |
| 164 | + |
| 165 | +1. **Mount a PersistentVolumeClaim** at `/data` instead of the pod's local |
| 166 | + disk, and have the gateway reattach the same PVC when a session returns. |
| 167 | + Works with `server.py` as-is, but couples each session to a volume in one |
| 168 | + zone. |
| 169 | +2. **Mirror `/data` to external storage** with the SDK's |
| 170 | + [`SessionStore`](https://code.claude.com/docs/en/agent-sdk/session-storage): |
| 171 | + the local-disk write still happens first; the store is a mirror, and |
| 172 | + `mirror_error` is non-fatal. This is the approach the notebook's |
| 173 | + *Making it production-ready* section describes — it needs a small hook in |
| 174 | + `server.py` that the cookbook hasn't grown yet. |
| 175 | + |
| 176 | +## Deploying to your own cluster |
| 177 | + |
| 178 | +`kind` proves the topology; the manifests are cloud-agnostic. To run on EKS, |
| 179 | +AKS, GKE, OpenShift, or bare metal, swap the image registry and the front door: |
| 180 | + |
| 181 | +```bash |
| 182 | +REG=your.registry.example.com/claude-agent # ECR, ACR, GHCR, Artifact Registry, ... |
| 183 | + |
| 184 | +# 1. Build and push the three images |
| 185 | +docker build -t $REG/agent:latest -f ../Dockerfile .. |
| 186 | +docker build -t $REG/gateway:latest ./gateway |
| 187 | +docker build -t $REG/egress-proxy:latest ./egress-proxy |
| 188 | +docker push $REG/agent:latest $REG/gateway:latest $REG/egress-proxy:latest |
| 189 | + |
| 190 | +# 2. TLS certs for the egress proxy |
| 191 | +./generate-certs.sh |
| 192 | + |
| 193 | +# 3. Namespace + secrets + config |
| 194 | +kubectl apply -f manifests/namespace.yaml |
| 195 | +kubectl -n claude-agent create secret generic anthropic-api-key \ |
| 196 | + --from-literal=ANTHROPIC_API_KEY="$ANTHROPIC_API_KEY" |
| 197 | +kubectl -n claude-agent create secret generic egress-proxy-tls \ |
| 198 | + --from-file=ca.crt=certs/ca.crt \ |
| 199 | + --from-file=proxy.crt=certs/proxy.crt \ |
| 200 | + --from-file=proxy.key=certs/proxy.key |
| 201 | +kubectl -n claude-agent create configmap agent-config \ |
| 202 | + --from-literal=AGENT_IMAGE=$REG/agent:latest \ |
| 203 | + --from-literal=STANDBY_POOL_SIZE=2 |
| 204 | + |
| 205 | +# 4. Apply manifests with your registry substituted |
| 206 | +for f in manifests/*.yaml; do |
| 207 | + sed "s|REGISTRY_URL|$REG|g" "$f" | kubectl apply -f - |
| 208 | +done |
| 209 | +``` |
| 210 | + |
| 211 | +Then expose the `gateway` Service through whatever your cluster uses for |
| 212 | +ingress — a cloud LoadBalancer, an Ingress controller, or a service mesh |
| 213 | +gateway. Three things vary by environment: |
| 214 | + |
| 215 | +- **Registry auth** — your nodes need pull credentials for `$REG` |
| 216 | + (`imagePullSecrets`, IRSA/Workload Identity, or a public registry). |
| 217 | +- **NetworkPolicy enforcement** — the egress lockdown only works if your CNI |
| 218 | + enforces `NetworkPolicy` (Cilium, Calico, GKE Dataplane V2, EKS with the |
| 219 | + VPC CNI policy add-on). On a CNI that ignores it, agent pods can reach the |
| 220 | + internet. |
| 221 | +- **TLS + auth in front of the gateway** — `GATEWAY_AUTH_TOKEN` is a |
| 222 | + placeholder. Put your IdP / API gateway in front before exposing this |
| 223 | + publicly. |
| 224 | + |
| 225 | +## What this doesn't give you |
| 226 | + |
| 227 | +- Real authentication or multi-tenancy (the `authenticate()` stub returns one |
| 228 | + hard-coded tenant) |
| 229 | +- Durable session storage (see [Persistence](#persistence)) |
| 230 | +- Gateway autoscaling or multi-region routing |
| 231 | +- Observability beyond what |
| 232 | + [`OTEL_EXPORTER_OTLP_ENDPOINT`](../README.md#observability) gives you for free |
| 233 | + |
| 234 | +## Teardown |
| 235 | + |
| 236 | +```bash |
| 237 | +./teardown.sh # kind delete cluster + remove certs/ |
| 238 | +``` |
| 239 | + |
| 240 | +## Layout |
| 241 | + |
| 242 | +``` |
| 243 | +kubernetes/ |
| 244 | +├── README.md |
| 245 | +├── kind-quickstart.sh # local end-to-end on kind |
| 246 | +├── teardown.sh |
| 247 | +├── generate-certs.sh # self-signed CA + proxy cert for egress-proxy |
| 248 | +├── gateway/ |
| 249 | +│ ├── main.py # FastAPI: route + reap |
| 250 | +│ ├── k8s.py # pod lifecycle + standby pool |
| 251 | +│ ├── proxy.py # SSE relay |
| 252 | +│ ├── requirements.txt |
| 253 | +│ └── Dockerfile |
| 254 | +├── egress-proxy/ |
| 255 | +│ ├── nginx.conf |
| 256 | +│ └── Dockerfile |
| 257 | +└── manifests/ |
| 258 | + ├── namespace.yaml |
| 259 | + ├── redis.yaml |
| 260 | + ├── egress-proxy.yaml |
| 261 | + ├── gateway.yaml # SA + RBAC + Deployment + Service |
| 262 | + └── network-policy.yaml |
| 263 | +``` |
| 264 | + |
| 265 | +--- |
| 266 | + |
| 267 | +The pod lifecycle management (`k8s.py`), egress proxy, and network policy are |
| 268 | +adapted from Anthropic's internal `create-claude-agent` harness by Joe Shamon |
| 269 | +and Ben Lehrburger. |
0 commit comments