Skip to content

feat(controller): credential-injector follow-ups — TLS, GH_TOKEN signal, CD fixes#356

Closed
pilartomas wants to merge 9 commits intomainfrom
feat/experimental-credential-injector
Closed

feat(controller): credential-injector follow-ups — TLS, GH_TOKEN signal, CD fixes#356
pilartomas wants to merge 9 commits intomainfrom
feat/experimental-credential-injector

Conversation

@pilartomas
Copy link
Copy Markdown
Contributor

Summary

Follow-up work on the experimental Envoy credential injector merged in #346.

  • fix(deploy): bump Envoy sidecar to envoyproxy/envoy:distroless-v1.37.2. The previous tag (envoyproxy/envoy-distroless:v1.32.0, 2024-10-15) tripped the StackRox >1y stale-image policy and blocked StatefulSet creation in the IBM Cloud cluster. Upstream stopped pushing to envoyproxy/envoy-distroless; distroless variants now ship under envoyproxy/envoy:distroless-*.
  • feat(helm): new controller.agentPodAnnotations value, propagated via AGENT_POD_ANNOTATIONS (JSON) and stamped on every agent pod. Lets operators attach admission-webhook break-glass annotations (e.g. admission.stackrox.io/break-glass) without a code change next time a cluster policy fires.
  • feat(controller): emit humr.ai/gh-token-available annotation + HUMR_GH_TOKEN_AVAILABLE env var on agents using the experimental path, so wrapper scripts can short-circuit instead of probing for a 401.
  • feat(controller): TLS interception for the experimental Envoy injector — per-instance leaf certs minted by cert-manager so Envoy can terminate the agent's TLS and inject credential headers (ADR-033).
  • fix(controller): wrap credential_injector in a composite filter so per-host dispatch works.
  • fix(controller,api-server): address review findings from feat(controller,ui): experimental Envoy credential injector behind per-instance opt-in flag #346.

Test plan

  • mise run check passes
  • mise run test passes
  • CD upgrade succeeds against IBM Cloud cluster (StackRox no longer blocks the StatefulSet)
  • Create an instance with experimentalCredentialInjector: true and confirm Envoy sidecar pulls and runs
  • kubectl get pod ... -o yaml shows humr.ai/gh-token-available annotation

First slice of the ADR-033 rollout. Add a per-instance opt-in flag
(`experimentalCredentialInjector`) that, when enabled, replaces OneCLI's
egress path for that pod with an Envoy sidecar.

- Controller: branch BuildStatefulSet on the flag — render a per-instance
  Envoy bootstrap ConfigMap, mount owner-scoped credential Secrets into
  the sidecar only, drop the agent's `ONECLI_ACCESS_TOKEN`, point
  `HTTP(S)_PROXY` at `127.0.0.1:<EnvoyPort>`, hard-code
  `automountServiceAccountToken: false` and `shareProcessNamespace: false`
  per ADR-033 threat model. NetworkPolicy drops the OneCLI peer and
  allows TCP 443/80 egress when the flag is on.
- API server: dual-write user-typed secrets (generic + Anthropic) to K8s
  Secrets labelled with the owner's sub. OneCLI write path unchanged;
  existing OneCLI-only secrets are not migrated — flagged instances only
  see secrets created after this lands.
- UI: checkbox in Add Agent dialog (configure step) and a new
  Experimental section in the configuration panel for toggling on
  existing instances.
- Helm: `controller.envoyImage` / `controller.envoyPort` defaults.

OAuth app connections, HITL, refresh-token loop, gVisor enforcement,
and OneCLI removal stay out of scope per the issue.

Closes #337

Signed-off-by: Tomas Pilar <thomas7pilar@gmail.com>
The K8s mirror used a non-existent `headerPrefix` field on `InjectionConfig`
(the real field is `valueFormat: "Bearer {value}"`). Two consequences:

1. Anthropic api-key secrets were mirrored with `Authorization: Bearer <key>`
   instead of `x-api-key: <key>`, so the upstream would reject them.
2. Generic secrets with a custom `valueFormat` (e.g. `Token {value}`) were
   ignored — every mirrored secret got `Bearer <value>` regardless.

Fix:
- Replace `headerPrefix` with the actual `valueFormat` template, applied
  via `{value}` substitution before writing the credential file.
- Special-case Anthropic in `resolveInjection`: read OneCLI's
  `metadata.authMode` from the create response and pick `x-api-key`
  (api-key) or `Authorization: Bearer` (oauth, default).
- Persist `humr.ai/auth-mode` and `humr.ai/injection-value-format`
  annotations so updates can recompute correctly without re-fetching the
  injection config.

`mise run check` does not run tsc on api-server; per-package CI catches
this. Add a unit test suite for the K8s port so the contract is locked
in.

Signed-off-by: Tomas Pilar <thomas7pilar@gmail.com>
…idecar

The standard envoyproxy/envoy image runs as root by default, which
conflicts with the sidecar's runAsNonRoot: true security context — the
container fails to start. envoyproxy/envoy-distroless ships with USER
set to a non-root account, so it satisfies the policy.

Signed-off-by: Tomas Pilar <thomas7pilar@gmail.com>
…-host dispatch

Envoy's credential_injector filter rejects virtual-host/route specific
config ("doesn't support virtual host or route specific configurations"),
which the previous bootstrap depended on. Move per-Secret injection into
an envoy.filters.http.composite filter wrapped with ExtensionWithMatcher,
dispatching by :authority. Each Secret becomes one entry in the matcher
map, selecting its own credential_injector instance.

Also add a node id/cluster — the SDS path_config_source for the per-route
credential file requires both, even for file-based sources.

Validated with 'envoy --mode validate' against a rendered bootstrap.

Note: HTTPS-via-CONNECT traffic is encrypted inside the tunnel, so
header injection is currently a no-op for HTTPS upstreams. Adding TLS
interception is tracked as the next slice.

Signed-off-by: Tomas Pilar <thomas7pilar@gmail.com>
…injector

Closes the two follow-ups from the previous slice:

1. HTTPS injection is no longer a no-op. Envoy now terminates the agent's
   TLS using a per-instance leaf cert signed by a cluster-wide MITM CA, runs
   credential_injector on the plaintext HTTP, and re-originates upstream TLS
   to the real host. SNI-miss requests pass through unmolested via
   sni_dynamic_forward_proxy.

2. The fetch-ca-cert init container is no longer required on the experimental
   path. The agent's CA volume is now projected from the leaf Secret (only
   ca.crt is exposed; tls.key stays in the sidecar — the credential boundary
   between agent and sidecar is preserved).

Mechanics:

- Helm: adds a self-signed bootstrap ClusterIssuer, an isCA Certificate that
  produces the humr-mitm-ca Secret in cert-manager's
  cluster-resource-namespace, and a CA ClusterIssuer that signs leaves.
  Gated behind controller.envoyMitm.enabled (default true).
- Controller: cert-manager.io/v1 types vendored; reconciler builds a per-
  instance Certificate (DNSNames = deduped Secret host-patterns, signed by
  humr-mitm-ca-issuer) and applies via dynamic client. cert-manager produces
  the {instance}-envoy-tls Secret asynchronously.
- Envoy bootstrap: outer CONNECT listener tunnels into an internal listener
  via envoy.bootstrap.internal_listener; tls_inspector + per-SNI filter
  chains terminate TLS using files mounted from the leaf Secret. HCM inside
  each chain runs credential_injector + dynamic_forward_proxy, with upstream
  TLS validated against the system CA bundle shipped in envoy-distroless.

Verified end-to-end on local k3s:

- Pod boots cleanly (no fetch-ca-cert hang).
- curl https://api.anthropic.com/v1/models from the agent shows MITM cert
  issuer 'CN=humr MITM CA' with SAN api.anthropic.com, the request reaches
  Anthropic with the injected Authorization header (Anthropic returns 401
  with a token-type-specific error, not a 'no key' error).
- curl https://httpbin.org/anything (no Secret) passes through unmodified;
  no Authorization header in the echoed request.

The credential file path now points at the SDS DiscoveryResponse the
api-server already writes (sds.yaml key) instead of the raw 'value' file —
path_config_source expects an SDS resource, not bare bytes.

Signed-off-by: Tomas Pilar <thomas7pilar@gmail.com>
Signed-off-by: Tomas Pilar <thomas7pilar@gmail.com>
Six findings from #346 review:

* secrets-service: replace ad-hoc `console.warn` in mirrorToK8s with a
  stable token (`k8s-mirror-failed`) and structured payload (op,
  secretId, error). Log scrapers can now alert on broken K8s mirroring
  (which silently breaks Envoy injection on the experimental path)
  without parsing free-form text.
* k8s-secrets-port: drop the defensive `.toLowerCase()` from
  k8sSecretName and validate the ID up-front against RFC 1123. Two IDs
  differing only in case can no longer silently overwrite each other; an
  invalid ID throws (caught by mirrorToK8s, not propagated to OneCLI).
* controller: when ExperimentalCredentialInjector is on, log a warning
  if no GitHub credential Secret is attached. The OneCLI GH_TOKEN
  sentinel is dropped on this path, so without a BYO credential
  gh/octokit silently lose auth — this surfaces it in operator logs.
* resources_test: TestBuildStatefulSet_FlagOn_AddsEnvoySidecar now uses
  a non-empty credentialSecrets slice and asserts that volume + mount
  names match what the bootstrap template references (`cred-<name>`,
  `/etc/envoy/credentials/<name>`, `/etc/envoy/tls`).
* secrets-service.test (new): unit tests verifying create/update/delete
  resolve successfully when the K8s mirror throws, that the failure is
  logged with the structured payload, and that the mirror is skipped
  entirely when k8sPort is undefined.
* platform-topology.md: rewrite the credential-isolation invariant to
  cover both paths (OneCLI MITM, Envoy sidecar with per-instance leaf)
  and add ADR-033 to Motivated by. The previous wording asserted agents
  never hold upstream credentials, but elided that the experimental
  path achieves this differently.

Signed-off-by: Tomas Pilar <thomas7pilar@gmail.com>
Follow-up to the previous review pass — operator-side log warnings weren't
enough; the agent itself needs a signal so GH_TOKEN-aware tooling (gh CLI,
octokit, wrapper scripts) can short-circuit instead of failing on a
mid-request 401.

When ExperimentalCredentialInjector=true:

- Set HUMR_GH_TOKEN_AVAILABLE="true"|"false" on the agent container env.
  "true" iff the owner has a credential Secret with host-pattern
  github.com or api.github.com (Envoy will inject Authorization on the
  wire); "false" otherwise.
- Mirror the same value to a pod annotation
  humr.ai/gh-token-available — operators can grep for the missing case
  via 'kubectl get pods -o jsonpath="{...annotations.humr\.ai/gh-token-available}"'
  without poking inside the container.

Off the experimental path, neither is set (the OneCLI sentinel mechanism
is unchanged). Tests cover both flag-on cases and confirm flag-off stays
clean. security-and-credentials.md documents the signal.

Signed-off-by: Tomas Pilar <thomas7pilar@gmail.com>
…otations

- Envoy sidecar image bumped from envoyproxy/envoy-distroless:v1.32.0
  (2024-10-15, blocked by StackRox >1y policy) to envoyproxy/envoy:distroless-v1.37.2
  (2026-04-10). Upstream stopped publishing to envoyproxy/envoy-distroless;
  distroless variants now live under envoyproxy/envoy:distroless-* tags.
- New controller.agentPodAnnotations Helm value, propagated to the controller
  via AGENT_POD_ANNOTATIONS (JSON) and stamped on every agent pod. Lets
  operators attach admission-webhook break-glass annotations (e.g.
  admission.stackrox.io/break-glass) without a code change next time a
  cluster policy fires.

Signed-off-by: Tomas Pilar <thomas7pilar@gmail.com>
@pilartomas
Copy link
Copy Markdown
Contributor Author

Reopening on a clean branch off main.

@pilartomas pilartomas closed this Apr 28, 2026
@pilartomas pilartomas deleted the feat/experimental-credential-injector branch April 28, 2026 13:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant