Skip to content

Commit b274ef2

Browse files
committed
fix(nri): address pre-ship review findings (CRIT-1, CRIT-2, IMP-1, IMP-2, IMP-5)
CRIT-1 — Pod identity is now verified by UID, not just (namespace, name). Closes a name-reuse race where an attacker could force-delete a pod between admission and CreateContainer, recreate one with the same name+SA, and have the plugin fetch credentials for the recreated pod whose UID doesn't match the NRI sandbox UID. fetchAndBuildMapping now refuses if NRI sandbox UID != kube-apiserver pod.UID. CRIT-2 — Removed dead wrap/unwrap path. The pull-not-push refactor (76c2074) replaced the wrap_token-as-bearer-credential pattern, but left behind: - pkg/vault/vault.go: WrapValues / UnwrapValues (deleted) - pkg/vault/wrap_test.go (deleted) - pkg/k8smutator/k8smutator_test.go: stubWrapper helper (deleted) - pkg/config/config.go: NRIConfig.WrapTokenTTL field (deleted) - helm/values.yml, configmaps.yaml, deployment-injector.yaml: nri.wrapTokenTTL knob and INJECTOR_NRI_WRAP_TOKEN_TTL env var (deleted) Operators no longer see a knob that does nothing. Tests no longer exercise dead code paths. IMP-1 — Rewrote docs/how-it-works/nri-mode.md to describe the schema v2 pull-not-push design. Old text described the wrap/unwrap flow which no longer exists. New version: architecture diagram for v2, the three-layer identity attestation defense (UID + namespace + SA), schema upgrade path, hardening checklist, accurate trust posture. IMP-2 — Kyverno policy now blocks hostPath mounts of /run/vault-db-injector in addition to /var/run/nri and /opt/nri. The credential cache lives there and was previously not covered by the policy (PSA baseline already covered it, but defense-in-depth). IMP-5 — Deleted leaked pkg/nri/.tmp file (saveCache test artifact). 174 unit tests pass. K3D edge suite: 9/9 (A_substitution, B_uri_mode, C_multi_container, D_init, E_empty_placeholder, F_malformed_json, G_cache_persistence, H_v1_rejected, I_identity_forge_blocked).
1 parent 55b013a commit b274ef2

11 files changed

Lines changed: 121 additions & 224 deletions

File tree

docs/how-it-works/nri-mode.md

Lines changed: 98 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -5,28 +5,59 @@ substitutes them at container creation time via a node-local DaemonSet
55
plugin. Credentials never appear in any persisted Kubernetes resource
66
(PodSpec, etcd, audit logs, GitOps captures).
77

8+
## Architecture (schema v2 — pull-not-push)
9+
10+
```
11+
kube-apiserver
12+
↓ pod admitted with annotation
13+
↓ db-creds-injector.numberly.io/nri-mapping = {
14+
↓ schema:2, db_path, db_role, placeholders, request_id,
15+
↓ pod_namespace, pod_service_account
16+
↓ } (NO Vault token, NO bearer credential)
17+
kubelet
18+
19+
containerd (NRI enabled)
20+
↓ /var/run/nri/nri.sock
21+
[vault-db-injector NRI plugin DaemonSet]
22+
↓ on CreateContainer:
23+
↓ 1. read pod-sandbox annotation, parse NRIMapping
24+
↓ 2. GET pod from kube-apiserver — verify UID, namespace, SA
25+
↓ match annotation (defense vs annotation forgery)
26+
↓ 3. authenticate to Vault as the plugin's OWN SA token
27+
↓ (k8s auth method)
28+
↓ 4. CanIGetRoles for the actual pod identity → confirms the
29+
↓ Vault auth role binds this (namespace, SA)
30+
↓ 5. GetDbCredentials — dynamic credential issued, lease tagged
31+
↓ with pod UID for renewer/revoker correlation
32+
↓ 6. emit ContainerAdjustment{env: substituted}
33+
runc
34+
↓ execve with real envp
35+
app
36+
```
37+
838
## Components
939

10-
- **Webhook** — generates placeholders, wraps the credential payload as a
11-
single-use Vault wrap token (5min TTL by default), attaches both as a
12-
pod annotation `db-creds-injector.numberly.io/nri-mapping`.
13-
- **DaemonSet** — node-local NRI plugin. On `CreateContainer`: reads the
14-
annotation, unwraps the token, substitutes placeholders in env vars,
15-
returns a `ContainerAdjustment` to containerd. Runs **before runc**, so
16-
the new process is born with the real envp.
40+
- **Webhook** — generates placeholders, stamps the
41+
`db-creds-injector.numberly.io/nri-mapping` annotation. Calls Vault
42+
CanIGetRoles to fail-fast at admission if the pod's SA isn't bound to
43+
the requested role. **Does not** call Vault sys/wrapping/wrap; no
44+
credential or token is placed in the PodSpec.
45+
- **DaemonSet (NRI plugin)** — node-local. Authenticates to Vault using
46+
its own ServiceAccount token. Verifies pod identity against the K8s
47+
API. Creates the dynamic credential. Substitutes placeholders in the
48+
container env at `CreateContainer` (before runc).
1749
- **Cache** — per-node tmpfs at `/run/vault-db-injector/nri/cache.json`
1850
persists unwrapped credentials so they survive plugin pod restart but
19-
not node reboot. Without persistence, a CrashLoop pod whose plugin DS
20-
restarts in the meantime would see the literal placeholder string in
21-
env (the wrap token is single-use).
51+
not node reboot. A pod whose plugin DS restarts mid-CrashLoop continues
52+
to receive the substituted env on retry instead of the placeholder.
2253

2354
## Failure modes and detection
2455

2556
The plugin emits Prometheus metrics:
2657

2758
- `vdbi_nri_substitutions_total` — successful adjustments emitted
2859
- `vdbi_nri_unwrap_failures_total{reason}` — labels: `malformed_annotation`,
29-
`unwrap_error`
60+
`fetch_error` (covers identity mismatch, Vault errors, missing pod, etc.)
3061

3162
### What can still go wrong
3263

@@ -56,24 +87,62 @@ The plugin emits Prometheus metrics:
5687
summary: NRI plugin not ready on node {{ $labels.node }} — pods with credentials are starting unsubstituted
5788
```
5889
59-
2. **Wrap token expired before CreateContainer.** Slow image pull, long
60-
pod scheduling delay (> 5 min) → unwrap fails. Plugin logs the error
61-
and increments `vdbi_nri_unwrap_failures_total{reason="unwrap_error"}`.
62-
Container starts unsubstituted, app crashes with bad credentials,
63-
visible. Increase `nri.wrapTokenTTL` if this is recurring on a slow
64-
cluster.
90+
2. **Annotation forgery** (closed by Hunter finding #H6). An attacker
91+
with `pods.create` or `pods.update` RBAC can craft an annotation
92+
claiming any `pod_namespace`/`pod_service_account`. The plugin
93+
defends against this in three layers:
94+
- The pod's actual UID (from NRI sandbox) must match
95+
`pod.metadata.uid` recorded by kube-apiserver.
96+
- The pod's actual namespace and `spec.serviceAccountName` (from
97+
kube-apiserver) must match the annotation's claim.
98+
- Vault `CanIGetRoles` is called with the K8s-attested identity, not
99+
the annotation's, so a mismatched claim fails authorization.
65100

66101
3. **Plugin DS pod and main container restart in the wrong order.** The
67102
on-disk cache covers this: the second CreateContainer attempt for the
68-
same pod UID reuses the stored mapping.
103+
same pod UID reuses the stored credential, no second Vault round-trip.
104+
105+
4. **Force-deleted pod cache leak** — a pod deleted with
106+
`--grace-period=0 --force` does not fire NRI's `RemovePodSandbox`
107+
event. The plugin runs a periodic 5-minute sweep that lists pods on
108+
its node via the K8s API and evicts cache entries whose UIDs no
109+
longer exist.
110+
111+
## Schema versioning
112+
113+
The plugin only accepts annotations with `"schema":2`. Schema 1 (the
114+
legacy `wrap_token` design) is rejected with a clear error so an
115+
operator never silently runs in an inconsistent state during upgrade.
116+
117+
**Upgrade path** — when moving from a v1 webhook + v1 plugin
118+
deployment to v2:
119+
120+
1. Set `nri.enabled: false` in helm values and apply. New pods now
121+
inject literal credentials in PodSpec (legacy mode, byte-identical
122+
to pre-NRI behavior).
123+
2. Upgrade the webhook and plugin Deployment/DaemonSet images together.
124+
3. Set `nri.enabled: true` and apply.
125+
126+
If you upgrade hot (without disabling NRI), pods admitted by an old
127+
webhook just before the upgrade will hit the new plugin and be rejected
128+
with `unsupported nri-mapping schema version 1`. Container starts with
129+
placeholder, app crashes with bad cred, kubelet restarts it. Within a
130+
few seconds the new webhook is admitting v2 annotations and recovery
131+
is automatic — but expect ~30 seconds of pod CrashLoop noise during the
132+
window. Cleaner to drain.
69133

70134
## Hardening checklist
71135

72136
- Set resource requests on the DS so it is not OOM-killed on memory pressure
73137
- Use `priorityClass: system-node-critical` to make eviction less likely
74-
- Monitor the alert above
75-
- Set `nri.wrapTokenTTL` higher than the worst-case scheduling + image pull
76-
time on your cluster (default 5min is fine for most clusters)
138+
- Monitor `NRIPluginMissingOnNode` (above) and
139+
`vdbi_nri_unwrap_failures_total{reason="fetch_error"}`
140+
- Apply the Kyverno policy at
141+
[helm/policies/kyverno-restrict-nri-socket.yaml](../../helm/policies/kyverno-restrict-nri-socket.yaml)
142+
to block hostPath mounts of `/var/run/nri` and `/opt/nri` outside the
143+
plugin's namespace
144+
- On RHEL/CoreOS leave SELinux enforcing; do not run user pods with
145+
`seLinuxOptions.type: spc_t`
77146

78147
## Trust posture
79148

@@ -89,3 +158,11 @@ A root-on-node attacker can already read `/proc/<pid>/environ` of every
89158
container, so the cache adds no new attack surface beyond what root already
90159
has. The cache is **never on persistent disk** (tmpfs) and **never in
91160
backups** (`/run` is excluded by every node backup tool).
161+
162+
A pod that mounts hostPath `/run` AND runs as UID 0 (root) can read the
163+
cache. PSA `restricted` and `baseline` profiles forbid hostPath mounts
164+
entirely, which is the recommended baseline for user namespaces. The
165+
Kyverno policy referenced above does not currently include
166+
`/run/vault-db-injector` because PSA covers it; if you must keep
167+
`baseline` off and root-on-pod allowed, extend the Kyverno policy
168+
manually.

helm/policies/kyverno-restrict-nri-socket.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -63,9 +63,14 @@ spec:
6363
- /var/run/nri/nri.sock
6464
- /opt/nri
6565
- /opt/nri/plugins
66+
- /run/vault-db-injector
67+
- /run/vault-db-injector/nri
6668
- key: "{{ regex_match('^/var/run/nri/', element.hostPath.path || '') }}"
6769
operator: Equals
6870
value: true
6971
- key: "{{ regex_match('^/opt/nri/', element.hostPath.path || '') }}"
7072
operator: Equals
7173
value: true
74+
- key: "{{ regex_match('^/run/vault-db-injector/', element.hostPath.path || '') }}"
75+
operator: Equals
76+
value: true

helm/templates/configmaps.yaml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -71,7 +71,6 @@ data:
7171
sentryDsn: {{ .Values.vaultDbInjector.configuration.sentryDsn }}
7272
nri:
7373
enabled: true
74-
wrapTokenTTL: {{ .Values.nri.wrapTokenTTL }}
7574
socketPath: /var/run/nri/nri.sock
7675
cachePath: /run/vault-db-injector/nri/cache.json
7776
{{- end }}

helm/templates/deployment-injector.yaml

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -29,8 +29,6 @@ spec:
2929
env:
3030
- name: INJECTOR_NRI_ENABLED
3131
value: "true"
32-
- name: INJECTOR_NRI_WRAP_TOKEN_TTL
33-
value: {{ .Values.nri.wrapTokenTTL | quote }}
3432
{{- end }}
3533
ports:
3634
- containerPort: 8443

helm/values.yml

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -82,8 +82,4 @@ nri:
8282
cpu: 200m
8383
memory: 256Mi
8484
tolerations: []
85-
nodeSelector: {}
86-
# Vault response-wrapping TTL for the wrap tokens that travel through
87-
# the PodSpec. Must outlive the longest expected scheduling + image-pull
88-
# delay on the cluster.
89-
wrapTokenTTL: 5m
85+
nodeSelector: {}

pkg/config/config.go

Lines changed: 4 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,6 @@ package config
22

33
import (
44
"os"
5-
"time"
65

76
"github.com/cockroachdb/errors"
87

@@ -27,9 +26,8 @@ const (
2726
// behavior). When true, the webhook wraps every credential and the NRI
2827
// DaemonSet substitutes placeholders at CreateContainer time.
2928
type NRIConfig struct {
30-
Enabled bool `yaml:"enabled" envconfig:"nri_enabled"`
31-
WrapTokenTTL time.Duration `yaml:"wrapTokenTTL" envconfig:"nri_wrap_token_ttl"`
32-
SocketPath string `yaml:"socketPath" envconfig:"nri_socket_path"`
29+
Enabled bool `yaml:"enabled" envconfig:"nri_enabled"`
30+
SocketPath string `yaml:"socketPath" envconfig:"nri_socket_path"`
3331
// CachePath is the on-disk JSON cache that persists unwrapped credentials
3432
// across plugin DS restarts (hostPath tmpfs). Survives DS pod restart;
3533
// cleared on node reboot. Must be writable by the plugin user.
@@ -79,9 +77,8 @@ func NewConfig(configFile string) (*Config, error) {
7977
DefaultEngine: "databases",
8078
VaultRateLimit: 30,
8179
NRI: NRIConfig{
82-
WrapTokenTTL: 5 * time.Minute,
83-
SocketPath: "/var/run/nri/nri.sock",
84-
CachePath: "/run/vault-db-injector/nri/cache.json",
80+
SocketPath: "/var/run/nri/nri.sock",
81+
CachePath: "/run/vault-db-injector/nri/cache.json",
8582
},
8683
}
8784
if configFile != "" {

pkg/config/config_test.go

Lines changed: 2 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,6 @@ package config
33
import (
44
"os"
55
"testing"
6-
"time"
76

87
"github.com/sirupsen/logrus"
98
"github.com/stretchr/testify/assert"
@@ -397,8 +396,8 @@ vaultSecretPrefix: prefix
397396

398397
func TestNRIConfig_Defaults(t *testing.T) {
399398
cfg := newConfigForNRITest(t)
400-
assert.Equal(t, 5*time.Minute, cfg.NRI.WrapTokenTTL)
401399
assert.Equal(t, "/var/run/nri/nri.sock", cfg.NRI.SocketPath)
400+
assert.Equal(t, "/run/vault-db-injector/nri/cache.json", cfg.NRI.CachePath)
402401
}
403402

404403
func TestNRIConfig_LoadsExplicitValues(t *testing.T) {
@@ -413,7 +412,6 @@ vaultSecretName: secret
413412
vaultSecretPrefix: prefix
414413
nri:
415414
enabled: true
416-
wrapTokenTTL: 10m
417415
socketPath: /custom/nri.sock
418416
`
419417
_, err = tmpfile.WriteString(y)
@@ -423,7 +421,7 @@ nri:
423421
for _, k := range []string{
424422
"INJECTOR_MODE", "INJECTOR_VAULT_ADDRESS", "INJECTOR_VAULT_AUTH_PATH",
425423
"INJECTOR_KUBE_ROLE", "INJECTOR_VAULT_SECRET_NAME", "INJECTOR_VAULT_SECRET_PREFIX",
426-
"INJECTOR_NRI_ENABLED", "INJECTOR_NRI_WRAP_TOKEN_TTL", "INJECTOR_NRI_SOCKET_PATH",
424+
"INJECTOR_NRI_ENABLED", "INJECTOR_NRI_SOCKET_PATH",
427425
} {
428426
t.Setenv(k, "")
429427
os.Unsetenv(k)
@@ -432,6 +430,5 @@ nri:
432430
cfg, err := NewConfig(tmpfile.Name())
433431
require.NoError(t, err)
434432
assert.True(t, cfg.NRI.Enabled)
435-
assert.Equal(t, 10*time.Minute, cfg.NRI.WrapTokenTTL)
436433
assert.Equal(t, "/custom/nri.sock", cfg.NRI.SocketPath)
437434
}

pkg/k8smutator/k8smutator_test.go

Lines changed: 0 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,8 @@
11
package k8smutator
22

33
import (
4-
"context"
54
"strings"
65
"testing"
7-
"time"
86

97
"github.com/numberly/vault-db-injector/pkg/k8s"
108
"github.com/numberly/vault-db-injector/pkg/placeholder"
@@ -293,13 +291,6 @@ func findEnvVar(envs []corev1.EnvVar, key string) string {
293291
return ""
294292
}
295293

296-
// stubWrapper implements vaultWrapper for tests — no real Vault needed.
297-
type stubWrapper struct{ wrapToken string }
298-
299-
func (s *stubWrapper) WrapValues(_ context.Context, _ map[string]string, _ time.Duration) (string, error) {
300-
return s.wrapToken, nil
301-
}
302-
303294
func TestApplyEnvToContainers_NRIEnabled_Classic(t *testing.T) {
304295
pod := &corev1.Pod{
305296
Spec: corev1.PodSpec{

pkg/nri/vault.go

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,17 @@ func fetchAndBuildMapping(ctx context.Context, cfg *config.Config, m k8s.NRIMapp
4444
if err != nil {
4545
return nil, nil, errors.Wrapf(err, "get pod %s/%s", podNamespace, podName)
4646
}
47+
// UID equality closes a name-reuse race: if the original pod was
48+
// force-deleted between admission and CreateContainer, an attacker who
49+
// can recreate a pod with the same name+namespace would otherwise
50+
// hijack the credential fetch. NRI's contextID is the sandbox UID;
51+
// kube-apiserver's pod.UID is the API-recorded UID. They must match.
52+
if string(pod.UID) != contextID {
53+
return nil, nil, errors.Newf(
54+
"pod UID mismatch: NRI sandbox UID %s != API pod UID %s for %s/%s — refusing to fetch credentials",
55+
contextID, string(pod.UID), podNamespace, podName,
56+
)
57+
}
4758
actualSA := pod.Spec.ServiceAccountName
4859
if actualSA == "" {
4960
actualSA = "default"

pkg/vault/vault.go

Lines changed: 0 additions & 62 deletions
Original file line numberDiff line numberDiff line change
@@ -244,65 +244,3 @@ func sliceToStrings(slice []any) ([]string, error) {
244244
return stringSlice, nil
245245
}
246246

247-
// WrapValues wraps the given key/value payload in Vault and returns the
248-
// resulting wrap token. The token is single-use: the next caller of
249-
// sys/wrapping/unwrap with this token gets the payload and the token dies.
250-
func (c *Connector) WrapValues(ctx context.Context, payload map[string]string, ttl time.Duration) (string, error) {
251-
data := make(map[string]any, len(payload))
252-
for k, v := range payload {
253-
data[k] = v
254-
}
255-
// SetWrappingLookupFunc mutates state on the shared *api.Client. The
256-
// deferred reset to nil restores the client's default wrap behaviour and
257-
// runs on every exit path, including panic unwinds — Go's defer fires
258-
// during stack unwinding before the panic propagates.
259-
//
260-
// Concurrency caveat: two WrapValues calls running in parallel on the
261-
// same *Connector would race on this client-level state. The current
262-
// architecture (per-request *Connector clone with its own login) makes
263-
// this safe in practice. If the invariant ever changes (Connector
264-
// shared across goroutines), serialize WrapValues with a mutex or
265-
// switch to a Connector pool.
266-
c.client.SetWrappingLookupFunc(func(operation, path string) string {
267-
return ttl.String()
268-
})
269-
defer c.client.SetWrappingLookupFunc(nil)
270-
271-
secret, err := c.client.Logical().WriteWithContext(ctx, "sys/wrapping/wrap", data)
272-
if err != nil {
273-
return "", errors.Wrap(err, "vault wrap")
274-
}
275-
if secret == nil || secret.WrapInfo == nil || secret.WrapInfo.Token == "" {
276-
return "", errors.New("vault wrap returned no token")
277-
}
278-
return secret.WrapInfo.Token, nil
279-
}
280-
281-
// UnwrapValues consumes the given wrap token and returns the previously
282-
// wrapped payload. The token is consumed in Vault on the first call
283-
// regardless of whether this method returns an error: a parse failure
284-
// (non-string field, missing data) leaves the caller without recourse
285-
// because Vault has already burned the token. Callers must therefore
286-
// treat any UnwrapValues error as a permanent loss of those credentials
287-
// and trigger a fresh credential issuance flow.
288-
//
289-
// Calling unwrap twice with the same token always fails on the second
290-
// call.
291-
func (c *Connector) UnwrapValues(ctx context.Context, token string) (map[string]string, error) {
292-
secret, err := c.client.Logical().UnwrapWithContext(ctx, token)
293-
if err != nil {
294-
return nil, errors.Wrap(err, "vault unwrap")
295-
}
296-
if secret == nil || secret.Data == nil {
297-
return nil, errors.New("vault unwrap returned no data")
298-
}
299-
out := make(map[string]string, len(secret.Data))
300-
for k, v := range secret.Data {
301-
s, ok := v.(string)
302-
if !ok {
303-
return nil, errors.Newf("vault unwrap returned non-string for %q", k)
304-
}
305-
out[k] = s
306-
}
307-
return out, nil
308-
}

0 commit comments

Comments
 (0)