feat: prevent containers from running in NRI if we fails to apply policy on it#392
Conversation
e5c558c to
45cfd00
Compare
As part of error handling, when we failed to apply protection on a container, we will fail the container creation flow by default. Users can use NRI_FAILOPEN envvar to change this behavior. Signed-off-by: Sam Wang (holyspectral) <sam.wang@suse.com>
45cfd00 to
1f93cdd
Compare
| "namespace", pod.GetNamespace(), | ||
| "error", err, | ||
| ) | ||
| return nil, fmt.Errorf("failed to add pod container from NRI: %w", err) |
There was a problem hiding this comment.
If we return an error from StartContainer, the container won't start. What happens if we return an error here in Synchronize?
| p := &plugin{ | ||
| logger: logger.With("component", "nri-plugin"), | ||
| resolver: resolver, | ||
| failOpen: os.Getenv("NRI_FAILOPEN") == "true", |
There was a problem hiding this comment.
How should the final user should set this environment variable? I would expect a helm field in values.yaml, WDYT?
There was a problem hiding this comment.
yeah this is something we can talk about. Today we already can specify the environment variable via agent.env but we can also make it more specific by providing a separate option. WDYT?
There was a problem hiding this comment.
yeah this seems an important feature for users, so I would probably add a new helm field
| "containerName", container.GetName(), | ||
| ) | ||
|
|
||
| handleError := func(reason string, err error) error { |
There was a problem hiding this comment.
Can we simplify this a little bit to avoid duplication?
handleError := func(reason string, err error) error {
// fail open defaults
var errNRI error
msg := "container is starting WITHOUT enforcement due to NRI_FAILOPEN"
if !p.failOpen {
errNRI = fmt.Errorf("%s: %w. Runtime-enforcer has prevented the container '%s/%s' from starting. To change this behavior, set environment variable NRI_FAILOPEN to true",
reason, err, pod.GetName(), container.GetName())
msg = errNRI.Error()
}
p.logger.ErrorContext(
ctx,
msg,
"containerID", container.GetId(),
"containerName", container.GetName(),
"podName", pod.GetName(),
"podID", pod.GetUid(),
)
return errNRI
}There was a problem hiding this comment.
In the end, we can use error verbosity in both cases since a container is not protected
| func (r *Resolver) applyPolicyToPodIfPresent(state *podEntry) error { | ||
| policyName := state.policyName() | ||
|
|
||
| // if the policy doesn't have the label we do nothing |
There was a problem hiding this comment.
| // if the pod doesn't have the label we do nothing |
| state.podName(), | ||
| state.podNamespace(), | ||
| policyName, | ||
| // This can happen when the pod runs before the policy is created/reconciled when using GitOps to deploy. |
There was a problem hiding this comment.
It's really up to us what we want to do. I'm also fine with returning an error since the pod will start without protection, but the user might think this is protected...
What this PR does / why we need it:
As part of error handling, when we failed to apply protection on a container, we will fail the container creation flow by default. Users can override this behavior via NRI_FAILOPEN env var.
When a container is prevented from starting, logs can be seen in these places:
In our log:
containerd:
kubernetes:
Which issue(s) this PR fixes
fixes #262
Special notes for your reviewer:
Checklist: