Skip to content

feat: add container and pod security context support for kubernetes workloads#3101

Open
npow wants to merge 1 commit intomasterfrom
npow/k8s-security-context-clean
Open

feat: add container and pod security context support for kubernetes workloads#3101
npow wants to merge 1 commit intomasterfrom
npow/k8s-security-context-clean

Conversation

@npow
Copy link
Copy Markdown
Collaborator

@npow npow commented Apr 13, 2026

Summary

  • Add security_context and pod_security_context parameters to the @kubernetes decorator
  • Supports setting container-level security context (runAsUser, runAsGroup, capabilities, etc.) and pod-level security context (fsGroup, runAsNonRoot, etc.)
  • Supported in direct K8s jobs, jobsets, and Argo Workflows
  • Configurable via METAFLOW_KUBERNETES_SECURITY_CONTEXT and METAFLOW_KUBERNETES_POD_SECURITY_CONTEXT

Test plan

  • test/unit/test_kubernetes_security_context.py — unit tests for security context parameter handling

🤖 Generated with Claude Code

…orkloads

Add security_context (container-level) and pod_security_context (pod-level)
parameters to the @kubernetes decorator, with environment variable fallbacks
METAFLOW_KUBERNETES_SECURITY_CONTEXT and METAFLOW_KUBERNETES_POD_SECURITY_CONTEXT
for org-wide defaults. Supports all standard Kubernetes security context fields
including runAsUser, runAsGroup, fsGroup, capabilities, etc.

Closes #1262
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 13, 2026

Greptile Summary

This PR adds security_context and pod_security_context parameters to the @kubernetes decorator, wiring them through the direct K8s job, JobSet, and Argo Workflows paths, with env-var fallbacks (METAFLOW_KUBERNETES_SECURITY_CONTEXT / METAFLOW_KUBERNETES_POD_SECURITY_CONTEXT).

  • Pod security context is silently ignored for non-parallel Argo steps. V1PodSecurityContext.to_dict() returns Python snake_case attribute names (e.g. fs_group, run_as_non_root), but podSpecPatch is applied as raw JSON against the Kubernetes API, which requires camelCase (e.g. fsGroup, runAsNonRoot). The parallel path is unaffected because it goes through sanitize_for_serialization. The fix is to wrap the to_dict() result with to_camelcase(), already imported in the file.

Confidence Score: 4/5

Not safe to merge as-is — pod security context is a no-op for non-parallel Argo Workflow steps due to a snake_case/camelCase mismatch in podSpecPatch serialisation.

One P1 bug (Argo non-parallel path ignores pod_security_context silently) needs to be fixed before merge. Direct K8s job, JobSet, and Argo parallel paths are all correct. Remaining findings are P2.

metaflow/plugins/argo/argo_workflows.py — the pod_spec_patch call for the non-parallel template path.

Important Files Changed

Filename Overview
metaflow/plugins/argo/argo_workflows.py Adds pod security context support for both parallel and non-parallel Argo steps; non-parallel path uses to_dict() which emits snake_case keys incompatible with podSpecPatch, silently ignoring the setting. Also introduces dead-code _pod_security_context variable.
metaflow/plugins/kubernetes/kubernetes_job.py Correctly unpacks V1PodSecurityContext into V1PodSpec via **_pod_security_context; serialisation is handled by the Kubernetes SDK so camelCase conversion is not needed here.
metaflow/plugins/kubernetes/kubernetes_jobsets.py Mirrors the kubernetes_job.py pattern for pod security context; the object is passed through sanitize_for_serialization so camelCase conversion is correct.
metaflow/plugins/kubernetes/kubernetes_decorator.py Adds pod_security_context attribute and env-var fallback; truthiness check means an explicit security_context={} from the decorator could be overridden by the env var, contradicting the stated precedence rule.
metaflow/metaflow_config.py Adds KUBERNETES_SECURITY_CONTEXT and KUBERNETES_POD_SECURITY_CONTEXT env vars; defaults to "" rather than None, inconsistent with adjacent config vars.
metaflow/plugins/kubernetes/kubernetes_cli.py Adds --pod-security-context CLI option and threads it through to Kubernetes.execute(); looks correct.
metaflow/plugins/kubernetes/kubernetes.py Propagates pod_security_context through both the single-job and jobset code paths cleanly.
test/unit/test_kubernetes_security_context.py Good coverage for KubernetesJob container and pod security context combinations; includes env-var override tests. The mock_kubernetes_client fixture is defined but never used by any test.

Reviews (1): Last reviewed commit: "feat: add container and pod security con..." | Re-trigger Greptile

Comment on lines +2818 to +2826
.pod_spec_patch(
{
"securityContext": kubernetes_sdk.V1PodSecurityContext(
**pod_security_context
).to_dict()
}
if pod_security_context
else None
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 to_dict() produces snake_case keys — pod security context silently ignored in Argo

V1PodSecurityContext.to_dict() returns Python attribute names in snake_case (e.g. fs_group, run_as_non_root), but podSpecPatch is applied as a raw JSON strategic-merge patch against the Kubernetes API, which requires camelCase (e.g. fsGroup, runAsNonRoot). Kubernetes ignores unrecognised keys, so the pod security context will be accepted without error but have no effect for all non-parallel Argo Workflow steps.

The existing code already imports to_camelcase from metaflow.util and uses it for the container spec. Apply the same treatment here:

Suggested change
.pod_spec_patch(
{
"securityContext": kubernetes_sdk.V1PodSecurityContext(
**pod_security_context
).to_dict()
}
if pod_security_context
else None
)
.pod_spec_patch(
{
"securityContext": to_camelcase(
kubernetes_sdk.V1PodSecurityContext(
**pod_security_context
).to_dict()
)
}
if pod_security_context
else None
)

Comment on lines +2584 to +2591
pod_security_context = resources.get("pod_security_context", None)
_pod_security_context = {}
if pod_security_context is not None and len(pod_security_context) > 0:
_pod_security_context = {
"security_context": kubernetes_sdk.V1PodSecurityContext(
**pod_security_context
)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Unused _pod_security_context variable — dead code

_pod_security_context is computed here but never referenced. For the parallel path, the raw pod_security_context dict is forwarded to KubernetesArgoJobSet (which handles it via JobSetSpec). For the non-parallel path, the pod security context is applied via pod_spec_patch() further down. This block can be removed.

Comment on lines 439 to +441
KUBERNETES_QOS = from_conf("KUBERNETES_QOS", "burstable")
# Default container security context (JSON) for kubernetes pods
KUBERNETES_SECURITY_CONTEXT = from_conf("KUBERNETES_SECURITY_CONTEXT", "")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Inconsistent default: "" instead of None

All adjacent config vars (KUBERNETES_MEMORY, KUBERNETES_DISK, etc.) default to None. Using "" works because the falsiness check in the decorator handles it, but it's inconsistent and slightly harder to reason about. Consider defaulting to None to match the established pattern.

Suggested change
KUBERNETES_QOS = from_conf("KUBERNETES_QOS", "burstable")
# Default container security context (JSON) for kubernetes pods
KUBERNETES_SECURITY_CONTEXT = from_conf("KUBERNETES_SECURITY_CONTEXT", "")
KUBERNETES_SECURITY_CONTEXT = from_conf("KUBERNETES_SECURITY_CONTEXT", None)
# Default pod security context (JSON) for kubernetes pods
KUBERNETES_POD_SECURITY_CONTEXT = from_conf("KUBERNETES_POD_SECURITY_CONTEXT", None)

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Comment on lines +327 to +338
# Security context: decorator takes precedence over env var
if not self.attributes["security_context"] and KUBERNETES_SECURITY_CONTEXT:
self.attributes["security_context"] = json.loads(
KUBERNETES_SECURITY_CONTEXT
)
if (
not self.attributes["pod_security_context"]
and KUBERNETES_POD_SECURITY_CONTEXT
):
self.attributes["pod_security_context"] = json.loads(
KUBERNETES_POD_SECURITY_CONTEXT
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Explicit security_context={} would be overridden by the env var

not {} evaluates to True, so a user who explicitly sets @kubernetes(security_context={}) would have the env var silently take precedence, contradicting the "decorator takes precedence" comment. The same caveat applies to pod_security_context. Consider comparing to None instead of using truthiness, which is also the pattern used for non-JSON options like port and shared_memory.

Comment on lines +9 to +16
@pytest.fixture
def mock_kubernetes_client():
"""Create a mock Kubernetes client that tracks calls to V1SecurityContext and V1PodSecurityContext."""
with patch("metaflow.plugins.kubernetes.kubernetes_job.KubernetesJob") as _:
from kubernetes import client

yield client

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 mock_kubernetes_client fixture is never used

This fixture is defined but not requested by any test class or test function. It can be removed to avoid confusion about its intent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant