Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 42 additions & 0 deletions .github/resources/manifests/base/driver-plugin-cm-path.yaml

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

THis is the fith instance of this file, please use a kustomize component, for example in the argo folder.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this patch not live as patch in the argo folder or so? The goal is to not patch it everywhere 10 times but have it somewhere in the common base.

Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
apiVersion: v1
kind: ConfigMap
metadata:
name: ml-pipeline-driver-agent
data:
sidecar.container: |
name: driver-plugin
image: kind-registry:5000/driver:ci
imagePullPolicy: IfNotPresent
env:
- name: LOG_ACCESS_KEY
valueFrom:
secretKeyRef:
name: mlpipeline-minio-artifact
key: accesskey
- name: LOG_SECRET_KEY
valueFrom:
secretKeyRef:
name: mlpipeline-minio-artifact
key: secretkey
ports:
- containerPort: 8080
resources:
requests:
cpu: "0.1"
memory: "64Mi"
limits:
cpu: "0.5"
memory: "0.5Gi"
securityContext:
runAsNonRoot: true
runAsUser: 65534
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
seccompProfile:
type: RuntimeDefault
volumeMounts:
- name: var-run-argo
mountPath: /kfp/log
readOnly: false

@juliusvonkohout juliusvonkohout May 16, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this patch not live as patch in the argo folder or so? The goal is to not patch it everywhere but have it somewhere in the common base.

Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,10 @@ patches:
target:
kind: Deployment
name: ml-pipeline
- path: ../../base/driver-plugin-cm-path.yaml
target:
kind: ConfigMap
name: ml-pipeline-driver-agent
- path: ../../base/grpc-specs.yaml
target:
kind: Deployment
Expand Down

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this patch not live as patch in the argo folder or so? The goal is to not patch it everywhere but have it somewhere in the common base.

Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,10 @@ patches:
target:
kind: Deployment
name: ml-pipeline
- path: ../../base/driver-plugin-cm-path.yaml
target:
kind: ConfigMap
name: ml-pipeline-driver-agent
- path: ../../base/grpc-specs.yaml
target:
kind: Deployment
Expand Down

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this patch not live as patch in the argo folder or so? The goal is to not patch it everywhere but have it somewhere in the common base.

Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,10 @@ patches:
target:
kind: Deployment
name: ml-pipeline
- path: ../../base/driver-plugin-cm-path.yaml
target:
kind: ConfigMap
name: ml-pipeline-driver-agent
- path: cache-env.yaml
target:
kind: Deployment
Expand Down

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this patch not live as patch in the argo folder or so? The goal is to not patch it everywhere but have it somewhere in the common base.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this patch not live as patch in the argo folder or so? The goal is to not patch it everywhere but have it somewhere in the common base.

Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,10 @@ patches:
target:
kind: Deployment
name: ml-pipeline
- path: ../../base/driver-plugin-cm-path.yaml
target:
kind: ConfigMap
name: ml-pipeline-driver-agent
- path: ../../base/grpc-specs.yaml
target:
kind: Deployment
Expand Down

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this patch not live as patch in the argo folder or so? The goal is to not patch it everywhere but have it somewhere in the common base.

Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,7 @@ patches:
target:
kind: Deployment
name: ml-pipeline
- path: ../../base/driver-plugin-cm-path.yaml
target:
kind: ConfigMap
name: ml-pipeline-driver-agent

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this patch not live as patch in the argo folder or so? The goal is to not patch it everywhere but have it somewhere in the common base.

Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,10 @@ patches:
target:
kind: Deployment
name: ml-pipeline
- path: ../../base/driver-plugin-cm-path.yaml
target:
kind: ConfigMap
name: ml-pipeline-driver-agent
- path: cache-env.yaml
target:
kind: Deployment
Expand Down

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this patch not live as patch in the argo folder or so? The goal is to not patch it everywhere but have it somewhere in the common base.

Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,10 @@ patches:
target:
kind: Deployment
name: ml-pipeline
- path: ../../base/driver-plugin-cm-path.yaml
target:
kind: ConfigMap
name: ml-pipeline-driver-agent
- path: ../../base/grpc-specs.yaml
target:
kind: Deployment
Expand Down

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this patch not live as patch in the argo folder or so? The goal is to not patch it everywhere but have it somewhere in the common base.

Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,10 @@ patches:
target:
kind: Deployment
name: ml-pipeline
- path: ../../base/driver-plugin-cm-path.yaml
target:
kind: ConfigMap
name: ml-pipeline-driver-agent
- path: ../../base/grpc-specs.yaml
target:
kind: Deployment
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
apiVersion: v1
kind: ConfigMap
metadata:
name: ml-pipeline-driver-agent
data:
sidecar.container: |
name: driver-plugin
image: kind-registry:5000/driver:ci
imagePullPolicy: IfNotPresent
ports:
- containerPort: 8080
env:
- name: LOG_ACCESS_KEY
valueFrom:
secretKeyRef:
name: mlpipeline-minio-artifact
key: accesskey
- name: LOG_SECRET_KEY
valueFrom:
secretKeyRef:
name: mlpipeline-minio-artifact
key: secretkey
resources:
requests:
cpu: "0.1"
memory: "64Mi"
limits:
cpu: "0.5"
memory: "0.5Gi"
securityContext:
runAsNonRoot: true
runAsUser: 65534
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
seccompProfile:
type: RuntimeDefault
volumeMounts:
- name: argo-workflows-agent-ca-certificates
mountPath: /kfp/certs
readOnly: true
- name: var-run-argo
mountPath: /kfp/log
readOnly: false

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this patch not live as patch in the argo folder or so? The goal is to not patch it everywhere but have it somewhere in the common base.

Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,10 @@ patches:
target:
kind: Deployment
name: ml-pipeline
- path: driver-plugin-cm-path.yaml
target:
kind: ConfigMap
name: ml-pipeline-driver-agent
- path: ../../base/grpc-specs.yaml
target:
kind: Deployment
Expand Down
39 changes: 38 additions & 1 deletion .github/resources/scripts/collect-logs.sh
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,36 @@ function check_namespace {
return 0
}

function describe_argo_workflows {
local NAMESPACE=$1
echo "===== Argo Workflows Inspection ====="
for wf in $(kubectl get wf -n "$NAMESPACE" -o json | jq -r '.items[] | select(.status.phase=="Failed" or .status.phase=="Running") | .metadata.name'); do

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i recommend proper long expressive human readable and pronounceable variable names

echo "Inspected workflow: $wf"
kubectl get wf "$wf" -n "$NAMESPACE" || true
pods=$(kubectl get po -n "$NAMESPACE" -l "workflows.argoproj.io/workflow=$wf" -o jsonpath='{.items[*].metadata.name}')
for pod in $pods; do
phase=$(kubectl get po "$pod" -n "$NAMESPACE" -o jsonpath='{.status.phase}')
echo "Inspect Pod: $pod, Status: $phase"
if [[ "$phase" != "Pending" && "$phase" != "Succeeded" ]]; then
echo " ---> $pod Logs:"
if [[ "$pod" == *-agent ]]; then
kubectl logs "$pod" -n "$NAMESPACE" -c driver-plugin || true
else
kubectl logs "$pod" -n "$NAMESPACE" || true
fi
fi
echo " ---> Describe $pod:"
if [[ "$phase" != "Succeeded" ]]; then
echo " ---> Describe:"
kubectl describe po "$pod" -n "$NAMESPACE"
fi
done
done
echo "===== Argo Workflows data ====="
kubectl get events -n "${NAMESPACE}" --field-selector involvedObject.kind=Workflow --sort-by='.metadata.creationTimestamp'
echo "==============================="
}

function display_pod_info {
local NAMESPACE=$1

Expand All @@ -52,7 +82,13 @@ function display_pod_info {
kubectl describe pod "${POD_NAME}" -n "${NAMESPACE}" | grep -A 100 Events || echo "No events found for pod ${POD_NAME}."

echo "----- LOGS -----"
kubectl logs "${POD_NAME}" -n "${NAMESPACE}" || echo "No logs found for pod ${POD_NAME}."
if [[ "${POD_NAME}" == *-agent* ]]; then
kubectl logs "${POD_NAME}" -n "${NAMESPACE}" -c driver-plugin || \
echo "No logs found for pod ${POD_NAME}."
else
kubectl logs "${POD_NAME}" -n "${NAMESPACE}" || \
echo "No logs found for pod ${POD_NAME}."
fi

echo "==========================="
echo ""
Expand All @@ -64,6 +100,7 @@ function display_pod_info {

if check_namespace "$NS"; then
display_pod_info "$NS"
describe_argo_workflows "$NS"
else
exit 0
fi
3 changes: 2 additions & 1 deletion .github/resources/scripts/kfp-readiness/wait_for_pods.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,8 @@ def get_pod_statuses():
statuses = {}
for pod in pods.items:
pod_name = pod.metadata.name
if "system" not in pod_name:
# This filter is safe: 'ml-pipeline-persistenceagent-<guid>' will not be excluded and will be processed.
if not ("system" in pod_name or pod_name.endswith("-agent")):
pod_status = pod.status.phase
container_statuses = pod.status.container_statuses or []
ready = 0
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/api-server-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -114,7 +114,7 @@ jobs:
shell: bash
if: ${{ matrix.pod_to_pod_tls_enabled == 'true'}}
run: |
kubectl get secret kfp-api-tls-cert -n kubeflow -o jsonpath='{.data.ca\.crt}' | base64 -d > "${{ github.workspace }}/ca.crt"
kubectl get secret argo-workflows-agent-ca-certificates -n kubeflow -o jsonpath='{.data.ca\.crt}' | base64 -d > "${{ github.workspace }}/ca.crt"
echo "CA_CERT_PATH=${{ github.workspace }}/ca.crt" >> "$GITHUB_ENV"


Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/e2e-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -118,7 +118,7 @@ jobs:
shell: bash
if: ${{ matrix.pod_to_pod_tls_enabled == 'true'}}
run: |
kubectl get secret kfp-api-tls-cert -n kubeflow -o jsonpath='{.data.ca\.crt}' | base64 -d > "${{ github.workspace }}/ca.crt"
kubectl get secret argo-workflows-agent-ca-certificates -n kubeflow -o jsonpath='{.data.ca\.crt}' | base64 -d > "${{ github.workspace }}/ca.crt"
echo "CA_CERT_PATH=${{ github.workspace }}/ca.crt" >> "$GITHUB_ENV"
- name: Configure Input Variables
shell: bash
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/legacy-v2-api-integration-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ jobs:
shell: bash
if: ${{ matrix.pod_to_pod_tls_enabled == 'true' }}
run: |
kubectl get secret kfp-api-tls-cert -n kubeflow -o jsonpath='{.data.ca\.crt}' | base64 -d > "${{ github.workspace }}/ca.crt"
kubectl get secret argo-workflows-agent-ca-certificates -n kubeflow -o jsonpath='{.data.ca\.crt}' | base64 -d > "${{ github.workspace }}/ca.crt"
echo "CA_CERT_PATH=${{ github.workspace }}/ca.crt" >> "$GITHUB_ENV"

- name: Forward MLMD port
Expand Down
2 changes: 1 addition & 1 deletion backend/Dockerfile.driver
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ RUN GO111MODULE=on go mod download

COPY . .

RUN GO111MODULE=on CGO_ENABLED=0 GOOS=linux go build -tags netgo -gcflags="${GCFLAGS}" -ldflags '-extldflags "-static"' -o /bin/driver ./backend/src/v2/cmd/driver/*.go
RUN GO111MODULE=on CGO_ENABLED=0 GOOS=linux go build -tags netgo -gcflags="${GCFLAGS}" -ldflags '-extldflags "-static"' -o /bin/driver ./backend/src/driver/*.go

FROM alpine:3.21

Expand Down
6 changes: 3 additions & 3 deletions backend/src/apiserver/plugins/mlflow/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -233,13 +233,13 @@ func SelectMLflowExperiment(input *MLflowPluginInput, settings *commonmlflow.MLf
return "", DefaultExperimentName
}

// InjectMLflowRuntimeEnv sets KFP_MLFLOW_CONFIG on driver and launcher
// containers.
// InjectMLflowRuntimeEnv passes KFP_MLFLOW_CONFIG to driver plugins through
// runtime args and to launcher containers through env vars.
func InjectMLflowRuntimeEnv(executionSpec util.ExecutionSpec, env map[string]string) error {
if len(env) == 0 || executionSpec == nil {
return nil
}
return executionSpec.UpsertRuntimeEnvVars(env,
return executionSpec.UpsertRuntimeConfig(env,
util.ExecutionRuntimeRoleDriver,
util.ExecutionRuntimeRoleLauncher,
)
Expand Down
18 changes: 9 additions & 9 deletions backend/src/apiserver/plugins/mlflow/config_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -588,13 +588,8 @@ func TestInjectMLflowRuntimeEnv(t *testing.T) {
Spec: workflowapi.WorkflowSpec{
Templates: []workflowapi.Template{
{
Name: "system-dag-driver",
Metadata: workflowapi.Metadata{
Annotations: map[string]string{
util.AnnotationKeyRuntimeRole: string(util.ExecutionRuntimeRoleDriver),
},
},
Container: &corev1.Container{Args: []string{"--type", "DAG"}},
Name: "system-dag-driver",
Plugin: &workflowapi.Plugin{Object: workflowapi.Object{Value: []byte(`{"driver-plugin":{"args":{"type":"DAG"}}}`)}},
},
{
Name: "system-container-impl",
Expand All @@ -620,8 +615,13 @@ func TestInjectMLflowRuntimeEnv(t *testing.T) {

expectedEnv := corev1.EnvVar{Name: commonmlflow.EnvMLflowConfig, Value: env[commonmlflow.EnvMLflowConfig]}

// Driver container gets the env var.
assert.Contains(t, workflow.Spec.Templates[0].Container.Env, expectedEnv)
var pluginConfig map[string]map[string]map[string]interface{}
require.NoError(t, json.Unmarshal(workflow.Spec.Templates[0].Plugin.Value, &pluginConfig))
runtimeArgsJSON, ok := pluginConfig["driver-plugin"]["args"]["runtime_args"].(string)
require.True(t, ok)
var runtimeArgs map[string]string
require.NoError(t, json.Unmarshal([]byte(runtimeArgsJSON), &runtimeArgs))
assert.Equal(t, env[commonmlflow.EnvMLflowConfig], runtimeArgs[commonmlflow.EnvMLflowConfig])

// Launcher main container (template with --copy init container) gets the env var.
assert.Contains(t, workflow.Spec.Templates[1].Container.Env, expectedEnv)
Expand Down
2 changes: 1 addition & 1 deletion backend/src/common/util/consts.go
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ const (

// AnnotationKeyRuntimeRole is set on compiled Argo Workflow templates to
// identify the logical role of the pod (driver, launcher, etc.). It is
// used by UpsertRuntimeEnvVars to target the right containers.
// used by UpsertRuntimeConfig to target the right containers.
AnnotationKeyRuntimeRole = "pipelines.kubeflow.org/runtime-role"

// LabelKeyCacheEnabled is a workflow label key.
Expand Down
Loading
Loading