Improve Visibility and Handling of Pod Lifecycle Failures in Kubeflow Pipelines UI

**### Problem**

Currently, when a Kubeflow Pipeline fails due to pod lifecycle issues (such as CrashLoopBackOff, OOMKilled, or ImagePullBackOff), the UI does not clearly indicate the reason for failure.

Instead, the pipeline often appears to be stuck or not progressing, which creates confusion for users. To understand the issue, users need to manually inspect Kubernetes resources using kubectl, which breaks the abstraction that Kubeflow aims to provide.

**### Proposed Improvement**

I would like to propose enhancements to improve failure visibility and handling in the Kubeflow Pipelines UI:

1. Detect and classify pod lifecycle failures into categories:
   - Provisioning failures (ImagePullBackOff, Unschedulable)
   - Runtime failures (CrashLoopBackOff, OOMKilled)
   - Node-level failures (NodeLost, Preempted)

2. Display clear failure reasons directly in the UI:
   - Show error type and message
   - Highlight failed pipeline nodes visually

3. Introduce timeout handling:
   - Prevent pipelines from appearing stuck indefinitely
   - Allow configurable timeout based on failure type

4. Improve user experience:
   - Provide human-readable explanations
   - Optionally suggest possible fixes (e.g., increase memory for OOMKilled)

**### Expected Impact**

- Improved debugging experience for users
- Reduced dependency on Kubernetes CLI tools
- Better alignment with Kubeflow’s goal of abstracting infrastructure complexity

### Additional Context

I am exploring contributing to this area as part of GSoC and would love feedback from maintainers on feasibility and design direction.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve Visibility and Handling of Pod Lifecycle Failures in Kubeflow Pipelines UI #13182

Additional Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Improve Visibility and Handling of Pod Lifecycle Failures in Kubeflow Pipelines UI #13182

Description

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions