Skip to content

feat: classify Kueue GPU admission failures with user-facing messages#732

Draft
nbs-rh wants to merge 1 commit into
trustyai-explainability:mainfrom
nbs-rh:provider_gpu_config
Draft

feat: classify Kueue GPU admission failures with user-facing messages#732
nbs-rh wants to merge 1 commit into
trustyai-explainability:mainfrom
nbs-rh:provider_gpu_config

Conversation

@nbs-rh
Copy link
Copy Markdown
Contributor

@nbs-rh nbs-rh commented May 12, 2026

Related to https://redhat.atlassian.net/browse/RHAIRFE-2171

When a Kueue workload is inadmissible (QuotaReserved=False/Inadmissible), the reconciler now distinguishes GPU quota exhaustion from generic queue errors by inspecting the Job's pod spec and the Kueue condition message.

GPU jobs that can't be admitted get message_code=gpu_unavailable with a human-readable explanation; all other admission failures use queue_error.

This avoids surfacing raw cluster internals through the eval-hub API.

… messages

When a Kueue workload is inadmissible (QuotaReserved=False/Inadmissible),
the reconciler now distinguishes GPU quota exhaustion from generic queue
errors by inspecting the Job's pod spec and the Kueue condition message.

GPU jobs that can't be admitted get message_code=gpu_unavailable with a
human-readable explanation; all other admission failures use queue_error.

This avoids surfacing raw cluster internals through the eval-hub API.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented May 12, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented May 12, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 12, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1791a271-b3b8-4460-bdb3-17048615310e

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant