update: validation by zdtsw · Pull Request #24 · opendatahub-io/rhaii-on-xks

zdtsw · 2026-02-25T13:04:46Z

Description

simplify image build (smaller ubi9), no need make python wheel just call script, and set to non-root user for runtime
simplify make targets, rename container to image, make env varibale to pass in "suite" to use only "run" target, work for both podman and docker, add support for config and install flake8/autopep8 if not exist in local
remove cloud_provider check as it does not nothing but left in suite
update README
rename llmd_xks_check.py to llmd_xks_preflight.py
fix by lint by indentation

How Has This Been Tested?

Merge criteria:

The commits are squashed in a cohesive manner and have meaningful messages.
Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
The developer has manually tested the changes and verified that the changes work

Summary by CodeRabbit

New Features
- Added SUITE variable to run specific validation test suites (all, cluster, operators)
- Added CONFIG option for custom configuration files during validation
Improvements
- Made LWS operator checks optional for improved validation flexibility
- Updated build and run commands for simplified usage
- Simplified documentation with clearer deployment instructions
Chores
- Updated container infrastructure and build tooling

- simplify image build, no need make python wheel just call script - simplify make targets, make env varibale to pass in "suite" - remove cloud_provider check as it does not nothing but left in suite - update README - rename llmd_xks_check.py to llmd_xks_preflight.py Signed-off-by: Wen Zhou <wenzhou@redhat.com>

coderabbitai · 2026-02-25T13:05:06Z

📝 Walkthrough

Walkthrough

This PR refactors the validation container infrastructure by switching the base image to Red Hat UBI minimal with microdnf, renames the build target from "container" to "image", introduces a SUITE parameter for test selection, adds optional configuration mounting, updates all related documentation, and renames the validation module from llmd_xks_checks to llmd_xks_preflight.

Changes

Cohort / File(s)	Summary
Container Base Image & Package Management `validation/Containerfile`	Updated base image from Fedora-based ARG to Red Hat UBI minimal (ubi-minimal:9.5), replaced dnf with microdnf, consolidated package installation into a single RUN step, removed conditional EPEL logic, added non-root user (UID 1001), and updated ENTRYPOINT to execute Python script directly.
Build System & Configuration `validation/Makefile`	Added dynamic CONTAINER_TOOL detection (podman/docker preference), introduced SUITE, VOLUME_OPTS, CONFIG, CONFIG_MOUNT, and CONFIG_ARG variables, renamed container target to image, consolidated run targets into single run target parameterized by SUITE, expanded run command with kubeconfig mounting and optional CONFIG support, and added flake8/autopep8 linting setup.
Documentation `validation/README.md`	Updated build target references from "container" to "image", simplified base image description, replaced Python script invocation with make-based CONFIG usage, added SUITE parameter documentation with possible values (all, cluster, operators), and commented out CoreWeave/CKS as coming soon.
Test Configuration & Module Rename `validation/llmd_xks_preflight.py`, `validation/pyproject.toml`	Restructured cluster and operators readiness test blocks with explicit optional flags for crd_lwsoperator and operator_lws; updated pyproject.toml to reference llmd_xks_preflight.py instead of llmd_xks_checks.py and updated console script entry point accordingly; minor formatting adjustments throughout test file.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Title check	❓ Inconclusive	The title 'update: validation' is too generic and vague to clearly convey the main changes in this multi-faceted refactoring.	Use a more specific title that captures the primary change, such as 'Refactor validation: simplify container image and Make targets' or 'Validation: switch to UBI9 base and consolidate build targets'.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

zdtsw · 2026-02-25T13:06:13Z

cc @kwozyman

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

validation/Makefile (2)

56-57: Auto-installing packages via bare pip install may pollute the system Python.

If run outside a virtualenv, pip install flake8 / pip install autopep8 installs globally, which may conflict with system-managed packages (especially on systems using externally-managed-environment in Python 3.12+). Consider using pip install --user or python3 -m venv.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@validation/Makefile` around lines 56 - 57, The Makefile currently
auto-installs flake8 with a bare pip install which can pollute system Python;
update the installation step to avoid global installs by either invoking the
installer via the active Python interpreter (e.g., python3 -m pip install --user
...) or by detecting/creating a virtualenv (python3 -m venv ...) before
installing; change the lines around the flake8 invocation (the rule that runs
the shell check for flake8 and calls pip install) to use "python3 -m pip install
--user flake8" or to create/activate a venv and install into it so running
flake8 --max-line-length=$(MAX_LINE_LENGTH) --exclude=build . no longer requires
a global pip install.

42-44: The -it flags will fail in non-interactive environments (e.g., CI).

The run target uses --rm -it, which requires an interactive TTY. If this is ever invoked from a CI pipeline or a non-interactive shell, it will error or hang. Consider -i only, or conditionally adding -t.

Suggested fix

-	$(CONTAINER_TOOL) run --rm -it --volume $(HOST_KUBECONFIG):/tmp/kubeconfig$(VOLUME_OPTS) $(CONFIG_MOUNT) -e KUBECONFIG=/tmp/kubeconfig $(CONTAINER_REPO):$(CONTAINER_TAG) -s $(SUITE) $(CONFIG_ARG)
+	$(CONTAINER_TOOL) run --rm -i --volume $(HOST_KUBECONFIG):/tmp/kubeconfig$(VOLUME_OPTS) $(CONFIG_MOUNT) -e KUBECONFIG=/tmp/kubeconfig $(CONTAINER_REPO):$(CONTAINER_TAG) -s $(SUITE) $(CONFIG_ARG)

Or make it conditional:

INTERACTIVE := $(shell [ -t 0 ] && echo "-it" || echo "-i")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@validation/Makefile` around lines 42 - 44, The run target currently invokes
$(CONTAINER_TOOL) with the interactive flags "-it", which will fail in
non-interactive CI; change the invocation in the run target to avoid forcing a
TTY by either using "-i" only or making the flags conditional (e.g., introduce
an INTERACTIVE variable computed with a shell test like [ -t 0 ] to produce
"-it" when a TTY exists and "-i" or empty otherwise), then replace the hardcoded
"-it" in the run target invocation with that INTERACTIVE variable so the command
works in both local and CI environments.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@validation/README.md`:
- Around line 28-36: Update the Validations table in validation/README.md to
remove or update the stale "cloud_provider" row under "Suite: cluster" so it no
longer promises a PASSED/FAILED test; reference that this check is performed
automatically during initialization in the code path that modifies
self.tests["cluster"] (i.e., remove the cloud_provider entry or annotate it as
"auto-detected during init, not a reported test") and ensure the table text and
any headings match the current behavior of self.tests["cluster"] in the Python
script.

---

Nitpick comments:
In `@validation/Makefile`:
- Around line 56-57: The Makefile currently auto-installs flake8 with a bare pip
install which can pollute system Python; update the installation step to avoid
global installs by either invoking the installer via the active Python
interpreter (e.g., python3 -m pip install --user ...) or by detecting/creating a
virtualenv (python3 -m venv ...) before installing; change the lines around the
flake8 invocation (the rule that runs the shell check for flake8 and calls pip
install) to use "python3 -m pip install --user flake8" or to create/activate a
venv and install into it so running flake8 --max-line-length=$(MAX_LINE_LENGTH)
--exclude=build . no longer requires a global pip install.
- Around line 42-44: The run target currently invokes $(CONTAINER_TOOL) with the
interactive flags "-it", which will fail in non-interactive CI; change the
invocation in the run target to avoid forcing a TTY by either using "-i" only or
making the flags conditional (e.g., introduce an INTERACTIVE variable computed
with a shell test like [ -t 0 ] to produce "-it" when a TTY exists and "-i" or
empty otherwise), then replace the hardcoded "-it" in the run target invocation
with that INTERACTIVE variable so the command works in both local and CI
environments.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between ecf7dab and 7a5fee7.

📒 Files selected for processing (5)

validation/Containerfile
validation/Makefile
validation/README.md
validation/llmd_xks_preflight.py
validation/pyproject.toml

coderabbitai · 2026-02-25T13:08:52Z

validation/README.md

+make image
 ```

-By default, the container is built on top of latest Fedora container image. If you have an **entitled Red Hat Enterprise Linux system**, you can use UBI9 (Universal Basic Image) as the base:
+The container is built on top of UBI9 (Universal Base Image 9.5).

-```bash
-FROM=registry.access.redhat.com/ubi9:latest make container
-```
-
-Notes:
-  * currently, only UBI version 9 (based on Red Hat Enterprise Linux 9) is supported
-  * while the base image itself can be pulled without registration, the container image will not build without a valid Red Hat entitlement -- if you are running a registered RHEL system, the entitlement is automatically passed to the container at build time
-
-Regardless of base image, the resulting container image repository (name) and tag can be customized by using `CONTAINER_REPO` and `CONTAINER_TAG` environment variables:
+The resulting container image repository (name) and tag can be customized by using `CONTAINER_REPO` and `CONTAINER_TAG` environment variables:

 ```bash
-CONTAINER_REPO=quay.io/myusername/llm-d-xks-preflight CONTAINER_TAG=mytag make container
-FROM=registry.access.redhat.com/ubi9:latest CONTAINER_REPO=quay.io/myusername/llm-d-xks-preflight CONTAINER_TAG=mytag make container
+CONTAINER_REPO=quay.io/myusername/llm-d-xks-preflight CONTAINER_TAG=mytag make image


⚠️ Potential issue | 🟡 Minor

Stale cloud_provider test entry in the Validations table (Line 67).

The cloud_provider row is still listed as a test under "Suite: cluster" in the Validations section (line 67), but this PR removes it from the self.tests["cluster"] dict in the Python script. Users reading this doc will expect a cloud_provider PASSED/FAILED result that will never appear. Consider removing or updating that row to clarify it's automatic detection during initialization, not a reported test.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@validation/README.md` around lines 28 - 36, Update the Validations table in validation/README.md to remove or update the stale "cloud_provider" row under "Suite: cluster" so it no longer promises a PASSED/FAILED test; reference that this check is performed automatically during initialization in the code path that modifies self.tests["cluster"] (i.e., remove the cloud_provider entry or annotate it as "auto-detected during init, not a reported test") and ensure the table text and any headings match the current behavior of self.tests["cluster"] in the Python script.

kwozyman · 2026-02-25T15:03:28Z

Hi @zdtsw
Just a couple of comments:

Does the wheel building actually hurt anything? It came as a requirement from @weaton and I tend to agree: it keeps things somewhat contained and tidy regardless of later packaging
If we change base container image from fedora to ubi, will users building from non-RHEL hosts still be able to build the image?
remove cloud_provider check as it does not nothing but left in suite: maybe this was bugged, but we need a "cloud test" if we're to add other cloud providers

zdtsw · 2026-02-25T16:12:58Z

Hi @zdtsw Just a couple of comments:

Does the wheel building actually hurt anything? It came as a requirement from @weaton and I tend to agree: it keeps things somewhat contained and tidy regardless of later packaging

If we change base container image from fedora to ubi, will users building from non-RHEL hosts still be able to build the image?

remove cloud_provider check as it does not nothing but left in suite: maybe this was bugged, but we need a "cloud test" if we're to add other cloud providers

some thoughts from me:

Does the wheel building actually hurt anything? It came as a requirement from @weaton and I tend to agree: it keeps things somewhat contained and tidy regardless of later packaging

No, I do not think building wheel does any harm, except it makes the build a bit complicated.
I was not aware of the requirment of wheel but I would imagine with wheel the plan is to publish it , not just as an internal build step. we can have two targets for make, not necessary to build a wheel in the image and install the wheel. I mean, for the user if they only work with image, they dont really care if it is a python script or a wheel inside of the image. or they could pip install the wheel without needing the image. for the later case, i would guess we need publish it somewhere, because if it is just an internal step to build it, then it get lost after the build.

If we change base container image from fedora to ubi, will users building from non-RHEL hosts still be able to build the image?

Yes, I think ubi is free to use, only when user wanna enable EPLE to install RH packages, they require to enable subscription. otherwise, it should work. most of upstream projects use ubi9, e.g https://github.com/llm-d/llm-d-kv-cache/blob/main/Dockerfile#L68

remove cloud_provider check as it does not nothing but left in suite: maybe this was bugged, but we need a "cloud test" if we're to add other cloud providers

yes, correct. we need support it eventually. the reason i removed it for now is because i do not see it was called. but i agree with you, we need to get that part in place, so either remove it for now and add the implementation later along with support for CKS or do the implementation now so we keep it.

kwozyman · 2026-03-04T15:45:50Z

I just tested, indeed you can build it on top of UBI, I really thought subscription is required for that.
I think we can merge this and we'll readd wheel building if we ever get to that requirement again.

aneeshkp · 2026-03-04T16:04:20Z

/lgtm

openshift-ci · 2026-03-04T16:04:36Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: aneeshkp, zdtsw

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [aneeshkp,zdtsw]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci bot added the approved label Feb 25, 2026

zdtsw requested a review from aneeshkp February 25, 2026 13:06

coderabbitai bot reviewed Feb 25, 2026

View reviewed changes

openshift-ci bot assigned aneeshkp Mar 4, 2026

aneeshkp approved these changes Mar 4, 2026

View reviewed changes

openshift-ci bot added the lgtm label Mar 4, 2026

zdtsw merged commit 787292c into opendatahub-io:main Mar 4, 2026
2 of 3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

update: validation#24

update: validation#24
zdtsw merged 1 commit intoopendatahub-io:mainfrom
zdtsw-forking:chore_1

zdtsw commented Feb 25, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Feb 25, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 inconclusive)

Uh oh!

zdtsw commented Feb 25, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Feb 25, 2026

Uh oh!

kwozyman commented Feb 25, 2026

Uh oh!

zdtsw commented Feb 25, 2026

Uh oh!

kwozyman commented Mar 4, 2026

Uh oh!

aneeshkp commented Mar 4, 2026

Uh oh!

openshift-ci bot commented Mar 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

zdtsw commented Feb 25, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

How Has This Been Tested?

Merge criteria:

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 inconclusive)

Uh oh!

zdtsw commented Feb 25, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

kwozyman commented Feb 25, 2026

Uh oh!

zdtsw commented Feb 25, 2026

Uh oh!

kwozyman commented Mar 4, 2026

Uh oh!

aneeshkp commented Mar 4, 2026

Uh oh!

openshift-ci bot commented Mar 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zdtsw commented Feb 25, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 25, 2026 •

edited

Loading