Skip to content

update: validation#24

Merged
zdtsw merged 1 commit intoopendatahub-io:mainfrom
zdtsw-forking:chore_1
Mar 4, 2026
Merged

update: validation#24
zdtsw merged 1 commit intoopendatahub-io:mainfrom
zdtsw-forking:chore_1

Conversation

@zdtsw
Copy link
Copy Markdown
Member

@zdtsw zdtsw commented Feb 25, 2026

Description

  • simplify image build (smaller ubi9), no need make python wheel just call script, and set to non-root user for runtime
  • simplify make targets, rename container to image, make env varibale to pass in "suite" to use only "run" target, work for both podman and docker, add support for config and install flake8/autopep8 if not exist in local
  • remove cloud_provider check as it does not nothing but left in suite
  • update README
  • rename llmd_xks_check.py to llmd_xks_preflight.py
  • fix by lint by indentation

How Has This Been Tested?

Merge criteria:

  • The commits are squashed in a cohesive manner and have meaningful messages.
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has manually tested the changes and verified that the changes work

Summary by CodeRabbit

  • New Features

    • Added SUITE variable to run specific validation test suites (all, cluster, operators)
    • Added CONFIG option for custom configuration files during validation
  • Improvements

    • Made LWS operator checks optional for improved validation flexibility
    • Updated build and run commands for simplified usage
    • Simplified documentation with clearer deployment instructions
  • Chores

    • Updated container infrastructure and build tooling

- simplify image build, no need make python wheel just call script
- simplify make targets, make env varibale to pass in "suite"
- remove cloud_provider check as it does not nothing but left in suite
- update README
- rename llmd_xks_check.py to llmd_xks_preflight.py

Signed-off-by: Wen Zhou <wenzhou@redhat.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Feb 25, 2026

📝 Walkthrough

Walkthrough

This PR refactors the validation container infrastructure by switching the base image to Red Hat UBI minimal with microdnf, renames the build target from "container" to "image", introduces a SUITE parameter for test selection, adds optional configuration mounting, updates all related documentation, and renames the validation module from llmd_xks_checks to llmd_xks_preflight.

Changes

Cohort / File(s) Summary
Container Base Image & Package Management
validation/Containerfile
Updated base image from Fedora-based ARG to Red Hat UBI minimal (ubi-minimal:9.5), replaced dnf with microdnf, consolidated package installation into a single RUN step, removed conditional EPEL logic, added non-root user (UID 1001), and updated ENTRYPOINT to execute Python script directly.
Build System & Configuration
validation/Makefile
Added dynamic CONTAINER_TOOL detection (podman/docker preference), introduced SUITE, VOLUME_OPTS, CONFIG, CONFIG_MOUNT, and CONFIG_ARG variables, renamed container target to image, consolidated run targets into single run target parameterized by SUITE, expanded run command with kubeconfig mounting and optional CONFIG support, and added flake8/autopep8 linting setup.
Documentation
validation/README.md
Updated build target references from "container" to "image", simplified base image description, replaced Python script invocation with make-based CONFIG usage, added SUITE parameter documentation with possible values (all, cluster, operators), and commented out CoreWeave/CKS as coming soon.
Test Configuration & Module Rename
validation/llmd_xks_preflight.py, validation/pyproject.toml
Restructured cluster and operators readiness test blocks with explicit optional flags for crd_lwsoperator and operator_lws; updated pyproject.toml to reference llmd_xks_preflight.py instead of llmd_xks_checks.py and updated console script entry point accordingly; minor formatting adjustments throughout test file.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'update: validation' is too generic and vague to clearly convey the main changes in this multi-faceted refactoring. Use a more specific title that captures the primary change, such as 'Refactor validation: simplify container image and Make targets' or 'Validation: switch to UBI9 base and consolidate build targets'.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.


Comment @coderabbitai help to get the list of available commands and usage tips.

@zdtsw
Copy link
Copy Markdown
Member Author

zdtsw commented Feb 25, 2026

cc @kwozyman

@zdtsw zdtsw requested a review from aneeshkp February 25, 2026 13:06
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
validation/Makefile (2)

56-57: Auto-installing packages via bare pip install may pollute the system Python.

If run outside a virtualenv, pip install flake8 / pip install autopep8 installs globally, which may conflict with system-managed packages (especially on systems using externally-managed-environment in Python 3.12+). Consider using pip install --user or python3 -m venv.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@validation/Makefile` around lines 56 - 57, The Makefile currently
auto-installs flake8 with a bare pip install which can pollute system Python;
update the installation step to avoid global installs by either invoking the
installer via the active Python interpreter (e.g., python3 -m pip install --user
...) or by detecting/creating a virtualenv (python3 -m venv ...) before
installing; change the lines around the flake8 invocation (the rule that runs
the shell check for flake8 and calls pip install) to use "python3 -m pip install
--user flake8" or to create/activate a venv and install into it so running
flake8 --max-line-length=$(MAX_LINE_LENGTH) --exclude=build . no longer requires
a global pip install.

42-44: The -it flags will fail in non-interactive environments (e.g., CI).

The run target uses --rm -it, which requires an interactive TTY. If this is ever invoked from a CI pipeline or a non-interactive shell, it will error or hang. Consider -i only, or conditionally adding -t.

Suggested fix
-	$(CONTAINER_TOOL) run --rm -it --volume $(HOST_KUBECONFIG):/tmp/kubeconfig$(VOLUME_OPTS) $(CONFIG_MOUNT) -e KUBECONFIG=/tmp/kubeconfig $(CONTAINER_REPO):$(CONTAINER_TAG) -s $(SUITE) $(CONFIG_ARG)
+	$(CONTAINER_TOOL) run --rm -i --volume $(HOST_KUBECONFIG):/tmp/kubeconfig$(VOLUME_OPTS) $(CONFIG_MOUNT) -e KUBECONFIG=/tmp/kubeconfig $(CONTAINER_REPO):$(CONTAINER_TAG) -s $(SUITE) $(CONFIG_ARG)

Or make it conditional:

INTERACTIVE := $(shell [ -t 0 ] && echo "-it" || echo "-i")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@validation/Makefile` around lines 42 - 44, The run target currently invokes
$(CONTAINER_TOOL) with the interactive flags "-it", which will fail in
non-interactive CI; change the invocation in the run target to avoid forcing a
TTY by either using "-i" only or making the flags conditional (e.g., introduce
an INTERACTIVE variable computed with a shell test like [ -t 0 ] to produce
"-it" when a TTY exists and "-i" or empty otherwise), then replace the hardcoded
"-it" in the run target invocation with that INTERACTIVE variable so the command
works in both local and CI environments.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@validation/README.md`:
- Around line 28-36: Update the Validations table in validation/README.md to
remove or update the stale "cloud_provider" row under "Suite: cluster" so it no
longer promises a PASSED/FAILED test; reference that this check is performed
automatically during initialization in the code path that modifies
self.tests["cluster"] (i.e., remove the cloud_provider entry or annotate it as
"auto-detected during init, not a reported test") and ensure the table text and
any headings match the current behavior of self.tests["cluster"] in the Python
script.

---

Nitpick comments:
In `@validation/Makefile`:
- Around line 56-57: The Makefile currently auto-installs flake8 with a bare pip
install which can pollute system Python; update the installation step to avoid
global installs by either invoking the installer via the active Python
interpreter (e.g., python3 -m pip install --user ...) or by detecting/creating a
virtualenv (python3 -m venv ...) before installing; change the lines around the
flake8 invocation (the rule that runs the shell check for flake8 and calls pip
install) to use "python3 -m pip install --user flake8" or to create/activate a
venv and install into it so running flake8 --max-line-length=$(MAX_LINE_LENGTH)
--exclude=build . no longer requires a global pip install.
- Around line 42-44: The run target currently invokes $(CONTAINER_TOOL) with the
interactive flags "-it", which will fail in non-interactive CI; change the
invocation in the run target to avoid forcing a TTY by either using "-i" only or
making the flags conditional (e.g., introduce an INTERACTIVE variable computed
with a shell test like [ -t 0 ] to produce "-it" when a TTY exists and "-i" or
empty otherwise), then replace the hardcoded "-it" in the run target invocation
with that INTERACTIVE variable so the command works in both local and CI
environments.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between ecf7dab and 7a5fee7.

📒 Files selected for processing (5)
  • validation/Containerfile
  • validation/Makefile
  • validation/README.md
  • validation/llmd_xks_preflight.py
  • validation/pyproject.toml

Comment on lines +28 to +36
make image
```

By default, the container is built on top of latest Fedora container image. If you have an **entitled Red Hat Enterprise Linux system**, you can use UBI9 (Universal Basic Image) as the base:
The container is built on top of UBI9 (Universal Base Image 9.5).

```bash
FROM=registry.access.redhat.com/ubi9:latest make container
```

Notes:
* currently, only UBI version 9 (based on Red Hat Enterprise Linux 9) is supported
* while the base image itself can be pulled without registration, the container image will not build without a valid Red Hat entitlement -- if you are running a registered RHEL system, the entitlement is automatically passed to the container at build time

Regardless of base image, the resulting container image repository (name) and tag can be customized by using `CONTAINER_REPO` and `CONTAINER_TAG` environment variables:
The resulting container image repository (name) and tag can be customized by using `CONTAINER_REPO` and `CONTAINER_TAG` environment variables:

```bash
CONTAINER_REPO=quay.io/myusername/llm-d-xks-preflight CONTAINER_TAG=mytag make container
FROM=registry.access.redhat.com/ubi9:latest CONTAINER_REPO=quay.io/myusername/llm-d-xks-preflight CONTAINER_TAG=mytag make container
CONTAINER_REPO=quay.io/myusername/llm-d-xks-preflight CONTAINER_TAG=mytag make image
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Stale cloud_provider test entry in the Validations table (Line 67).

The cloud_provider row is still listed as a test under "Suite: cluster" in the Validations section (line 67), but this PR removes it from the self.tests["cluster"] dict in the Python script. Users reading this doc will expect a cloud_provider PASSED/FAILED result that will never appear. Consider removing or updating that row to clarify it's automatic detection during initialization, not a reported test.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@validation/README.md` around lines 28 - 36, Update the Validations table in
validation/README.md to remove or update the stale "cloud_provider" row under
"Suite: cluster" so it no longer promises a PASSED/FAILED test; reference that
this check is performed automatically during initialization in the code path
that modifies self.tests["cluster"] (i.e., remove the cloud_provider entry or
annotate it as "auto-detected during init, not a reported test") and ensure the
table text and any headings match the current behavior of self.tests["cluster"]
in the Python script.

@kwozyman
Copy link
Copy Markdown
Contributor

Hi @zdtsw
Just a couple of comments:

  • Does the wheel building actually hurt anything? It came as a requirement from @weaton and I tend to agree: it keeps things somewhat contained and tidy regardless of later packaging
  • If we change base container image from fedora to ubi, will users building from non-RHEL hosts still be able to build the image?
  • remove cloud_provider check as it does not nothing but left in suite: maybe this was bugged, but we need a "cloud test" if we're to add other cloud providers

@zdtsw
Copy link
Copy Markdown
Member Author

zdtsw commented Feb 25, 2026

Hi @zdtsw Just a couple of comments:

  • Does the wheel building actually hurt anything? It came as a requirement from @weaton and I tend to agree: it keeps things somewhat contained and tidy regardless of later packaging
  • If we change base container image from fedora to ubi, will users building from non-RHEL hosts still be able to build the image?
  • remove cloud_provider check as it does not nothing but left in suite: maybe this was bugged, but we need a "cloud test" if we're to add other cloud providers

some thoughts from me:

  • Does the wheel building actually hurt anything? It came as a requirement from @weaton and I tend to agree: it keeps things somewhat contained and tidy regardless of later packaging

No, I do not think building wheel does any harm, except it makes the build a bit complicated.
I was not aware of the requirment of wheel but I would imagine with wheel the plan is to publish it , not just as an internal build step. we can have two targets for make, not necessary to build a wheel in the image and install the wheel. I mean, for the user if they only work with image, they dont really care if it is a python script or a wheel inside of the image. or they could pip install the wheel without needing the image. for the later case, i would guess we need publish it somewhere, because if it is just an internal step to build it, then it get lost after the build.

  • If we change base container image from fedora to ubi, will users building from non-RHEL hosts still be able to build the image?

Yes, I think ubi is free to use, only when user wanna enable EPLE to install RH packages, they require to enable subscription. otherwise, it should work. most of upstream projects use ubi9, e.g https://github.com/llm-d/llm-d-kv-cache/blob/main/Dockerfile#L68

  • remove cloud_provider check as it does not nothing but left in suite: maybe this was bugged, but we need a "cloud test" if we're to add other cloud providers

yes, correct. we need support it eventually. the reason i removed it for now is because i do not see it was called. but i agree with you, we need to get that part in place, so either remove it for now and add the implementation later along with support for CKS or do the implementation now so we keep it.

@kwozyman
Copy link
Copy Markdown
Contributor

kwozyman commented Mar 4, 2026

I just tested, indeed you can build it on top of UBI, I really thought subscription is required for that.
I think we can merge this and we'll readd wheel building if we ever get to that requirement again.

@aneeshkp
Copy link
Copy Markdown
Contributor

aneeshkp commented Mar 4, 2026

/lgtm

@openshift-ci openshift-ci bot added the lgtm label Mar 4, 2026
@openshift-ci
Copy link
Copy Markdown

openshift-ci bot commented Mar 4, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: aneeshkp, zdtsw

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@zdtsw zdtsw merged commit 787292c into opendatahub-io:main Mar 4, 2026
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants