Skip to content

KFLUXINFRA-3694 Add CI to validate tekton-kueue CEL expressions#343

Open
manish-jangra wants to merge 1 commit into
redhat-appstudio:mainfrom
manish-jangra:kueue-github-ci-workflow
Open

KFLUXINFRA-3694 Add CI to validate tekton-kueue CEL expressions#343
manish-jangra wants to merge 1 commit into
redhat-appstudio:mainfrom
manish-jangra:kueue-github-ci-workflow

Conversation

@manish-jangra

Copy link
Copy Markdown
Contributor

Summary

  • Add GitHub Actions workflow (.github/workflows/test-tekton-kueue-config.yaml) that runs on PRs touching components/kueue/**
  • Add Python test script (hack/test-tekton-kueue-config.py) that validates tekton-kueue CEL expressions using the actual container via Podman

Test scenarios

The test covers all boundary conditions of the CEL configuration:

Test case What it validates
signing_pipeline Full match: rate-limited=true + group=signing-server gets resource annotation
non_signing_pipeline Empty labels: no resource annotation added
no_labels_pipeline No labels field: has(labels) check fails gracefully
rate_limited_false rate-limited=false: value check rejects correctly
partial_labels_pipeline Wrong group value: group check rejects correctly

All tests also validate that pipelines-queue and konflux-default priority are unconditionally assigned.

Test plan

  • Verified locally with Podman: signing pipeline gets correct annotations/labels
  • Verified locally: non-signing pipeline does NOT get resource annotation
  • Python syntax validated
  • Pre-commit hooks pass
  • CI runs successfully on this PR (workflow self-triggers on components/kueue/** changes)

Note

The internal-production config path will be available once #341 merges. Until then, the CI prerequisite check will correctly flag its absence.

Made with Cursor

@openshift-ci

openshift-ci Bot commented May 8, 2026

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: manish-jangra

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved label May 8, 2026
@qodo-code-review

Copy link
Copy Markdown

Review Summary by Qodo

Add CI workflow and tests for tekton-kueue CEL expressions

🧪 Tests ✨ Enhancement

Grey Divider

Walkthroughs

Description
• Add GitHub Actions workflow to validate tekton-kueue CEL expressions
• Create comprehensive Python test script with 5 test scenarios
• Test mutations using actual tekton-kueue container via Podman
• Validate queue-name labels, priority classes, and resource annotations
• Support both internal-staging and internal-production configurations
Diagram
flowchart LR
  PR["PR triggers on kueue changes"]
  WF["GitHub Actions Workflow"]
  PY["Python Test Script"]
  PREREQ["Check Prerequisites"]
  PODMAN["Run Podman Container"]
  VALIDATE["Validate Mutations"]
  REPORT["Report Results"]
  
  PR --> WF
  WF --> PREREQ
  PREREQ --> PY
  PY --> PODMAN
  PODMAN --> VALIDATE
  VALIDATE --> REPORT
Loading

Grey Divider

File Changes

1. .github/workflows/test-tekton-kueue-config.yaml ⚙️ Configuration changes +45/-0

GitHub Actions workflow for tekton-kueue CI validation

• New GitHub Actions workflow triggered on PRs modifying components/kueue/**
• Sets up Python 3, installs PyYAML and Podman dependencies
• Runs prerequisite checks and executes tekton-kueue configuration tests
• Reports success upon completion of all CEL expression validations

.github/workflows/test-tekton-kueue-config.yaml


2. hack/test-tekton-kueue-config.py 🧪 Tests +493/-0

Python test script for tekton-kueue mutation validation

• Comprehensive Python test suite with 5 PipelineRun test scenarios covering boundary conditions
• Validates CEL expressions by running actual tekton-kueue container via Podman
• Tests both internal-staging and internal-production configurations (10 total test combinations)
• Verifies queue-name labels, priority classes, and resource annotations for signing pipeline
 rate-limiting
• Includes prerequisite validation for Podman and configuration file availability

hack/test-tekton-kueue-config.py


Grey Divider

Qodo Logo

@qodo-code-review

qodo-code-review Bot commented May 8, 2026

Copy link
Copy Markdown

Code Review by Qodo

🐞 Bugs (6) 📘 Rule violations (0)

Grey Divider


Action required

1. Missing prod config blocks CI 🐞 Bug ☼ Reliability
Description
The workflow runs --check-setup, and check_prerequisites() hard-fails if *any* entry in
CONFIG_COMBINATIONS points to a missing config/kustomization file. Because CONFIG_COMBINATIONS
includes internal-production paths, this CI job will fail until those files land, preventing the
new check from ever going green.
Code

hack/test-tekton-kueue-config.py[R302-313]

+CONFIG_COMBINATIONS: Dict[str, ConfigCombination] = {
+    "internal-staging": {
+        "name": "Internal Staging",
+        "config_file": "components/kueue/internal-staging/tekton-kueue/config.yaml",
+        "kustomization_file": "components/kueue/internal-staging/tekton-kueue/kustomization.yaml"
+    },
+    "internal-production": {
+        "name": "Internal Production",
+        "config_file": "components/kueue/internal-production/tekton-kueue/config.yaml",
+        "kustomization_file": "components/kueue/internal-production/tekton-kueue/kustomization.yaml"
+    }
+}
Evidence
validate_config_combination() raises when a referenced config/kustomization path does not exist,
and check_prerequisites() iterates over all CONFIG_COMBINATIONS and converts any such failure
into a fatal RuntimeError. The workflow always executes --check-setup before running tests, so
the job fails as soon as one config combination is missing.

hack/test-tekton-kueue-config.py[110-123]
hack/test-tekton-kueue-config.py[132-153]
hack/test-tekton-kueue-config.py[302-364]
.github/workflows/test-tekton-kueue-config.yaml[34-40]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
The CI workflow calls `python hack/test-tekton-kueue-config.py --check-setup`, but `check_prerequisites()` currently fails the entire run if any configured environment is missing on disk. Since `CONFIG_COMBINATIONS` includes `internal-production`, the workflow will fail until production files exist.

### Issue Context
This script is intended to validate CEL expressions for the environments present in-repo. Right now it treats `internal-production` as mandatory.

### Fix Focus Areas
- hack/test-tekton-kueue-config.py[110-159]
- hack/test-tekton-kueue-config.py[302-365]
- .github/workflows/test-tekton-kueue-config.yaml[34-40]

### Suggested fix
- Make `check_prerequisites()` tolerate missing combinations:
 - If `validate_config_combination()` fails with `FileNotFoundError`, record a warning message and **continue**, rather than raising.
 - If **no** configs are available after filtering, fail with a clear error.
- Ensure tests only run for available configs:
 - Either generate `TEST_COMBINATIONS` dynamically from `processed_configs`, or skip combinations whose `config_key` wasn’t processed.
- (Optional) Keep `--check-setup` as a non-blocking informational step if you still want to report missing future environments, but do not fail the job until the files are present.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended

2. Checkout version inconsistency 🐞 Bug ⚙ Maintainability ⭐ New
Description
The new workflow uses actions/checkout@v6 while all existing workflows in this repo use
actions/checkout@v4, creating an inconsistent baseline and increasing the chance the job fails or
behaves differently than the rest of CI.
Code

.github/workflows/test-tekton-kueue-config.yaml[R17-18]

+      - name: Checkout code
+        uses: actions/checkout@v6
Evidence
The newly added workflow checks out with actions/checkout@v6, but the repo’s other workflows
consistently reference actions/checkout@v4, indicating this workflow is an outlier and may not
match the CI environment assumptions used elsewhere.

.github/workflows/test-tekton-kueue-config.yaml[17-18]
.github/workflows/chainsaw-tests.yaml[18-20]
.github/workflows/yamllint.yaml[12-14]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
The workflow uses a different major version of `actions/checkout` than the rest of the repository’s workflows, which can lead to inconsistent CI behavior.

### Issue Context
Other workflows in this repo use `actions/checkout@v4`.

### Fix Focus Areas
- .github/workflows/test-tekton-kueue-config.yaml[17-18]

### Suggested change
Update the workflow to use the same checkout major version as other workflows (e.g. `actions/checkout@v4`) unless there is a specific reason to diverge (in which case, document why).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


3. Workflow timeout mismatch 🐞 Bug ☼ Reliability ⭐ New
Description
The GitHub Actions job times out after 15 minutes, but the Python test runner allows each Podman
mutation run to block for up to 300 seconds and executes multiple combinations, so a few hung runs
can cause the job to hit the global timeout before the script reports which test timed out.
Code

.github/workflows/test-tekton-kueue-config.yaml[R13-16]

+    name: Test Tekton-Kueue CEL expressions
+    runs-on: ubuntu-latest
+    timeout-minutes: 15
+    steps:
Evidence
The workflow sets a 15-minute job timeout. The test runner runs multiple test combinations and each
podman run has a 300s timeout; in a degraded/hung scenario, cumulative time can exceed the job
timeout and the job will be killed without the script’s per-test timeout context.

.github/workflows/test-tekton-kueue-config.yaml[13-16]
hack/test-tekton-kueue-config.py[329-373]
hack/test-tekton-kueue-config.py[421-427]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
The workflow-level timeout (15 minutes) can terminate the job before the Python test runner’s per-test timeout and failure reporting can identify which mutation run hung.

### Issue Context
- The workflow sets `timeout-minutes: 15`.
- The test runner can execute multiple combinations.
- Each Podman run can block for up to 300 seconds.

### Fix Focus Areas
- .github/workflows/test-tekton-kueue-config.yaml[13-16]
- hack/test-tekton-kueue-config.py[329-373]
- hack/test-tekton-kueue-config.py[421-427]

### Suggested change
Either:
1) Increase the workflow `timeout-minutes` to comfortably exceed the worst-case runtime, or
2) Reduce the per-mutation Podman timeout (and/or number of combinations), so the script can fail fast and report the exact failing test before the workflow kills the job.

Prefer an approach that ensures per-test timeout errors are surfaced in logs.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


4. No timeout on podman run 🐞 Bug ☼ Reliability
Description
run_mutation_test() invokes podman run via subprocess.run(...) without a timeout, so a stuck
image pull or hung container can block the job until GitHub’s job-level timeout is hit. This can
waste runner time and make the check flaky.
Code

hack/test-tekton-kueue-config.py[R403-413]

+            cmd = [
+                "podman", "run", "--rm",
+                "-v", f"{temp_dir}:/workspace:z",
+                test_config.image,
+                "mutate",
+                "--pipelinerun-file", "/workspace/pipelinerun.yaml",
+                "--config-dir", "/workspace"
+            ]
+
+            result = subprocess.run(cmd, capture_output=True, text=True)
+
Evidence
The external podman run call is executed without a timeout=... parameter, so Python will wait
indefinitely for it to return.

hack/test-tekton-kueue-config.py[403-413]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
The test harness can hang indefinitely waiting for `podman run` to finish because `subprocess.run()` is called without a timeout.

### Issue Context
CI should fail fast with a clear message when image pulls or container execution stalls.

### Fix Focus Areas
- hack/test-tekton-kueue-config.py[403-419]

### Suggested fix
- Add a reasonable timeout, e.g.:
 - `subprocess.run(cmd, capture_output=True, text=True, timeout=300)`
- Catch `subprocess.TimeoutExpired` and fail the test with a helpful message that includes the command and partial output.
- (Optional) Also set a job-level `timeout-minutes` in the workflow as a second safety net.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


View more (1)
5. Mutated YAML shape unchecked 🐞 Bug ⚙ Maintainability
Description
After yaml.safe_load(result.stdout), the code assumes the parsed value is a dict and later calls
mutated.get(...). If the tool outputs empty YAML or a non-mapping YAML structure, the test crashes
with AttributeError instead of producing a clear assertion failure.
Code

hack/test-tekton-kueue-config.py[R421-446]

+            try:
+                mutated = yaml.safe_load(result.stdout)
+            except yaml.YAMLError as e:
+                self.fail(f"Failed to parse mutated YAML: {e}\nOutput was: {result.stdout}")
+
+            return mutated
+
+    def validate_mutation_result(self, test_key: str, test_combination: TestCombination) -> None:
+        """Validate mutation results for a single test combination."""
+        with self.subTest(test=test_key):
+            config_key = test_combination["config_key"]
+            test_config = self.processed_configs[config_key]
+            pipelinerun_key = test_combination["pipelinerun_key"]
+            pipelinerun_data = PIPELINERUN_DEFINITIONS[pipelinerun_key]
+
+            print(f"  Running '{test_key}': {pipelinerun_data['name']} with {CONFIG_COMBINATIONS[config_key]['name']}")
+
+            mutated = self.run_mutation_test(test_combination)
+
+            expected = pipelinerun_data["expected"]
+            original_metadata = pipelinerun_data["pipelinerun"].get("metadata", {})
+            original_annotations = original_metadata.get("annotations", {}) or {}
+            original_labels = original_metadata.get("labels", {}) or {}
+
+            annotations = mutated.get("metadata", {}).get("annotations", {}) or {}
+            expected_annotations = expected["annotations"]
Evidence
run_mutation_test() returns the raw yaml.safe_load(...) result without validating its type, and
validate_mutation_result() immediately calls .get() on it.

hack/test-tekton-kueue-config.py[421-426]
hack/test-tekton-kueue-config.py[438-446]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
The test assumes the container always prints a single YAML mapping to stdout. If stdout is empty or not a mapping, later code fails with an `AttributeError` that obscures the real problem.

### Issue Context
This script is executing an external container; defensive parsing improves CI debuggability.

### Fix Focus Areas
- hack/test-tekton-kueue-config.py[421-427]
- hack/test-tekton-kueue-config.py[438-456]

### Suggested fix
- After `yaml.safe_load`, validate:
 - `if not isinstance(mutated, dict): self.fail(f"Expected YAML mapping, got {type(mutated).__name__}: {result.stdout!r}")`
- Optionally also validate the presence of `metadata` and that it’s a dict, and fail with an actionable message if not.
- If multiple YAML documents are expected, switch to `yaml.safe_load_all` and validate exactly one document is produced.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Advisory comments

6. Mixed pip invocation 🐞 Bug ⚙ Maintainability ⭐ New
Description
The workflow upgrades pip via python -m pip but installs PyYAML via bare pip, which is
inconsistent and makes it less clear which interpreter/environment the dependency is being installed
into.
Code

.github/workflows/test-tekton-kueue-config.yaml[R25-28]

+      - name: Install Python dependencies
+        run: |
+          python -m pip install --upgrade pip
+          pip install PyYAML
Evidence
The dependency installation step mixes python -m pip and pip, which reduces clarity and can be
avoided by consistently using python -m pip for both commands.

.github/workflows/test-tekton-kueue-config.yaml[25-28]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
The workflow uses two different pip invocation styles in the same step.

### Issue Context
This is a CI workflow; being explicit about the interpreter improves readability and reduces ambiguity.

### Fix Focus Areas
- .github/workflows/test-tekton-kueue-config.yaml[25-28]

### Suggested change
Replace `pip install PyYAML` with `python -m pip install PyYAML` (and optionally add `--no-cache-dir` if desired).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

Previous review results

Review updated until commit b67578a

Results up to commit 549ecab


🐞 Bugs (3) 📘 Rule violations (0) 📎 Requirement gaps (0)


Action required
1. Missing prod config blocks CI 🐞 Bug ☼ Reliability
Description
The workflow runs --check-setup, and check_prerequisites() hard-fails if *any* entry in
CONFIG_COMBINATIONS points to a missing config/kustomization file. Because CONFIG_COMBINATIONS
includes internal-production paths, this CI job will fail until those files land, preventing the
new check from ever going green.
Code

hack/test-tekton-kueue-config.py[R302-313]

+CONFIG_COMBINATIONS: Dict[str, ConfigCombination] = {
+    "internal-staging": {
+        "name": "Internal Staging",
+        "config_file": "components/kueue/internal-staging/tekton-kueue/config.yaml",
+        "kustomization_file": "components/kueue/internal-staging/tekton-kueue/kustomization.yaml"
+    },
+    "internal-production": {
+        "name": "Internal Production",
+        "config_file": "components/kueue/internal-production/tekton-kueue/config.yaml",
+        "kustomization_file": "components/kueue/internal-production/tekton-kueue/kustomization.yaml"
+    }
+}
Evidence
validate_config_combination() raises when a referenced config/kustomization path does not exist,
and check_prerequisites() iterates over all CONFIG_COMBINATIONS and converts any such failure
into a fatal RuntimeError. The workflow always executes --check-setup before running tests, so
the job fails as soon as one config combination is missing.

hack/test-tekton-kueue-config.py[110-123]
hack/test-tekton-kueue-config.py[132-153]
hack/test-tekton-kueue-config.py[302-364]
.github/workflows/test-tekton-kueue-config.yaml[34-40]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
The CI workflow calls `python hack/test-tekton-kueue-config.py --check-setup`, but `check_prerequisites()` currently fails the entire run if any configured environment is missing on disk. Since `CONFIG_COMBINATIONS` includes `internal-production`, the workflow will fail until production files exist.

### Issue Context
This script is intended to validate CEL expressions for the environments present in-repo. Right now it treats `internal-production` as mandatory.

### Fix Focus Areas
- hack/test-tekton-kueue-config.py[110-159]
- hack/test-tekton-kueue-config.py[302-365]
- .github/workflows/test-tekton-kueue-config.yaml[34-40]

### Suggested fix
- Make `check_prerequisites()` tolerate missing combinations:
 - If `validate_config_combination()` fails with `FileNotFoundError`, record a warning message and **continue**, rather than raising.
 - If **no** configs are available after filtering, fail with a clear error.
- Ensure tests only run for available configs:
 - Either generate `TEST_COMBINATIONS` dynamically from `processed_configs`, or skip combinations whose `config_key` wasn’t processed.
- (Optional) Keep `--check-setup` as a non-blocking informational step if you still want to report missing future environments, but do not fail the job until the files are present.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended
2. No timeout on podman run 🐞 Bug ☼ Reliability
Description
run_mutation_test() invokes podman run via subprocess.run(...) without a timeout, so a stuck
image pull or hung container can block the job until GitHub’s job-level timeout is hit. This can
waste runner time and make the check flaky.
Code

hack/test-tekton-kueue-config.py[R403-413]

+            cmd = [
+                "podman", "run", "--rm",
+                "-v", f"{temp_dir}:/workspace:z",
+                test_config.image,
+                "mutate",
+                "--pipelinerun-file", "/workspace/pipelinerun.yaml",
+                "--config-dir", "/workspace"
+            ]
+
+            result = subprocess.run(cmd, capture_output=True, text=True)
+
Evidence
The external podman run call is executed without a timeout=... parameter, so Python will wait
indefinitely for it to return.

hack/test-tekton-kueue-config.py[403-413]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
The test harness can hang indefinitely waiting for `podman run` to finish because `subprocess.run()` is called without a timeout.

### Issue Context
CI should fail fast with a clear message when image pulls or container execution stalls.

### Fix Focus Areas
- hack/test-tekton-kueue-config.py[403-419]

### Suggested fix
- Add a reasonable timeout, e.g.:
 - `subprocess.run(cmd, capture_output=True, text=True, timeout=300)`
- Catch `subprocess.TimeoutExpired` and fail the test with a helpful message that includes the command and partial output.
- (Optional) Also set a job-level `timeout-minutes` in the workflow as a second safety net.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


3. Mutated YAML shape unchecked 🐞 Bug ⚙ Maintainability
Description
After yaml.safe_load(result.stdout), the code assumes the parsed value is a dict and later calls
mutated.get(...). If the tool outputs empty YAML or a non-mapping YAML structure, the test crashes
with AttributeError instead of producing a clear assertion failure.
Code

hack/test-tekton-kueue-config.py[R421-446]

+            try:
+                mutated = yaml.safe_load(result.stdout)
+            except yaml.YAMLError as e:
+                self.fail(f"Failed to parse mutated YAML: {e}\nOutput was: {result.stdout}")
+
+            return mutated
+
+    def validate_mutation_result(self, test_key: str, test_combination: TestCombination) -> None:
+        """Validate mutation results for a single test combination."""
+        with self.subTest(test=test_key):
+            config_key = test_combination["config_key"]
+            test_config = self.processed_configs[config_key]
+            pipelinerun_key = test_combination["pipelinerun_key"]
+            pipelinerun_data = PIPELINERUN_DEFINITIONS[pipelinerun_key]
+
+            print(f"  Running '{test_key}': {pipelinerun_data['name']} with {CONFIG_COMBINATIONS[config_key]['name']}")
+
+            mutated = self.run_mutation_test(test_combination)
+
+            expected = pipelinerun_data["expected"]
+            original_metadata = pipelinerun_data["pipelinerun"].get("metadata", {})
+            original_annotations = original_metadata.get("annotations", {}) or {}
+            original_labels = original_metadata.get("labels", {}) or {}
+
+            annotations = mutated.get("metadata", {}).get("annotations", {}) or {}
+            expected_annotations = expected["annotations"]
Evidence
run_mutation_test() returns the raw yaml.safe_load(...) result without validating its type, and
validate_mutation_result() immediately calls .get() on it.

hack/test-tekton-kueue-config.py[421-426]
hack/test-tekton-kueue-config.py[438-446]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
The test assumes the container always prints a single YAML mapping to stdout. If stdout is empty or not a mapping, later code fails with an `AttributeError` that obscures the real problem.

### Issue Context
This script is executing an external container; defensive parsing improves CI debuggability.

### Fix Focus Areas
- hack/test-tekton-kueue-config.py[421-427]
- hack/test-tekton-kueue-config.py[438-456]

### Suggested fix
- After `yaml.safe_load`, validate:
 - `if not isinstance(mutated, dict): self.fail(f"Expected YAML mapping, got {type(mutated).__name__}: {result.stdout!r}")`
- Optionally also validate the presence of `metadata` and that it’s a dict, and fail with an actionable message if not.
- If multiple YAML documents are expected, switch to `yaml.safe_load_all` and validate exactly one document is produced.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Qodo Logo

Comment thread .github/workflows/test-tekton-kueue-config.yaml Outdated
Comment on lines +308 to +312
"internal-production": {
"name": "Internal Production",
"config_file": "components/kueue/internal-production/tekton-kueue/config.yaml",
"kustomization_file": "components/kueue/internal-production/tekton-kueue/kustomization.yaml"
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't have internal prod

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will skip now, it doesn't find it

Comment on lines +302 to +313
CONFIG_COMBINATIONS: Dict[str, ConfigCombination] = {
"internal-staging": {
"name": "Internal Staging",
"config_file": "components/kueue/internal-staging/tekton-kueue/config.yaml",
"kustomization_file": "components/kueue/internal-staging/tekton-kueue/kustomization.yaml"
},
"internal-production": {
"name": "Internal Production",
"config_file": "components/kueue/internal-production/tekton-kueue/config.yaml",
"kustomization_file": "components/kueue/internal-production/tekton-kueue/kustomization.yaml"
}
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

1. Missing prod config blocks ci 🐞 Bug ☼ Reliability

The workflow runs --check-setup, and check_prerequisites() hard-fails if *any* entry in
CONFIG_COMBINATIONS points to a missing config/kustomization file. Because CONFIG_COMBINATIONS
includes internal-production paths, this CI job will fail until those files land, preventing the
new check from ever going green.
Agent Prompt
### Issue description
The CI workflow calls `python hack/test-tekton-kueue-config.py --check-setup`, but `check_prerequisites()` currently fails the entire run if any configured environment is missing on disk. Since `CONFIG_COMBINATIONS` includes `internal-production`, the workflow will fail until production files exist.

### Issue Context
This script is intended to validate CEL expressions for the environments present in-repo. Right now it treats `internal-production` as mandatory.

### Fix Focus Areas
- hack/test-tekton-kueue-config.py[110-159]
- hack/test-tekton-kueue-config.py[302-365]
- .github/workflows/test-tekton-kueue-config.yaml[34-40]

### Suggested fix
- Make `check_prerequisites()` tolerate missing combinations:
  - If `validate_config_combination()` fails with `FileNotFoundError`, record a warning message and **continue**, rather than raising.
  - If **no** configs are available after filtering, fail with a clear error.
- Ensure tests only run for available configs:
  - Either generate `TEST_COMBINATIONS` dynamically from `processed_configs`, or skip combinations whose `config_key` wasn’t processed.
- (Optional) Keep `--check-setup` as a non-blocking informational step if you still want to report missing future environments, but do not fail the job until the files are present.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Add a GitHub Actions workflow and Python test script that validates the
tekton-kueue CEL configuration for internal-staging and
internal-production. Tests run the actual tekton-kueue container via
Podman to verify mutations produce correct queue-name labels, priority
classes, and resource annotations for signing pipeline rate-limiting.

Addresses review feedback:
- Use actions/checkout@v6 (filariow)
- Tolerate missing config environments gracefully (qodo)
- Add 300s timeout to podman subprocess (qodo)
- Validate YAML output type before accessing fields (qodo)
- Add job-level timeout-minutes: 15 as safety net

Co-authored-by: Cursor <cursoragent@cursor.com>
@manish-jangra manish-jangra force-pushed the kueue-github-ci-workflow branch from 549ecab to b67578a Compare May 8, 2026 11:36
@manish-jangra

Copy link
Copy Markdown
Contributor Author

/review

@qodo-code-review

Copy link
Copy Markdown

PR Reviewer Guide 🔍

Warning

/review is deprecated. Use /agentic_review instead (removed after 2026-05-31).

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 PR contains tests
🔒 Security concerns

Supply chain / unpinned image:
The test runs a container image resolved from kustomization (test_config.image) and executes it in CI via podman run. If the image tag is mutable (e.g., latest or a frequently-updated tag), the workflow could execute different code over time. Consider pinning to an immutable digest in the referenced kustomization to reduce the risk of executing a compromised or unexpected image in CI.

⚡ Recommended focus areas for review

Action Version

The workflow references actions/checkout@v6, which is not a valid major version for that action (commonly v4). This will cause the workflow to fail at runtime unless an internal/custom fork is intended.

- name: Checkout code
  uses: actions/checkout@v6

- name: Set up Python 3
  uses: actions/setup-python@v4
  with:
    python-version: '3.x'
Brittle Assertions

The tests assert full equality for metadata.annotations and metadata.labels after mutation. If the tekton-kueue container adds additional labels/annotations (or preserves/normalizes fields differently), these tests may fail even when the intended CEL behavior is correct. Consider asserting that expected keys exist with expected values (subset match) rather than full dict equality.

annotations = mutated.get("metadata", {}).get("annotations", {}) or {}
expected_annotations = expected["annotations"]
expected_annotations_full = {**original_annotations, **expected_annotations}
self.assertDictEqual(
    annotations,
    expected_annotations_full,
    f"Annotations mismatch for '{test_key}': expected {expected_annotations_full}, got {annotations}"
)

labels = mutated.get("metadata", {}).get("labels", {}) or {}
expected_labels = expected["labels"]
expected_labels_full = {**original_labels, **expected_labels}
self.assertDictEqual(
    labels,
    expected_labels_full,
    f"Labels mismatch for '{test_key}': expected {expected_labels_full}, got {labels}"
)
YAML Serialization

The script writes the PipelineRun YAML using yaml.dump (not safe_dump) which can emit YAML tags/representations depending on data types. While current test data is simple, using yaml.safe_dump would reduce risk of producing unexpected YAML that the container rejects.

with open(pipelinerun_path, 'w') as f:
    yaml.dump(pipelinerun, f, default_flow_style=False)

@manish-jangra

Copy link
Copy Markdown
Contributor Author

/agentic_review

@qodo-code-review

qodo-code-review Bot commented May 8, 2026

Copy link
Copy Markdown

Persistent review updated to latest commit b67578a

@openshift-ci

openshift-ci Bot commented Jun 20, 2026

Copy link
Copy Markdown

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants