chore(evalhub): preserve upstream YAML verbatim in provider/collectio… by gnaulak-redhat · Pull Request #733 · trustyai-explainability/trustyai-service-operator

gnaulak-redhat · 2026-05-13T05:37:53Z

…n ConfigMaps

Switch sync script to embed raw upstream content instead of yaml.dump round-trip. This preserves block scalar styles (|-), UTF-8 characters (em dashes, etc.), key ordering, and all formatting exactly as authored upstream.

Sync eval-hub/eval-hub main config provider yaml files.
These files are auto-synced with hack/sync-evalhub-providers.py script

This is in part to fix the ci pipeline in eval-hub/eval-hub repository side

Run python scripts/check_configmap_sync.py
Fetching ConfigMap listing from trustyai-explainability/trustyai-service-operator...
Found 9 ConfigMap(s) to check.

OK collection-leaderboard-v2.yaml <-> config/collections/leaderboard-v2.yaml
OK collection-safety-and-fairness-v1.yaml <-> config/collections/safety-and-fairness-v1.yaml
OK collection-toxicity-and-ethical-principles.yaml <-> config/collections/toxicity-and-ethical-principles.yaml

Drift detected:

provider-garak-kfp.yaml vs config/providers/garak-kfp.yaml:
~ runtime.local.command: remote='true' local='python tests/features/test_data/runtime/main.py'
provider-garak.yaml vs config/providers/garak.yaml:
~ runtime.local.command: remote='true' local='python tests/features/test_data/runtime/main.py'
provider-guidellm.yaml vs config/providers/guidellm.yaml:
~ runtime.local.command: remote='true' local='python tests/features/test_data/runtime/main.py'
provider-ibm-clear.yaml vs config/providers/ibm-clear.yaml:
~ runtime.local.command: remote='true' local='python tests/features/test_data/runtime/main.py'
provider-lighteval.yaml vs config/providers/lighteval.yaml:
~ runtime.local.command: remote='true' local='python tests/features/test_data/runtime/main.py'
provider-lm-evaluation-harness.yaml vs config/providers/lm_evaluation_harness.yaml:
~ runtime.local.command: remote='true' local='python tests/features/test_data/runtime/main.py'
Error: Process completed with exit code 1.

Summary by CodeRabbit

Chores
- Normalized YAML formatting and indentation across evaluation provider and collection configurations for improved consistency.
- Standardized numeric precision formatting in configuration thresholds.
Updates
- Updated runtime test execution commands to reference actual test files instead of placeholder operations across multiple evaluation providers.

…n ConfigMaps Switch sync script to embed raw upstream content instead of yaml.dump round-trip. This preserves block scalar styles (|-), UTF-8 characters (em dashes, etc.), key ordering, and all formatting exactly as authored upstream. Co-Authored-By: Claude <noreply@anthropic.com>

openshift-ci · 2026-05-13T05:37:59Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

coderabbitai · 2026-05-13T05:38:06Z

📝 Walkthrough

Walkthrough

The PR regenerates EvalHub provider and collection ConfigMaps with normalized YAML formatting and local test runtime wiring. The sync script is updated to embed upstream YAML directly instead of re-serializing, which causes all ConfigMaps to be regenerated with consistent formatting. All five provider ConfigMaps wire their local runtime to a test runner script and restructure K8s entrypoints; collection ConfigMaps receive YAML formatting normalization.

Changes

EvalHub ConfigMap YAML Formatting and Runtime

Layer / File(s)	Summary
Sync script to embed upstream YAML directly `hack/sync-evalhub-providers.py`	`process_provider` and `process_collection` now embed raw upstream YAML content strings directly instead of re-serializing parsed YAML via `yaml.dump()`; this enables the downstream ConfigMap regeneration.
Provider local and Kubernetes runtime configuration `config/configmaps/evalhub/provider-garak-kfp.yaml`, `provider-garak.yaml`, `provider-guidellm.yaml`, `provider-ibm-clear.yaml`, `provider-lighteval.yaml`	All five provider ConfigMaps wire their local runtime command to `python tests/features/test_data/runtime/main.py` (replacing a no-op `true` placeholder) and restructure Kubernetes entrypoint lists with explicit YAML formatting; base image references and resource settings remain unchanged.
Provider benchmark YAML formatting and restructuring `config/configmaps/evalhub/provider-garak-kfp.yaml`, `provider-garak.yaml`, `provider-guidellm.yaml`, `provider-ibm-clear.yaml`, `provider-lighteval.yaml`	Benchmark sections are reformatted with consistent YAML indentation, block-scalar multi-line descriptions, and inline clarifying comments; lighteval introduces an "Individual benchmarks" section with shared `primary_score` and `pass_criteria` defaults to replace per-category field repetition. All benchmark IDs, metrics, weights, and pass thresholds remain unchanged.
Collection ConfigMap YAML formatting normalization `config/configmaps/evalhub/collection-leaderboard-v2.yaml`, `collection-safety-and-fairness-v1.yaml`, `collection-toxicity-and-ethical-principles.yaml`	Three collection ConfigMaps are reformatted with normalized YAML indentation, explicit numeric precision (e.g., `0.60` vs `0.6`), folded block-scalar descriptions, and added inline comments; all benchmark IDs, provider references, weights, metrics, and thresholds remain semantically unchanged.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

trustyai-explainability/trustyai-service-operator#650: Introduces the sync script for provider/collection ConfigMaps that this PR refactors from re-serialization to raw YAML embedding.
trustyai-explainability/trustyai-service-operator#711: Also reformats benchmark descriptions and YAML structure in the same provider ConfigMaps, overlapping on style normalization.
trustyai-explainability/trustyai-service-operator#705: Prior refactor of the sync script that this PR builds upon by switching from re-serialization to embedding upstream YAML directly.

Suggested labels

project/evalhub, lgtm

Suggested reviewers

tarilabs
ruivieira
julpayne

Poem

🐰 YAML flowed like rabbit burrows, neat and deep—
Raw upstream content now in ConfigMaps we keep,
Test runners hopping local, Kubernetes entrypoints bright,
Benchmarks all reformatted, benchmarks all just right! 🌟

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: preserving upstream YAML verbatim by embedding raw content instead of re-serializing, which directly corresponds to the modifications in hack/sync-evalhub-providers.py and resulting ConfigMap updates.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@config/configmaps/evalhub/provider-guidellm.yaml`:
- Around line 110-130: The YAML fragment for the "poisson" benchmark has a
misplaced comment splitting its fields; move the comment "# Pre-configured
benchmark suites" so that the poisson block's keys remain contiguous: ensure
primary_score and pass_criteria are indented and placed directly under the
poisson entry alongside id, name, description, category, metrics, and tags
(preserve the keys primary_score and pass_criteria under the poisson object) so
the YAML parses correctly and yamllint passes.

In `@config/configmaps/evalhub/provider-lighteval.yaml`:
- Around line 130-148: The YAML for the benchmark "language_understanding" is
broken because the inline comment "# Individual benchmarks" interrupts its
fields; move that comment so it appears after the complete
"language_understanding" block and ensure "primary_score" and "pass_criteria"
are indented at the same level as "id", "name", "description", "category",
"metrics", and "tags" (i.e., siblings under the "language_understanding"
mapping) so keys "primary_score" and "pass_criteria" correctly belong to the
"language_understanding" benchmark.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 305f3d81-6daf-4c1f-84c3-22cb7d537ad0

📥 Commits

Reviewing files that changed from the base of the PR and between d4d151a and 4d0a663.

📒 Files selected for processing (10)

config/configmaps/evalhub/collection-leaderboard-v2.yaml
config/configmaps/evalhub/collection-safety-and-fairness-v1.yaml
config/configmaps/evalhub/collection-toxicity-and-ethical-principles.yaml
config/configmaps/evalhub/provider-garak-kfp.yaml
config/configmaps/evalhub/provider-garak.yaml
config/configmaps/evalhub/provider-guidellm.yaml
config/configmaps/evalhub/provider-ibm-clear.yaml
config/configmaps/evalhub/provider-lighteval.yaml
config/configmaps/evalhub/provider-lm-evaluation-harness.yaml
hack/sync-evalhub-providers.py

coderabbitai · 2026-05-13T05:42:10Z

+      - id: poisson
+        name: Realistic traffic simulation
+        description: Simulates real-world traffic patterns using Poisson-distributed request arrivals.
+        category: performance
+        metrics:
+          - requests_per_second
+          - prompt_tokens_per_second
+          - output_tokens_per_second
+          - mean_ttft_ms
+          - mean_itl_ms
+        tags:
+          - performance
+          - poisson
+          - realistic
+          - guidellm
+      # Pre-configured benchmark suites
+        primary_score:
+          metric: output_tokens_per_second
+          lower_is_better: false
+        pass_criteria:
+          threshold: 10.0


⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Critical YAML structural error: comment breaks benchmark definition.

The comment # Pre-configured benchmark suites at line 125 is placed between the poisson benchmark's tags field (line 124) and its primary_score/pass_criteria fields (lines 126-130). This breaks the YAML structure and will cause either:

YAML parsing errors

The poisson benchmark to be missing its scoring criteria

These criteria to be incorrectly orphaned or attached to the wrong benchmark

The primary_score and pass_criteria (lines 126-130) should be indented at the same level as other poisson benchmark fields and should appear before line 125.

🐛 Proposed fix to move comment after the complete benchmark definition

tags: - performance - poisson - realistic - guidellm - # Pre-configured benchmark suites primary_score: metric: output_tokens_per_second lower_is_better: false pass_criteria: threshold: 10.0 + # Pre-configured benchmark suites - id: quick_perf_test

As per coding guidelines, YAML configuration files in config/ directories must pass yamllint validation as part of CI/CD checks.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@config/configmaps/evalhub/provider-guidellm.yaml` around lines 110 - 130, The YAML fragment for the "poisson" benchmark has a misplaced comment splitting its fields; move the comment "# Pre-configured benchmark suites" so that the poisson block's keys remain contiguous: ensure primary_score and pass_criteria are indented and placed directly under the poisson entry alongside id, name, description, category, metrics, and tags (preserve the keys primary_score and pass_criteria under the poisson object) so the YAML parses correctly and yamllint passes.

coderabbitai · 2026-05-13T05:42:10Z

+      - id: language_understanding
+        name: Language understanding suite
+        description: "Core NLU tasks: grammaticality, sentiment, and paraphrase detection (GLUE)."
+        category: language_understanding
+        metrics:
+          - acc
+          - matthews_correlation
+          - f1
+        tags:
+          - language_understanding
+          - glue
+          - lighteval
+          - suite
+      # Individual benchmarks
+        primary_score:
+          metric: acc
+          lower_is_better: false
+        pass_criteria:
+          threshold: 0.25


⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Critical YAML structural error: comment breaks benchmark definition.

The comment # Individual benchmarks at line 143 is placed between the language_understanding benchmark's tags field (line 142) and its primary_score/pass_criteria fields (lines 144-148). This breaks the YAML structure and will cause the language_understanding category benchmark to be missing its scoring criteria.

The primary_score and pass_criteria (lines 144-148) should be indented at the same level as other language_understanding benchmark fields and should appear before the comment at line 143.

🐛 Proposed fix to move comment after the complete benchmark definition

tags: - language_understanding - glue - lighteval - suite - # Individual benchmarks primary_score: metric: acc lower_is_better: false pass_criteria: threshold: 0.25 + # Individual benchmarks - id: hellaswag

As per coding guidelines, YAML configuration files in config/ directories must pass yamllint validation as part of CI/CD checks.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@config/configmaps/evalhub/provider-lighteval.yaml` around lines 130 - 148, The YAML for the benchmark "language_understanding" is broken because the inline comment "# Individual benchmarks" interrupts its fields; move that comment so it appears after the complete "language_understanding" block and ensure "primary_score" and "pass_criteria" are indented at the same level as "id", "name", "description", "category", "metrics", and "tags" (i.e., siblings under the "language_understanding" mapping) so keys "primary_score" and "pass_criteria" correctly belong to the "language_understanding" benchmark.

openshift-ci · 2026-05-13T07:00:21Z

@gnaulak-redhat: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/trustyai-service-operator-e2e	`4d0a663`	link	true	`/test trustyai-service-operator-e2e`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

coderabbitai Bot reviewed May 13, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(evalhub): preserve upstream YAML verbatim in provider/collectio…#733

chore(evalhub): preserve upstream YAML verbatim in provider/collectio…#733
gnaulak-redhat wants to merge 1 commit into
trustyai-explainability:mainfrom
gnaulak-redhat:chore-sync-raw

gnaulak-redhat commented May 13, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

openshift-ci Bot commented May 13, 2026

Uh oh!

coderabbitai Bot commented May 13, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 13, 2026

Uh oh!

coderabbitai Bot May 13, 2026

Uh oh!

openshift-ci Bot commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

gnaulak-redhat commented May 13, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

openshift-ci Bot commented May 13, 2026

Uh oh!

coderabbitai Bot commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

openshift-ci Bot commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

gnaulak-redhat commented May 13, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 13, 2026 •

edited

Loading