Skip to content

chore(evalhub): preserve upstream YAML verbatim in provider/collectio…#733

Open
gnaulak-redhat wants to merge 1 commit into
trustyai-explainability:mainfrom
gnaulak-redhat:chore-sync-raw
Open

chore(evalhub): preserve upstream YAML verbatim in provider/collectio…#733
gnaulak-redhat wants to merge 1 commit into
trustyai-explainability:mainfrom
gnaulak-redhat:chore-sync-raw

Conversation

@gnaulak-redhat
Copy link
Copy Markdown
Collaborator

@gnaulak-redhat gnaulak-redhat commented May 13, 2026

…n ConfigMaps

Switch sync script to embed raw upstream content instead of yaml.dump round-trip. This preserves block scalar styles (|-), UTF-8 characters (em dashes, etc.), key ordering, and all formatting exactly as authored upstream.


Sync eval-hub/eval-hub main config provider yaml files.
These files are auto-synced with hack/sync-evalhub-providers.py script

This is in part to fix the ci pipeline in eval-hub/eval-hub repository side

Run python scripts/check_configmap_sync.py
Fetching ConfigMap listing from trustyai-explainability/trustyai-service-operator...
Found 9 ConfigMap(s) to check.

OK collection-leaderboard-v2.yaml <-> config/collections/leaderboard-v2.yaml
OK collection-safety-and-fairness-v1.yaml <-> config/collections/safety-and-fairness-v1.yaml
OK collection-toxicity-and-ethical-principles.yaml <-> config/collections/toxicity-and-ethical-principles.yaml

Drift detected:

provider-garak-kfp.yaml vs config/providers/garak-kfp.yaml:
~ runtime.local.command: remote='true' local='python tests/features/test_data/runtime/main.py'
provider-garak.yaml vs config/providers/garak.yaml:
~ runtime.local.command: remote='true' local='python tests/features/test_data/runtime/main.py'
provider-guidellm.yaml vs config/providers/guidellm.yaml:
~ runtime.local.command: remote='true' local='python tests/features/test_data/runtime/main.py'
provider-ibm-clear.yaml vs config/providers/ibm-clear.yaml:
~ runtime.local.command: remote='true' local='python tests/features/test_data/runtime/main.py'
provider-lighteval.yaml vs config/providers/lighteval.yaml:
~ runtime.local.command: remote='true' local='python tests/features/test_data/runtime/main.py'
provider-lm-evaluation-harness.yaml vs config/providers/lm_evaluation_harness.yaml:
~ runtime.local.command: remote='true' local='python tests/features/test_data/runtime/main.py'
Error: Process completed with exit code 1.

Summary by CodeRabbit

  • Chores

    • Normalized YAML formatting and indentation across evaluation provider and collection configurations for improved consistency.
    • Standardized numeric precision formatting in configuration thresholds.
  • Updates

    • Updated runtime test execution commands to reference actual test files instead of placeholder operations across multiple evaluation providers.

Review Change Stack

…n ConfigMaps

Switch sync script to embed raw upstream content instead of yaml.dump round-trip.
This preserves block scalar styles (|-), UTF-8 characters (em dashes, etc.), key
ordering, and all formatting exactly as authored upstream.

Co-Authored-By: Claude <noreply@anthropic.com>
@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented May 13, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 13, 2026

📝 Walkthrough

Walkthrough

The PR regenerates EvalHub provider and collection ConfigMaps with normalized YAML formatting and local test runtime wiring. The sync script is updated to embed upstream YAML directly instead of re-serializing, which causes all ConfigMaps to be regenerated with consistent formatting. All five provider ConfigMaps wire their local runtime to a test runner script and restructure K8s entrypoints; collection ConfigMaps receive YAML formatting normalization.

Changes

EvalHub ConfigMap YAML Formatting and Runtime

Layer / File(s) Summary
Sync script to embed upstream YAML directly
hack/sync-evalhub-providers.py
process_provider and process_collection now embed raw upstream YAML content strings directly instead of re-serializing parsed YAML via yaml.dump(); this enables the downstream ConfigMap regeneration.
Provider local and Kubernetes runtime configuration
config/configmaps/evalhub/provider-garak-kfp.yaml, provider-garak.yaml, provider-guidellm.yaml, provider-ibm-clear.yaml, provider-lighteval.yaml
All five provider ConfigMaps wire their local runtime command to python tests/features/test_data/runtime/main.py (replacing a no-op true placeholder) and restructure Kubernetes entrypoint lists with explicit YAML formatting; base image references and resource settings remain unchanged.
Provider benchmark YAML formatting and restructuring
config/configmaps/evalhub/provider-garak-kfp.yaml, provider-garak.yaml, provider-guidellm.yaml, provider-ibm-clear.yaml, provider-lighteval.yaml
Benchmark sections are reformatted with consistent YAML indentation, block-scalar multi-line descriptions, and inline clarifying comments; lighteval introduces an "Individual benchmarks" section with shared primary_score and pass_criteria defaults to replace per-category field repetition. All benchmark IDs, metrics, weights, and pass thresholds remain unchanged.
Collection ConfigMap YAML formatting normalization
config/configmaps/evalhub/collection-leaderboard-v2.yaml, collection-safety-and-fairness-v1.yaml, collection-toxicity-and-ethical-principles.yaml
Three collection ConfigMaps are reformatted with normalized YAML indentation, explicit numeric precision (e.g., 0.60 vs 0.6), folded block-scalar descriptions, and added inline comments; all benchmark IDs, provider references, weights, metrics, and thresholds remain semantically unchanged.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested labels

project/evalhub, lgtm

Suggested reviewers

  • tarilabs
  • ruivieira
  • julpayne

Poem

🐰 YAML flowed like rabbit burrows, neat and deep—
Raw upstream content now in ConfigMaps we keep,
Test runners hopping local, Kubernetes entrypoints bright,
Benchmarks all reformatted, benchmarks all just right! 🌟

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: preserving upstream YAML verbatim by embedding raw content instead of re-serializing, which directly corresponds to the modifications in hack/sync-evalhub-providers.py and resulting ConfigMap updates.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@config/configmaps/evalhub/provider-guidellm.yaml`:
- Around line 110-130: The YAML fragment for the "poisson" benchmark has a
misplaced comment splitting its fields; move the comment "# Pre-configured
benchmark suites" so that the poisson block's keys remain contiguous: ensure
primary_score and pass_criteria are indented and placed directly under the
poisson entry alongside id, name, description, category, metrics, and tags
(preserve the keys primary_score and pass_criteria under the poisson object) so
the YAML parses correctly and yamllint passes.

In `@config/configmaps/evalhub/provider-lighteval.yaml`:
- Around line 130-148: The YAML for the benchmark "language_understanding" is
broken because the inline comment "# Individual benchmarks" interrupts its
fields; move that comment so it appears after the complete
"language_understanding" block and ensure "primary_score" and "pass_criteria"
are indented at the same level as "id", "name", "description", "category",
"metrics", and "tags" (i.e., siblings under the "language_understanding"
mapping) so keys "primary_score" and "pass_criteria" correctly belong to the
"language_understanding" benchmark.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 305f3d81-6daf-4c1f-84c3-22cb7d537ad0

📥 Commits

Reviewing files that changed from the base of the PR and between d4d151a and 4d0a663.

📒 Files selected for processing (10)
  • config/configmaps/evalhub/collection-leaderboard-v2.yaml
  • config/configmaps/evalhub/collection-safety-and-fairness-v1.yaml
  • config/configmaps/evalhub/collection-toxicity-and-ethical-principles.yaml
  • config/configmaps/evalhub/provider-garak-kfp.yaml
  • config/configmaps/evalhub/provider-garak.yaml
  • config/configmaps/evalhub/provider-guidellm.yaml
  • config/configmaps/evalhub/provider-ibm-clear.yaml
  • config/configmaps/evalhub/provider-lighteval.yaml
  • config/configmaps/evalhub/provider-lm-evaluation-harness.yaml
  • hack/sync-evalhub-providers.py

Comment on lines +110 to +130
- id: poisson
name: Realistic traffic simulation
description: Simulates real-world traffic patterns using Poisson-distributed request arrivals.
category: performance
metrics:
- requests_per_second
- prompt_tokens_per_second
- output_tokens_per_second
- mean_ttft_ms
- mean_itl_ms
tags:
- performance
- poisson
- realistic
- guidellm
# Pre-configured benchmark suites
primary_score:
metric: output_tokens_per_second
lower_is_better: false
pass_criteria:
threshold: 10.0
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Critical YAML structural error: comment breaks benchmark definition.

The comment # Pre-configured benchmark suites at line 125 is placed between the poisson benchmark's tags field (line 124) and its primary_score/pass_criteria fields (lines 126-130). This breaks the YAML structure and will cause either:

  1. YAML parsing errors
  2. The poisson benchmark to be missing its scoring criteria
  3. These criteria to be incorrectly orphaned or attached to the wrong benchmark

The primary_score and pass_criteria (lines 126-130) should be indented at the same level as other poisson benchmark fields and should appear before line 125.

🐛 Proposed fix to move comment after the complete benchmark definition
       tags:
         - performance
         - poisson
         - realistic
         - guidellm
-      # Pre-configured benchmark suites
       primary_score:
         metric: output_tokens_per_second
         lower_is_better: false
       pass_criteria:
         threshold: 10.0
+    # Pre-configured benchmark suites
     - id: quick_perf_test

As per coding guidelines, YAML configuration files in config/ directories must pass yamllint validation as part of CI/CD checks.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@config/configmaps/evalhub/provider-guidellm.yaml` around lines 110 - 130, The
YAML fragment for the "poisson" benchmark has a misplaced comment splitting its
fields; move the comment "# Pre-configured benchmark suites" so that the poisson
block's keys remain contiguous: ensure primary_score and pass_criteria are
indented and placed directly under the poisson entry alongside id, name,
description, category, metrics, and tags (preserve the keys primary_score and
pass_criteria under the poisson object) so the YAML parses correctly and
yamllint passes.

Comment on lines +130 to +148
- id: language_understanding
name: Language understanding suite
description: "Core NLU tasks: grammaticality, sentiment, and paraphrase detection (GLUE)."
category: language_understanding
metrics:
- acc
- matthews_correlation
- f1
tags:
- language_understanding
- glue
- lighteval
- suite
# Individual benchmarks
primary_score:
metric: acc
lower_is_better: false
pass_criteria:
threshold: 0.25
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Critical YAML structural error: comment breaks benchmark definition.

The comment # Individual benchmarks at line 143 is placed between the language_understanding benchmark's tags field (line 142) and its primary_score/pass_criteria fields (lines 144-148). This breaks the YAML structure and will cause the language_understanding category benchmark to be missing its scoring criteria.

The primary_score and pass_criteria (lines 144-148) should be indented at the same level as other language_understanding benchmark fields and should appear before the comment at line 143.

🐛 Proposed fix to move comment after the complete benchmark definition
       tags:
         - language_understanding
         - glue
         - lighteval
         - suite
-    # Individual benchmarks
       primary_score:
         metric: acc
         lower_is_better: false
       pass_criteria:
         threshold: 0.25
+    # Individual benchmarks
     - id: hellaswag

As per coding guidelines, YAML configuration files in config/ directories must pass yamllint validation as part of CI/CD checks.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@config/configmaps/evalhub/provider-lighteval.yaml` around lines 130 - 148,
The YAML for the benchmark "language_understanding" is broken because the inline
comment "# Individual benchmarks" interrupts its fields; move that comment so it
appears after the complete "language_understanding" block and ensure
"primary_score" and "pass_criteria" are indented at the same level as "id",
"name", "description", "category", "metrics", and "tags" (i.e., siblings under
the "language_understanding" mapping) so keys "primary_score" and
"pass_criteria" correctly belong to the "language_understanding" benchmark.

@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented May 13, 2026

@gnaulak-redhat: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/trustyai-service-operator-e2e 4d0a663 link true /test trustyai-service-operator-e2e

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant