chore(evalhub): preserve upstream YAML verbatim in provider/collectio…#733
chore(evalhub): preserve upstream YAML verbatim in provider/collectio…#733gnaulak-redhat wants to merge 1 commit into
Conversation
…n ConfigMaps Switch sync script to embed raw upstream content instead of yaml.dump round-trip. This preserves block scalar styles (|-), UTF-8 characters (em dashes, etc.), key ordering, and all formatting exactly as authored upstream. Co-Authored-By: Claude <noreply@anthropic.com>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
📝 WalkthroughWalkthroughThe PR regenerates EvalHub provider and collection ConfigMaps with normalized YAML formatting and local test runtime wiring. The sync script is updated to embed upstream YAML directly instead of re-serializing, which causes all ConfigMaps to be regenerated with consistent formatting. All five provider ConfigMaps wire their local runtime to a test runner script and restructure K8s entrypoints; collection ConfigMaps receive YAML formatting normalization. ChangesEvalHub ConfigMap YAML Formatting and Runtime
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@config/configmaps/evalhub/provider-guidellm.yaml`:
- Around line 110-130: The YAML fragment for the "poisson" benchmark has a
misplaced comment splitting its fields; move the comment "# Pre-configured
benchmark suites" so that the poisson block's keys remain contiguous: ensure
primary_score and pass_criteria are indented and placed directly under the
poisson entry alongside id, name, description, category, metrics, and tags
(preserve the keys primary_score and pass_criteria under the poisson object) so
the YAML parses correctly and yamllint passes.
In `@config/configmaps/evalhub/provider-lighteval.yaml`:
- Around line 130-148: The YAML for the benchmark "language_understanding" is
broken because the inline comment "# Individual benchmarks" interrupts its
fields; move that comment so it appears after the complete
"language_understanding" block and ensure "primary_score" and "pass_criteria"
are indented at the same level as "id", "name", "description", "category",
"metrics", and "tags" (i.e., siblings under the "language_understanding"
mapping) so keys "primary_score" and "pass_criteria" correctly belong to the
"language_understanding" benchmark.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 305f3d81-6daf-4c1f-84c3-22cb7d537ad0
📒 Files selected for processing (10)
config/configmaps/evalhub/collection-leaderboard-v2.yamlconfig/configmaps/evalhub/collection-safety-and-fairness-v1.yamlconfig/configmaps/evalhub/collection-toxicity-and-ethical-principles.yamlconfig/configmaps/evalhub/provider-garak-kfp.yamlconfig/configmaps/evalhub/provider-garak.yamlconfig/configmaps/evalhub/provider-guidellm.yamlconfig/configmaps/evalhub/provider-ibm-clear.yamlconfig/configmaps/evalhub/provider-lighteval.yamlconfig/configmaps/evalhub/provider-lm-evaluation-harness.yamlhack/sync-evalhub-providers.py
| - id: poisson | ||
| name: Realistic traffic simulation | ||
| description: Simulates real-world traffic patterns using Poisson-distributed request arrivals. | ||
| category: performance | ||
| metrics: | ||
| - requests_per_second | ||
| - prompt_tokens_per_second | ||
| - output_tokens_per_second | ||
| - mean_ttft_ms | ||
| - mean_itl_ms | ||
| tags: | ||
| - performance | ||
| - poisson | ||
| - realistic | ||
| - guidellm | ||
| # Pre-configured benchmark suites | ||
| primary_score: | ||
| metric: output_tokens_per_second | ||
| lower_is_better: false | ||
| pass_criteria: | ||
| threshold: 10.0 |
There was a problem hiding this comment.
Critical YAML structural error: comment breaks benchmark definition.
The comment # Pre-configured benchmark suites at line 125 is placed between the poisson benchmark's tags field (line 124) and its primary_score/pass_criteria fields (lines 126-130). This breaks the YAML structure and will cause either:
- YAML parsing errors
- The
poissonbenchmark to be missing its scoring criteria - These criteria to be incorrectly orphaned or attached to the wrong benchmark
The primary_score and pass_criteria (lines 126-130) should be indented at the same level as other poisson benchmark fields and should appear before line 125.
🐛 Proposed fix to move comment after the complete benchmark definition
tags:
- performance
- poisson
- realistic
- guidellm
- # Pre-configured benchmark suites
primary_score:
metric: output_tokens_per_second
lower_is_better: false
pass_criteria:
threshold: 10.0
+ # Pre-configured benchmark suites
- id: quick_perf_testAs per coding guidelines, YAML configuration files in config/ directories must pass yamllint validation as part of CI/CD checks.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@config/configmaps/evalhub/provider-guidellm.yaml` around lines 110 - 130, The
YAML fragment for the "poisson" benchmark has a misplaced comment splitting its
fields; move the comment "# Pre-configured benchmark suites" so that the poisson
block's keys remain contiguous: ensure primary_score and pass_criteria are
indented and placed directly under the poisson entry alongside id, name,
description, category, metrics, and tags (preserve the keys primary_score and
pass_criteria under the poisson object) so the YAML parses correctly and
yamllint passes.
| - id: language_understanding | ||
| name: Language understanding suite | ||
| description: "Core NLU tasks: grammaticality, sentiment, and paraphrase detection (GLUE)." | ||
| category: language_understanding | ||
| metrics: | ||
| - acc | ||
| - matthews_correlation | ||
| - f1 | ||
| tags: | ||
| - language_understanding | ||
| - glue | ||
| - lighteval | ||
| - suite | ||
| # Individual benchmarks | ||
| primary_score: | ||
| metric: acc | ||
| lower_is_better: false | ||
| pass_criteria: | ||
| threshold: 0.25 |
There was a problem hiding this comment.
Critical YAML structural error: comment breaks benchmark definition.
The comment # Individual benchmarks at line 143 is placed between the language_understanding benchmark's tags field (line 142) and its primary_score/pass_criteria fields (lines 144-148). This breaks the YAML structure and will cause the language_understanding category benchmark to be missing its scoring criteria.
The primary_score and pass_criteria (lines 144-148) should be indented at the same level as other language_understanding benchmark fields and should appear before the comment at line 143.
🐛 Proposed fix to move comment after the complete benchmark definition
tags:
- language_understanding
- glue
- lighteval
- suite
- # Individual benchmarks
primary_score:
metric: acc
lower_is_better: false
pass_criteria:
threshold: 0.25
+ # Individual benchmarks
- id: hellaswagAs per coding guidelines, YAML configuration files in config/ directories must pass yamllint validation as part of CI/CD checks.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@config/configmaps/evalhub/provider-lighteval.yaml` around lines 130 - 148,
The YAML for the benchmark "language_understanding" is broken because the inline
comment "# Individual benchmarks" interrupts its fields; move that comment so it
appears after the complete "language_understanding" block and ensure
"primary_score" and "pass_criteria" are indented at the same level as "id",
"name", "description", "category", "metrics", and "tags" (i.e., siblings under
the "language_understanding" mapping) so keys "primary_score" and
"pass_criteria" correctly belong to the "language_understanding" benchmark.
|
@gnaulak-redhat: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
…n ConfigMaps
Switch sync script to embed raw upstream content instead of yaml.dump round-trip. This preserves block scalar styles (|-), UTF-8 characters (em dashes, etc.), key ordering, and all formatting exactly as authored upstream.
Sync eval-hub/eval-hub main config provider yaml files.
These files are auto-synced with hack/sync-evalhub-providers.py script
This is in part to fix the ci pipeline in eval-hub/eval-hub repository side
Run python scripts/check_configmap_sync.py
Fetching ConfigMap listing from trustyai-explainability/trustyai-service-operator...
Found 9 ConfigMap(s) to check.
OK collection-leaderboard-v2.yaml <-> config/collections/leaderboard-v2.yaml
OK collection-safety-and-fairness-v1.yaml <-> config/collections/safety-and-fairness-v1.yaml
OK collection-toxicity-and-ethical-principles.yaml <-> config/collections/toxicity-and-ethical-principles.yaml
Drift detected:
provider-garak-kfp.yaml vs config/providers/garak-kfp.yaml:
~ runtime.local.command: remote='true' local='python tests/features/test_data/runtime/main.py'
provider-garak.yaml vs config/providers/garak.yaml:
~ runtime.local.command: remote='true' local='python tests/features/test_data/runtime/main.py'
provider-guidellm.yaml vs config/providers/guidellm.yaml:
~ runtime.local.command: remote='true' local='python tests/features/test_data/runtime/main.py'
provider-ibm-clear.yaml vs config/providers/ibm-clear.yaml:
~ runtime.local.command: remote='true' local='python tests/features/test_data/runtime/main.py'
provider-lighteval.yaml vs config/providers/lighteval.yaml:
~ runtime.local.command: remote='true' local='python tests/features/test_data/runtime/main.py'
provider-lm-evaluation-harness.yaml vs config/providers/lm_evaluation_harness.yaml:
~ runtime.local.command: remote='true' local='python tests/features/test_data/runtime/main.py'
Error: Process completed with exit code 1.
Summary by CodeRabbit
Chores
Updates