Skip to content

Adding test_group to lm-eval configs#2623

Open
debroy-rh wants to merge 7 commits intovllm-project:mainfrom
debroy-rh:lmeval_conggrp
Open

Adding test_group to lm-eval configs#2623
debroy-rh wants to merge 7 commits intovllm-project:mainfrom
debroy-rh:lmeval_conggrp

Conversation

@debroy-rh
Copy link
Copy Markdown

Adding test_group to the following lm-eval configs:
fp8_dynamic_per_token.yaml

w4a4_nvfp4.yaml

w4a16_actorder_none.yaml

vl_w4a16_actorder_weight.yaml

This is to test rhaiis model-opt image for lm-eval accuracy.

@github-actions
Copy link
Copy Markdown

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 16, 2026

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 9124ff09-509b-4d07-aa1d-dddecde62d2c

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review

Walkthrough

Added a new top-level configuration field test_group: "rhaiis" to four test configuration YAML files across the tests/lmeval/configs/ and tests/lmeval/vl_configs/ directories without modifying any other existing fields or logic.

Changes

Cohort / File(s) Summary
Test Configuration Files
tests/lmeval/configs/fp8_dynamic_per_token.yaml, tests/lmeval/configs/w4a16_actorder_none.yaml, tests/lmeval/configs/w4a4_nvfp4.yaml, tests/lmeval/vl_configs/vl_w4a16_actorder_weight.yaml
Added test_group: "rhaiis" top-level configuration field to each YAML file.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

Suggested labels

enhancement, fp8, w4a16, nvfp4

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and accurately describes the main change: adding a test_group field to lm-eval configuration files.
Description check ✅ Passed The description is directly related to the changeset, listing the specific files modified and explaining the purpose of adding the test_group field.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds the "test_group: rhaiis" field to several YAML configuration files in the tests/lmeval directory. Feedback highlights that the run_tests_in_rhaiis.sh script requires updates to correctly resolve the paths for these configuration files and to refine the model name extraction logic to handle nested keys in Vision-Language configurations.

@@ -1,4 +1,5 @@
cadence: "weekly"
test_group: "rhaiis"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The run_tests_in_rhaiis.sh script (line 53) contains a hardcoded path prefix ${script_path}/configs/ which resolves to tests/e2e/vLLM/configs/. Since this configuration file is located in tests/lmeval/configs/, the script will fail to find it even if the correct directory is passed via the -c flag. The script should be updated to use the provided $CONFIG path instead of the hardcoded one: for MODEL_CONFIG in $(echo -e "$CONFIGS" | sed "s|^|${CONFIG}/|").

@@ -1,4 +1,5 @@
cadence: "weekly"
test_group: "rhaiis"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The run_tests_in_rhaiis.sh script (line 59) uses grep 'model:' to extract the model name. For Vision-Language (VL) configurations like this one, there is typically a nested model key under the lmeval section (e.g., model: "hf-multimodal"). This causes the grep command to return multiple lines, which breaks the $model variable and the subsequent save_dir construction. The script should be updated to use a more specific pattern, such as grep '^model:', to only match the top-level key.

@coderabbitai coderabbitai Bot added enhancement New feature or request fp8 For any issue / PR related to FP8 support nvfp4 For any PR / issue related to NVFP4 support w4a16 labels Apr 16, 2026
Copy link
Copy Markdown
Collaborator

@dsikka dsikka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think RHAIIS testing should use vLLM lm-eval testing, not HF as is currently used
This is something we are going to update in upstream soon but definitely what we should use for product release

Copy link
Copy Markdown
Collaborator

@dhuangnm dhuangnm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It LGTM in general, just one question about a model config.

I think this PR needs to land along with the corresponding changes in the llm-compressor-testing repo. @debroy-rh can you please create the PR for the llm-compressor-testing repo as well so we can test the two PRs together before landing? Thanks.

Comment thread tests/lmeval/configs/w4a16_actorder_none.yaml Outdated
Signed-off-by: Debolina Roy <debroy@redhat.com>
Signed-off-by: Debolina Roy <debroy@redhat.com>
Signed-off-by: Debolina Roy <debroy@redhat.com>
Signed-off-by: Debolina Roy <debroy@redhat.com>
Signed-off-by: Debolina Roy <debroy@redhat.com>
Signed-off-by: Debolina Roy <debroy@redhat.com>
Signed-off-by: Debolina Roy <debroy@redhat.com>
@debroy-rh
Copy link
Copy Markdown
Author

@dhuangnm , @dsikka - I was trying to run the the lm-eval for vl_configs - [vl_w4a16_actorder_weight.yaml]. It was failing. Two reasons -

  1. Parsing error - two "model" tag in the config file, the 2nd one under lm-eval. So, needed to update the tests/e2e/vLLM/run_tests_in_rhaiis.sh file.
  2. Add torchvision to install_requires, so the same default install that already pulls in torch also pulls in torchvision, because the VLM / processor path needs it. Also, add an lmeval conftest bootstrap so pytest loads torchvision before Transformers.

Please take a look.
Run - https://github.com/neuralmagic/llm-compressor-testing/actions/runs/24686238622/job/72196254120

@mergify mergify Bot added the two-reviews When a PR requires two reviews label Apr 22, 2026
@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Apr 22, 2026

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🔴 Require two reviews

Waiting for:

  • #approved-reviews-by >= 2
  • #changes-requested-reviews-by = 0
This rule is failing.

PRs labelled "two-reviews" must have at least two approving reviews before merging.

  • #approved-reviews-by >= 2
  • #changes-requested-reviews-by = 0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request fp8 For any issue / PR related to FP8 support nvfp4 For any PR / issue related to NVFP4 support two-reviews When a PR requires two reviews w4a16

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants