Skip to content

[Refactor] Refactor splits to only use the "calibration" split (#2551)#2589

Open
arpitkh101 wants to merge 2 commits intovllm-project:mainfrom
arpitkh101:refactor-split
Open

[Refactor] Refactor splits to only use the "calibration" split (#2551)#2589
arpitkh101 wants to merge 2 commits intovllm-project:mainfrom
arpitkh101:refactor-split

Conversation

@arpitkh101
Copy link
Copy Markdown

Summary

Closes #2551
Simplifies the splits interface in get_processed_dataset by removing
multi-split dict handling in favour of a plain string argument.

Examples & Tests

  • Updated all examples and tests to use splits="train[:N]" string format.
  • Deleted test_dataset_helpers.py (helpers no longer exist).
  • Added new unit tests to test_dataset_loading.py:
    • {"calibration": ...} dict backward compat
    • Deprecation warning is emitted for dict input
    • splits=None returns None (data-free flow)
    • Invalid type raises ValueError

Before / After

# Before (deprecated, still works with warning)
oneshot(model, dataset="ultrachat", splits={"calibration": "train_sft[:512]"})
# After (recommended)
oneshot(model, dataset="ultrachat", splits="train_sft[:512]")

Copilot AI review requested due to automatic review settings April 8, 2026 18:29
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 8, 2026

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: cb2a5bf0-9558-486a-aeab-73a0137395ca

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review

Walkthrough

Refactors dataset split handling from a multi-split/dict-based API (e.g., {"calibration": "..."}) to a single split string form (e.g., "train[:100]"), updating examples, tests, CLI help text, and core dataset utilities to accept and validate the new shape while preserving deprecated dict/list compatibility with warnings.

Changes

Cohort / File(s) Summary
Core dataset API & utils
src/llmcompressor/args/dataset_arguments.py, src/llmcompressor/datasets/__init__.py, src/llmcompressor/datasets/utils.py
Changed DatasetArguments.splits help text to recommend string selectors; removed re-export of make_dataset_splits; refactored get_processed_dataset() to return a single `Dataset
Tracing/CLI usage
src/llmcompressor/transformers/tracing/debug.py
Adjusted dataset split usage in trace flow to expect a single split string (changed from dataset_args.splits["calibration"] to dataset_args.splits) and reorganized imports/argparse formatting (no semantic CLI behavior changes).
Example callsites
examples/disk_offloading/kimi_k2_example.py, examples/disk_offloading/qwen3_example.py, examples/imatrix/llama3_imatrix_example.py, examples/multimodal_vision/llava_example.py, examples/multimodal_vision/mistral3_example.py, examples/multimodal_vision/mllama_example.py, examples/multimodal_vision/pixtral_example.py
Replaced oneshot(..., splits={"calibration": ...}) with oneshot(..., splits="...") across example scripts to pass a single split string.
Test callsites updated to single-split
tests/llmcompressor/modifiers/transform/imatrix/test_e2e_integration.py, tests/llmcompressor/modifiers/transform/smoothquant/test_base.py, tests/llmcompressor/transformers/compression/test_compress_tensor_utils.py, tests/llmcompressor/transformers/compression/test_quantization.py, tests/llmcompressor/transformers/compression/test_recipe_parsing.py, tests/llmcompressor/transformers/gptq/test_gptq_oneshot.py, tests/llmcompressor/transformers/kv_cache/test_kv_cache.py, tests/llmcompressor/transformers/sparsegpt/test_oneshot_with_modifier.py, tests/llmcompressor/transformers/sparsegpt/test_sparsegpt_completion.py
Updated tests and fixtures to pass splits as a string instead of a dict keyed by "calibration".
Dataset tests & helpers
tests/llmcompressor/transformers/data/test_dataset_loading.py, tests/llmcompressor/transformers/data/test_dataset_helpers.py
Expanded test_dataset_loading.py to include string and deprecated dict/list split variants, added warnings/error case tests and adjusted expectations to datasets.Dataset return. Deleted test_dataset_helpers.py which validated removed make_dataset_splits behavior.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 52.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main refactoring: simplifying the splits interface by removing multi-split dict handling in favor of a plain string argument, which is the primary change throughout the PR.
Description check ✅ Passed The description is directly related to the changeset, providing a clear summary of the refactoring, examples of before/after usage, and listing the specific changes made (updated examples, tests, backward compatibility).
Linked Issues check ✅ Passed The PR successfully addresses all objectives from issue #2551: removes multi-split logic, simplifies splits to accept strings, maintains backward compatibility for dict with deprecation warning, and updates all examples and tests.
Out of Scope Changes check ✅ Passed All changes are directly related to the splits refactoring objective. No unrelated modifications were introduced; import reorganization in debug.py is minimal and directly supports the splits interface changes.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

📋 Issue Planner

Built with CodeRabbit's Coding Plans for faster development and fewer bugs.

View plan used: #2551

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 8, 2026

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request simplifies dataset split handling by deprecating dictionary-based split configurations in favor of string-based formats. It updates the get_processed_dataset function, removes the now-redundant make_dataset_splits helper, and updates numerous examples and tests to reflect these changes. I have included a suggestion to improve the error message for invalid split types to provide better guidance to users.

Comment thread src/llmcompressor/datasets/utils.py Outdated
)
split_str = splits[0] if len(splits) > 0 else None
else:
raise ValueError(f"Invalid splits type: {type(splits)}. Expected string.")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The error message for invalid split types should be more descriptive to help users understand what types are supported, especially since dicts are now deprecated.

raise ValueError(f"Invalid splits type: {type(splits)}. Expected string (recommended) or dict (deprecated).")

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors dataset loading to simplify the splits interface for oneshot/calibration workflows by preferring a single split string (e.g., "train[:N]") and removing the multi-split dict output shape from get_processed_dataset.

Changes:

  • Refactored get_processed_dataset to return a single processed dataset (or None) and added deprecated dict-handling to extract a split string.
  • Updated tests and examples to pass splits as a string rather than {"calibration": ...}.
  • Removed now-obsolete dataset split helper test coverage (test_dataset_helpers.py) and added new split-focused unit tests.

Reviewed changes

Copilot reviewed 21 out of 21 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
src/llmcompressor/datasets/utils.py Refactors dataset processing to a single-split flow; updates calibration dataloader usage accordingly.
src/llmcompressor/datasets/init.py Drops make_dataset_splits from public exports after refactor.
src/llmcompressor/args/dataset_arguments.py Updates CLI/help text to recommend string splits and document backward compatibility.
tests/llmcompressor/transformers/data/test_dataset_loading.py Updates split-loading assertions for new return type and adds coverage for deprecated dict inputs and invalid types.
tests/llmcompressor/transformers/data/test_dataset_helpers.py Removes tests for helpers that no longer exist after refactor.
tests/llmcompressor/transformers/sparsegpt/test_sparsegpt_completion.py Updates oneshot invocation to pass splits as a string.
tests/llmcompressor/transformers/sparsegpt/test_oneshot_with_modifier.py Updates modifier-based oneshot test to pass splits as a string.
tests/llmcompressor/transformers/kv_cache/test_kv_cache.py Updates kv-cache oneshot fixture to use string splits.
tests/llmcompressor/transformers/gptq/test_gptq_oneshot.py Updates GPTQ oneshot test to use string splits.
tests/llmcompressor/transformers/compression/test_recipe_parsing.py Updates recipe parsing config to use string splits.
tests/llmcompressor/transformers/compression/test_quantization.py Updates quantization test setup to use string splits.
tests/llmcompressor/transformers/compression/test_compress_tensor_utils.py Updates compression tensor utils test to use string splits.
tests/llmcompressor/modifiers/transform/smoothquant/test_base.py Updates SmoothQuant e2e test to use string splits.
tests/llmcompressor/modifiers/transform/imatrix/test_e2e_integration.py Updates iMatrix integration tests to use string splits.
examples/multimodal_vision/pixtral_example.py Updates example to use string splits.
examples/multimodal_vision/mllama_example.py Updates example to use string splits.
examples/multimodal_vision/mistral3_example.py Updates example to use string splits.
examples/multimodal_vision/llava_example.py Updates example to use string splits.
examples/imatrix/llama3_imatrix_example.py Updates example to use string splits.
examples/disk_offloading/qwen3_example.py Updates example to use string splits.
examples/disk_offloading/kimi_k2_example.py Updates example to use string splits.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/llmcompressor/datasets/utils.py Outdated
Comment thread src/llmcompressor/datasets/utils.py Outdated
Comment thread src/llmcompressor/datasets/utils.py
Comment thread src/llmcompressor/args/dataset_arguments.py
Comment thread tests/llmcompressor/transformers/data/test_dataset_loading.py
Comment thread src/llmcompressor/datasets/utils.py
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/llmcompressor/transformers/data/test_dataset_loading.py (1)

199-265: ⚠️ Potential issue | 🟠 Major

Tighten these tests to the new splits contract.

These cases currently bless {"train": ...} as a valid deprecated input and allow TypeError, but this PR only keeps {"calibration": ...} on the compatibility path and says unsupported splits types should raise ValueError. As written, the tests will lock in the permissive fallback from src/llmcompressor/datasets/utils.py instead of guarding the stricter API.

🧪 Suggested test updates
 `@pytest.mark.parametrize`(
     "split_def",
     [
         "train[95%:]",
-        {"train": "train[:5%]"},                  # old dict (non-calibration key)
         {"calibration": "train[:5%]"},            # old dict (calibration key - main old format)
     ],
 )
 `@pytest.mark.unit`
 `@pytest.mark.parametrize`(
     "split_def",
     [
         {"calibration": "train[:5%]"},
-        {"train": "train[:5%]"},
     ],
 )
 def test_split_dict_emits_deprecation_warning(split_def, tiny_llama_tokenizer):
-@pytest.mark.unit
-def test_split_invalid_type_raises_value_error():
+@pytest.mark.unit
+@pytest.mark.parametrize(
+    "split_def",
+    [
+        12345,
+        {"train": "train[:5%]"},
+        ["train[:5%]"],
+    ],
+)
+def test_split_invalid_type_raises_value_error(split_def):
     """An unsupported splits type should raise ValueError."""
-    dataset_args = DatasetArguments(dataset="open_platypus", splits=12345)
-    with pytest.raises((ValueError, TypeError)):
+    dataset_args = DatasetArguments(dataset="open_platypus", splits=split_def)
+    with pytest.raises(ValueError):
         get_processed_dataset(dataset_args=dataset_args, processor=None)

As per coding guidelines, tests/**/*.py: Ensure PyTest tests are clear, comprehensive, and cover edge cases for quantization scenarios. Verify proper mocking and test isolation.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/llmcompressor/transformers/data/test_dataset_loading.py` around lines
199 - 265, The tests currently accept {"train": ...} as a deprecated splits form
and allow TypeError, which conflicts with the tightened contract that only
{"calibration": ...} is supported on the compatibility path and unsupported
splits should raise ValueError; update test_split_loading to remove the
{"train": "train[:5%]"} case and only parametrize the new string form and the
{"calibration": "train[:5%]"} dict, change
test_split_dict_emits_deprecation_warning to only parametrize {"calibration":
"train[:5%]"} (remove {"train": ...}), and change
test_split_invalid_type_raises_value_error to assert only ValueError (remove
TypeError) when calling get_processed_dataset with an invalid splits type;
reference DatasetArguments and get_processed_dataset in these tests to locate
and modify the failing cases.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/llmcompressor/args/dataset_arguments.py`:
- Around line 143-145: Update the help text for the dataset split argument in
src/llmcompressor/args/dataset_arguments.py to replace the vague "dictionary or
a list" phrasing with the exact legacy compatibility shape; specifically state
the deprecated form as {"calibration": "<split-spec>"} (or a list of such dicts)
and mark it as legacy, and recommend using a string like 'train' or
'train[:50%]' instead—modify the metadata["help"] string where this argument is
defined to include that precise example and the deprecation note.

In `@src/llmcompressor/datasets/utils.py`:
- Around line 49-76: The current match on the variable splits accepts arbitrary
dicts and iterables and silently picks the first element; change it to only
accept None, str, or dicts that contain the "calibration" key and fail fast
otherwise: keep the None and str branches as-is, modify the dict() branch to
only extract splits["calibration"] and emit the deprecation logger.warning for
that case, and for any other dict or any non-str iterable (the previous case _
fallback) raise ValueError(f"Invalid splits shape: {type(splits)}. Expected
None, str, or dict with 'calibration' key.") instead of attempting to extract
the first element; update references to split_str and logger.warning accordingly
so only the allowed deprecation path is logged.

---

Outside diff comments:
In `@tests/llmcompressor/transformers/data/test_dataset_loading.py`:
- Around line 199-265: The tests currently accept {"train": ...} as a deprecated
splits form and allow TypeError, which conflicts with the tightened contract
that only {"calibration": ...} is supported on the compatibility path and
unsupported splits should raise ValueError; update test_split_loading to remove
the {"train": "train[:5%]"} case and only parametrize the new string form and
the {"calibration": "train[:5%]"} dict, change
test_split_dict_emits_deprecation_warning to only parametrize {"calibration":
"train[:5%]"} (remove {"train": ...}), and change
test_split_invalid_type_raises_value_error to assert only ValueError (remove
TypeError) when calling get_processed_dataset with an invalid splits type;
reference DatasetArguments and get_processed_dataset in these tests to locate
and modify the failing cases.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 34b76517-75d6-4345-8a30-0cae0c3fa6e3

📥 Commits

Reviewing files that changed from the base of the PR and between c18e9fd and 32117f2.

📒 Files selected for processing (21)
  • examples/disk_offloading/kimi_k2_example.py
  • examples/disk_offloading/qwen3_example.py
  • examples/imatrix/llama3_imatrix_example.py
  • examples/multimodal_vision/llava_example.py
  • examples/multimodal_vision/mistral3_example.py
  • examples/multimodal_vision/mllama_example.py
  • examples/multimodal_vision/pixtral_example.py
  • src/llmcompressor/args/dataset_arguments.py
  • src/llmcompressor/datasets/__init__.py
  • src/llmcompressor/datasets/utils.py
  • tests/llmcompressor/modifiers/transform/imatrix/test_e2e_integration.py
  • tests/llmcompressor/modifiers/transform/smoothquant/test_base.py
  • tests/llmcompressor/transformers/compression/test_compress_tensor_utils.py
  • tests/llmcompressor/transformers/compression/test_quantization.py
  • tests/llmcompressor/transformers/compression/test_recipe_parsing.py
  • tests/llmcompressor/transformers/data/test_dataset_helpers.py
  • tests/llmcompressor/transformers/data/test_dataset_loading.py
  • tests/llmcompressor/transformers/gptq/test_gptq_oneshot.py
  • tests/llmcompressor/transformers/kv_cache/test_kv_cache.py
  • tests/llmcompressor/transformers/sparsegpt/test_oneshot_with_modifier.py
  • tests/llmcompressor/transformers/sparsegpt/test_sparsegpt_completion.py
💤 Files with no reviewable changes (2)
  • src/llmcompressor/datasets/init.py
  • tests/llmcompressor/transformers/data/test_dataset_helpers.py

Comment thread src/llmcompressor/args/dataset_arguments.py
Comment thread src/llmcompressor/datasets/utils.py Outdated
Comment thread src/llmcompressor/datasets/utils.py Outdated
Comment thread src/llmcompressor/datasets/utils.py
Copy link
Copy Markdown
Collaborator

@HDCharles HDCharles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comments and bot comments

@arpitkh101
Copy link
Copy Markdown
Author

arpitkh101 commented Apr 8, 2026

Addressed the requested changes, thanks!

@HDCharles HDCharles added ready When a PR is ready for review Refactor Code cleanup and/or improvements to existing features labels Apr 9, 2026
Copy link
Copy Markdown
Collaborator

@HDCharles HDCharles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good

@HDCharles
Copy link
Copy Markdown
Collaborator

@coderabbitai re-review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 9, 2026

@HDCharles Sure, I'll re-review the PR now!

✅ Actions performed

Full review triggered.

@HDCharles
Copy link
Copy Markdown
Collaborator

Looks like the sparse test is failing? @arpitkh101

@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Apr 13, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @arpitkh101.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify Bot added the needs-rebase label Apr 13, 2026
@arpitkh101 arpitkh101 force-pushed the refactor-split branch 2 times, most recently from 05aec86 to c8b64e5 Compare April 13, 2026 15:42
@mergify mergify Bot removed the needs-rebase label Apr 13, 2026
@mergify mergify Bot added the two-reviews When a PR requires two reviews label Apr 22, 2026
@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Apr 22, 2026

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🔴 Require two reviews

Waiting for:

  • #approved-reviews-by >= 2
This rule is failing.

PRs labelled "two-reviews" must have at least two approving reviews before merging.

  • #approved-reviews-by >= 2
  • #changes-requested-reviews-by = 0

@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Apr 22, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @arpitkh101.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify Bot added the needs-rebase label Apr 22, 2026
@HDCharles
Copy link
Copy Markdown
Collaborator

@arpitkh101 can you resolve the merge conflicts?

Arpit added 2 commits April 24, 2026 05:02
Signed-off-by: Arpit <arpit@example.com>
…i-split handling

Signed-off-by: Arpit <arpit@example.com>
@arpitkh101
Copy link
Copy Markdown
Author

I've rebased on latest main, Thanks!

@mergify mergify Bot removed the needs-rebase label Apr 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready When a PR is ready for review Refactor Code cleanup and/or improvements to existing features two-reviews When a PR requires two reviews

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Refactor] Refactor splits to only use the "calibration" split

3 participants