Skip to content
This repository was archived by the owner on Jun 3, 2025. It is now read-only.

Commit d64d9fb

Browse files
author
Sara Adkins
authored
[Cherry Pick] allow dataset size smaller than calibration samples (#2091) (#2179)
* allow dataset size smaller than calibration samples (#2091) * merge issue
1 parent 2c3bdf7 commit d64d9fb

File tree

1 file changed

+10
-2
lines changed

1 file changed

+10
-2
lines changed

src/sparseml/transformers/finetune/data/data_helpers.py

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -49,9 +49,17 @@ def format_calibration_data(
4949
:param accelerator: optional accelerator for if preparing in FSDP mode
5050
:return: list of trimmed calibration data tensors
5151
"""
52-
num_calibration_samples = num_calibration_samples or len(tokenized_dataset)
52+
safe_calibration_samples = len(tokenized_dataset)
53+
if num_calibration_samples is not None:
54+
safe_calibration_samples = min(len(tokenized_dataset), num_calibration_samples)
55+
if safe_calibration_samples != num_calibration_samples:
56+
LOGGER.warn(
57+
f"Requested {num_calibration_samples} calibration samples but "
58+
f"the provided dataset only has {safe_calibration_samples}. "
59+
)
60+
5361
shuffled_calibration = tokenized_dataset.shuffle()
54-
shuffled_calibration = shuffled_calibration.select(range(num_calibration_samples))
62+
shuffled_calibration = shuffled_calibration.select(range(safe_calibration_samples))
5563

5664
dataloader_params = {
5765
"batch_size": 1,

0 commit comments

Comments
 (0)