-
Couldn't load subscription status.
- Fork 221
Description
Describe the bug
I found that the parameters assigned to GeneratorStep are ignored during dry_run, and only batch_size remains. This behavior seems incorrect, as it should preserve the existing parameters and adjust only the batch_size.
To reproduce
- Configure a pipeline with a
GeneratorStepthat includes custom parameters. - Run the pipeline in
dry_runmode. - Observe that the parameters assigned to the
GeneratorStepare lost, and onlybatch_sizeis retained.
Expected behavior
During dry_run, the parameters assigned to GeneratorStep should be preserved, and only the batch_size should be adjusted.
Screenshots
No response
Environment
- Distilabel Version [1.5.3]:
- Python Version [3.11]:
Additional context
Actual Behavior
The parameters assigned to GeneratorStep are overwritten, leaving only the batch_size.
Code Snippet
The issue seems to originate from the following code snippet:
for step_name in self.dag:
step = self.dag.get_step(step_name)[constants.STEP_ATTR_NAME]
if step.is_generator:
if not parameters:
parameters = {}
parameters[step_name] = {"batch_size": batch_size}This code is located at: https://github.com/argilla-io/distilabel/blob/main/src/distilabel/pipeline/base.py#L438
Proposed Solution
Modify the code to preserve existing parameters and only update the batch_size. For example:
if not parameters:
parameters = {}
for step_name in self.dag:
step = self.dag.get_step(step_name)[constants.STEP_ATTR_NAME]
if step.is_generator:
if step_name not in parameters:
parameters[step_name] = {}
parameters[step_name]["batch_size"] = batch_sizeThis change ensures that existing parameters are not overwritten and only the batch_size is updated.