Skip to content

[BUG] Parameters of GeneratorStep are ignored during dry_run #1137

@sung1-kang

Description

@sung1-kang

Describe the bug

I found that the parameters assigned to GeneratorStep are ignored during dry_run, and only batch_size remains. This behavior seems incorrect, as it should preserve the existing parameters and adjust only the batch_size.

To reproduce

  1. Configure a pipeline with a GeneratorStep that includes custom parameters.
  2. Run the pipeline in dry_run mode.
  3. Observe that the parameters assigned to the GeneratorStep are lost, and only batch_size is retained.

Expected behavior

During dry_run, the parameters assigned to GeneratorStep should be preserved, and only the batch_size should be adjusted.

Screenshots

No response

Environment

  • Distilabel Version [1.5.3]:
  • Python Version [3.11]:

Additional context

Actual Behavior

The parameters assigned to GeneratorStep are overwritten, leaving only the batch_size.

Code Snippet

The issue seems to originate from the following code snippet:

for step_name in self.dag:
    step = self.dag.get_step(step_name)[constants.STEP_ATTR_NAME]

    if step.is_generator:
        if not parameters:
            parameters = {}
        parameters[step_name] = {"batch_size": batch_size}

This code is located at: https://github.com/argilla-io/distilabel/blob/main/src/distilabel/pipeline/base.py#L438

Proposed Solution

Modify the code to preserve existing parameters and only update the batch_size. For example:

if not parameters:
    parameters = {}

for step_name in self.dag:
    step = self.dag.get_step(step_name)[constants.STEP_ATTR_NAME]

    if step.is_generator:
        if step_name not in parameters:
            parameters[step_name] = {}
        parameters[step_name]["batch_size"] = batch_size

This change ensures that existing parameters are not overwritten and only the batch_size is updated.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions