Skip to content

Reproducing JustRL #4

@lee-junjie

Description

@lee-junjie

Thank you for your excellent work on JustRL. I really appreciate the simplicity of the approach and am very interested in reproducing the results using VERL.

However, I have encountered several issues while attempting to reproduce the reported results.

Following your recommended settings in this comment:
#3 (comment)

I used VERL version 0.2.0 or 0.2.0.post2 and ran the provided training script. During execution, I encountered multiple configuration validation errors. Specifically, more than 10 configuration keys referenced in the script do not exist in VERL 0.2.0. One example error is shown below:

Could not override 'algorithm.use_kl_in_reward'.
To append to your config use +algorithm.use_kl_in_reward=False
Key 'use_kl_in_reward' is not in struct
    full_key: algorithm.use_kl_in_reward
    object_type=dict

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

The following configuration options do not appear to exist in VERL 0.2.0. To pass configuration validation, I had to remove them from the script; however, this may result in inconsistencies with the setup you used:

  • algorithm.use_kl_in_reward
  • data.filter_overlong_prompts
  • data.truncation
  • actor_rollout_ref.actor.optim.lr_warmup_steps
  • actor_rollout_ref.actor.optim.weight_decay
  • actor_rollout_ref.actor.use_kl_loss
  • actor_rollout_ref.actor.clip_ratio_low
  • actor_rollout_ref.actor.clip_ratio_high
  • actor_rollout_ref.actor.clip_ratio_c
  • actor_rollout_ref.rollout.val_kwargs.do_sample
  • actor_rollout_ref.rollout.val_kwargs.n
  • actor_rollout_ref.rollout.val_kwargs.temperature
  • actor_rollout_ref.rollout.val_kwargs.top_p

Could you please clarify whether these configuration options were implemented as custom modifications on top of VERL 0.2.0, or whether a different VERL version was used for your experiments?

Additionally, in your replies on both GitHub and Zhihu, you mentioned that the JustRL code is largely the same as VERL and that the results should be reproducible across multiple VERL versions:

Have you tested JustRL with more recent versions of VERL? If so, could you please share which VERL and vLLM versions you would recommend for reproduction? I would be happy to try them on my side. If not, could you please share additional details about the original environment, such as the Docker image, exact configurations, and code modifications required to reproduce the results?

Finally, it would be greatly appreciated if other teams who have successfully reproduced the results could share their VERL/vLLM versions and relevant training scripts.

Thank you very much for your time and help.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions