You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: examples/train/mini_swe_agent/README.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -58,7 +58,7 @@ For issues with SkyRL or the Mini-SWE-Agent integration, please [open an Issue](
58
58
59
59
### Common Issues
60
60
61
-
-**Context length errors**: If you see `ValueError: The decoder prompt (length xxxx) is longer than the maximum model length`, increase `max_input_length` and `max_generate_length` or reduce steps in `swebench.yaml`.
61
+
-**Context length errors**: If you see `ValueError: The decoder prompt (length xxxx) is longer than the maximum model length`, increase the vLLM `engine_init_kwargs.max_model_len`, reduce `max_input_length`, or reduce steps in `swebench.yaml`. `max_generate_length` is the assistant-token budget for a trajectory and does not increase the model context window.
62
62
63
63
-**All zero rewards**: If rewards are consistently zero, the task may be too difficult. Consider:
64
64
- Filtering data for a better mix of easy/hard samples
Copy file name to clipboardExpand all lines: skyrl/train/config/ppo_base_config.yaml
+7-2Lines changed: 7 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -295,7 +295,9 @@ generator:
295
295
n_samples_per_prompt: 5
296
296
async_engine: true
297
297
batched: false
298
-
max_input_length: ${trainer.max_prompt_length} # max generator input length used for multi-turn conversations - for single turn set equal to max_prompt_length
298
+
# Max input/context length checked before each generation turn. For single-turn, set equal to max_prompt_length.
299
+
# This is distinct from sampling_params.max_generate_length, which budgets assistant-generated tokens.
300
+
max_input_length: ${trainer.max_prompt_length}
299
301
# VLLM_ENABLE_V1_MULTIPROCESSING=0 for reproducibility
300
302
vllm_v1_disable_multiproc: true
301
303
enable_prefix_caching: true
@@ -334,11 +336,14 @@ generator:
334
336
335
337
# Inference engine arguments. Arguments are passed directly to the vLLM engine, so names must match
336
338
# the engine's args. To specify an engine arg in the CLI override, use the format: +generator.engine_init_kwargs.arg_name=value
339
+
# If max_model_len is set, each rollout request's max_tokens is capped so prompt+completion fits this window.
0 commit comments