Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .agents/skills
101 changes: 101 additions & 0 deletions .claude/skills/add-dynamic-filter/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
---
name: add-dynamic-filter
description: Guide for adding dynamic/filter hooks in slime rollout pipeline. Use when user wants sample-group selection during rollout, buffer filtering before training, or per-sample masking/processing hooks.
---

# Add Dynamic Filter

Add filtering hooks in rollout and buffer stages while preserving sample-group contracts.

## When to Use

Use this skill when:

- User asks to filter sample groups during dynamic sampling
- User asks to customize buffer extraction strategy
- User asks to mask/remove some rollout samples before training
- User asks to process all generated samples for logging/analysis

## Step-by-Step Guide

### Step 1: Pick the Correct Hook

- Dynamic sampling filter: `--dynamic-sampling-filter-path`
- Buffer filter: `--buffer-filter-path`
- Per-sample rollout filter: `--rollout-sample-filter-path`
- All-samples post process: `--rollout-all-samples-process-path`

### Step 2: Implement the Function Signature

Dynamic sampling filter (called in `slime/rollout/sglang_rollout.py`):

```python
def filter_function(args, samples, **kwargs):
# return DynamicFilterOutput or bool
```

Preferred return type:

```python
from slime.rollout.filter_hub.base_types import DynamicFilterOutput

return DynamicFilterOutput(keep=True, reason=None)
```

Buffer filter (called in `slime/rollout/data_source.py`):

```python
def buffer_filter(args, rollout_id, buffer, num_samples):
return selected_groups
```

Rollout sample filter:

```python
def rollout_sample_filter(args, samples):
# modify sample.remove_sample in-place where needed
```

All-samples process:

```python
def process_all_samples(args, all_samples, data_source):
...
```

### Step 3: Preserve Group Structure

- Keep `list[list[Sample]]` structure intact where required.
- Do not flatten sample groups unless downstream path expects flattened samples.
- For sample removal, prefer `sample.remove_sample=True` over deleting objects.

### Step 4: Wire and Validate

Example wiring:

```bash
--dynamic-sampling-filter-path slime.rollout.filter_hub.dynamic_sampling_filters.check_reward_nonzero_std
--buffer-filter-path <module>.buffer_filter
--rollout-sample-filter-path <module>.rollout_sample_filter
--rollout-all-samples-process-path <module>.process_all_samples
```

### Step 5: Measure Side Effects

- Check final sample count remains aligned with `rollout_batch_size` expectations.
- Verify drop reasons are surfaced in rollout metrics when dynamic filter is used.
- Validate training still receives valid loss masks/rewards after filtering.

## Common Mistakes

- Returning wrong container type for buffer filter
- Dropping samples by deletion instead of mask flagging
- Losing sample-group alignment in group-RM setup
- Adding expensive logic in hot filtering paths

## Reference Locations

- Dynamic filter types: `slime/rollout/filter_hub/base_types.py`
- Dynamic filter example: `slime/rollout/filter_hub/dynamic_sampling_filters.py`
- Rollout generation hook points: `slime/rollout/sglang_rollout.py`
- Buffer filter hook point: `slime/rollout/data_source.py`
91 changes: 91 additions & 0 deletions .claude/skills/add-eval-dataset-config/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
---
name: add-eval-dataset-config
description: Guide for adding and validating evaluation dataset configuration in slime. Use when user wants to configure eval datasets via --eval-config or --eval-prompt-data, add per-dataset overrides, or customize evaluation rollout behavior.
---

# Add Eval Dataset Config

Configure evaluation datasets in slime with explicit dataset-level overrides and predictable runtime behavior.

## When to Use

Use this skill when:

- User asks to add evaluation datasets for periodic eval
- User asks to migrate from `--eval-prompt-data` to structured `--eval-config`
- User asks for per-dataset eval overrides (sampling params, keys, rm_type, metadata)

## Step-by-Step Guide

### Step 1: Choose Config Entry Method

Supported inputs:

- Structured config file: `--eval-config <yaml>`
- Legacy CLI pairs: `--eval-prompt-data <name1> <path1> <name2> <path2> ...`

If `--eval-interval` is set, eval datasets must be configured.

### Step 2: Build YAML with Required Fields

Each dataset needs at least:

- `name`
- `path`

Example:

```yaml
eval:
defaults:
n_samples_per_eval_prompt: 1
temperature: 0.7
top_p: 1.0
datasets:
- name: aime
path: /path/to/aime.jsonl
rm_type: math
input_key: prompt
label_key: answer
metadata_overrides:
split: test
```
### Step 3: Understand Override Priority
`slime/utils/eval_config.py` resolves fields in this order:

1. Dataset-level values in `eval.datasets[*]`
2. `eval.defaults`
3. CLI args fallback (for example eval_* or rollout_* fields)

Common overridable fields include:

- Runtime: `n_samples_per_eval_prompt`, `temperature`, `top_p`, `top_k`, `max_response_len`
- Sample keys: `input_key`, `label_key`, `tool_key`, `metadata_key`
- Extra: `rm_type`, `custom_generate_function_path`, `metadata_overrides`

### Step 4: Wire Eval Function if Needed

By default, eval uses `--eval-function-path` (defaults to rollout function path).
Use a separate eval function when inference/eval behavior must differ from training rollout.

### Step 5: Validate Parsing and Runtime

- Start with config parsing sanity by running a short launch command.
- Confirm dataset entries are loaded into `args.eval_datasets`.
- Verify output keys match eval logging/metrics expectations.

## Common Mistakes

- Missing `name` in dataset entries
- Odd-length `--eval-prompt-data` pairs
- Setting `--eval-interval` without any eval dataset
- Mixing reward dict outputs without `eval_reward_key` configuration

## Reference Locations

- Eval config model: `slime/utils/eval_config.py`
- Eval config resolution: `slime/utils/arguments.py`
- Eval rollout path: `slime/rollout/sglang_rollout.py`
- Customization docs: `docs/en/get_started/customization.md`
90 changes: 90 additions & 0 deletions .claude/skills/add-reward-function/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
---
name: add-reward-function
description: Guide for adding a custom reward function in slime and wiring it through --custom-rm-path (and optional reward post-processing). Use when user wants new reward logic, remote/service reward integration, or task-specific reward shaping.
---

# Add Reward Function

Implement custom reward logic and connect it to slime rollout/training safely.

## When to Use

Use this skill when:

- User asks to add new reward computation logic
- User asks to integrate an external reward service
- User asks to customize reward normalization/post-processing

## Step-by-Step Guide

### Step 1: Choose Reward Mode

Pick one of these:

- Single-sample mode (`--group-rm` disabled): custom function gets one `Sample`
- Group/batch mode (`--group-rm` enabled): custom function gets `list[Sample]`

`slime.rollout.rm_hub.__init__.py` calls your function via `--custom-rm-path`.

### Step 2: Create Reward Module

Create `slime/rollout/rm_hub/<your_rm>.py`.

Supported signatures:

```python
async def custom_rm(args, sample):
return float_reward_or_reward_dict
```

```python
async def custom_rm(args, samples):
return list_of_rewards
```

If using group mode, return one reward per sample in input order.

### Step 3: Keep Reward Type Consistent

- Return scalar numeric rewards unless your pipeline explicitly uses keyed rewards.
- If using reward dicts, ensure downstream `reward_key` / `eval_reward_key` is configured.
- Keep exceptions explicit for invalid metadata instead of silently returning zeros.

### Step 4: Optional Reward Post-Processing

To customize normalization/shaping before advantage computation, add:

```python
def post_process_rewards(args, samples):
# return (raw_rewards, processed_rewards)
...
```

Wire with:

```bash
--custom-reward-post-process-path <module>.post_process_rewards
```

This hook is consumed in `slime/ray/rollout.py`.

### Step 5: Wire and Validate

Use:

```bash
--custom-rm-path slime.rollout.rm_hub.<your_rm>.custom_rm
```

## Common Mistakes

- Returning wrong output shape in group mode
- Mixing scalar rewards and reward dicts without `reward_key` config
- Doing blocking network calls without async handling
- Forgetting to validate reward behavior on truncated/failed samples

## Reference Locations

- Reward dispatch: `slime/rollout/rm_hub/__init__.py`
- Reward post-process hook: `slime/ray/rollout.py`
- Customization docs: `docs/en/get_started/customization.md`
108 changes: 108 additions & 0 deletions .claude/skills/add-rollout-function/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
---
name: add-rollout-function
description: Guide for adding a new rollout function in slime and wiring it through --rollout-function-path. Use when user wants to implement custom rollout data generation logic, custom train/eval rollout outputs, or migrate from the default sglang rollout path.
---

# Add Rollout Function

Implement a custom rollout function and integrate it safely with slime training/eval flow.

## When to Use

Use this skill when:

- User asks to add a new rollout task or rollout generation function
- User asks to replace default `slime.rollout.sglang_rollout.generate_rollout`
- User asks to customize train/eval data generation behavior

## Step-by-Step Guide

### Step 1: Choose the Right Starting Point

Start from one of these references:

- Async RL-style rollout: `slime/rollout/sglang_rollout.py`
- Simple SFT-style rollout: `slime/rollout/sft_rollout.py`

If the task needs engine-based async generation and rewards, use the sglang path as base.
If the task is file/buffer-driven and simple, use sft path as base.

### Step 2: Create the New Rollout Module

Create a new file, for example: `slime/rollout/<your_rollout>.py`

Required callable signature:

```python
def generate_rollout(args, rollout_id, data_source, evaluation=False) -> RolloutFnTrainOutput | RolloutFnEvalOutput:
...
```

Return types are defined in `slime/rollout/base_types.py`.

### Step 3: Implement Train and Eval Branches Explicitly

- For training (`evaluation=False`), return `RolloutFnTrainOutput(samples=..., metrics=...)`
- For evaluation (`evaluation=True`), return `RolloutFnEvalOutput(data=..., metrics=...)`

Minimal skeleton:

```python
from slime.rollout.base_types import RolloutFnTrainOutput, RolloutFnEvalOutput


def generate_rollout(args, rollout_id, data_source, evaluation=False):
if evaluation:
result = {
"custom_eval": {
"rewards": [],
"truncated": [],
"samples": [],
}
}
return RolloutFnEvalOutput(data=result)

groups = data_source.get_samples(args.rollout_batch_size)
# fill Sample fields needed by training: tokens/response_length/reward/status (+ loss_mask when needed)
return RolloutFnTrainOutput(samples=groups)
```

### Step 4: Keep Data Contract Compatible

For each generated sample, ensure required training fields are populated consistently with your objective:

- `tokens`
- `response_length`
- `reward` (or reward dict if your setup uses keyed rewards)
- `status`

If partial rollout or masking logic is involved, keep `loss_mask` semantics consistent with existing behavior.

### Step 5: Wire Through Arguments

Set your function path via CLI:

```bash
--rollout-function-path slime.rollout.<your_rollout>.generate_rollout
```

The default and signature expectation are documented in:

- `slime/utils/arguments.py`
- `docs/en/get_started/customization.md`

## Common Mistakes

- Returning raw Python lists/dicts with mismatched schema in custom path
- Implementing only training branch and forgetting evaluation branch
- Generating samples without required fields (`tokens`, `response_length`, `reward`, `status`)
- Using blocking-heavy logic in high-frequency rollout paths without batching/concurrency control

## Reference Locations

- Default rollout: `slime/rollout/sglang_rollout.py`
- Simple custom example: `slime/rollout/sft_rollout.py`
- Output dataclasses: `slime/rollout/base_types.py`
- Wiring/loading: `slime/ray/rollout.py`
- Argument definition: `slime/utils/arguments.py`
- Customization docs: `docs/en/get_started/customization.md`
Loading