feat: init environment CLI with composable scaffolding

## Use cases, pain points, and background
Building a new environment in NeMo Gym today requires copying an existing benchmark, understanding which parts to keep and which to replace, manually wiring YAML configs, and hoping you picked the right example to copy from. There is no guided path. The result is that:
- **New contributors copy the wrong template.** Someone building a judge-based benchmark might copy `example_single_tool_call` (which has a trivial `verify()`) instead of `math_with_judge` (which has the judge wiring they need). There's nothing telling them which to start from.
- **Boilerplate is re-implemented across benchmarks.** LLM-as-a-judge logic (calling a judge model server, prompt template formatting, position-bias-aware answer swapping, response parsing) is independently implemented in `math_with_judge`, `equivalence_llm_judge`, `terminus_judge`, and others. Same for multi-turn correction loops across `proof_refinement_agent` and `multi_turn_agent` environments.
- **YAML wiring is error-prone.** Each new environment needs a config that correctly references agent servers, resources servers, model servers (and optionally judge model servers), with the right `type` and `name` cross-references. Getting this wrong produces runtime errors that are hard to debug.

## Description
Add a CLI command that scaffolds a new environment from composable options, directional example:

```bash
gym init env style=nemo verifier=judge agent=multistep
gym init env style=nemo verifier=rm agent=multiturn
gym init env style=nemo verifier=custom agent=custom
gym init env style=gymnasium
```

### `style=nemo` - NeMo Gym environment style

Two composable flags:

**`agent=` (interaction pattern)**
- `multistep` (default) — scaffolds a resources server wired to `simple_agent`. The agent runs the tool-call loop; the user just implements tool endpoints and `verify()`.
- `multiturn` — scaffolds a resources server wired to `multi_turn_agent`. The agent runs the outer generate-verify-feedback loop; the user implements `verify()` with correction feedback.
- `custom` — scaffolds both a custom agent server and a resources server. Full control over the agent loop.

**`verifier=` (verification strategy)**
- `custom` (default) — empty `verify()` for the user to implement with their own logic.
- `judge` — `verify()` pre-wired with LLM-as-a-judge: judge model server reference in config, prompt template scaffolding, position-bias-aware dual evaluation, response parsing. Based on patterns from `math_with_judge` / `equivalence_llm_judge`.
- `rm` — `verify()` pre-wired to call a reward model server and return its score as the reward.

The CLI generates the full environment: `resources_servers/<name>/` with `app.py`, config YAML, `data/example.jsonl` placeholder, tests, and `requirements.txt`. For `agent=custom`, also generates `responses_api_agents/<name>/`.

### `style=gymnasium` - Classic Gymnasium-compatible APIs

For users coming from the OpenAI Gym / Gymnasium ecosystem who expect `reset()` / `step()` / `reward` semantics. Scaffolds an environment class with the familiar APIs, with NeMo Gym handling the translation to NeMo Gym's architecture behind the scenes.

## Design
What files should be touched? What logic should be written?

## Out of scope
What are some items that this issue could be mistaken to cover that this issue should explicitly NOT cover?

## Acceptance Criteria
- [ ] Individual items that need to be finished in order for this issue to be considered completed


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: init environment CLI with composable scaffolding #1029

Use cases, pain points, and background

Description

`style=nemo` - NeMo Gym environment style

`style=gymnasium` - Classic Gymnasium-compatible APIs

Design

Out of scope

Acceptance Criteria

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

feat: init environment CLI with composable scaffolding #1029

Description

Use cases, pain points, and background

Description

style=nemo - NeMo Gym environment style

style=gymnasium - Classic Gymnasium-compatible APIs

Design

Out of scope

Acceptance Criteria

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`style=nemo` - NeMo Gym environment style

`style=gymnasium` - Classic Gymnasium-compatible APIs