Skip to content

Support NVIDIA-NeMo/ProRL-Agent-Server (Polar) as an RL rollout backend #219

@reacher-z

Description

@reacher-z

Request: support NVIDIA-NeMo/ProRL-Agent-Server (Polar) as an RL rollout backend

Add support for ProRL-Agent-Server / Polar so ClawBench can be used as an RL environment for agentic training.

What it is: "Polar" — an RL rollout framework for agent systems ("Agentic RL on Any Harness at Scale"). It converts agent harnesses into RL environments without code changes, pools inference at scale, and offers rollout-as-a-service for async RL. Distributed orchestrator + gateway nodes; HTTP API; backends incl. vLLM/SGLang; Python 3.13+, Apache-2.0.

CLI: polar serve_rollout -c topology.yaml (orchestrator :8080), polar serve_gateway -c topology.yaml --node-id <node> (:8100+), polar submit <task.json|yaml> -c topology.yaml, polar status.

Ask: provide an integration so a ClawBench harness can run under Polar's environment contract — turning ClawBench tasks + interception scoring into RL reward signals, enabling ClawBench trajectories to drive agentic RL training.

Integration considerations:

  • Polar sits as a proxy between agent execution and inference servers; a ClawBench harness would need to expose its task lifecycle (reset/step/reward) to Polar.
  • Reward can derive from ClawBench's two-stage scoring (HTTP interception + LLM judge).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions