Request: support NVIDIA-NeMo/ProRL-Agent-Server (Polar) as an RL rollout backend
Add support for ProRL-Agent-Server / Polar so ClawBench can be used as an RL environment for agentic training.
What it is: "Polar" — an RL rollout framework for agent systems ("Agentic RL on Any Harness at Scale"). It converts agent harnesses into RL environments without code changes, pools inference at scale, and offers rollout-as-a-service for async RL. Distributed orchestrator + gateway nodes; HTTP API; backends incl. vLLM/SGLang; Python 3.13+, Apache-2.0.
CLI: polar serve_rollout -c topology.yaml (orchestrator :8080), polar serve_gateway -c topology.yaml --node-id <node> (:8100+), polar submit <task.json|yaml> -c topology.yaml, polar status.
Ask: provide an integration so a ClawBench harness can run under Polar's environment contract — turning ClawBench tasks + interception scoring into RL reward signals, enabling ClawBench trajectories to drive agentic RL training.
Integration considerations:
- Polar sits as a proxy between agent execution and inference servers; a ClawBench harness would need to expose its task lifecycle (reset/step/reward) to Polar.
- Reward can derive from ClawBench's two-stage scoring (HTTP interception + LLM judge).
Request: support NVIDIA-NeMo/ProRL-Agent-Server (Polar) as an RL rollout backend
Add support for ProRL-Agent-Server / Polar so ClawBench can be used as an RL environment for agentic training.
What it is: "Polar" — an RL rollout framework for agent systems ("Agentic RL on Any Harness at Scale"). It converts agent harnesses into RL environments without code changes, pools inference at scale, and offers rollout-as-a-service for async RL. Distributed orchestrator + gateway nodes; HTTP API; backends incl. vLLM/SGLang; Python 3.13+, Apache-2.0.
CLI:
polar serve_rollout -c topology.yaml(orchestrator :8080),polar serve_gateway -c topology.yaml --node-id <node>(:8100+),polar submit <task.json|yaml> -c topology.yaml,polar status.Ask: provide an integration so a ClawBench harness can run under Polar's environment contract — turning ClawBench tasks + interception scoring into RL reward signals, enabling ClawBench trajectories to drive agentic RL training.
Integration considerations: