Standardizing environment infrastructure with Strands Agents — step, observe, reward.
This package treats each env.step() as a full agent loop (prompt → (tool_call, tool_response+)* → response), not a single model call.
- Define Environments — Subclass
Environment, add@toolfunctions, plug inRewardFunction - RL Training — Token-level observations for on-policy training with strands-sglang
- Benchmarking — CLI and
Evaluatorwith checkpointing, resume, and custom metrics
pip install strands-envFor development:
git clone https://github.com/horizon-rl/strands-env.git && cd strands-env
pip install -e ".[dev]"Subclass Environment and add tools as @tool-decorated functions:
from strands import tool
from strands_env.core import Environment
@tool
def calculator(expression: str) -> str:
"""Evaluate a math expression."""
return str(eval(expression))
class MathEnv(Environment):
def get_tools(self):
return [calculator]env = MathEnv(model_factory=factory, reward_fn=reward_fn)
result = await env.step(Action(message="What is 2^10?", task_context=TaskContext(ground_truth="1024")))
result.observation.final_response # "The answer is 1024"
result.reward.reward # 1.0
result.termination_reason # TerminationReason.TASK_COMPLETESee examples/calculator_demo.py for a complete example.
strands-env eval aime-2024 \
--env examples/envs/calculator_env.py \
--backend sglang \
--base-url http://localhost:30000 \
--n-samples-per-prompt 8 \
--max-concurrency 30- Evaluation Guide — CLI reference, hook files, custom evaluators
- RL Training Integration — slime integration, token observations
# Lint
ruff check src/ && ruff format --check src/
# Unit tests
pytest tests/unit/ -v
# Integration tests (requires running SGLang server)
pytest tests/integration/ -v --sglang-base-url=http://localhost:30000Apache License 2.0 — see LICENSE.