Releases: horizon-rl/strands-env
Releases · horizon-rl/strands-env
v0.1.2: CLI and Benchmark Registry
What's New
CLI for Benchmark Evaluation
strands-env list- List registered benchmarksstrands-env eval <benchmark> --env <hook_file>- Run evaluations with SGLang or Bedrock backends
Evaluator Hooks
- Custom evaluator support via
--evaluatorflag for implementing benchmarks - Environment hooks for flexible environment configuration (environments are not necessarily tied to benchmarks)
Reproducibility
config.jsonsaved to output directory with full configuration- Auto-backfill of
model_id,tokenizer_path, andsystem_prompt
Built-in Benchmarks
aime-2024- AIME 2024 math competitionaime-2025- AIME 2025 math competition
Example
strands-env eval aime-2024 \
--env examples/envs/calculator_env.py \
--backend sglang \
--n-samples-per-prompt 8 \
--max-concurrency 30Full Changelog: v0.1.1...v0.1.2
v0.1.1: Add New Environments, Evaluation Infra, and Client Caching
This release adds two new environments, evaluation infrastructure, and client caching utilities on top of the core abstractions.
Highlights
- Environments:
CalculatorEnvfor math problems,CodeSandboxEnvfor sandboxed Python/shell execution via AWS Bedrock
AgentCore - Evaluation: Evaluator with concurrent rollouts, checkpointing, and pass@k metrics; AIMEEvaluator for AIME benchmarks
- Rewards: MathRewardFunction using math-verify for symbolic equivalence checking
- Utilities: Cached SGLang clients and AWS boto3 sessions with auto-refreshing credentials
Example
# Run AIME evaluation with pure reasoning
python examples/aime_eval.py --backend sglang --env chat
# Run with Python code sandbox (requires aws agentcore credentials)
python examples/aime_eval.py --backend sglang --env code
Full Changelog: v0.1.0...v0.1.1
v0.1.0: Core RL Environment Abstractions
Initial release with core abstractions. Concrete environments will be added in future releases.
What's included
Environmentbase class withstep(),reset(),cleanup(),get_tools(),get_hooks()Action/TaskContext— user message + ground truth, conversation history, arbitrary metadataObservation— step messages, metrics, and optionalTokenObservationfor TITO trainingStepResult— bundles observation, reward, and termination reasonTerminationReason— maps agent exceptions to enum values via cause-chain walkingRewardFunction/RewardResult— abstract reward interfaceModelFactory— factory functions for SGLang, Bedrock, and OpenAI backends
Getting started
pip install strands-env
from strands_env.core.environment import Environment
class MathEnv(Environment):
def get_tools(self):
return [calculator]
env = MathEnv(model_factory=factory, reward_fn=reward_fn)
result = await env.step(Action(message="What is 2^10?", task_context=TaskContext(ground_truth="1024")))
See examples/math_env.py for a complete runnable example with SGLang and Bedrock backends.