Skip to content

Releases: horizon-rl/strands-env

v0.1.2: CLI and Benchmark Registry

07 Feb 07:20

Choose a tag to compare

What's New

CLI for Benchmark Evaluation

  • strands-env list - List registered benchmarks
  • strands-env eval <benchmark> --env <hook_file> - Run evaluations with SGLang or Bedrock backends

Evaluator Hooks

  • Custom evaluator support via --evaluator flag for implementing benchmarks
  • Environment hooks for flexible environment configuration (environments are not necessarily tied to benchmarks)

Reproducibility

  • config.json saved to output directory with full configuration
  • Auto-backfill of model_id, tokenizer_path, and system_prompt

Built-in Benchmarks

  • aime-2024 - AIME 2024 math competition
  • aime-2025 - AIME 2025 math competition

Example

strands-env eval aime-2024 \
  --env examples/envs/calculator_env.py \
  --backend sglang \
  --n-samples-per-prompt 8 \
  --max-concurrency 30

Full Changelog: v0.1.1...v0.1.2

v0.1.1: Add New Environments, Evaluation Infra, and Client Caching

06 Feb 10:13

Choose a tag to compare

This release adds two new environments, evaluation infrastructure, and client caching utilities on top of the core abstractions.

Highlights

  • Environments: CalculatorEnv for math problems, CodeSandboxEnv for sandboxed Python/shell execution via AWS Bedrock
    AgentCore
  • Evaluation: Evaluator with concurrent rollouts, checkpointing, and pass@k metrics; AIMEEvaluator for AIME benchmarks
  • Rewards: MathRewardFunction using math-verify for symbolic equivalence checking
  • Utilities: Cached SGLang clients and AWS boto3 sessions with auto-refreshing credentials

Example

 # Run AIME evaluation with pure reasoning                                                                                    
 python examples/aime_eval.py --backend sglang --env chat 
 # Run with Python code sandbox (requires aws agentcore credentials)                                                                                     
 python examples/aime_eval.py --backend sglang --env code                           

Full Changelog: v0.1.0...v0.1.1

v0.1.0: Core RL Environment Abstractions

03 Feb 10:02

Choose a tag to compare

Initial release with core abstractions. Concrete environments will be added in future releases.

What's included

  • Environment base class with step(), reset(), cleanup(), get_tools(), get_hooks()
  • Action / TaskContext — user message + ground truth, conversation history, arbitrary metadata
  • Observation — step messages, metrics, and optional TokenObservation for TITO training
  • StepResult — bundles observation, reward, and termination reason
  • TerminationReason — maps agent exceptions to enum values via cause-chain walking
  • RewardFunction / RewardResult — abstract reward interface
  • ModelFactory — factory functions for SGLang, Bedrock, and OpenAI backends

Getting started

pip install strands-env                                                                                                               
                                                                                                                                      
from strands_env.core.environment import Environment                                                                                  
                                                                                                                                      
class MathEnv(Environment):                                                                                                           
    def get_tools(self):                                                                                                              
        return [calculator]                                                                                                           
                                                                                                                                      
env = MathEnv(model_factory=factory, reward_fn=reward_fn)                                                                             
result = await env.step(Action(message="What is 2^10?", task_context=TaskContext(ground_truth="1024")))                               
                                                                                                                                      
See examples/math_env.py for a complete runnable example with SGLang and Bedrock backends.