Releases · horizon-rl/strands-env

07 Feb 07:20

Lawhy

v0.1.2

f1491d3

v0.1.2: CLI and Benchmark Registry Latest

Latest

What's New

CLI for Benchmark Evaluation

strands-env list - List registered benchmarks
strands-env eval <benchmark> --env <hook_file> - Run evaluations with SGLang or Bedrock backends

Evaluator Hooks

Custom evaluator support via --evaluator flag for implementing benchmarks
Environment hooks for flexible environment configuration (environments are not necessarily tied to benchmarks)

Reproducibility

config.json saved to output directory with full configuration
Auto-backfill of model_id, tokenizer_path, and system_prompt

Built-in Benchmarks

aime-2024 - AIME 2024 math competition
aime-2025 - AIME 2025 math competition

Example

strands-env eval aime-2024 \
  --env examples/envs/calculator_env.py \
  --backend sglang \
  --n-samples-per-prompt 8 \
  --max-concurrency 30

Full Changelog: v0.1.1...v0.1.2

Assets 2

06 Feb 10:13

Lawhy

v0.1.1

95df06c

v0.1.1: Add New Environments, Evaluation Infra, and Client Caching

This release adds two new environments, evaluation infrastructure, and client caching utilities on top of the core abstractions.

Highlights

Environments: CalculatorEnv for math problems, CodeSandboxEnv for sandboxed Python/shell execution via AWS Bedrock
AgentCore
Evaluation: Evaluator with concurrent rollouts, checkpointing, and pass@k metrics; AIMEEvaluator for AIME benchmarks
Rewards: MathRewardFunction using math-verify for symbolic equivalence checking
Utilities: Cached SGLang clients and AWS boto3 sessions with auto-refreshing credentials

Example

 # Run AIME evaluation with pure reasoning                                                                                    
 python examples/aime_eval.py --backend sglang --env chat

 # Run with Python code sandbox (requires aws agentcore credentials)                                                                                     
 python examples/aime_eval.py --backend sglang --env code

Full Changelog: v0.1.0...v0.1.1

Assets 2

03 Feb 10:02

Lawhy

v0.1.0

442ec1c

v0.1.0: Core RL Environment Abstractions

Initial release with core abstractions. Concrete environments will be added in future releases.

What's included

Environment base class with step(), reset(), cleanup(), get_tools(), get_hooks()
Action / TaskContext — user message + ground truth, conversation history, arbitrary metadata
Observation — step messages, metrics, and optional TokenObservation for TITO training
StepResult — bundles observation, reward, and termination reason
TerminationReason — maps agent exceptions to enum values via cause-chain walking
RewardFunction / RewardResult — abstract reward interface
ModelFactory — factory functions for SGLang, Bedrock, and OpenAI backends

Getting started

pip install strands-env                                                                                                               
                                                                                                                                      
from strands_env.core.environment import Environment                                                                                  
                                                                                                                                      
class MathEnv(Environment):                                                                                                           
    def get_tools(self):                                                                                                              
        return [calculator]                                                                                                           
                                                                                                                                      
env = MathEnv(model_factory=factory, reward_fn=reward_fn)                                                                             
result = await env.step(Action(message="What is 2^10?", task_context=TaskContext(ground_truth="1024")))                               
                                                                                                                                      
See examples/math_env.py for a complete runnable example with SGLang and Bedrock backends.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

What's New

CLI for Benchmark Evaluation

Evaluator Hooks

Reproducibility

Built-in Benchmarks

Example

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Highlights

Example

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

What's included

Getting started

Uh oh!

Releases: horizon-rl/strands-env

v0.1.2: CLI and Benchmark Registry

What's New

CLI for Benchmark Evaluation

Evaluator Hooks

Reproducibility

Built-in Benchmarks

Example

Uh oh!

v0.1.1: Add New Environments, Evaluation Infra, and Client Caching

Highlights

Example

Uh oh!

v0.1.0: Core RL Environment Abstractions

What's included

Getting started

Uh oh!