@@ -7,47 +7,34 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
77
88## [ Unreleased]
99
10- ### Added
11-
12- - ** ` Evaluator ` ** : Concurrent evaluation orchestrator with checkpointing, resume, and pass@k metrics.
13- - tqdm progress bar with ` logging_redirect_tqdm ` for clean output
14- - ` n_samples_per_prompt ` for pass@k evaluation
15- - JSONL checkpointing with automatic resume
16- - ** ` AIMEEvaluator ` ** : AIME benchmark evaluator subclass.
17- - ** ` MathRewardFunction ` ** : Math reward using ` math-verify ` for symbolic equivalence checking.
18- - ** ` utils/sglang.py ` ** : SGLang client caching utilities.
19- - ` get_cached_client(base_url, max_connections) ` with ` lru_cache `
20- - ` get_cached_client_from_slime_args(args) ` for slime RL training integration
21- - ** ` utils/aws.py ` ** : AWS boto3 session caching utilities.
22- - ` get_boto3_session(region, profile_name) ` with ` lru_cache `
23- - ` get_assumed_role_session(role_arn, region) ` with ` RefreshableCredentials ` for auto-refresh
24- - ** ` tools/code_interpreter.py ` ** : ` CodeInterpreterToolkit ` for AWS Bedrock AgentCore Code Interpreter.
25- - ` execute_code ` tool for running Python code
26- - ` execute_command ` tool for running shell commands
27- - ** ` environments/code_sandbox/ ` ** : ` CodeSandboxEnv ` using AWS Bedrock AgentCore Code Interpreter.
28- - ` CodeMode ` enum for configurable tool availability (CODE, TERMINAL, CODE_AND_TERMINAL)
29- - Async ` cleanup() ` for session cleanup
30- - ** ` environments/calculator/ ` ** : ` CalculatorEnv ` renamed from SimpleMathEnv for clarity.
31- - Added ` boto3 ` , ` datasets ` , ` tqdm ` to main dependencies.
32-
33- ## [ 0.0.2] - 2026-02-03
34-
35- ### Fixed
36-
37- - Replace git dependency (` strands-sglang @ git+... ` ) with PyPI package (` strands-sglang>=0.1.2 ` ) to fix PyPI upload rejection.
38-
39- ## [ 0.0.1] - 2026-02-03 [ yanked]
40-
41- Initial release — core abstractions only. Environments will be added in future releases.
10+ ## [ 0.1.1] - 2026-02-06
4211
4312### Added
4413
45- - ** ` Environment ` ** base class: ` step() ` , ` reset() ` , ` cleanup() ` , ` get_tools() ` , ` get_hooks() ` , ` compute_metrics() ` .
46- - ** ` Action ` / ` TaskContext ` ** : User message + ground truth, conversation history, and arbitrary metadata (` extra="allow" ` ).
14+ - ** Environments**
15+ - ` CalculatorEnv ` : Simple calculator tool for math problems.
16+ - ` CodeSandboxEnv ` : AWS Bedrock AgentCore Code Interpreter with ` CodeMode ` enum.
17+ - ** Evaluation**
18+ - ` Evaluator ` : Concurrent evaluation with checkpointing, resume, and pass@k metrics.
19+ - ` AIMEEvaluator ` : AIME benchmark evaluator.
20+ - ` MathRewardFunction ` : Math reward using ` math-verify ` for symbolic equivalence.
21+ - ** Utilities**
22+ - ` utils/sglang.py ` : SGLang client caching with ` lru_cache ` .
23+ - ` utils/aws.py ` : AWS boto3 session caching with ` RefreshableCredentials ` for auto-refresh.
24+ - ** Tools**
25+ - ` CodeInterpreterToolkit ` : ` execute_code ` and ` execute_command ` for sandboxed execution.
26+ - ** Examples**
27+ - ` aime_eval.py ` : Support ` --env chat ` and ` --env code ` modes with ` --role-arn ` option.
28+ - ` common.py ` : Use cached SGLang client with connection pooling.
29+
30+ ## [ 0.1.0] - 2026-02-03
31+
32+ Initial release with core abstractions.
33+
34+ - ** ` Environment ` ** base class: ` step() ` , ` reset() ` , ` cleanup() ` , ` get_tools() ` , ` get_hooks() ` .
35+ - ** ` Action ` / ` TaskContext ` ** : User message + ground truth, conversation history, and arbitrary metadata.
4736- ** ` Observation ` ** : Step messages, metrics, and optional ` TokenObservation ` for TITO training.
4837- ** ` StepResult ` ** : Bundles observation, reward, and termination reason.
49- - ** ` TerminationReason ` ** : Maps agent exceptions ( ` MaxToolIterationsReachedError ` , ` MaxTokensReachedException ` , timeouts) to enum values via cause-chain walking.
38+ - ** ` TerminationReason ` ** : Maps agent exceptions to enum values via cause-chain walking.
5039- ** ` RewardFunction ` / ` RewardResult ` ** : Abstract reward interface with scalar reward + diagnostics.
51- - ** ` ModelFactory ` ** type and factory functions for SGLang, Bedrock, and OpenAI backends.
52- - ** ` examples/math_env.py ` ** : Calculator tool example with exact-match reward, supporting SGLang and Bedrock.
53- - CI/CD: GitHub Actions for testing (lint + unit tests on Python 3.10–3.12) and PyPI publishing.
40+ - ** ` ModelFactory ` ** : Factory functions for SGLang, Bedrock, and OpenAI backends.
0 commit comments