Releases: horizon-rl/strands-sglang
v0.2.2: Fix logprob regression for RL training
What's Changed
Bug Fixes
- fix(sglang): regression on logprob computation - Fixed a regression introduced in v0.2.0 where
logprob_start_len=0was
accidentally removed from the SGLang/generaterequest. This causedinput_token_logprobsto not be returned for tool result
tokens, resulting inNonevalues in the token trajectory.
Impact
This bug affected all token-in/token-out training workflows with tool-calling agents. Users would see:
TypeError: must be real number, not NoneType
when training frameworks (e.g., slime) attempted to create tensors from the logprobs.
Upgrade
pip install --upgrade strands-sglangv0.2.1 will be yanked
Full Changelog: v0.2.1...v0.2.2
v0.2.1: Smoother SGLang API and Tool Registry
Compared to v0.1.0, the v0.2.1 release mainly brings:
- Supporting tool parser registry and three kinds of tool parsers
- Significant code refactoring to align with SGLang's native API while removing redundant code
Full Changelog: v0.1.0...v0.2.1
v0.1.3 - QwenXMLToolParser & Renamed Tool Parser API
Added
QwenXMLToolParser: New parser for Qwen3-Coder XML tool call format
(<function=name><parameter=key>value</parameter></function>)
Changed
- [Breaking] Renamed tool parser classes for consistency:
ToolCallParser→ToolParserToolCallParseResult→ToolParseResultHermesToolCallParser→HermesToolParsertool_call_parserparameter →tool_parser
- Decorator-based parser registry: New parsers self-register via
@register_tool_parser("name")decorator - Reorganized tool_parsers module: Extracted into
tool_parsers/module withbase.py,hermes.py,qwen_xml.py
Removed
- CHANGELOG.md (tracking releases via GitHub releases instead)
Full Changelog: v0.1.2...v0.1.3
v0.1.2: Align SGLangModel with other Strands-Agents Model Providers
This release has breaking changes related to aligning SGLangModel with other model providers in Strands-Agents and remove redundancy like ephemeral client.
What's Changed
Breaking Changes
SGLangModelrequiresclientparameter:SGLangModelnow takes a
requiredclient: SGLangClientinstead ofbase_url/timeoutconfig.
Ephemeral client creation is removed. All parameters are keyword-only.SGLangConfigno longer containsbase_urlortimeout: These
belong toSGLangClient.SGLangConfignow only hasmodel_id,
params,return_logprobs, andenable_thinking.SGLangClientbase_urlis keyword-only:
SGLangClient(base_url="http://...")instead of positional.
Migration
# Before (0.1.1)
model = SGLangModel(tokenizer=tokenizer,
base_url="http://localhost:30000")
# After (0.1.2)
client = SGLangClient(base_url="http://localhost:30000")
model = SGLangModel(tokenizer=tokenizer, client=client)Added
- TokenManager.initial_prompt: Property to access the first segment
(initial prompt tokens) directly, instead of indexing segments[0].
Changed
- SGLangModel.client is a public attribute: Stored as self.client
(previously self._client). - TokenManager.add_response() guard: Raises RuntimeError if called before
any add_prompt(), enforcing that the first segment is always a prompt. - TokenManager.segments: No longer returns defensive copies; returns
internal segment lists directly.
Removed
- SGLangModel._get_client(): Removed along with ephemeral client
creation/cleanup logic.
Full Changelog:
v0.1.1...v0.1.2
v0.1.1: Enhance Tool Parsing Logic
Added
-
Model Info Getter: Native
get_model_info()for SGLang client. -
Think Block Exclusion:
HermesToolCallParserexcludes tool calls inside<think>blocks by default, preventing parsing of draft tool calls from reasoning models (Qwen3, DeepSeek-R1). Configurable viathink_start_token/think_end_token. -
Tool Result Ordering: Tool results are now sorted by sequential IDs (
call_0000,call_0001, ...) before tokenization. Fixes ordering issues when Strands executes tools concurrently and returns results in completion order.
Full Changelog: v0.1.0...v0.1.1
v0.1.0: Strands-SGLang Beta Release
Upon testing on various agentic RL tasks including search and coding agents, we announce beta release of strands-sglang.
Full Changelog: v0.0.3...v0.1.0
v0.0.3 - Effectively Support Qwen3 Series Agentic RL Training
Experiments
strands-sglang has been proven working with Slime on Qwen3 series models for agentic RL, providing customizable agent loops and more stabilized training compared with using OpenAIModel.
New Features
enable_thinking Config Option: Control Qwen3 hybrid models' internal reasoning mode
# Enable thinking for reasoning tasks
model = SGLangModel(tokenizer=tokenizer, enable_thinking=True) # Disable thinking for faster non-reasoning tasks
model = SGLangModel(tokenizer=tokenizer, enable_thinking=False)Bug Fixes
- SLIME-Aligned Retry for Local Servers: 400 errors are now retried (transient during weight reloading, memory pressure). Only truly non-retryable: 401, 403, 404, and 400 with context length patterns.
- Improved Context Length Detection: Expanded patterns ("exceed", "too long", "max model len", etc.) to properly raise ContextWindowOverflowException
Changes
- Simplified TokenManager API: Removed unused tokenizer parameter and decode() method
- Message Formatting Methods: Converted to
@classmethodfor clearer intent
Installation
pip install strands-sglang==0.0.3Full Changelog: v0.0.2...v0.0.3
v0.0.2 - Align SGLang client with Slime for RL Training
🚀 Highlights
This release introduces a non-streaming architecture aligned with Slime's http_utils.py, providing ~20x better parallelism for RL training at scale.
⚠️ Breaking Changes
SGLangClient.generate()now returnsdict[str, Any]directly instead ofAsyncGenerator- Removed
streamconfig option fromSGLangModel
✨ What's New
SGLangClient
- High-level async HTTP client with connection pooling (1000 max connections)
- Aggressive retry: 60 attempts with 1s delay (aligned with Slime)
- Infinite timeout by default for long generations
- Non-streaming POST for optimal parallelism
Factory Method
client = SGLangClient.from_slime_args(args)
model = SGLangModel(tokenizer=tokenizer, client=client)Improved Retry Logic
- Retries all 5xx server errors
- Retries 408 (Request Timeout) and 429 (Rate Limit)
- Retries connection errors (
ConnectError,PoolTimeout,ReadTimeout) - Only non-retryable: permanent client errors (400, 401, 403, 404, 422)
Developer Experience
- Added conventional commits enforcement via pre-commit hook
- Default port changed to 30000 (SGLang default)
📦 Installation
pip install strands-sglang==0.0.2📖 Full Changelog
See CHANGELOG.md for complete details.
Full Changelog: v0.0.1...v0.0.2
v0.0.1
strands-sglang Initial Release
strands-sglang is a SGLang model provider for Strands Agents SDK with Token-In/Token-Out (TITO) support for agentic RL training.
Features
- SGLang Native API: Uses SGLang's native /generate endpoint for efficient token-level generation
- TITO Support: Tracks complete token trajectories with logprobs for RL training - no retokenization drift (see examples/retokenization_drift/)
- Tool Call Parsing: Customizable tool parsing aligned with model chat templates (Hermes/Qwen format)
- Iteration Limiting: Built-in hook to limit tool iterations with clean trajectory truncation