Skip to content

Releases: horizon-rl/strands-sglang

v0.2.2: Fix logprob regression for RL training

11 Feb 01:33

Choose a tag to compare

What's Changed

Bug Fixes

  • fix(sglang): regression on logprob computation - Fixed a regression introduced in v0.2.0 where logprob_start_len=0 was
    accidentally removed from the SGLang /generate request. This caused input_token_logprobs to not be returned for tool result
    tokens, resulting in None values in the token trajectory.

Impact

This bug affected all token-in/token-out training workflows with tool-calling agents. Users would see:
TypeError: must be real number, not NoneType
when training frameworks (e.g., slime) attempted to create tensors from the logprobs.

Upgrade

pip install --upgrade strands-sglang

v0.2.1 will be yanked

Full Changelog: v0.2.1...v0.2.2

v0.2.1: Smoother SGLang API and Tool Registry

10 Feb 07:15

Choose a tag to compare

Compared to v0.1.0, the v0.2.1 release mainly brings:

  • Supporting tool parser registry and three kinds of tool parsers
  • Significant code refactoring to align with SGLang's native API while removing redundant code

Full Changelog: v0.1.0...v0.2.1

v0.1.3 - QwenXMLToolParser & Renamed Tool Parser API

09 Feb 06:03

Choose a tag to compare

Added

  • QwenXMLToolParser: New parser for Qwen3-Coder XML tool call format
    (<function=name><parameter=key>value</parameter></function>)

Changed

  • [Breaking] Renamed tool parser classes for consistency:
    • ToolCallParserToolParser
    • ToolCallParseResultToolParseResult
    • HermesToolCallParserHermesToolParser
    • tool_call_parser parameter → tool_parser
  • Decorator-based parser registry: New parsers self-register via @register_tool_parser("name") decorator
  • Reorganized tool_parsers module: Extracted into tool_parsers/ module with base.py, hermes.py, qwen_xml.py

Removed

  • CHANGELOG.md (tracking releases via GitHub releases instead)

Full Changelog: v0.1.2...v0.1.3

v0.1.2: Align SGLangModel with other Strands-Agents Model Providers

03 Feb 06:33

Choose a tag to compare

This release has breaking changes related to aligning SGLangModel with other model providers in Strands-Agents and remove redundancy like ephemeral client.

What's Changed

Breaking Changes

  • SGLangModel requires client parameter: SGLangModel now takes a
    required client: SGLangClient instead of base_url/timeout config.
    Ephemeral client creation is removed. All parameters are keyword-only.
  • SGLangConfig no longer contains base_url or timeout: These
    belong to SGLangClient. SGLangConfig now only has model_id,
    params, return_logprobs, and enable_thinking.
  • SGLangClient base_url is keyword-only:
    SGLangClient(base_url="http://...") instead of positional.

Migration

# Before (0.1.1)                                                          
model = SGLangModel(tokenizer=tokenizer,                                  
base_url="http://localhost:30000")                                      
                                                                          
# After (0.1.2)                                                           
client = SGLangClient(base_url="http://localhost:30000")                  
model = SGLangModel(tokenizer=tokenizer, client=client)

Added

  • TokenManager.initial_prompt: Property to access the first segment
    (initial prompt tokens) directly, instead of indexing segments[0].

Changed

  • SGLangModel.client is a public attribute: Stored as self.client
    (previously self._client).
  • TokenManager.add_response() guard: Raises RuntimeError if called before
    any add_prompt(), enforcing that the first segment is always a prompt.
  • TokenManager.segments: No longer returns defensive copies; returns
    internal segment lists directly.

Removed

  • SGLangModel._get_client(): Removed along with ephemeral client
    creation/cleanup logic.

Full Changelog:
v0.1.1...v0.1.2

v0.1.1: Enhance Tool Parsing Logic

27 Jan 02:38

Choose a tag to compare

Added

  • Model Info Getter: Native get_model_info() for SGLang client.

  • Think Block Exclusion: HermesToolCallParser excludes tool calls inside <think> blocks by default, preventing parsing of draft tool calls from reasoning models (Qwen3, DeepSeek-R1). Configurable via think_start_token/think_end_token.

  • Tool Result Ordering: Tool results are now sorted by sequential IDs (call_0000, call_0001, ...) before tokenization. Fixes ordering issues when Strands executes tools concurrently and returns results in completion order.

Full Changelog: v0.1.0...v0.1.1

v0.1.0: Strands-SGLang Beta Release

21 Jan 00:05

Choose a tag to compare

Upon testing on various agentic RL tasks including search and coding agents, we announce beta release of strands-sglang.

Full Changelog: v0.0.3...v0.1.0

v0.0.3 - Effectively Support Qwen3 Series Agentic RL Training

09 Jan 02:55

Choose a tag to compare

Experiments

strands-sglang has been proven working with Slime on Qwen3 series models for agentic RL, providing customizable agent loops and more stabilized training compared with using OpenAIModel.

New Features

enable_thinking Config Option: Control Qwen3 hybrid models' internal reasoning mode

 # Enable thinking for reasoning tasks
 model = SGLangModel(tokenizer=tokenizer, enable_thinking=True)
 # Disable thinking for faster non-reasoning tasks  
 model = SGLangModel(tokenizer=tokenizer, enable_thinking=False)

Bug Fixes

  • SLIME-Aligned Retry for Local Servers: 400 errors are now retried (transient during weight reloading, memory pressure). Only truly non-retryable: 401, 403, 404, and 400 with context length patterns.
  • Improved Context Length Detection: Expanded patterns ("exceed", "too long", "max model len", etc.) to properly raise ContextWindowOverflowException

Changes

  • Simplified TokenManager API: Removed unused tokenizer parameter and decode() method
  • Message Formatting Methods: Converted to @classmethod for clearer intent

Installation

pip install strands-sglang==0.0.3

Full Changelog: v0.0.2...v0.0.3

v0.0.2 - Align SGLang client with Slime for RL Training

07 Jan 05:31

Choose a tag to compare

🚀 Highlights

This release introduces a non-streaming architecture aligned with Slime's http_utils.py, providing ~20x better parallelism for RL training at scale.

⚠️ Breaking Changes

  • SGLangClient.generate() now returns dict[str, Any] directly instead of AsyncGenerator
  • Removed stream config option from SGLangModel

✨ What's New

SGLangClient

  • High-level async HTTP client with connection pooling (1000 max connections)
  • Aggressive retry: 60 attempts with 1s delay (aligned with Slime)
  • Infinite timeout by default for long generations
  • Non-streaming POST for optimal parallelism

Factory Method

client = SGLangClient.from_slime_args(args)

model = SGLangModel(tokenizer=tokenizer, client=client)

Improved Retry Logic

  • Retries all 5xx server errors
  • Retries 408 (Request Timeout) and 429 (Rate Limit)
  • Retries connection errors (ConnectError, PoolTimeout, ReadTimeout)
  • Only non-retryable: permanent client errors (400, 401, 403, 404, 422)

Developer Experience

  • Added conventional commits enforcement via pre-commit hook
  • Default port changed to 30000 (SGLang default)

📦 Installation

pip install strands-sglang==0.0.2

📖 Full Changelog

See CHANGELOG.md for complete details.

Full Changelog: v0.0.1...v0.0.2

v0.0.1

03 Jan 11:01

Choose a tag to compare

strands-sglang Initial Release

strands-sglang is a SGLang model provider for Strands Agents SDK with Token-In/Token-Out (TITO) support for agentic RL training.

Features

  • SGLang Native API: Uses SGLang's native /generate endpoint for efficient token-level generation
  • TITO Support: Tracks complete token trajectories with logprobs for RL training - no retokenization drift (see examples/retokenization_drift/)
  • Tool Call Parsing: Customizable tool parsing aligned with model chat templates (Hermes/Qwen format)
  • Iteration Limiting: Built-in hook to limit tool iterations with clean trajectory truncation