SGLang model provider for Strands Agents SDK with Token-in/Token-out rollouts for on-policy agentic RL training (no retokenization drift) [Blog].
✅ Featured in Strands Agents Docs: Community Model Provider: SGLang
This package is designed to make the serving-oriented agent scaffold Strands Agents SDK training-ready by exposing end-to-end, token-level rollouts from SGLang while reusing Strands’ customizable agent loop.
- Token-In/Token-Out rollouts (token IDs + logprobs/masks): no retokenization drift
- Strict, on-policy tool-call parsing: no heuristic repair or post-processing; tool calls are parsed exactly as generated by models
- Native SGLang
/generate: high-throughput, non-streaming rollouts
- Python 3.10+
- Strands Agents SDK
- SGLang server running with your model
- HuggingFace tokenizer for the model
pip install strands-sglang strands-agents-toolsOr install from source with development dependencies:
git clone https://github.com/horizon-rl/strands-sglang.git
cd strands-sglang
pip install -e ".[dev]"python -m sglang.launch_server \
--model-path Qwen/Qwen3-4B-Instruct-2507 \
--port 30000 \
--host 0.0.0.0import asyncio
from transformers import AutoTokenizer
from strands import Agent
from strands_tools import calculator
from strands_sglang import SGLangClient, SGLangModel
async def main():
client = SGLangClient(base_url="http://localhost:30000")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-4B-Instruct-2507")
model = SGLangModel(client=client, tokenizer=tokenizer)
agent = Agent(model=model, tools=[calculator])
result = await agent.invoke_async("What is 25 * 17?")
print(result)
# Access token data for RL training
print(f"Tokens: {model.token_manager.token_ids}")
print(f"Loss mask: {model.token_manager.loss_mask}")
print(f"Logprobs: {model.token_manager.logprobs}")
asyncio.run(main())For RL training with slime, SGLangModel eliminates the retokenization step, see an concrete example at slime/examples/strands_sglang:
from strands import Agent, tool
from strands_sglang import SGLangClient, SGLangModel, ToolIterationLimiter
from slime.utils.types import Sample
SYSTEM_PROMPT = "..."
MAX_TOOL_ITERATIONS= ... # e.g., 5
@tool
def execute_python_code(code: str):
"""Execute Python code and return the output."""
...
async def generate(args, sample: Sample, sampling_params) -> Sample:
"""Customize slime's rollout function using `SGLangModel`"""
assert not args.partial_rollout, "Partial rollout not supported."
state = GenerateState(args)
# Set up Agent with SGLangModel and ToolIterationLimiter hook
model = SGLangModel(
client=get_client(args),
tokenizer=state.tokenizer,
sampling_params={k: sampling_params[k] for k in ["max_new_tokens", "temperature", "top_p"]},
)
limiter = ToolIterationLimiter(max_iterations=MAX_TOOL_ITERATIONS)
agent = Agent(
model=model,
tools=[execute_python_code],
hooks=[limiter],
callback_handler=None,
system_prompt=SYSTEM_PROMPT,
)
# Run Agent Loop
prompt = sample.prompt if isinstance(sample.prompt, str) else sample.prompt[0]["content"]
try:
await agent.invoke_async(prompt)
sample.status = Sample.Status.COMPLETED
except Exception as e:
# Always use TRUNCATED instead of ABORTED because Slime doesn't properly
# handle ABORTED samples in reward processing. See: https://github.com/THUDM/slime/issues/200
sample.status = Sample.Status.TRUNCATED
logger.warning(f"TRUNCATED: {type(e).__name__}: {e}")
# Extract token trajectory from token_manager
tm = model.token_manager
prompt_len = len(tm.segments[0]) # system + user are first segment
sample.tokens = tm.token_ids
sample.loss_mask = tm.loss_mask[prompt_len:]
sample.rollout_log_probs = tm.logprobs[prompt_len:]
sample.response_length = len(sample.tokens) - prompt_len
sample.response = model.tokenizer.decode(sample.tokens[prompt_len:], skip_special_tokens=False)
# Cleanup and return
model.reset()
agent.cleanup()
return sample# Unit tests
pytest tests/unit/ -v
# Integration tests (requires SGLang server)
pytest tests/integration/ -v --sglang-base-url=http://localhost:30000Contributions welcome! Install pre-commit hooks for code style and commit message validation:
pip install -e ".[dev]"
pre-commit install -t pre-commit -t commit-msgThis project uses Conventional Commits. Commit messages must follow the format:
<type>(<scope>): <description>
# Examples:
feat(client): add retry backoff configuration
fix(sglang): handle empty response from server
docs: update usage examples
Allowed types: feat, fix, docs, style, refactor, perf, test, build, ci, chore, revert
- strands-vllm - Community vLLM provider for Strands Agents SDK
Apache License 2.0 - see LICENSE.