Add claude-code feature complete ant api working with interleaved thinkers #16266

TensorTemplar · 2026-01-01T11:19:33Z

Motivation

Implements 8149

Running minimax m2.1 from claude code directly with native Anthropic API is not possible anywhere to my knowledge.
PR adds a claude-code feature complete Anthropic API with useful debug logging (system prompts, reminders etc). I tested with Minimax M2.1 on runpod with 4x RTX PRO 6000 Blackwell GPUs (PCI-e) and am using it for a few days now with no issues so far.

Plan mode, sub-agents, token counters, tool use, thinking time estimates and fluff status messages seem all to work.

Here are some screenshots from the debug logs and claude code:

There are two spurious design decisions which would be great to hear feedback on, apart from code style:

Interleaved thinkers do not have a model config to identify them with, this PR has to hardcode a list of reasoning parsers that we interpret to signify if a model is a "interleaved thinker". I added GLM4.5, Minimax M2 and KIMI K2 - this is brittle and not forward compatible - maintainer feedback would be appreciated in how we should handle it.
Fixed an ordering issue in pyproject which put the test and dev dependencies into as tool.uv.extra-build-dependencies instead of optional dependencies and prevented uv sync --extra dev to pick up the dev dependencies. This now works but affects all dev workflows - lmk if we should keep/split/revert, since this was only needed to simplify running tests. Dev deps also includes ruff which could be used to replace most of all logic in pre-commit with a single dependency, but is clearly out of scope for this PR and is planned to be removed once i get feedback.

Diff with upstream Anthropic API

No cryptographic signatures for thinking blocks
No strict server-side tool schema validation

Modifications

PR adds a lot of entirely new anthropic api files and tests, please see diffs.

Accuracy Tests

n/a

Benchmarking and Profiling

didn't run either, getting 45-60 tps on the setup described above for single requests

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

Review Process

Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments (/tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci) or contact authorized users to do so.
After green CI and required approvals, ask Merge Oncalls to merge.

… thinkers

gemini-code-assist · 2026-01-01T11:20:01Z

Summary of Changes

Hello @TensorTemplar, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances SGLang's interoperability by introducing a robust and feature-rich compatibility layer for Anthropic's Messages API. The primary goal is to allow existing Anthropic client code to seamlessly interact with SGLang-served models, expanding the ecosystem of supported clients. A key aspect of this integration is the advanced handling of 'interleaved thinking' for reasoning-capable models, which extracts and presents internal thought processes as distinct content blocks. This change not only broadens SGLang's API surface but also improves the developer experience by streamlining dependency management and providing comprehensive testing for the new features.

Highlights

Anthropic Messages API Integration: Introduced a feature-complete Anthropic Messages API (/v1/messages) endpoint, enabling SGLang to serve models compatible with Anthropic's protocol, including support for single/multi-turn conversations, system prompts, streaming, complex content blocks (text, image), and proper error handling.
Token Counting API: Added a dedicated Token Counting API (/v1/messages/count_tokens) that allows users to count tokens for messages, system prompts, and tools without requiring model inference, useful for cost estimation and context management.
Extended Thinking and Reasoning Support: Implemented support for extended thinking/reasoning blocks for compatible models (e.g., DeepSeek-R1, Qwen3, MiniMax M2, Kimi K2). This feature automatically detects and extracts thinking content, returning it as separate thinking content blocks, with both streaming and non-streaming support.
Tool Use and Function Calling: Enabled comprehensive tool use functionality, converting Anthropic's tool structure to/from OpenAI's function calling format, and supporting various tool_choice options (auto, any, specific tool).
Authentication Flexibility: The API now supports both Anthropic-style (x-api-key) and OpenAI-style (Authorization: Bearer) API key authentication across all endpoints for maximum compatibility.
Development Dependency Fix: Resolved an ordering issue in pyproject.toml that incorrectly placed test and development dependencies, preventing uv sync --all-extras from picking them up. This change improves the developer workflow for running tests.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This is an excellent pull request that adds comprehensive support for the Anthropic Messages API. The implementation is thorough, covering streaming, non-streaming, tool use, and token counting. The addition of extensive documentation and a full suite of integration tests is particularly commendable and greatly increases confidence in the new feature. The code is well-structured, though complex due to the need to translate between Anthropic and internal/OpenAI formats.

I've noted a couple of minor documentation typos. Regarding the design decision for INTERLEAVED_THINKING_MODELS, your assessment that it's brittle is correct. A more robust solution would involve capability flags in the model configuration, but that's a larger architectural change. The current approach is a reasonable pragmatic solution for now. The fix in pyproject.toml also looks correct and improves the development workflow.

gemini-code-assist · 2026-01-01T11:21:40Z

README_Anthropic_API.md

+> **⚠️ Important for MiniMax M2:**
+> Use `--reasoning-parser minimax` (NOT `minimax-append-think`) for Anthropic API compatibility.
+> - `minimax`: Extracts thinking blocks from model output (use this for `/v1/messages`)
+> - `minimax-append-think`: Adds `<think>` prefix to prompts but doesn't extract thinking, most likely for use with unofficial openai chatmessages extensions


There's a small typo here. "chatmessages" should probably be two words: "chat messages".

Suggested change

> - `minimax-append-think`: Adds `<think>` prefix to prompts but doesn't extract thinking, most likely for use with unofficial openai chatmessages extensions

> - `minimax-append-think`: Adds `<think>` prefix to prompts but doesn't extract thinking, most likely for use with unofficial openai chat messages extensions

python/sglang/srt/entrypoints/anthropic/protocol.py

Add feature complete ant api working with claude code for interleaved…

3677a09

… thinkers

TensorTemplar requested review from CatherineSue, Fridge003, JustinTong0323, ispobock, merrymercy and slin1237 as code owners January 1, 2026 11:19

github-actions bot added documentation Improvements or additions to documentation dependencies Pull requests that update a dependency file labels Jan 1, 2026

gemini-code-assist bot reviewed Jan 1, 2026

View reviewed changes

TensorTemplar changed the title ~~Add feature complete ant api working with claude code for interleaved thinkers~~ Add claude-code feature complete ant api working with for interleaved thinkers Jan 1, 2026

TensorTemplar changed the title ~~Add claude-code feature complete ant api working with for interleaved thinkers~~ Add claude-code feature complete ant api working with interleaved thinkers Jan 1, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add claude-code feature complete ant api working with interleaved thinkers #16266

Add claude-code feature complete ant api working with interleaved thinkers #16266

TensorTemplar commented Jan 1, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Jan 1, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	> - `minimax-append-think`: Adds `<think>` prefix to prompts but doesn't extract thinking, most likely for use with unofficial openai chatmessages extensions
	> - `minimax-append-think`: Adds `<think>` prefix to prompts but doesn't extract thinking, most likely for use with unofficial openai chat messages extensions

Add claude-code feature complete ant api working with interleaved thinkers #16266

Are you sure you want to change the base?

Add claude-code feature complete ant api working with interleaved thinkers #16266

Conversation

TensorTemplar commented Jan 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Diff with upstream Anthropic API

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Review Process

Uh oh!

gemini-code-assist bot commented Jan 1, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 1, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

TensorTemplar commented Jan 1, 2026 •

edited

Loading