Skip to content

Conversation

@TensorTemplar
Copy link

@TensorTemplar TensorTemplar commented Jan 1, 2026

Motivation

Implements 8149

Running minimax m2.1 from claude code directly with native Anthropic API is not possible anywhere to my knowledge.
PR adds a claude-code feature complete Anthropic API with useful debug logging (system prompts, reminders etc). I tested with Minimax M2.1 on runpod with 4x RTX PRO 6000 Blackwell GPUs (PCI-e) and am using it for a few days now with no issues so far.

Plan mode, sub-agents, token counters, tool use, thinking time estimates and fluff status messages seem all to work.

Here are some screenshots from the debug logs and claude code:
tps_minimax_m21_4x6000_blackwell

minimax_claude_code_tool_use

There are two spurious design decisions which would be great to hear feedback on, apart from code style:

  1. Interleaved thinkers do not have a model config to identify them with, this PR has to hardcode a list of reasoning parsers that we interpret to signify if a model is a "interleaved thinker". I added GLM4.5, Minimax M2 and KIMI K2 - this is brittle and not forward compatible - maintainer feedback would be appreciated in how we should handle it.

  2. Fixed an ordering issue in pyproject which put the test and dev dependencies into as tool.uv.extra-build-dependencies instead of optional dependencies and prevented uv sync --extra dev to pick up the dev dependencies. This now works but affects all dev workflows - lmk if we should keep/split/revert, since this was only needed to simplify running tests. Dev deps also includes ruff which could be used to replace most of all logic in pre-commit with a single dependency, but is clearly out of scope for this PR and is planned to be removed once i get feedback.

Diff with upstream Anthropic API

  1. No cryptographic signatures for thinking blocks
  2. No strict server-side tool schema validation

Modifications

PR adds a lot of entirely new anthropic api files and tests, please see diffs.

Accuracy Tests

n/a

Benchmarking and Profiling

didn't run either, getting 45-60 tps on the setup described above for single requests

Checklist

Review Process

  1. Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
  2. Get approvals from CODEOWNERS and other reviewers.
  3. Trigger CI tests with comments (/tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci) or contact authorized users to do so.
  4. After green CI and required approvals, ask Merge Oncalls to merge.

@github-actions github-actions bot added documentation Improvements or additions to documentation dependencies Pull requests that update a dependency file labels Jan 1, 2026
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @TensorTemplar, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances SGLang's interoperability by introducing a robust and feature-rich compatibility layer for Anthropic's Messages API. The primary goal is to allow existing Anthropic client code to seamlessly interact with SGLang-served models, expanding the ecosystem of supported clients. A key aspect of this integration is the advanced handling of 'interleaved thinking' for reasoning-capable models, which extracts and presents internal thought processes as distinct content blocks. This change not only broadens SGLang's API surface but also improves the developer experience by streamlining dependency management and providing comprehensive testing for the new features.

Highlights

  • Anthropic Messages API Integration: Introduced a feature-complete Anthropic Messages API (/v1/messages) endpoint, enabling SGLang to serve models compatible with Anthropic's protocol, including support for single/multi-turn conversations, system prompts, streaming, complex content blocks (text, image), and proper error handling.
  • Token Counting API: Added a dedicated Token Counting API (/v1/messages/count_tokens) that allows users to count tokens for messages, system prompts, and tools without requiring model inference, useful for cost estimation and context management.
  • Extended Thinking and Reasoning Support: Implemented support for extended thinking/reasoning blocks for compatible models (e.g., DeepSeek-R1, Qwen3, MiniMax M2, Kimi K2). This feature automatically detects and extracts thinking content, returning it as separate thinking content blocks, with both streaming and non-streaming support.
  • Tool Use and Function Calling: Enabled comprehensive tool use functionality, converting Anthropic's tool structure to/from OpenAI's function calling format, and supporting various tool_choice options (auto, any, specific tool).
  • Authentication Flexibility: The API now supports both Anthropic-style (x-api-key) and OpenAI-style (Authorization: Bearer) API key authentication across all endpoints for maximum compatibility.
  • Development Dependency Fix: Resolved an ordering issue in pyproject.toml that incorrectly placed test and development dependencies, preventing uv sync --all-extras from picking them up. This change improves the developer workflow for running tests.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This is an excellent pull request that adds comprehensive support for the Anthropic Messages API. The implementation is thorough, covering streaming, non-streaming, tool use, and token counting. The addition of extensive documentation and a full suite of integration tests is particularly commendable and greatly increases confidence in the new feature. The code is well-structured, though complex due to the need to translate between Anthropic and internal/OpenAI formats.

I've noted a couple of minor documentation typos. Regarding the design decision for INTERLEAVED_THINKING_MODELS, your assessment that it's brittle is correct. A more robust solution would involve capability flags in the model configuration, but that's a larger architectural change. The current approach is a reasonable pragmatic solution for now. The fix in pyproject.toml also looks correct and improves the development workflow.

> **⚠️ Important for MiniMax M2:**
> Use `--reasoning-parser minimax` (NOT `minimax-append-think`) for Anthropic API compatibility.
> - `minimax`: Extracts thinking blocks from model output (use this for `/v1/messages`)
> - `minimax-append-think`: Adds `<think>` prefix to prompts but doesn't extract thinking, most likely for use with unofficial openai chatmessages extensions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There's a small typo here. "chatmessages" should probably be two words: "chat messages".

Suggested change
> - `minimax-append-think`: Adds `<think>` prefix to prompts but doesn't extract thinking, most likely for use with unofficial openai chatmessages extensions
> - `minimax-append-think`: Adds `<think>` prefix to prompts but doesn't extract thinking, most likely for use with unofficial openai chat messages extensions

@TensorTemplar TensorTemplar changed the title Add feature complete ant api working with claude code for interleaved thinkers Add claude-code feature complete ant api working with for interleaved thinkers Jan 1, 2026
@TensorTemplar TensorTemplar changed the title Add claude-code feature complete ant api working with for interleaved thinkers Add claude-code feature complete ant api working with interleaved thinkers Jan 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant