-
Notifications
You must be signed in to change notification settings - Fork 3.9k
Add claude-code feature complete ant api working with interleaved thinkers #16266
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Summary of ChangesHello @TensorTemplar, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances SGLang's interoperability by introducing a robust and feature-rich compatibility layer for Anthropic's Messages API. The primary goal is to allow existing Anthropic client code to seamlessly interact with SGLang-served models, expanding the ecosystem of supported clients. A key aspect of this integration is the advanced handling of 'interleaved thinking' for reasoning-capable models, which extracts and presents internal thought processes as distinct content blocks. This change not only broadens SGLang's API surface but also improves the developer experience by streamlining dependency management and providing comprehensive testing for the new features. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This is an excellent pull request that adds comprehensive support for the Anthropic Messages API. The implementation is thorough, covering streaming, non-streaming, tool use, and token counting. The addition of extensive documentation and a full suite of integration tests is particularly commendable and greatly increases confidence in the new feature. The code is well-structured, though complex due to the need to translate between Anthropic and internal/OpenAI formats.
I've noted a couple of minor documentation typos. Regarding the design decision for INTERLEAVED_THINKING_MODELS, your assessment that it's brittle is correct. A more robust solution would involve capability flags in the model configuration, but that's a larger architectural change. The current approach is a reasonable pragmatic solution for now. The fix in pyproject.toml also looks correct and improves the development workflow.
| > **⚠️ Important for MiniMax M2:** | ||
| > Use `--reasoning-parser minimax` (NOT `minimax-append-think`) for Anthropic API compatibility. | ||
| > - `minimax`: Extracts thinking blocks from model output (use this for `/v1/messages`) | ||
| > - `minimax-append-think`: Adds `<think>` prefix to prompts but doesn't extract thinking, most likely for use with unofficial openai chatmessages extensions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a small typo here. "chatmessages" should probably be two words: "chat messages".
| > - `minimax-append-think`: Adds `<think>` prefix to prompts but doesn't extract thinking, most likely for use with unofficial openai chatmessages extensions | |
| > - `minimax-append-think`: Adds `<think>` prefix to prompts but doesn't extract thinking, most likely for use with unofficial openai chat messages extensions |
Motivation
Implements 8149
Running minimax m2.1 from claude code directly with native Anthropic API is not possible anywhere to my knowledge.
PR adds a claude-code feature complete Anthropic API with useful debug logging (system prompts, reminders etc). I tested with Minimax M2.1 on runpod with 4x RTX PRO 6000 Blackwell GPUs (PCI-e) and am using it for a few days now with no issues so far.
Plan mode, sub-agents, token counters, tool use, thinking time estimates and fluff status messages seem all to work.
Here are some screenshots from the debug logs and claude code:

There are two spurious design decisions which would be great to hear feedback on, apart from code style:
Interleaved thinkers do not have a model config to identify them with, this PR has to hardcode a list of reasoning parsers that we interpret to signify if a model is a "interleaved thinker". I added GLM4.5, Minimax M2 and KIMI K2 - this is brittle and not forward compatible - maintainer feedback would be appreciated in how we should handle it.
Fixed an ordering issue in pyproject which put the test and dev dependencies into as
tool.uv.extra-build-dependenciesinstead of optional dependencies and preventeduv sync --extra devto pick up the dev dependencies. This now works but affects all dev workflows - lmk if we should keep/split/revert, since this was only needed to simplify running tests. Dev deps also includes ruff which could be used to replace most of all logic in pre-commit with a single dependency, but is clearly out of scope for this PR and is planned to be removed once i get feedback.Diff with upstream Anthropic API
Modifications
PR adds a lot of entirely new anthropic api files and tests, please see diffs.
Accuracy Tests
n/a
Benchmarking and Profiling
didn't run either, getting 45-60 tps on the setup described above for single requests
Checklist
Review Process
/tag-run-ci-label,/rerun-failed-ci,/tag-and-rerun-ci) or contact authorized users to do so.