feat(api_server): Add OpenAI-compatible API server for MaxText models #2313

babyplutokurt · 2025-09-08T22:34:54Z

This commit introduces a fully-featured, OpenAI-compatible RESTful API server for serving MaxText models. The server is built with FastAPI, supports multi-host inference on TPUs, and is designed for both interactive use and large-scale benchmarking.

Key features and additions:

Core Server Implementation:
- Adds maxtext_server.py, a FastAPI application that serves /v1/completions and /v1/chat/completions endpoints.
- Implements dynamic request batching to efficiently utilize underlying hardware.
- Uses maxtext_generator.py to encapsulate the MaxText inference engine, handling model loading, tokenization, and the generation loop.
- Includes Pydantic models in server_models.py for robust, OpenAI-compliant request and response validation.
Deployment and Utilities:
- Provides start_server.sh to simplify launching the server from the project root.
- Adds port_forward_xpk.sh, a utility script to automatically find and connect to a server running on a GKE cluster via xpk, supporting custom namespaces.
- Isolates server-specific dependencies in benchmarks/api_server/requirements.txt (uvicorn, fastapi, openai-harmony).
Comprehensive Documentation:
- A new README.md in the api_server directory offers a complete guide covering: - Installation and environment setup. - Launching the server in both single-pod and multi-pod GKE environments. - Detailed examples for interacting with the API using curl and the openai Python client. - Step-by-step instructions for running benchmarks with lm-evaluation-harness and evalchemy for both log-likelihood and generative tasks.

Checklist

Before submitting this PR, please make sure (put X in square brackets):

I have performed a self-review of my code.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run end-to-end tests tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed.

RissyRan

Great work! I will take 2nd round of review for files maxtet_generator, maxtext_server, and server_models.

src/MaxText/maxengine.py

benchmarks/api_server/README.md

requirements.txt

benchmarks/api_server/README.md

benchmarks/api_server/port_forward_xpk.sh

benchmarks/api_server/server_utils.py

RissyRan

Thanks! LGTM in general. Could you leverage Gemini to build some unit tests, especailly for maxtext_generator? More unit tests are very welcome!

benchmarks/api_server/README.md

benchmarks/api_server/maxtext_generator.py

benchmarks/api_server/server_models.py

benchmarks/api_server/maxtext_server.py

hengtaoguo

Great job! LGTM to unblock.

+1 to Ran's comment, it would be great to have some unit tests guarding your functionality.

benchmarks/api_server/README.md

benchmarks/api_server/maxtext_generator.py

RissyRan

I am fine to merge those at this moment (in a separate file, and no breakage for existing codebase), but could @hengtaoguo or @bvandermoon help test and verify end-to-end? Currently no tests for those scripts and functionality yet.

benchmarks/api_server/maxtext_generator.py

benchmarks/api_server/port_forward_xpk.sh

RissyRan

Discussing with @bvandermoon and @hengtaoguo, we will follow it up if any issues.

This commit introduces a fully-featured, OpenAI-compatible RESTful API server for serving MaxText models. The server is built with FastAPI, supports multi-host inference on TPUs, and is designed for both interactive use and large-scale benchmarking. Key features and additions: 1. **Core Server Implementation:** - Adds `maxtext_server.py`, a FastAPI application that serves `/v1/completions` and `/v1/chat/completions` endpoints. - Implements dynamic request batching to efficiently utilize underlying hardware. - Uses `maxtext_generator.py` to encapsulate the MaxText inference engine, handling model loading, tokenization, and the generation loop. - Includes Pydantic models in `server_models.py` for robust, OpenAI-compliant request and response validation. 2. **Deployment and Utilities:** - Provides `start_server.sh` to simplify launching the server from the project root. - Adds `port_forward_xpk.sh`, a utility script to automatically find and connect to a server running on a GKE cluster via `xpk`, supporting custom namespaces. - Isolates server-specific dependencies in `benchmarks/api_server/requirements.txt` (`uvicorn`, `fastapi`, `openai-harmony`). 3. **Comprehensive Documentation:** - A new `README.md` in the `api_server` directory offers a complete guide covering: - Installation and environment setup. - Launching the server in both single-pod and multi-pod GKE environments. - Detailed examples for interacting with the API using `curl` and the `openai` Python client. - Step-by-step instructions for running benchmarks with `lm-evaluation-harness` and `evalchemy` for both log-likelihood and generative tasks.

babyplutokurt requested review from A9isha, NuojCheng, RissyRan, SujeethJinesh, SurbhiJainUSC, aireenmei, bvandermoon, gagika, gobbleturk, hengtaoguo, khatwanimohit, mitalisi, parambole, richjames0, shauryagup, shralex, vipannalla and yangyuwei as code owners September 8, 2025 22:34

babyplutokurt force-pushed the api_sever_v1 branch 5 times, most recently from 400ee8c to ce5fab1 Compare September 9, 2025 18:56

RissyRan reviewed Sep 11, 2025

View reviewed changes

babyplutokurt force-pushed the api_sever_v1 branch 5 times, most recently from b70e083 to 18d3055 Compare September 11, 2025 20:31

RissyRan reviewed Sep 12, 2025

View reviewed changes

hengtaoguo approved these changes Sep 12, 2025

View reviewed changes

benchmarks/api_server/README.md Show resolved Hide resolved

benchmarks/api_server/README.md Outdated Show resolved Hide resolved

benchmarks/api_server/maxtext_generator.py Show resolved Hide resolved

benchmarks/api_server/maxtext_generator.py Outdated Show resolved Hide resolved

babyplutokurt force-pushed the api_sever_v1 branch 5 times, most recently from d1b5ad2 to a29735a Compare September 15, 2025 17:42

babyplutokurt requested review from Lumosis, gpolovets1, jrplatin, mailvijayasingh, michelle-yooh and patemotter as code owners September 15, 2025 17:42

babyplutokurt force-pushed the api_sever_v1 branch 3 times, most recently from 000872a to 750ffc2 Compare September 15, 2025 21:42

RissyRan reviewed Sep 16, 2025

View reviewed changes

benchmarks/api_server/maxtext_generator.py Show resolved Hide resolved

benchmarks/api_server/port_forward_xpk.sh Outdated Show resolved Hide resolved

babyplutokurt force-pushed the api_sever_v1 branch 4 times, most recently from c703914 to 07a0e66 Compare September 23, 2025 17:22

RissyRan approved these changes Sep 24, 2025

View reviewed changes

babyplutokurt force-pushed the api_sever_v1 branch from 07a0e66 to e8bdeef Compare September 24, 2025 22:03

babyplutokurt force-pushed the api_sever_v1 branch from e8bdeef to f8031a0 Compare September 26, 2025 17:35

RissyRan added the pull ready label Sep 26, 2025

copybara-service bot merged commit 9784475 into main Sep 26, 2025
27 checks passed

copybara-service bot deleted the api_sever_v1 branch September 26, 2025 19:25

feat(api_server): Add OpenAI-compatible API server for MaxText models #2313

feat(api_server): Add OpenAI-compatible API server for MaxText models #2313

Uh oh!

Conversation

babyplutokurt commented Sep 8, 2025

Checklist

Uh oh!

RissyRan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

RissyRan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hengtaoguo left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

RissyRan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

RissyRan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

hengtaoguo left a comment •

edited

Loading