Skip to content

Commit 5523589

Browse files
v-shobhitmlcommons-botgithub-actions[bot]anandhu-engarjunsuresh
authored
[GPT-OSS-120B] Reference implementation (mlcommons#2395)
* [Automated Commit] Format Codebase * [Automated Commit] Format Codebase * initial * [Automated Commit] Format Codebase * json fixes * updates, tokenizer * fix padding * concurrent requests * increase timeout * [Automated Commit] Format Codebase * add refactor changes * [Automated Commit] Format Codebase * rm truncation * rm truncation, wait for server ready * left padding * [Automated Commit] Format Codebase * fixes * add failure check * change opts * organize files * rm submodule * add infer stuff * add harmonize-tokens.py * move things * [Automated Commit] Format Codebase * add README * fix name * add commands * update README * accepts output_ids and detokenize * [Automated Commit] Format Codebase * add plotter * fix sampling * add reasoning effort option * [Automated Commit] Format Codebase * doc option * draw input histograms * [Automated Commit] Format Codebase * updates * [Automated Commit] Format Codebase * add more opts * move * [Automated Commit] Format Codebase * refactor * updates * rename opt * updates * [Automated Commit] Format Codebase * add healthbench prompt creation * [Automated Commit] Format Codebase * add healthbench eval * [Automated Commit] Format Codebase * add scripts to fetch datasets * [Automated Commit] Format Codebase * update requirements * add setup enroot script * add changes * [Automated Commit] Format Codebase * add symlinks to gitmodules * add fetch_lcb.py * [Automated Commit] Format Codebase * updates * add pass@k; add spec decode option * add openai client; add pass@k * [Automated Commit] Format Codebase * restrict lcb to v5 * [Automated Commit] Format Codebase * lcb optimizations * remove openai client * [Automated Commit] Format Codebase * rename * remove mmlu, healthbench * add fetch_all.py * updates * add preprocess * update README * [Automated Commit] Format Codebase * add top-p option * add summarize_eval * [Automated Commit] Format Codebase * add trtllm infer script * fixes * add round-robin for multi-dp * fix timeout issues * [Automated Commit] Format Codebase * add anthropic * add stuff * optimize lcb multi-pass * [Automated Commit] Format Codebase * rm healthbench * [Automated Commit] Format Codebase * lcb bug fixes * [Automated Commit] Format Codebase * omit top-k if 0 * [Automated Commit] Format Codebase * add changes and plotting scripts * [Automated Commit] Format Codebase * add overall number * [Automated Commit] Format Codebase * add glob matching * rename * add pubmed tokenization * updates * add tentative gpt-oss fields * remove data dir * create preprocess module * move things to archive * [Automated Commit] Format Codebase * rm unused scripts * rm unused * mv things * add mlperf artifacts * add mlperf artifacts * [Automated Commit] Format Codebase * add utils, gitignore * [Automated Commit] Format Codebase * update README * fix request pool size * [Automated Commit] Format Codebase * add setup * updates * server scenario fix; gpt-oss -> gpt-oss-120b * add fixes * add accuracy eval script for mlperf * finishing touches * [Automated Commit] Format Codebase * refactor mode -> scenario * add eval_perf script * [Automated Commit] Format Codebase * add pass@k to acc eval * add repeats_per_sample option to loadgen * [Automated Commit] Format Codebase * fix harmonize tokens -> text * [Automated Commit] Format Codebase * remove file * fix prompt of summarization * move stuff to sglang * allow use of parquet * [Automated Commit] Format Codebase * fix scores for pass@1 with k repeats * add extra-args option * [Automated Commit] Format Codebase * Update user.conf * updates to use v4 * [Automated Commit] Format Codebase * remove loadgen changes for repeats * gpt-oss -> gpt-oss-120b * [Automated Commit] Format Codebase * update README * remove archive * update frozen requirements * rm harmonize script + fix score calculation * add percentage * [Automated Commit] Format Codebase * empty commit to trigger CLA * remove comments * add gptoss placeholder values * rm gpt-oss fields * update user.conf * add generation_config.json * add docker command * add better parsing and check for harmony tokens * [Automated Commit] Format Codebase * add exact_match log for submission_checker * empty commit to trigger test * [Automated Commit] Format Codebase * remove hardcoded path * [Automated Commit] Format Codebase * Update build_wheels.yml * [Automated Commit] Format Codebase * Update build_wheels.yml --------- Co-authored-by: mlcommons-bot <mlcommons-bot@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: ANANDHU S <71482562+anandhu-eng@users.noreply.github.com> Co-authored-by: Arjun Suresh <arjun@gateoverflow.com> Co-authored-by: Miro <mirhodak@amd.com>
1 parent 9374959 commit 5523589

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

64 files changed

+5990
-310
lines changed

language/gpt-oss-120b/.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
*venv*
2+
*.pkl
3+
*.csv

language/gpt-oss-120b/README.md

Lines changed: 154 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,154 @@
1+
# MLPerf Inference reference implementation for GPT-OSS-120B
2+
This is the reference implementation for GPT-OSS-120B. This is a proposal and is a WIP.
3+
4+
## Model and Dataset download
5+
6+
#### TODO: Replace this with mlc download link when available
7+
8+
* Model: `openai/gpt-oss-120b`, commit id: [`b5c939d`](https://huggingface.co/openai/gpt-oss-120b/tree/b5c939de8f754692c1647ca79fbf85e8c1e70f8a)
9+
* Dataset: Please request access at [this link](https://drive.google.com/drive/folders/1DCfEXHqe69okrqKbSyV-8VUw413JqpPY?usp=drive_link) - **this is a tentative dataset**
10+
11+
Datasets are now provided in **Parquet format** (recommended) for better performance and smaller file size (50% smaller than pickle). Pickle format is still supported for backward compatibility.
12+
13+
## Environment setup
14+
Work on reference implementation is done using the sglang containers at [https://hub.docker.com/r/lmsysorg/sglang/tags](https://hub.docker.com/r/lmsysorg/sglang/tags). For enroot setup, a script is provided under [`setup_enroot.sh`](./setup_enroot.sh). For all sections below, we shall assume this environment is instantiated.
15+
16+
Once in the environment, install additional requirements using [`setup.sh`](./setup.sh):
17+
```bash
18+
./setup.sh
19+
```
20+
21+
## Running the reference implementation: SGLang
22+
Use [`./sglang/run_server.sh`](./sglang/run_server.sh) to launch an SGLang server hosting `gpt-oss-120b`.
23+
24+
### Run the server
25+
```bash
26+
./run_server.sh \
27+
--model_path path/to/gpt-oss-120b/model \
28+
--dp N \
29+
--stream_interval 100 \
30+
--eagle_path optional/path/to/eagle/head
31+
```
32+
The script uses `python3 -m sglang.launch_server` tp instantiate the model, with `tp=pp=ep=1`, and `dp` as specified.
33+
34+
You may also use docker:
35+
```bash
36+
docker run --runtime nvidia --gpus all --net host \
37+
-v ${HF_HOME}:/root/.cache/huggingface \
38+
--env "HUGGING_FACE_HUB_TOKEN=$HF_TOKEN" \
39+
--ipc=host lmsysorg/sglang:latest \
40+
python3 -m sglang.launch_server --model-path ${MODEL_NAME} \
41+
--host 0.0.0.0 --port 3000 --data-parallel-size=1 --max-running-requests 512 \
42+
--mem-fraction-static 0.85 --chunked-prefill-size 16384 --ep-size=1 \
43+
--enable-metrics --stream-interval 500
44+
```
45+
46+
Then, run a benchmark script that uses the client to send/recv requests.
47+
### Run the inference
48+
49+
**Note:** All scripts now support both Parquet (`.parquet`) and Pickle (`.pkl`) formats for dataset files. Parquet is recommended as it offers:
50+
- 50% smaller file size
51+
- Faster loading times
52+
- Cross-language compatibility
53+
- Type-safe schema preservation
54+
55+
Example usage:
56+
```bash
57+
# first, install loadgen
58+
pip install $(git rev-parse --show-toplevel)/loadgen
59+
60+
# Using Parquet format (recommended)
61+
python3 run_mlperf.py \
62+
--scenario offline \
63+
--input-file /path/to/dataset.parquet \
64+
--accuracy
65+
66+
# Using Pickle format (backward compatible)
67+
python3 run_mlperf.py \
68+
--scenario offline \
69+
--input-file /path/to/dataset.pkl \
70+
--accuracy
71+
```
72+
73+
Full command-line options:
74+
```bash
75+
python3 run_mlperf.py --help
76+
usage: run_mlperf.py [-h] [--scenario {offline,server}] --input-file INPUT_FILE [--max-samples MAX_SAMPLES] [--mlperf-conf MLPERF_CONF]
77+
[--user-conf USER_CONF] [--accuracy] [--output-dir OUTPUT_DIR] [--backend {sglang}] [--server-url SERVER_URL]
78+
[--generation-config GENERATION_CONFIG] [--max-new-tokens MAX_NEW_TOKENS] [--num-workers NUM_WORKERS]
79+
[--max-concurrency MAX_CONCURRENCY]
80+
81+
Run MLPerf inference benchmarks for gpt-oss
82+
83+
options:
84+
-h, --help show this help message and exit
85+
--scenario {offline,server}
86+
MLPerf scenario mode
87+
--input-file INPUT_FILE
88+
Path to tokenized dataset (parquet or pickle file)
89+
--max-samples MAX_SAMPLES
90+
Maximum number of samples to use (None for all)
91+
--mlperf-conf MLPERF_CONF
92+
Path to MLPerf configuration file
93+
--user-conf USER_CONF
94+
Path to user configuration file
95+
--accuracy Run accuracy mode instead of performance
96+
--output-dir OUTPUT_DIR
97+
Directory for MLPerf output logs
98+
--backend {sglang} Backend to use for inference
99+
--server-url SERVER_URL
100+
Server URL for backend (SGLang)
101+
--generation-config GENERATION_CONFIG
102+
Path to generation configuration JSON file
103+
--max-new-tokens MAX_NEW_TOKENS
104+
Override max_new_tokens from generation config (default: use value from config)
105+
--num-workers NUM_WORKERS
106+
Number of worker threads (for server scenario)
107+
--max-concurrency MAX_CONCURRENCY
108+
Maximum concurrent requests to backend (SGLang handles batching internally)
109+
110+
```
111+
112+
### Evaluate the accuracy
113+
Run `run_mlperf.py` with `--accuracy`, and then use the generated `mlperf_log_accuracy.json` to evaluate the accuracy of the run.
114+
115+
Example usage:
116+
```bash
117+
# Using Parquet format (recommended)
118+
python3 eval_mlperf_accuracy.py \
119+
--mlperf-log mlperf_results/offline/accuracy/mlperf_log_accuracy.json \
120+
--reference-data /path/to/acc_eval_inputs.parquet \
121+
--tokenizer openai/gpt-oss-120b
122+
123+
# Using Pickle format (backward compatible)
124+
python3 eval_mlperf_accuracy.py \
125+
--mlperf-log mlperf_results/offline/accuracy/mlperf_log_accuracy.json \
126+
--reference-data /path/to/acc_eval_inputs.pkl \
127+
--tokenizer openai/gpt-oss-120b
128+
```
129+
130+
Full command-line options:
131+
```bash
132+
python3 eval_mlperf_accuracy.py --help
133+
usage: eval_mlperf_accuracy.py [-h] --mlperf-log MLPERF_LOG --reference-data REFERENCE_DATA [--tokenizer TOKENIZER] [--output-file OUTPUT_FILE]
134+
[--save-outputs SAVE_OUTPUTS] [--num-lcb-workers NUM_LCB_WORKERS] [--verbose]
135+
136+
Evaluate MLPerf accuracy logs for gpt-oss-120b
137+
138+
options:
139+
-h, --help show this help message and exit
140+
--mlperf-log MLPERF_LOG
141+
Path to mlperf_log_accuracy.json
142+
--reference-data REFERENCE_DATA
143+
Path to reference parquet or pickle file (DataFrame with dataset, ground_truth, etc.)
144+
--tokenizer TOKENIZER
145+
HuggingFace tokenizer name or path
146+
--output-file OUTPUT_FILE
147+
Output JSON file for results (optional)
148+
--save-outputs SAVE_OUTPUTS
149+
Save detokenized outputs to pickle file (ordered by qsl_idx) for debugging
150+
--num-lcb-workers NUM_LCB_WORKERS
151+
Number of parallel workers for LiveCodeBench evaluation (default: 64)
152+
--verbose Verbose logging
153+
154+
```
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
#!/usr/bin/env python3
2+
"""Backend implementations for gpt-oss inference."""
3+
4+
from .base_backend import BaseBackend
5+
from .sglang_backend import SGLangBackend
6+
7+
__all__ = [
8+
"BaseBackend",
9+
"SGLangBackend",
10+
]
Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
#!/usr/bin/env python3
2+
"""Base backend class for gpt-oss inference."""
3+
4+
import abc
5+
import logging
6+
from typing import List, Dict, Any, Optional
7+
8+
logger = logging.getLogger(__name__)
9+
10+
11+
class BaseBackend(abc.ABC):
12+
"""Abstract base class for inference backends.
13+
14+
All backends must implement this interface to work with the MLPerf SUT.
15+
"""
16+
17+
def __init__(self, config: Optional[Dict[str, Any]] = None):
18+
"""Initialize the backend.
19+
20+
Args:
21+
config: Optional configuration dictionary
22+
"""
23+
self.config = config or {}
24+
self.initialized = False
25+
logger.info(f"Initializing {self.__class__.__name__}")
26+
27+
@abc.abstractmethod
28+
def initialize(self) -> None:
29+
"""Initialize the backend (load model, connect to server, etc.)."""
30+
raise NotImplementedError("Subclasses must implement initialize()")
31+
32+
@abc.abstractmethod
33+
def generate(
34+
self,
35+
prompts: List[List[int]],
36+
max_tokens: int = 100,
37+
temperature: float = 0.001,
38+
top_k: int = 1,
39+
top_p: float = 1.0,
40+
**kwargs
41+
) -> List[Dict[str, Any]]:
42+
"""Generate responses for a batch of prompts.
43+
44+
Args:
45+
prompts: List of token ID sequences
46+
max_tokens: Maximum tokens to generate per prompt
47+
temperature: Sampling temperature
48+
top_k: Top-k sampling parameter
49+
top_p: Top-p (nucleus) sampling parameter
50+
**kwargs: Additional backend-specific parameters
51+
52+
Returns:
53+
List of response dictionaries with keys:
54+
- output_ids: List of generated token IDs
55+
- output_text: Generated text (optional)
56+
- metadata: Additional metadata (latencies, etc.)
57+
"""
58+
raise NotImplementedError("Subclasses must implement generate()")
59+
60+
@abc.abstractmethod
61+
def cleanup(self) -> None:
62+
"""Clean up backend resources."""
63+
raise NotImplementedError("Subclasses must implement cleanup()")
64+
65+
def __enter__(self):
66+
"""Context manager entry."""
67+
self.initialize()
68+
return self
69+
70+
def __exit__(self, exc_type, exc_val, exc_tb):
71+
"""Context manager exit."""
72+
self.cleanup()
73+
74+
@property
75+
def is_initialized(self) -> bool:
76+
"""Check if backend is initialized."""
77+
return self.initialized

0 commit comments

Comments
 (0)