|
| 1 | +# Ministral-3 |
| 2 | + |
| 3 | +## 1. Model Introduction |
| 4 | +The largest model in the Ministral 3 family, Ministral 3 14B offers frontier capabilities and performance comparable to its larger Mistral Small 3.2 24B counterpart. A powerful and efficient language model with vision capabilities. |
| 5 | + |
| 6 | +The Ministral 3 14B Instruct model offers the following capabilities: |
| 7 | + |
| 8 | +Vision: Enables the model to analyze images and provide insights based on visual content, in addition to text. |
| 9 | +Multilingual: Supports dozens of languages, including English, French, Spanish, German, Italian, Portuguese, Dutch, Chinese, Japanese, Korean, Arabic. |
| 10 | +System Prompt: Maintains strong adherence and support for system prompts. |
| 11 | +Agentic: Offers best-in-class agentic capabilities with native function calling and JSON outputting. |
| 12 | +Edge-Optimized: Delivers best-in-class performance at a small scale, deployable anywhere. |
| 13 | +Apache 2.0 License: Open-source license allowing usage and modification for both commercial and non-commercial purposes. |
| 14 | +Large Context Window: Supports a 256k context window. |
| 15 | + |
| 16 | + |
| 17 | +For further details, please refer to the [official documentation](https://github.com/mistralai) |
| 18 | + |
| 19 | + |
| 20 | +## 2. SGLang Installation |
| 21 | + |
| 22 | +Please refer to the [official SGLang installation guide](https://docs.sglang.ai/get_started/install.html) for installation instructions. |
| 23 | + |
| 24 | +## 3. Model Deployment |
| 25 | + |
| 26 | +This section provides deployment configurations optimized for different hardware platforms and use cases. |
| 27 | + |
| 28 | +### 3.1 Basic Configuration |
| 29 | + |
| 30 | +**Interactive Command Generator**: Use the configuration selector below to automatically generate the appropriate deployment command for your hardware platform, model variant, deployment strategy, and thinking capabilities. |
| 31 | + |
| 32 | +import Ministral3ConfigGenerator from '@site/src/components/autoregressive/Ministral3ConfigGenerator'; |
| 33 | + |
| 34 | + |
| 35 | +<Ministral3ConfigGenerator /> |
| 36 | + |
| 37 | +### 3.2 Configuration Tips |
| 38 | +**Context length vs memory**: Ministral-3 advertises a long context window; if you are memory-constrained, start by lowering --context-length (for example 32768) and increase once things are stable. |
| 39 | + |
| 40 | +**Pre-installation steps**: Adding the following steps after launching the docker |
| 41 | +```shell |
| 42 | +pip install mistral-common --upgrade |
| 43 | +pip install transformers==5.0.0.rc0 |
| 44 | +``` |
| 45 | +## 4. Model Invocation |
| 46 | + |
| 47 | +### 4.1 Basic Usage |
| 48 | + |
| 49 | +For basic API usage and request examples, please refer to: |
| 50 | + |
| 51 | +- [SGLang Basic Usage Guide](https://docs.sglang.ai/basic_usage/send_request.html) |
| 52 | +- [SGLang OpenAI Vision API Guide](https://docs.sglang.ai/basic_usage/openai_api_vision.html) |
| 53 | + |
| 54 | + |
| 55 | +### 4.2 Advanced Usage |
| 56 | + |
| 57 | +#### 4.2.1 Launch the docker |
| 58 | +```shell |
| 59 | +docker pull lmsysorg/sglang:v0.5.9-rocm720-mi30x |
| 60 | +``` |
| 61 | + |
| 62 | +```shell |
| 63 | +docker run -d -it --ipc=host --network=host --privileged \ |
| 64 | + --cap-add=CAP_SYS_ADMIN \ |
| 65 | + --device=/dev/kfd --device=/dev/dri --device=/dev/mem \ |
| 66 | + --group-add video --cap-add=SYS_PTRACE \ |
| 67 | + --security-opt seccomp=unconfined \ |
| 68 | + -v /:/work \ |
| 69 | + -e SHELL=/bin/bash \ |
| 70 | + --name Ministral \ |
| 71 | + lmsysorg/sglang:v0.5.9-rocm720-mi30x \ |
| 72 | + /bin/bash |
| 73 | +``` |
| 74 | + |
| 75 | + |
| 76 | +#### 4.2.2 Launch the server |
| 77 | +```shell |
| 78 | +sglang serve \ |
| 79 | + --model-path mistralai/Ministral-3-14B-Instruct-2512 \ |
| 80 | + --tp 1 \ |
| 81 | + --trust-remote-code |
| 82 | +``` |
| 83 | + |
| 84 | + |
| 85 | + |
| 86 | +## 5. Benchmark |
| 87 | + |
| 88 | +This section uses **industry-standard configurations** for comparable benchmark results. |
| 89 | + |
| 90 | +### 5.1 Speed Benchmark |
| 91 | + |
| 92 | +**Test Environment:** |
| 93 | + |
| 94 | +- Hardware: MI300X GPU (8x) |
| 95 | +- Model: mistralai/Ministral-3-14B-Instruct-2512 |
| 96 | +- Tensor Parallelism: 1 |
| 97 | +- SGLang Version: 0.5.7 |
| 98 | + |
| 99 | +- Model Deployment Command: |
| 100 | + |
| 101 | +```bash |
| 102 | +sglang serve \ |
| 103 | + --model-path mistralai/Ministral-3-14B-Instruct-2512 \ |
| 104 | + --tp 1 \ |
| 105 | + --trust-remote-code |
| 106 | +``` |
| 107 | + |
| 108 | +##### Low Concurrency |
| 109 | +- Benchmark Command: |
| 110 | +```bash |
| 111 | +python3 -m sglang.bench_serving \ |
| 112 | + --backend sglang \ |
| 113 | + --model mistralai/Ministral-3-14B-Instruct-2512 \ |
| 114 | + --dataset-name random \ |
| 115 | + --random-input-len 1000 \ |
| 116 | + --random-output-len 1000 \ |
| 117 | + --num-prompts 10 \ |
| 118 | + --max-concurrency 1 \ |
| 119 | + --request-rate inf |
| 120 | +``` |
| 121 | + |
| 122 | +- Test Results: |
| 123 | +``` |
| 124 | +============ Serving Benchmark Result ============ |
| 125 | +Backend: sglang |
| 126 | +Traffic request rate: inf |
| 127 | +Max request concurrency: 1 |
| 128 | +Successful requests: 10 |
| 129 | +Benchmark duration (s): 65.08 |
| 130 | +Total input tokens: 6101 |
| 131 | +Total input text tokens: 6101 |
| 132 | +Total input vision tokens: 0 |
| 133 | +Total generated tokens: 4220 |
| 134 | +Total generated tokens (retokenized): 4218 |
| 135 | +Request throughput (req/s): 0.15 |
| 136 | +Input token throughput (tok/s): 93.75 |
| 137 | +Output token throughput (tok/s): 64.84 |
| 138 | +Peak output token throughput (tok/s): 151.00 |
| 139 | +Peak concurrent requests: 2 |
| 140 | +Total token throughput (tok/s): 158.59 |
| 141 | +Concurrency: 1.00 |
| 142 | +----------------End-to-End Latency---------------- |
| 143 | +Mean E2E Latency (ms): 6505.51 |
| 144 | +Median E2E Latency (ms): 3037.37 |
| 145 | +---------------Time to First Token---------------- |
| 146 | +Mean TTFT (ms): 3709.33 |
| 147 | +Median TTFT (ms): 53.72 |
| 148 | +P99 TTFT (ms): 33320.77 |
| 149 | +-----Time per Output Token (excl. 1st token)------ |
| 150 | +Mean TPOT (ms): 6.63 |
| 151 | +Median TPOT (ms): 6.64 |
| 152 | +P99 TPOT (ms): 6.66 |
| 153 | +---------------Inter-Token Latency---------------- |
| 154 | +Mean ITL (ms): 6.64 |
| 155 | +Median ITL (ms): 6.65 |
| 156 | +P95 ITL (ms): 6.75 |
| 157 | +P99 ITL (ms): 6.82 |
| 158 | +Max ITL (ms): 8.45 |
| 159 | +================================================== |
| 160 | +``` |
| 161 | + |
| 162 | +##### Medium Concurrency |
| 163 | +- Benchmark Command: |
| 164 | +```bash |
| 165 | +python3 -m sglang.bench_serving \ |
| 166 | + --backend sglang \ |
| 167 | + --model mistralai/Ministral-3-14B-Instruct-2512 \ |
| 168 | + --dataset-name random \ |
| 169 | + --random-input-len 1000 \ |
| 170 | + --random-output-len 1000 \ |
| 171 | + --num-prompts 80 \ |
| 172 | + --max-concurrency 16 \ |
| 173 | + --request-rate inf |
| 174 | +``` |
| 175 | +- Test Results: |
| 176 | +``` |
| 177 | +============ Serving Benchmark Result ============ |
| 178 | +Backend: sglang |
| 179 | +Traffic request rate: inf |
| 180 | +Max request concurrency: 16 |
| 181 | +Successful requests: 80 |
| 182 | +Benchmark duration (s): 31.20 |
| 183 | +Total input tokens: 39668 |
| 184 | +Total input text tokens: 39668 |
| 185 | +Total input vision tokens: 0 |
| 186 | +Total generated tokens: 40805 |
| 187 | +Total generated tokens (retokenized): 40783 |
| 188 | +Request throughput (req/s): 2.56 |
| 189 | +Input token throughput (tok/s): 1271.38 |
| 190 | +Output token throughput (tok/s): 1307.82 |
| 191 | +Peak output token throughput (tok/s): 1760.00 |
| 192 | +Peak concurrent requests: 22 |
| 193 | +Total token throughput (tok/s): 2579.20 |
| 194 | +Concurrency: 13.72 |
| 195 | +----------------End-to-End Latency---------------- |
| 196 | +Mean E2E Latency (ms): 5351.07 |
| 197 | +Median E2E Latency (ms): 5626.45 |
| 198 | +---------------Time to First Token---------------- |
| 199 | +Mean TTFT (ms): 280.87 |
| 200 | +Median TTFT (ms): 68.16 |
| 201 | +P99 TTFT (ms): 1194.79 |
| 202 | +-----Time per Output Token (excl. 1st token)------ |
| 203 | +Mean TPOT (ms): 10.47 |
| 204 | +Median TPOT (ms): 10.10 |
| 205 | +P99 TPOT (ms): 20.00 |
| 206 | +---------------Inter-Token Latency---------------- |
| 207 | +Mean ITL (ms): 9.96 |
| 208 | +Median ITL (ms): 9.10 |
| 209 | +P95 ITL (ms): 9.87 |
| 210 | +P99 ITL (ms): 51.39 |
| 211 | +Max ITL (ms): 888.63 |
| 212 | +================================================== |
| 213 | +``` |
| 214 | + |
| 215 | +##### High Concurrency |
| 216 | +- Benchmark Command: |
| 217 | +```bash |
| 218 | +python3 -m sglang.bench_serving \ |
| 219 | + --backend sglang \ |
| 220 | + --model mistralai/Ministral-3-14B-Instruct-2512 \ |
| 221 | + --dataset-name random \ |
| 222 | + --random-input-len 1000 \ |
| 223 | + --random-output-len 1000 \ |
| 224 | + --num-prompts 500 \ |
| 225 | + --max-concurrency 100 \ |
| 226 | + --request-rate inf |
| 227 | +``` |
| 228 | + |
| 229 | +- Test Results: |
| 230 | +``` |
| 231 | +============ Serving Benchmark Result ============ |
| 232 | +Backend: sglang |
| 233 | +Traffic request rate: inf |
| 234 | +Max request concurrency: 100 |
| 235 | +Successful requests: 500 |
| 236 | +Benchmark duration (s): 88.75 |
| 237 | +Total input tokens: 249831 |
| 238 | +Total input text tokens: 249831 |
| 239 | +Total input vision tokens: 0 |
| 240 | +Total generated tokens: 252662 |
| 241 | +Total generated tokens (retokenized): 252547 |
| 242 | +Request throughput (req/s): 5.63 |
| 243 | +Input token throughput (tok/s): 2815.01 |
| 244 | +Output token throughput (tok/s): 2846.91 |
| 245 | +Peak output token throughput (tok/s): 4271.00 |
| 246 | +Peak concurrent requests: 110 |
| 247 | +Total token throughput (tok/s): 5661.93 |
| 248 | +Concurrency: 93.04 |
| 249 | +----------------End-to-End Latency---------------- |
| 250 | +Mean E2E Latency (ms): 16514.45 |
| 251 | +Median E2E Latency (ms): 15834.45 |
| 252 | +---------------Time to First Token---------------- |
| 253 | +Mean TTFT (ms): 148.57 |
| 254 | +Median TTFT (ms): 99.15 |
| 255 | +P99 TTFT (ms): 455.86 |
| 256 | +-----Time per Output Token (excl. 1st token)------ |
| 257 | +Mean TPOT (ms): 32.93 |
| 258 | +Median TPOT (ms): 34.73 |
| 259 | +P99 TPOT (ms): 38.05 |
| 260 | +---------------Inter-Token Latency---------------- |
| 261 | +Mean ITL (ms): 32.45 |
| 262 | +Median ITL (ms): 27.30 |
| 263 | +P95 ITL (ms): 71.73 |
| 264 | +P99 ITL (ms): 73.45 |
| 265 | +Max ITL (ms): 328.10 |
| 266 | +================================================== |
| 267 | +``` |
| 268 | + |
| 269 | + |
| 270 | +### 5.2 Accuracy Benchmark |
| 271 | + |
| 272 | +Document model accuracy on standard benchmarks: |
| 273 | + |
| 274 | +#### 5.2.1 GSM8K Benchmark |
| 275 | + |
| 276 | +- Benchmark Command |
| 277 | + |
| 278 | +```bash |
| 279 | +python3 benchmark/gsm8k/bench_sglang.py \ |
| 280 | + --num-shots 8 \ |
| 281 | + --num-questions 1316 \ |
| 282 | + --parallel 1316 |
| 283 | +``` |
| 284 | + |
| 285 | +**Test Results:** |
| 286 | + |
| 287 | +``` |
| 288 | +Accuracy: 0.959 |
| 289 | +Invalid: 0.000 |
| 290 | +Latency: 29.185 s |
| 291 | +Output throughput: 4854.672 token/s |
| 292 | +``` |
0 commit comments