Skip to content

Commit 5a8713a

Browse files
authored
Update and rename Mistral-3.md to Ministral-3.md for AMD (sgl-project#205)
* Update and rename Mistral-3.md to Ministral-3.md * Create index.js * Update index.js
1 parent 0e9efe4 commit 5a8713a

File tree

3 files changed

+381
-40
lines changed

3 files changed

+381
-40
lines changed
Lines changed: 292 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,292 @@
1+
# Ministral-3
2+
3+
## 1. Model Introduction
4+
The largest model in the Ministral 3 family, Ministral 3 14B offers frontier capabilities and performance comparable to its larger Mistral Small 3.2 24B counterpart. A powerful and efficient language model with vision capabilities.
5+
6+
The Ministral 3 14B Instruct model offers the following capabilities:
7+
8+
Vision: Enables the model to analyze images and provide insights based on visual content, in addition to text.
9+
Multilingual: Supports dozens of languages, including English, French, Spanish, German, Italian, Portuguese, Dutch, Chinese, Japanese, Korean, Arabic.
10+
System Prompt: Maintains strong adherence and support for system prompts.
11+
Agentic: Offers best-in-class agentic capabilities with native function calling and JSON outputting.
12+
Edge-Optimized: Delivers best-in-class performance at a small scale, deployable anywhere.
13+
Apache 2.0 License: Open-source license allowing usage and modification for both commercial and non-commercial purposes.
14+
Large Context Window: Supports a 256k context window.
15+
16+
17+
For further details, please refer to the [official documentation](https://github.com/mistralai)
18+
19+
20+
## 2. SGLang Installation
21+
22+
Please refer to the [official SGLang installation guide](https://docs.sglang.ai/get_started/install.html) for installation instructions.
23+
24+
## 3. Model Deployment
25+
26+
This section provides deployment configurations optimized for different hardware platforms and use cases.
27+
28+
### 3.1 Basic Configuration
29+
30+
**Interactive Command Generator**: Use the configuration selector below to automatically generate the appropriate deployment command for your hardware platform, model variant, deployment strategy, and thinking capabilities.
31+
32+
import Ministral3ConfigGenerator from '@site/src/components/autoregressive/Ministral3ConfigGenerator';
33+
34+
35+
<Ministral3ConfigGenerator />
36+
37+
### 3.2 Configuration Tips
38+
**Context length vs memory**: Ministral-3 advertises a long context window; if you are memory-constrained, start by lowering --context-length (for example 32768) and increase once things are stable.
39+
40+
**Pre-installation steps**: Adding the following steps after launching the docker
41+
```shell
42+
pip install mistral-common --upgrade
43+
pip install transformers==5.0.0.rc0
44+
```
45+
## 4. Model Invocation
46+
47+
### 4.1 Basic Usage
48+
49+
For basic API usage and request examples, please refer to:
50+
51+
- [SGLang Basic Usage Guide](https://docs.sglang.ai/basic_usage/send_request.html)
52+
- [SGLang OpenAI Vision API Guide](https://docs.sglang.ai/basic_usage/openai_api_vision.html)
53+
54+
55+
### 4.2 Advanced Usage
56+
57+
#### 4.2.1 Launch the docker
58+
```shell
59+
docker pull lmsysorg/sglang:v0.5.9-rocm720-mi30x
60+
```
61+
62+
```shell
63+
docker run -d -it --ipc=host --network=host --privileged \
64+
--cap-add=CAP_SYS_ADMIN \
65+
--device=/dev/kfd --device=/dev/dri --device=/dev/mem \
66+
--group-add video --cap-add=SYS_PTRACE \
67+
--security-opt seccomp=unconfined \
68+
-v /:/work \
69+
-e SHELL=/bin/bash \
70+
--name Ministral \
71+
lmsysorg/sglang:v0.5.9-rocm720-mi30x \
72+
/bin/bash
73+
```
74+
75+
76+
#### 4.2.2 Launch the server
77+
```shell
78+
sglang serve \
79+
--model-path mistralai/Ministral-3-14B-Instruct-2512 \
80+
--tp 1 \
81+
--trust-remote-code
82+
```
83+
84+
85+
86+
## 5. Benchmark
87+
88+
This section uses **industry-standard configurations** for comparable benchmark results.
89+
90+
### 5.1 Speed Benchmark
91+
92+
**Test Environment:**
93+
94+
- Hardware: MI300X GPU (8x)
95+
- Model: mistralai/Ministral-3-14B-Instruct-2512
96+
- Tensor Parallelism: 1
97+
- SGLang Version: 0.5.7
98+
99+
- Model Deployment Command:
100+
101+
```bash
102+
sglang serve \
103+
--model-path mistralai/Ministral-3-14B-Instruct-2512 \
104+
--tp 1 \
105+
--trust-remote-code
106+
```
107+
108+
##### Low Concurrency
109+
- Benchmark Command:
110+
```bash
111+
python3 -m sglang.bench_serving \
112+
--backend sglang \
113+
--model mistralai/Ministral-3-14B-Instruct-2512 \
114+
--dataset-name random \
115+
--random-input-len 1000 \
116+
--random-output-len 1000 \
117+
--num-prompts 10 \
118+
--max-concurrency 1 \
119+
--request-rate inf
120+
```
121+
122+
- Test Results:
123+
```
124+
============ Serving Benchmark Result ============
125+
Backend: sglang
126+
Traffic request rate: inf
127+
Max request concurrency: 1
128+
Successful requests: 10
129+
Benchmark duration (s): 65.08
130+
Total input tokens: 6101
131+
Total input text tokens: 6101
132+
Total input vision tokens: 0
133+
Total generated tokens: 4220
134+
Total generated tokens (retokenized): 4218
135+
Request throughput (req/s): 0.15
136+
Input token throughput (tok/s): 93.75
137+
Output token throughput (tok/s): 64.84
138+
Peak output token throughput (tok/s): 151.00
139+
Peak concurrent requests: 2
140+
Total token throughput (tok/s): 158.59
141+
Concurrency: 1.00
142+
----------------End-to-End Latency----------------
143+
Mean E2E Latency (ms): 6505.51
144+
Median E2E Latency (ms): 3037.37
145+
---------------Time to First Token----------------
146+
Mean TTFT (ms): 3709.33
147+
Median TTFT (ms): 53.72
148+
P99 TTFT (ms): 33320.77
149+
-----Time per Output Token (excl. 1st token)------
150+
Mean TPOT (ms): 6.63
151+
Median TPOT (ms): 6.64
152+
P99 TPOT (ms): 6.66
153+
---------------Inter-Token Latency----------------
154+
Mean ITL (ms): 6.64
155+
Median ITL (ms): 6.65
156+
P95 ITL (ms): 6.75
157+
P99 ITL (ms): 6.82
158+
Max ITL (ms): 8.45
159+
==================================================
160+
```
161+
162+
##### Medium Concurrency
163+
- Benchmark Command:
164+
```bash
165+
python3 -m sglang.bench_serving \
166+
--backend sglang \
167+
--model mistralai/Ministral-3-14B-Instruct-2512 \
168+
--dataset-name random \
169+
--random-input-len 1000 \
170+
--random-output-len 1000 \
171+
--num-prompts 80 \
172+
--max-concurrency 16 \
173+
--request-rate inf
174+
```
175+
- Test Results:
176+
```
177+
============ Serving Benchmark Result ============
178+
Backend: sglang
179+
Traffic request rate: inf
180+
Max request concurrency: 16
181+
Successful requests: 80
182+
Benchmark duration (s): 31.20
183+
Total input tokens: 39668
184+
Total input text tokens: 39668
185+
Total input vision tokens: 0
186+
Total generated tokens: 40805
187+
Total generated tokens (retokenized): 40783
188+
Request throughput (req/s): 2.56
189+
Input token throughput (tok/s): 1271.38
190+
Output token throughput (tok/s): 1307.82
191+
Peak output token throughput (tok/s): 1760.00
192+
Peak concurrent requests: 22
193+
Total token throughput (tok/s): 2579.20
194+
Concurrency: 13.72
195+
----------------End-to-End Latency----------------
196+
Mean E2E Latency (ms): 5351.07
197+
Median E2E Latency (ms): 5626.45
198+
---------------Time to First Token----------------
199+
Mean TTFT (ms): 280.87
200+
Median TTFT (ms): 68.16
201+
P99 TTFT (ms): 1194.79
202+
-----Time per Output Token (excl. 1st token)------
203+
Mean TPOT (ms): 10.47
204+
Median TPOT (ms): 10.10
205+
P99 TPOT (ms): 20.00
206+
---------------Inter-Token Latency----------------
207+
Mean ITL (ms): 9.96
208+
Median ITL (ms): 9.10
209+
P95 ITL (ms): 9.87
210+
P99 ITL (ms): 51.39
211+
Max ITL (ms): 888.63
212+
==================================================
213+
```
214+
215+
##### High Concurrency
216+
- Benchmark Command:
217+
```bash
218+
python3 -m sglang.bench_serving \
219+
--backend sglang \
220+
--model mistralai/Ministral-3-14B-Instruct-2512 \
221+
--dataset-name random \
222+
--random-input-len 1000 \
223+
--random-output-len 1000 \
224+
--num-prompts 500 \
225+
--max-concurrency 100 \
226+
--request-rate inf
227+
```
228+
229+
- Test Results:
230+
```
231+
============ Serving Benchmark Result ============
232+
Backend: sglang
233+
Traffic request rate: inf
234+
Max request concurrency: 100
235+
Successful requests: 500
236+
Benchmark duration (s): 88.75
237+
Total input tokens: 249831
238+
Total input text tokens: 249831
239+
Total input vision tokens: 0
240+
Total generated tokens: 252662
241+
Total generated tokens (retokenized): 252547
242+
Request throughput (req/s): 5.63
243+
Input token throughput (tok/s): 2815.01
244+
Output token throughput (tok/s): 2846.91
245+
Peak output token throughput (tok/s): 4271.00
246+
Peak concurrent requests: 110
247+
Total token throughput (tok/s): 5661.93
248+
Concurrency: 93.04
249+
----------------End-to-End Latency----------------
250+
Mean E2E Latency (ms): 16514.45
251+
Median E2E Latency (ms): 15834.45
252+
---------------Time to First Token----------------
253+
Mean TTFT (ms): 148.57
254+
Median TTFT (ms): 99.15
255+
P99 TTFT (ms): 455.86
256+
-----Time per Output Token (excl. 1st token)------
257+
Mean TPOT (ms): 32.93
258+
Median TPOT (ms): 34.73
259+
P99 TPOT (ms): 38.05
260+
---------------Inter-Token Latency----------------
261+
Mean ITL (ms): 32.45
262+
Median ITL (ms): 27.30
263+
P95 ITL (ms): 71.73
264+
P99 ITL (ms): 73.45
265+
Max ITL (ms): 328.10
266+
==================================================
267+
```
268+
269+
270+
### 5.2 Accuracy Benchmark
271+
272+
Document model accuracy on standard benchmarks:
273+
274+
#### 5.2.1 GSM8K Benchmark
275+
276+
- Benchmark Command
277+
278+
```bash
279+
python3 benchmark/gsm8k/bench_sglang.py \
280+
--num-shots 8 \
281+
--num-questions 1316 \
282+
--parallel 1316
283+
```
284+
285+
**Test Results:**
286+
287+
```
288+
Accuracy: 0.959
289+
Invalid: 0.000
290+
Latency: 29.185 s
291+
Output throughput: 4854.672 token/s
292+
```

docs/autoregressive/Mistral/Mistral-3.md

Lines changed: 0 additions & 40 deletions
This file was deleted.

0 commit comments

Comments
 (0)