Skip to content

Commit 3ae0374

Browse files
authored
[Doc] Fix documentation typos and resolve merge conflicts (#11023)
### What this PR does / why we need it? Low-error modification backport #11020 - vLLM version: v0.22.1 - vLLM main: vllm-project/vllm@967c5c3 --------- Signed-off-by: herizhen <1270637059@qq.com>
1 parent 5842eca commit 3ae0374

33 files changed

Lines changed: 90 additions & 90 deletions

docs/source/community/governance.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ vLLM Ascend is an open-source project under the vLLM community, where the author
1414

1515
- Contributor:
1616

17-
**Responsibility:** Help new contributors onboarding, handle and respond to community questions, review RFCs and code.
17+
**Responsibility:** Help new contributors with onboarding, handle and respond to community questions, review RFCs and code.
1818

1919
**Requirements:** Complete at least 1 contribution. A contributor is someone who consistently and actively participates in a project, including but not limited to issue/review/commits/community involvement.
2020

docs/source/developer_guide/Design_Documents/ModelRunner_prepare_inputs.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ The workflow of obtaining inputs:
3333

3434
3. Get `Token IDs`: using token indices to retrieve the Token IDs from **token id table**.
3535

36-
At last, these `Token IDs` are required to be fed into a model, and `positions` should also be sent into the model to create `Rope` (Rotary positional embedding). Both of them are the inputs of the model.
36+
At last, these `Token IDs` are required to be fed into a model, and `positions` should also be sent into the model to create `RoPE` (Rotary positional embedding). Both of them are the inputs of the model.
3737

3838
**Note**: The `Token IDs` are the inputs of a model, so we also call them `Input IDs`.
3939

docs/source/developer_guide/Design_Documents/disaggregated_prefill.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -100,6 +100,6 @@ Under non-symmetric PD scenarios, validate the P-to-D tp ratio against expected
100100

101101
## Limitations
102102

103-
- Heterogeneous P and D nodes are not supportedfor example, running P nodes on A2 and D nodes on A3.
103+
- Heterogeneous P and D nodes are not supported, for example, running P nodes on A2 and D nodes on A3.
104104

105105
- In non-symmetric TP configurations, only cases where the P nodes have a higher TP degree than the D nodes and the P TP count is an integer multiple of the D TP count are supported (i.e., P_tp > D_tp and P_tp % D_tp = 0).

docs/source/developer_guide/Design_Documents/eplb_swift_balancer.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -184,10 +184,10 @@ All integer input parameters must explicitly specify their maximum and minimum v
184184
raise TypeError(f"The {iterations} is not int.")
185185
if iterations <= 0:
186186
raise ValueError(
187-
f"The {iterations} can not less than or equal to 0.")
187+
f"The {iterations} can not be less than or equal to 0.")
188188
if iterations > sys.maxsize:
189189
raise ValueError(
190-
f"The {iterations} can not large than {sys.maxsize}")
190+
f"The {iterations} can not be larger than {sys.maxsize}")
191191
```
192192

193193
#### File Path
@@ -207,7 +207,7 @@ The file path for EPLB must be checked for legality, such as whether the file pa
207207
if ext.lower() != ".json":
208208
raise TypeError("The expert_map is not json.")
209209
if not os.path.exists(expert_map):
210-
raise ValueError("The expert_map is not exist.")
210+
raise ValueError("The expert_map does not exist.")
211211
try:
212212
with open(expert_map, "w", encoding='utf-8') as f:
213213
f.read()

docs/source/developer_guide/contribution/testing.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -210,7 +210,7 @@ pytest -sv tests/ut/test_ascend_config.py
210210
### E2E test
211211

212212
Although vllm-ascend CI provides E2E tests on Ascend CI (for example,
213-
[schedule_nightly_test_a2.yaml](https://github.com/vllm-project/vllm-ascend/blob/main/.github/workflows/schedule_nightly_test_a2.yaml), [schedule_nightly_test_a3.yaml](https://github.com/vllm-project/vllm-ascend/blob/main/.github/workflows/schedule_nightly_test_a3.yaml), [pr_test_full.yaml](https://github.com/vllm-project/vllm-ascend/blob/main/.github/workflows/pr_test_full.yaml)), you can run them locally.
213+
[schedule_nightly_test_a2.yaml](https://github.com/vllm-project/vllm-ascend/blob/main/.github/workflows/schedule_nightly_test_a2.yaml), [schedule_nightly_test_a3.yaml](https://github.com/vllm-project/vllm-ascend/blob/main/.github/workflows/schedule_nightly_test_a3.yaml), [pr_test.yaml](https://github.com/vllm-project/vllm-ascend/blob/main/.github/workflows/pr_test.yaml)), you can run them locally.
214214

215215
#### PR-triggered E2E test
216216

docs/source/developer_guide/evaluation/using_lm_eval.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -139,7 +139,7 @@ lm_eval \
139139
After 30 minutes, the output is as shown below:
140140

141141
```shell
142-
The markdown format results is as below:
142+
The results in Markdown format are as follows:
143143

144144
|Tasks|Version| Filter |n-shot| Metric | |Value | |Stderr|
145145
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|

docs/source/developer_guide/performance_and_debug/performance_benchmark.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -165,7 +165,7 @@ Total num prompt tokens: 1280
165165
Total num output tokens: 1280
166166
```
167167

168-
#### 3.2.4 Multi-Modal Benchmark
168+
#### 3.2.3 Multi-Modal Benchmark
169169

170170
```shell
171171
export VLLM_USE_MODELSCOPE=True
@@ -214,7 +214,7 @@ P99 ITL (ms): 182.28
214214
==================================================
215215
```
216216

217-
#### 3.2.5 Embedding Benchmark
217+
#### 3.2.4 Embedding Benchmark
218218

219219
```shell
220220
vllm serve Qwen/Qwen3-Embedding-8B --trust-remote-code

docs/source/installation.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -367,7 +367,7 @@ Prompt: 'The capital of France is', Generated text: ' a city. What is the capita
367367
Prompt: 'The future of AI is', Generated text: ' a topic that is being discussed in various contexts. In the business world, AI'
368368
```
369369

370-
This section shows process exits after offline inference, and is does not affect actual inference:
370+
This section shows process exits after offline inference, and does not affect actual inference:
371371

372372
```bash
373373
(EngineCore pid=970) INFO 05-12 11:36:00 [core.py:1201] Shutdown initiated (timeout=0)

docs/source/quick_start.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -180,7 +180,7 @@ Prompt: 'The capital of France is', Generated text: ' a city. What is the capita
180180
Prompt: 'The future of AI is', Generated text: ' a topic that is being discussed in various contexts. In the business world, AI'
181181
```
182182

183-
This section shows process exits after offline inference, and is does not affect actual inference:
183+
This section shows process exits after offline inference, and does not affect actual inference:
184184

185185
```bash
186186
(EngineCore pid=970) INFO 05-12 11:36:00 [core.py:1201] Shutdown initiated (timeout=0)

docs/source/tutorials/features/long_sequence_context_parallel_single_node.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -119,7 +119,7 @@ The parameters are explained as follows:
119119
- (2) Decode requests are prioritized for scheduling, and prefill requests are scheduled only if there is available capacity.
120120
- Generally, if `--max-num-batched-tokens` is set to a larger value, the overall latency will be lower, but the pressure on GPU memory (activation value usage) will be greater.
121121
- `--gpu-memory-utilization` represents the proportion of HBM that vLLM will use for actual inference. Its essential function is to calculate the available kv_cache size. During the warm-up phase (referred to as profile run in vLLM), vLLM records the peak GPU memory usage during an inference process with an input size of `--max-num-batched-tokens`. The available kv_cache size is then calculated as: `--gpu-memory-utilization` * HBM size - peak GPU memory usage. Therefore, the larger the value of `--gpu-memory-utilization`, the more kv_cache can be used. However, since the GPU memory usage during the warm-up phase may differ from that during actual inference (e.g., due to uneven EP load), setting `--gpu-memory-utilization` too high may lead to OOM (Out of Memory) issues during actual inference. The default value is `0.9`.
122-
- `--enable-expert-parallel` indicates that EP is enabled. Note that vLLM does not support a mixed approach of ETP and EP; that is, MoE can either use pure EP or pure TP.
122+
- `--enable-expert-parallel` indicates that EP is enabled. Note that vLLM does not support a mixed approach of EP and TP; that is, MoE can either use pure EP or pure TP.
123123
- `--no-enable-prefix-caching` indicates that prefix caching is disabled. To enable it, remove this option.
124124
- `--quantization` "ascend" indicates that quantization is used. To disable quantization, remove this option.
125125
- `--compilation-config` contains configurations related to the aclgraph graph mode. The most significant configurations are "cudagraph_mode" and "cudagraph_capture_sizes", which have the following meanings:

0 commit comments

Comments
 (0)