Skip to content

Commit 5b1e1e4

Browse files
committed
docs: fix typos in day 6 inference system overview
- Fix grammar: 'is hide' -> 'is hidden', 'executed' -> 'are executed' - Fix image caption: .png -> .jpg for H800 Node Count figure Made-with: Cursor
1 parent 56d8685 commit 5b1e1e4

1 file changed

Lines changed: 2 additions & 2 deletions

File tree

202502OpenSourceWeek/day_6_one_more_thing_deepseekV3R1_inference_system_overview.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ As we have adopted prefill-decode disaggregation architecture, we employ differe
2424

2525
### Computation-Communication Overlapping
2626
Large-scale cross-node EP introduces significant communication overhead. To mitigate this, we employ a dual-batch overlap strategy to hide communication costs and improve overall throughput by splitting a batch of requests into two microbatches.
27-
During the prefilling phase, these two microbatches executed alternately and the communication cost of one microbatch is hide behind the computation of the other.
27+
During the prefilling phase, these two microbatches are executed alternately and the communication cost of one microbatch is hidden behind the computation of the other.
2828

2929
![Communication-Computation Overlapping during Prefilling Phase.png](figures/Communication-Computation%20Overlapping%20during%20Prefilling%20Phase.png)
3030
*Communication-Computation Overlapping during Prefilling Phase*
@@ -68,7 +68,7 @@ Over the past 24 hours (UTC+8 02/27/2025 12:00 PM to 02/28/2025 12:00 PM), the c
6868
Assuming the leasing cost of one H800 GPU is $2 per hour, the total daily cost amounts to $87,072.
6969

7070
![H800 Node Count For Inference Service.jpg](figures/H800%20Node%20Count%20For%20Inference%20Service.jpg)
71-
*H800 Node Count For Inference Service.png*
71+
*H800 Node Count For Inference Service.jpg*
7272

7373
Within the 24-hour statistical period (UTC+8 02/27/2025 12:00 PM to 02/28/2025 12:00 PM), V3 and R1:
7474
- Total input tokens: 608B, of which 342B tokens (56.3%) hit the on-disk KV cache.

0 commit comments

Comments
 (0)