[QUESTION] What is the principle of pre-communication-optimization?

**Your question**

--no-pre-communication-optimization
By default zb runtime dispatches a tiny communication before the real communication to optimize computation
Why?

code
https://github.com/sail-sg/zero-bubble-pipeline-parallelism/blob/7e03eac45af514be6f0303ec6c2e5422e2945284/megatron/core/pipeline_parallel/zerobubble/runtime.py#L756

```python

# Cannot fuse "pre_send" with other send kernels, or they will get stuck
# possibly as there will be 2 send-recv with the same source and target.
with nvtx_range_ctx("pre_send"):
    pre_send, _ = multi_pipeline_ops(
        pre_sp_tensors, [],
        pre_sn_tensors, [],
        batch_p2p,
    )
with nvtx_range_ctx(send_fused_name):
    send_reqs, _ = multi_pipeline_ops(
        sp_tensors, [],
        sn_tensors, [],
        batch_p2p,
    )

```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QUESTION] What is the principle of pre-communication-optimization? #61

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[QUESTION] What is the principle of pre-communication-optimization? #61

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions