Skip to content

Conversation

@weiguihua2
Copy link
Collaborator

@weiguihua2 weiguihua2 commented Dec 15, 2025

What this PR does / why we need it?

pd disaggregated support cross-machine.
We send the primary and secondary node information of node p to node d. When node d pulls the KV data, it retrieves the corresponding primary or secondary node information from the mapping.

Does this PR introduce any user-facing change?

How was this patch tested?

@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces multi-node KV cache transfer capabilities by implementing a handshake mechanism. Key changes include adding KVConnectorHandshakeMetadata and local_ip to MooncakeAgentMetadata, and remote_multi_nodes_meta_mapping to ReqMeta to facilitate cross-node metadata exchange. New methods get_handshake_metadata and set_xfer_handshake_metadata were added to MooncakeConnector and MooncakeConnectorScheduler respectively, along with a multi_nodes_meta_mapping attribute in the scheduler. The MooncakeConnectorWorker now generates and stores its own handshake metadata, including its local IP, and uses a new helper method _get_remote_host_info_by_port to resolve remote host information based on the exchanged metadata. The WorkerV1 class was updated to expose this KV connector handshake metadata. A review comment identified an issue where the default value for remote_multi_nodes_meta_mapping was incorrectly set to an integer 1 instead of an empty dictionary {}, which would lead to an AttributeError.

remote_port=kv_transfer_params["remote_port"],
remote_pcp_size=kv_transfer_params.get("remote_pcp_size", 1),
remote_dcp_size=kv_transfer_params.get("remote_dcp_size", 1),
remote_multi_nodes_meta_mapping=kv_transfer_params.get("remote_multi_nodes_meta_mapping", 1),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The default value for remote_multi_nodes_meta_mapping is set to 1, which is incorrect for a parameter that is expected to be a dictionary. If the remote_multi_nodes_meta_mapping key is not present in kv_transfer_params, this will cause an AttributeError: 'int' object has no attribute 'get' in _get_remote_host_info_by_port when it tries to access the mapping. The default value should be an empty dictionary {}.

Suggested change
remote_multi_nodes_meta_mapping=kv_transfer_params.get("remote_multi_nodes_meta_mapping", 1),
remote_multi_nodes_meta_mapping=kv_transfer_params.get("remote_multi_nodes_meta_mapping", {}),

@lidenghui1110
Copy link
Contributor

lidenghui1110 commented Dec 15, 2025

Could you please add more explain of why you need this remote_multi_nodes_meta_mapping?

As I know, when prefill node crossing multi-node, each MooncakeConnectorScheduler will add it's own remote_host and remote_engine_id to kv_transfer_params, no matter master or slave node, which is set here.

@weiguihua2
Copy link
Collaborator Author

Could you please add more explain of why you need this remote_multi_nodes_meta_mapping?

As I know, when prefill node crossing multi-node, each MooncakeConnectorScheduler will add it's own remote_host and remote_engine_id to kv_transfer_params, no matter master or slave node, which is set here.

For DP cross-machine scenarios, there will be multiple instances, but for TP cross-machine scenarios (such as Ray cross-machine or MP cross-machine), there will only be one instance. In this case, the information of the master and slave nodes needs to be sent to the D node.

@lidenghui1110
Copy link
Contributor

Could you please add more explain of why you need this remote_multi_nodes_meta_mapping?
As I know, when prefill node crossing multi-node, each MooncakeConnectorScheduler will add it's own remote_host and remote_engine_id to kv_transfer_params, no matter master or slave node, which is set here.

For DP cross-machine scenarios, there will be multiple instances, but for TP cross-machine scenarios (such as Ray cross-machine or MP cross-machine), there will only be one instance. In this case, the information of the master and slave nodes needs to be sent to the D node.

I got it. But the problem is only existing on Ray cross-machine scenario with only one dp rank like pure TP? While using MP cross-machine, each node will have a DPEnginecore with a MooncakeConnectorScheduler, it will follow kv_transfer_params, each node can set its own remote_host.

Could please confirm if I am right or wrong?

@weiguihua2
Copy link
Collaborator Author

Could you please add more explain of why you need this remote_multi_nodes_meta_mapping?
As I know, when prefill node crossing multi-node, each MooncakeConnectorScheduler will add it's own remote_host and remote_engine_id to kv_transfer_params, no matter master or slave node, which is set here.

For DP cross-machine scenarios, there will be multiple instances, but for TP cross-machine scenarios (such as Ray cross-machine or MP cross-machine), there will only be one instance. In this case, the information of the master and slave nodes needs to be sent to the D node.

I got it. But the problem is only existing on Ray cross-machine scenario with only one dp rank like pure TP? While using MP cross-machine, each node will have a DPEnginecore with a MooncakeConnectorScheduler, it will follow kv_transfer_params, each node can set its own remote_host.

Could please confirm if I am right or wrong?
The community merged a new feature a few weeks ago, which is a TP cross-machine not based on Ray; there is only one DPEnginecore with a MooncakeConnectorScheduler across multiple nodes.
vllm-project/vllm#23691

@weiguihua2
Copy link
Collaborator Author

Could you please add more explain of why you need this remote_multi_nodes_meta_mapping?
As I know, when prefill node crossing multi-node, each MooncakeConnectorScheduler will add it's own remote_host and remote_engine_id to kv_transfer_params, no matter master or slave node, which is set here.

For DP cross-machine scenarios, there will be multiple instances, but for TP cross-machine scenarios (such as Ray cross-machine or MP cross-machine), there will only be one instance. In this case, the information of the master and slave nodes needs to be sent to the D node.

I got it. But the problem is only existing on Ray cross-machine scenario with only one dp rank like pure TP? While using MP cross-machine, each node will have a DPEnginecore with a MooncakeConnectorScheduler, it will follow kv_transfer_params, each node can set its own remote_host.
Could please confirm if I am right or wrong?
The community merged a new feature a few weeks ago, which is a TP cross-machine not based on Ray; there is only one DPEnginecore with a MooncakeConnectorScheduler across multiple nodes.
vllm-project/vllm#23691

For the DP cross-machine, each node will have a DPEnginecore with a MooncakeConnectorScheduler, it will follow kv_transfer_params, each node can set its own remote_host.The current code is compatible with this scenario.

@lidenghui1110
Copy link
Contributor

Could you please add more explain of why you need this remote_multi_nodes_meta_mapping?
As I know, when prefill node crossing multi-node, each MooncakeConnectorScheduler will add it's own remote_host and remote_engine_id to kv_transfer_params, no matter master or slave node, which is set here.

For DP cross-machine scenarios, there will be multiple instances, but for TP cross-machine scenarios (such as Ray cross-machine or MP cross-machine), there will only be one instance. In this case, the information of the master and slave nodes needs to be sent to the D node.

I got it. But the problem is only existing on Ray cross-machine scenario with only one dp rank like pure TP? While using MP cross-machine, each node will have a DPEnginecore with a MooncakeConnectorScheduler, it will follow kv_transfer_params, each node can set its own remote_host.
Could please confirm if I am right or wrong?
The community merged a new feature a few weeks ago, which is a TP cross-machine not based on Ray; there is only one DPEnginecore with a MooncakeConnectorScheduler across multiple nodes.
vllm-project/vllm#23691

For the DP cross-machine, each node will have a DPEnginecore with a MooncakeConnectorScheduler, it will follow kv_transfer_params, each node can set its own remote_host.The current code is compatible with this scenario.

Got it. Thanks for your explanation.

@weiguihua2 weiguihua2 added pd-test enable pd test for PR ready-for-test start test by label for PR ready read for review and removed pd-test enable pd test for PR labels Dec 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready read for review ready-for-test start test by label for PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants