Skip to content

[BugFix] Implement scatter-done event handling for RDMA writes#189

Open
matthewygf wants to merge 3 commits intoLMCache:mainfrom
matthewygf:main
Open

[BugFix] Implement scatter-done event handling for RDMA writes#189
matthewygf wants to merge 3 commits intoLMCache:mainfrom
matthewygf:main

Conversation

@matthewygf
Copy link
Collaborator

This pull request adds synchronization mechanisms to prevent race conditions when using RDMA writes and scatter operations on the same memory pool in lmcache_ascend/v1/npu_connector.py. The changes ensure that RDMA operations do not overwrite buffers that are still being read by a scatter operation, improving data integrity and correctness.

Synchronization improvements for RDMA and scatter operations:

  • Introduced per-pool scatter-done events (pool_scatter_events) and flags (pool_scatter_recorded) to ensure that an RDMA write does not race with an ongoing scatter read from the same pool. The code now waits for the previous scatter to complete before allowing RDMA writes to proceed.
  • Implemented recording of scatter-done events on the load_stream after each scatter operation, and updated the logic to track which pool's event was recorded, ensuring proper synchronization for subsequent RDMA writes.

Added handling for scatter-done events to prevent RDMA writes from racing with ongoing reads in the same pool. Introduced event recording for synchronization in multi-batch processing.

Signed-off-by: Matthew Yeung <yyygggfff@hotmail.com>
@matthewygf matthewygf marked this pull request as ready for review March 11, 2026 22:36
Copilot AI review requested due to automatic review settings March 11, 2026 22:36
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the npu_connector by integrating critical synchronization logic to manage concurrent RDMA write and scatter read operations. The changes are designed to eliminate potential race conditions, thereby safeguarding data consistency and improving the reliability of memory access patterns in high-performance computing environments.

Highlights

  • Synchronization for RDMA and Scatter Operations: Introduced robust synchronization mechanisms to prevent race conditions between RDMA writes and scatter operations that access the same memory pool. This ensures data integrity by preventing RDMA from overwriting buffers still being read by a scatter operation.
  • Per-Pool Scatter-Done Events: Implemented pool_scatter_events and pool_scatter_recorded flags to track the completion of scatter operations for each memory pool. RDMA writes now wait for the corresponding scatter event to complete before proceeding.
  • Event Recording on Load Stream: Updated the logic to record scatter-done events on the load_stream after each scatter operation. This ensures proper synchronization for subsequent RDMA writes by marking which pool's event has been recorded.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • lmcache_ascend/v1/npu_connector.py
    • Initialized pool_scatter_events (torch.npu.Event objects) and pool_scatter_recorded (boolean flags) to manage per-pool scatter synchronization.
    • Added a wait mechanism using transport_stream.wait_event to ensure that an RDMA write for a specific pool does not proceed until the previous scatter operation on that pool has completed.
    • Recorded scatter-done events on the self.load_stream for the appropriate pool after each scatter operation.
    • Updated the pool_scatter_recorded flag to True for the pool whose scatter-done event was just recorded.
    • Removed a TODO comment related to investigating scatter-done event recording on the load stream.
Activity
  • No human activity has been recorded on this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements scatter-done event handling for RDMA writes in the NPU connector's ping-pong pipeline, resolving a TODO about potential race conditions between RDMA writes and scatter reads on the same memory pool.

Changes:

  • Added per-pool scatter-done events and tracking flags to synchronize RDMA writes with ongoing scatter operations
  • Replaced a TODO comment with actual event recording and synchronization logic

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces synchronization for RDMA writes and scatter operations to prevent race conditions, which is a solid improvement for data integrity. The use of per-pool events and flags is well-implemented. I have one suggestion to improve maintainability by avoiding a hardcoded value.

Signed-off-by: Matthew Yeung <yyygggfff@hotmail.com>
Ensure compute stream waits for load_stream's KV scatter completion before attention reads.

Signed-off-by: Matthew Yeung <yyygggfff@hotmail.com>
@matthewygf matthewygf changed the title Implement scatter-done event handling for RDMA writes [BugFix] Implement scatter-done event handling for RDMA writes Mar 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants