Skip to content

Conversation

@jeremylt
Copy link
Member

This fixes the intermittent t354 failures in CI

Basically, we really just need __syncthreads when we're changing the contents shared memory buffers so I moved them to clarify this (and dropped a couple)

@jeremylt jeremylt requested a review from zatkins-dev June 17, 2025 17:05
@jeremylt jeremylt self-assigned this Jun 17, 2025
Copy link
Member

@jedbrown jedbrown left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is tedious to check for correctness. Would it be helpful to document on each function which arguments are shared memory and what the pre/post-conditions are regarding synchronization?

@jeremylt
Copy link
Member Author

The rule of thumb is actually reasonably straightforward - any time we change the contents of data.slice we should sync threads immediately before and after

@jeremylt jeremylt merged commit d6c19ee into main Jun 17, 2025
29 checks passed
@jeremylt jeremylt deleted the jeremy/cuda-tol branch June 17, 2025 18:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants