Skip to content

use threadgroup pointers instead of references in metal#9380

Open
39ali wants to merge 3 commits intogfx-rs:trunkfrom
39ali:metal-threadgroup
Open

use threadgroup pointers instead of references in metal#9380
39ali wants to merge 3 commits intogfx-rs:trunkfrom
39ali:metal-threadgroup

Conversation

@39ali
Copy link
Copy Markdown
Contributor

@39ali 39ali commented Apr 6, 2026

Connections
fixes #4500

Description
threadgroups are broken in metal when referenced instead of using pointers(i suspect because of compiler reordering)
this seems to fix the issue

Testing
ran tests, and local test that uses threadgroups

Squash or Rebase?
either

Checklist

  • Run cargo fmt.
  • Run taplo format.
  • Run cargo clippy --tests. If applicable, add:
    • --target wasm32-unknown-unknown
  • Run cargo xtask test to run tests.
  • If this contains user-facing changes, add a CHANGELOG.md entry.

@39ali 39ali changed the title use threadgroup pointers instead of references use threadgroup pointers instead of references in metal Apr 6, 2026
@ErichDonGubler
Copy link
Copy Markdown
Member

@39ali: Before we have somebody do review, this needs to be rebased, and CI errors need to be fixed.

@inner-daemons inner-daemons self-requested a review April 8, 2026 15:28
@inner-daemons inner-daemons self-assigned this Apr 8, 2026
Copy link
Copy Markdown
Collaborator

@inner-daemons inner-daemons left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some initial thoughts, nothing major but will need to be addressed

Comment thread naga/src/back/msl/writer.rs Outdated
};
(coherent, space, access, "&")

let (suffix, reference) = if let crate::AddressSpace::WorkGroup = var.space {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If let is probably less clear than simply var.space == .... Also, do you know whether this should affect AddressSpace::TaskPayload, or any ray payload space? If not that warrants a comment explaining.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just saw the mesh shader changes, i'll look into it

Comment thread naga/src/back/msl/writer.rs Outdated
// but for it to work with the rest of the code we reference it in a temp var in the function body:
// threadgroup type& temp = *temp_ptr;
for (handle, var) in module.global_variables.iter() {
if var.space == crate::AddressSpace::WorkGroup && !fun_info[handle].is_empty() {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, TaskPayload?

Comment thread naga/src/back/msl/writer.rs Outdated
Comment on lines +7606 to +7608
// for threadgroup, we use pointer and not a reference to disable compiler reordering
// but for it to work with the rest of the code we reference it in a temp var in the function body:
// threadgroup type& temp = *temp_ptr;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably link to an issue. "Compiler reordering" is not perfectly clear on what the problem actually is. And its not obvious why we can redeclare as a refernece later without issue.

Comment thread naga/src/back/msl/writer.rs Outdated
Ok(write!(
out,
"{}{}{}{}{}{}{} {}",
"{}{}{}{}{}{}{} {}{}",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not adequately handle cases where there might be another variable with this name. You have to use the writer's namer.

Comment thread naga/src/back/msl/writer.rs Outdated

writeln!(
self.out,
" threadgroup {}& {} = *{}_ptr;",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This _ptr will have to change according to the comment above.

@39ali
Copy link
Copy Markdown
Contributor Author

39ali commented Apr 12, 2026

@inner-daemons i didn't like that local reference hack so i changed it to actually use the pointer

@teoxoy
Copy link
Copy Markdown
Member

teoxoy commented Apr 20, 2026

@39ali could you resolve the CI failures? I will take a look at the PR after.

@39ali 39ali force-pushed the metal-threadgroup branch from f615b3a to df6ba31 Compare April 20, 2026 20:16
@39ali 39ali force-pushed the metal-threadgroup branch from df6ba31 to a834739 Compare April 20, 2026 20:16
@39ali
Copy link
Copy Markdown
Contributor Author

39ali commented Apr 20, 2026

@teoxoy done

@inner-daemons
Copy link
Copy Markdown
Collaborator

Gonna block this on my review again since it touches mesh shader stuff

@inner-daemons inner-daemons self-requested a review April 20, 2026 20:38
Copy link
Copy Markdown
Member

@teoxoy teoxoy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, but I think we can remove 2 sections.

Comment thread naga/src/back/msl/writer.rs Outdated
Comment thread naga/src/back/msl/writer.rs Outdated
@39ali
Copy link
Copy Markdown
Contributor Author

39ali commented Apr 21, 2026

@teoxoy done 👍

@teoxoy
Copy link
Copy Markdown
Member

teoxoy commented Apr 22, 2026

Thanks!

Comment on lines 77 to +78
) {
uint3 nagaGridSize = _ts_main(__local_invocation_index, taskPayload, workgroupData);
threadgroup float workgroupData;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you make this change

Copy link
Copy Markdown
Collaborator

@inner-daemons inner-daemons Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll give you some more context. There is a difference between workgroup variables declared as function parameters and those declared as variables in the function, because you have to manually allocate those declared in the function parameters (and then bind them). But there are benefits so we usually use those.

Mesh shaders don't support using this kind of workgroup variables because the grids can be dispatched by another task shader (which obviously can't allocate and bind a workgroup variable for each dispatched mesh shader grid). So mesh shaders intentionally declare them as variables in the wrapper function, but the inner function doesn't notice since it just takes a reference anyway. Task shaders don't have this limitation so they are treated like compute shaders.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! I wonder why none of the tests failed.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@teoxoy There was an edge case here to protect mesh shaders. It was expanded to task shaders in this change, not removed entirely. So it shouldn't have broken anything but it might've worsened performance or reduced the maximum size of the payload unexpectedly, I'm not entirely sure.

uint __local_invocation_index
, object_data TaskPayload& taskPayload
, threadgroup float& workgroupData
, threadgroup float* workgroupData
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really necessary? Keep in mind this right here isn't actually declaring such a workgroup variable, its merely taking a reference to an existing variable

@@ -442,7 +442,7 @@ impl TypedGlobalVariable<'_> {
};
let (coherent, space, access, reference) = match (var.space.to_msl_name(), var.space) {
(Some(space), crate::AddressSpace::WorkGroup) => {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this also match for TaskPayload address space?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it should, it doesn't map to the threadgroup address space.

Self::TaskPayload => Some("object_data"),

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Task payload storage class is defined to be basically identical to threadgroup storage class, except that in mesh shaders it is immutable. My point here is that this fix should therefore also apply to object_data variables so that they can benefit.

Copy link
Copy Markdown
Collaborator

@inner-daemons inner-daemons left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments above, didn't realize there would be so many so I didn't put it in a review. Here, I mostly want you to avoid making changes to mesh/task shaders that aren't necessary.

Another thing that came up a ton is that TaskPayload address space functions very similarly to workgroup address space (though it is only mutable in task shaders). So it would probably be best to apply all of your fixes to that too.

match context.function.expressions[chain] {
crate::Expression::GlobalVariable(handle) => {
let var = &context.module.global_variables[handle];
var.space == crate::AddressSpace::WorkGroup
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, should this match for task payload variables?

_ => {
if var.space == crate::AddressSpace::WorkGroup
&& ep.stage == crate::ShaderStage::Mesh
&& (ep.stage == crate::ShaderStage::Mesh
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the kind of stuff that I highlighted in the mesh shader ouptut but I don't know why it needed to change

match *self {
Access::GlobalVariable(handle) => {
let var = &module.global_variables[handle];
var.space == crate::AddressSpace::WorkGroup
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again here, should this match a task payload pointer

fn root_is_workgroup_pointer(&self, module: &crate::Module) -> bool {
if let Some(&Access::GlobalVariable(handle)) = self.stack.first() {
let var = &module.global_variables[handle];
return var.space == crate::AddressSpace::WorkGroup;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You know the deal

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[msl-out] Incorrect behavior over workgroup memory due to possibly miscompiled barrier in Metal

4 participants