Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

threads: add shared heap types #1600

Merged
merged 27 commits into from
Jun 12, 2024

Conversation

abrown
Copy link
Collaborator

@abrown abrown commented Jun 10, 2024

This changes heap types everywhere to allow them to be shared as described in the shared-everything-threads proposal. The key refactoring is to push down all the abstract heap types (e.g., func, any, etc.) into an AbstractHeapType, leaving the HeapType enum with two variants: Abstract and Concrete. This allows the Abstract side to be marked shared; the Concrete side's shared-ness must be checked by looking at the pointed-to type, something which isn't yet addressed by this PR.

There are several parts of the PR that bump up against the boundary of what can be marked shared currently (e.g., FuncType and related types can't yet be marked shared). I've commented these locations with TODO: handle shared to be dealt with in future PRs that continue expanding shared everywhere.

@abrown abrown marked this pull request as draft June 10, 2024 17:39
abrown added 3 commits June 10, 2024 10:55
The shared-everything-thread [proposal] specifies that all abstract heap
types can be marked `shared` by prepending them with the 0x65 prefix
byte. While we wait for clarification on what this precisely means, I
chose to implement the "extra byte" version, which will result in larger
binaries that use `shared` types.

The key change here is that `wasmparser::HeapType` now groups all of the
abstract heap types (e.g., `any`) into a
`wasmparser::HeapType::Abstract` variant, which allows us to mark those
as `shared` in a single place. This results in a cascade of changes
elsewhere, most of which are pretty harmless (an extra, internal `match`
on the `ty` of the `Abstract` variant). The real business left
unfinished by this commit is _what_ to do with the `shared` field in all
of these locations; for now I've just noted those as TODOs to get a
review on this approach so far before forging ahead.

[proposal]: https://github.com/WebAssembly/shared-everything-threads
[bytecodealliance#64]: WebAssembly/shared-everything-threads#64
This continues the work of the previous commit by propagating the
`shared` field of abstract heap types through to `wasm_encoder`, with
uses in various other crates. As before, the boundary at which we cannot
yet handle sharedness is marked with TODO.
This finally extends `shared` into the heap types defined in `wast`. As
before, a distinction is drawn between abstract and concrete heap
types.
Copy link
Member

@alexcrichton alexcrichton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High-level question, how complete do you want to get this before landing? For example do you want to have all the // TODO resolved?

Comment on lines +916 to +924
let name = ty.as_str(nullable);
match (nullable, shared) {
// Print the shortened form if nullable; i.e., `*ref`.
(true, true) => write!(f, "(shared {}ref)", name),
(true, false) => write!(f, "{}ref", name),
// Print the long form otherwise; i.e., `(ref *)`.
(false, true) => write!(f, "(ref (shared {}))", name),
(false, false) => write!(f, "(ref {})", name),
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused on the handling of nullable here where it's passed into as_str but also handled in the match here? That seems like it could produce some confusing results in some situations. Should as_str return an enum of some kind to indicate whether it handled nullable or not?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, what I'm trying to do here is factor out the common pattern, mainly for human comprehensibility. I would just use RefType::wat but (a) it's behind the validate feature and (b) because of its &'static str return it can't print out the concrete index like we were accustomed to here. If we dropped (a) and made RefType::wat return a String then we can just have one version of all this...

Perhaps better yet, why don't we just use Debug everywhere RefType::wat was being used (i.e., ty_to_str)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keeping the &str is somewhat important for performance in the operand validator because that's used in a few places for error messages and not dealing with Debug or String helps codegen/perf there.

This may just be me not really being familiar enough with the gc proposal. I basically just keep forgetting all the various shorthands like nullexnref works but (ref nullexn) doesn't or something like that. Anyway just ignore me here I think.

alexcrichton added a commit to alexcrichton/wasm-tools that referenced this pull request Jun 10, 2024
Reading over bytecodealliance#1600 I remembered that I have a difficult time
understanding how value types are encoded in wasm. There's a number of
overlapping concerns and a bit of duplication within `wasmparser`
itself. I've tried to leave an explanatory comment for myself in the
future which can also hopefully help serve others as well.

Along the way I've also managed to remove `ValType::is_valtype_byte`
with some simpler logic to avoid duplication of all the value type bytes
that are supported.
github-merge-queue bot pushed a commit that referenced this pull request Jun 10, 2024
Reading over #1600 I remembered that I have a difficult time
understanding how value types are encoded in wasm. There's a number of
overlapping concerns and a bit of duplication within `wasmparser`
itself. I've tried to leave an explanatory comment for myself in the
future which can also hopefully help serve others as well.

Along the way I've also managed to remove `ValType::is_valtype_byte`
with some simpler logic to avoid duplication of all the value type bytes
that are supported.
@abrown abrown marked this pull request as ready for review June 10, 2024 23:16
@abrown abrown requested a review from alexcrichton June 10, 2024 23:16
@abrown
Copy link
Collaborator Author

abrown commented Jun 10, 2024

High-level question, how complete do you want to get this before landing? For example do you want to have all the // TODO resolved?

No, I think I want to keep the boundary-level TODO: handle shared around so I can make sure to check all these places when I expand shared types even more. The real question for me about completeness is testing: as I describe above, I can't write spec tests without shared-checking, but I kind of wanted to defer that for a future PR. I'm open to fixing it all here, though, but it's all getting a bit unwieldy...

Copy link
Member

@alexcrichton alexcrichton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main worry with this is that there's no tests currently for this PR and personally at least once I land a // TODO: ... it's highly likely that I just forget about it and don't actually handle it until years later when someone hits a bug. I do think that some tests should be part of this, at the very least parsing/printing/dumping/etc or something along those lines. The tests don't necesarily need to be exhaustive, but I think there should be at least some simple things working at this point, even if they should be invalid modules in the future or something like that.

As for things like is_shared, if you're ok taking on the responsibility to audit the // TODO: ... after this lands I think that's ok.

Comment on lines +3994 to +3998
let heap_type = HeapType::Abstract {
shared: false, // TODO: handle shared--see https://github.com/WebAssembly/shared-everything-threads/issues/65.
ty: AbstractHeapType::Any,
};
let any_ref = RefType::new(is_nullable, heap_type).unwrap();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll note that given the current thinking on WebAssembly/shared-everything-threads#65 we're probably going to need a relatively exhaustive audit of all gc instructions. Much of the validator here is framed as pop_operand(Some(some_top_type)) for gc-related things and the pop_operand handles that all internally if there's a subtype on the stack. That only worked though since there's conveniently a single top type to pop, but now if every instruction is polymorphic over shared then there's two top types that need to be handled. That means that we'll have to restructure most GC-related instruction validation to not use the same style and instead have to pop and then see-what-you-got within each instruction.

Either that or some sort of helper like pop_gc(AbstractHeapType) which handles this sort of logic (and all instructions would need to be updated to that)

Comment on lines +916 to +924
let name = ty.as_str(nullable);
match (nullable, shared) {
// Print the shortened form if nullable; i.e., `*ref`.
(true, true) => write!(f, "(shared {}ref)", name),
(true, false) => write!(f, "{}ref", name),
// Print the long form otherwise; i.e., `(ref *)`.
(false, true) => write!(f, "(ref (shared {}))", name),
(false, false) => write!(f, "(ref {})", name),
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keeping the &str is somewhat important for performance in the operand validator because that's used in a few places for error messages and not dealing with Debug or String helps codegen/perf there.

This may just be me not really being familiar enough with the gc proposal. I basically just keep forgetting all the various shorthands like nullexnref works but (ref nullexn) doesn't or something like that. Anyway just ignore me here I think.

abrown added 3 commits June 12, 2024 10:05
In previous commits, we were unable to determine if a reference type was
shared because concrete heap types point to a type elsewhere in the
module that we weren't able to retrieve at the time the check was made.
This change fixes that by plumbing a `TypeList` through to the right
place and implementing `TypeList::is_shared`. This will still fail with
a `todo!()` at the boundary of where shared-ness is implemented; once
composite types gain shared flags that `todo!()` should go away.
When parsing a shared global in the WebAssembly text format, one might
run across the following text: `(global (shared anyref))`. In the
current paradigm, this should parse to a `shared` global that contains
an invalid `anyref` (i.e., `(ref null any)` in long-form, which is
unshared); this should be parseable but result in a validation error.
The correct form is `(global (shared (shared anyref)))`.

This change fixes several issues related to this as well as refactoring
some of the "short-hand"/"long-hand" parsing code.
This change adds generated tests for all of the abstract heap types when
they are `shared` and placed in `shared` globals. Since globals are the
only thing we can mark shared now that can touch these types, they are a
good vehicle for testing the parsing, encoding, and validation of all of
this. Eventually the same must be done for concrete heap types once
composite types (`func`, `struct`, `array`) can be marked shared.
@abrown abrown requested a review from alexcrichton June 12, 2024 17:10
@alexcrichton alexcrichton added this pull request to the merge queue Jun 12, 2024
Merged via the queue into bytecodealliance:main with commit 2108dfc Jun 12, 2024
27 checks passed
@abrown abrown deleted the set-heap-types branch June 12, 2024 17:28
abrown added a commit to abrown/wasm-tools that referenced this pull request Jun 14, 2024
This continues the work of bytecodealliance#1600 to expand the surface area of a
WebAssembly module that can be marked `shared`. This is for support of
the shared-everything-threads [proposal]. The key change is to convert
`CompositeType` in each of the crates from an `enum` to a `struct` in
order to add a `shared` boolean field. The original variants (`Func`,
`Array`, `Struct`) are moved to a separate `enum` and fill an `inner`
field. Propagating this throughout the code base is the bulk of this PR,
with occasional, small refactors to avoid larger match patterns.

[proposal]: https://github.com/WebAssembly/shared-everything-threads
abrown added a commit to abrown/wasm-tools that referenced this pull request Jun 27, 2024
This continues the work of bytecodealliance#1600 to expand the surface area of a
WebAssembly module that can be marked `shared`. This is for support of
the shared-everything-threads [proposal]. The key change is to convert
`CompositeType` in each of the crates from an `enum` to a `struct` in
order to add a `shared` boolean field. The original variants (`Func`,
`Array`, `Struct`) are moved to a separate `enum` and fill an `inner`
field. Propagating this throughout the code base is the bulk of this PR,
with occasional, small refactors to avoid larger match patterns.

[proposal]: https://github.com/WebAssembly/shared-everything-threads
abrown added a commit to abrown/wasm-tools that referenced this pull request Jul 3, 2024
This continues the work of bytecodealliance#1600 to expand the surface area of a
WebAssembly module that can be marked `shared`. This is for support of
the shared-everything-threads [proposal]. The key change is to convert
`CompositeType` in each of the crates from an `enum` to a `struct` in
order to add a `shared` boolean field. The original variants (`Func`,
`Array`, `Struct`) are moved to a separate `enum` and fill an `inner`
field. Propagating this throughout the code base is the bulk of this PR,
with occasional, small refactors to avoid larger match patterns.

[proposal]: https://github.com/WebAssembly/shared-everything-threads
github-merge-queue bot pushed a commit that referenced this pull request Jul 3, 2024
* threads: add `shared` composite types

This continues the work of #1600 to expand the surface area of a
WebAssembly module that can be marked `shared`. This is for support of
the shared-everything-threads [proposal]. The key change is to convert
`CompositeType` in each of the crates from an `enum` to a `struct` in
order to add a `shared` boolean field. The original variants (`Func`,
`Array`, `Struct`) are moved to a separate `enum` and fill an `inner`
field. Propagating this throughout the code base is the bulk of this PR,
with occasional, small refactors to avoid larger match patterns.

[proposal]: https://github.com/WebAssembly/shared-everything-threads

* Add `wast` parsing of shared composite types

This finishes the work of the previous commit, allowing us to express
shared composite types in the text format: `(type $t (shared (...)))`.
This also finishes up the heap type testing by allowing the tests to
express concrete heap types (e.g., `ref $t`).

* Fix some un-formatted files

* Add shared check to rec groups

For now, this explicitly checks that both subtypes are shared. But it
isn't yet clear whether an shared type might be able to match an
unshared one. Once that is resolved, this might need to be updated.

* Add @tlively `struct` and `array` tests

These proposed spec tests are slightly modified from @tlively's
originals:
- they use this crate's choice of error message
- they avoid the shorthand form of a single-field struct, ~(struct
  <type>)` because this crate does not yet know how to parse that
- they avoid the shorthand form for `(elem)` for the same reason

All of these minor issues can be resolved later.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants