-
Notifications
You must be signed in to change notification settings - Fork 271
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
threads: add shared
heap types
#1600
Conversation
The shared-everything-thread [proposal] specifies that all abstract heap types can be marked `shared` by prepending them with the 0x65 prefix byte. While we wait for clarification on what this precisely means, I chose to implement the "extra byte" version, which will result in larger binaries that use `shared` types. The key change here is that `wasmparser::HeapType` now groups all of the abstract heap types (e.g., `any`) into a `wasmparser::HeapType::Abstract` variant, which allows us to mark those as `shared` in a single place. This results in a cascade of changes elsewhere, most of which are pretty harmless (an extra, internal `match` on the `ty` of the `Abstract` variant). The real business left unfinished by this commit is _what_ to do with the `shared` field in all of these locations; for now I've just noted those as TODOs to get a review on this approach so far before forging ahead. [proposal]: https://github.com/WebAssembly/shared-everything-threads [bytecodealliance#64]: WebAssembly/shared-everything-threads#64
This continues the work of the previous commit by propagating the `shared` field of abstract heap types through to `wasm_encoder`, with uses in various other crates. As before, the boundary at which we cannot yet handle sharedness is marked with TODO.
This finally extends `shared` into the heap types defined in `wast`. As before, a distinction is drawn between abstract and concrete heap types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
High-level question, how complete do you want to get this before landing? For example do you want to have all the // TODO
resolved?
let name = ty.as_str(nullable); | ||
match (nullable, shared) { | ||
// Print the shortened form if nullable; i.e., `*ref`. | ||
(true, true) => write!(f, "(shared {}ref)", name), | ||
(true, false) => write!(f, "{}ref", name), | ||
// Print the long form otherwise; i.e., `(ref *)`. | ||
(false, true) => write!(f, "(ref (shared {}))", name), | ||
(false, false) => write!(f, "(ref {})", name), | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little confused on the handling of nullable
here where it's passed into as_str
but also handled in the match
here? That seems like it could produce some confusing results in some situations. Should as_str
return an enum
of some kind to indicate whether it handled nullable
or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, what I'm trying to do here is factor out the common pattern, mainly for human comprehensibility. I would just use RefType::wat
but (a) it's behind the validate
feature and (b) because of its &'static str
return it can't print out the concrete index like we were accustomed to here. If we dropped (a) and made RefType::wat
return a String
then we can just have one version of all this...
Perhaps better yet, why don't we just use Debug
everywhere RefType::wat
was being used (i.e., ty_to_str
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keeping the &str
is somewhat important for performance in the operand validator because that's used in a few places for error messages and not dealing with Debug
or String
helps codegen/perf there.
This may just be me not really being familiar enough with the gc proposal. I basically just keep forgetting all the various shorthands like nullexnref
works but (ref nullexn)
doesn't or something like that. Anyway just ignore me here I think.
Reading over bytecodealliance#1600 I remembered that I have a difficult time understanding how value types are encoded in wasm. There's a number of overlapping concerns and a bit of duplication within `wasmparser` itself. I've tried to leave an explanatory comment for myself in the future which can also hopefully help serve others as well. Along the way I've also managed to remove `ValType::is_valtype_byte` with some simpler logic to avoid duplication of all the value type bytes that are supported.
Reading over #1600 I remembered that I have a difficult time understanding how value types are encoded in wasm. There's a number of overlapping concerns and a bit of duplication within `wasmparser` itself. I've tried to leave an explanatory comment for myself in the future which can also hopefully help serve others as well. Along the way I've also managed to remove `ValType::is_valtype_byte` with some simpler logic to avoid duplication of all the value type bytes that are supported.
No, I think I want to keep the boundary-level |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My main worry with this is that there's no tests currently for this PR and personally at least once I land a // TODO: ...
it's highly likely that I just forget about it and don't actually handle it until years later when someone hits a bug. I do think that some tests should be part of this, at the very least parsing/printing/dumping/etc or something along those lines. The tests don't necesarily need to be exhaustive, but I think there should be at least some simple things working at this point, even if they should be invalid modules in the future or something like that.
As for things like is_shared
, if you're ok taking on the responsibility to audit the // TODO: ...
after this lands I think that's ok.
let heap_type = HeapType::Abstract { | ||
shared: false, // TODO: handle shared--see https://github.com/WebAssembly/shared-everything-threads/issues/65. | ||
ty: AbstractHeapType::Any, | ||
}; | ||
let any_ref = RefType::new(is_nullable, heap_type).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll note that given the current thinking on WebAssembly/shared-everything-threads#65 we're probably going to need a relatively exhaustive audit of all gc instructions. Much of the validator here is framed as pop_operand(Some(some_top_type))
for gc-related things and the pop_operand handles that all internally if there's a subtype on the stack. That only worked though since there's conveniently a single top type to pop, but now if every instruction is polymorphic over shared
then there's two top types that need to be handled. That means that we'll have to restructure most GC-related instruction validation to not use the same style and instead have to pop and then see-what-you-got within each instruction.
Either that or some sort of helper like pop_gc(AbstractHeapType)
which handles this sort of logic (and all instructions would need to be updated to that)
let name = ty.as_str(nullable); | ||
match (nullable, shared) { | ||
// Print the shortened form if nullable; i.e., `*ref`. | ||
(true, true) => write!(f, "(shared {}ref)", name), | ||
(true, false) => write!(f, "{}ref", name), | ||
// Print the long form otherwise; i.e., `(ref *)`. | ||
(false, true) => write!(f, "(ref (shared {}))", name), | ||
(false, false) => write!(f, "(ref {})", name), | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keeping the &str
is somewhat important for performance in the operand validator because that's used in a few places for error messages and not dealing with Debug
or String
helps codegen/perf there.
This may just be me not really being familiar enough with the gc proposal. I basically just keep forgetting all the various shorthands like nullexnref
works but (ref nullexn)
doesn't or something like that. Anyway just ignore me here I think.
In previous commits, we were unable to determine if a reference type was shared because concrete heap types point to a type elsewhere in the module that we weren't able to retrieve at the time the check was made. This change fixes that by plumbing a `TypeList` through to the right place and implementing `TypeList::is_shared`. This will still fail with a `todo!()` at the boundary of where shared-ness is implemented; once composite types gain shared flags that `todo!()` should go away.
When parsing a shared global in the WebAssembly text format, one might run across the following text: `(global (shared anyref))`. In the current paradigm, this should parse to a `shared` global that contains an invalid `anyref` (i.e., `(ref null any)` in long-form, which is unshared); this should be parseable but result in a validation error. The correct form is `(global (shared (shared anyref)))`. This change fixes several issues related to this as well as refactoring some of the "short-hand"/"long-hand" parsing code.
This change adds generated tests for all of the abstract heap types when they are `shared` and placed in `shared` globals. Since globals are the only thing we can mark shared now that can touch these types, they are a good vehicle for testing the parsing, encoding, and validation of all of this. Eventually the same must be done for concrete heap types once composite types (`func`, `struct`, `array`) can be marked shared.
This continues the work of bytecodealliance#1600 to expand the surface area of a WebAssembly module that can be marked `shared`. This is for support of the shared-everything-threads [proposal]. The key change is to convert `CompositeType` in each of the crates from an `enum` to a `struct` in order to add a `shared` boolean field. The original variants (`Func`, `Array`, `Struct`) are moved to a separate `enum` and fill an `inner` field. Propagating this throughout the code base is the bulk of this PR, with occasional, small refactors to avoid larger match patterns. [proposal]: https://github.com/WebAssembly/shared-everything-threads
This continues the work of bytecodealliance#1600 to expand the surface area of a WebAssembly module that can be marked `shared`. This is for support of the shared-everything-threads [proposal]. The key change is to convert `CompositeType` in each of the crates from an `enum` to a `struct` in order to add a `shared` boolean field. The original variants (`Func`, `Array`, `Struct`) are moved to a separate `enum` and fill an `inner` field. Propagating this throughout the code base is the bulk of this PR, with occasional, small refactors to avoid larger match patterns. [proposal]: https://github.com/WebAssembly/shared-everything-threads
This continues the work of bytecodealliance#1600 to expand the surface area of a WebAssembly module that can be marked `shared`. This is for support of the shared-everything-threads [proposal]. The key change is to convert `CompositeType` in each of the crates from an `enum` to a `struct` in order to add a `shared` boolean field. The original variants (`Func`, `Array`, `Struct`) are moved to a separate `enum` and fill an `inner` field. Propagating this throughout the code base is the bulk of this PR, with occasional, small refactors to avoid larger match patterns. [proposal]: https://github.com/WebAssembly/shared-everything-threads
* threads: add `shared` composite types This continues the work of #1600 to expand the surface area of a WebAssembly module that can be marked `shared`. This is for support of the shared-everything-threads [proposal]. The key change is to convert `CompositeType` in each of the crates from an `enum` to a `struct` in order to add a `shared` boolean field. The original variants (`Func`, `Array`, `Struct`) are moved to a separate `enum` and fill an `inner` field. Propagating this throughout the code base is the bulk of this PR, with occasional, small refactors to avoid larger match patterns. [proposal]: https://github.com/WebAssembly/shared-everything-threads * Add `wast` parsing of shared composite types This finishes the work of the previous commit, allowing us to express shared composite types in the text format: `(type $t (shared (...)))`. This also finishes up the heap type testing by allowing the tests to express concrete heap types (e.g., `ref $t`). * Fix some un-formatted files * Add shared check to rec groups For now, this explicitly checks that both subtypes are shared. But it isn't yet clear whether an shared type might be able to match an unshared one. Once that is resolved, this might need to be updated. * Add @tlively `struct` and `array` tests These proposed spec tests are slightly modified from @tlively's originals: - they use this crate's choice of error message - they avoid the shorthand form of a single-field struct, ~(struct <type>)` because this crate does not yet know how to parse that - they avoid the shorthand form for `(elem)` for the same reason All of these minor issues can be resolved later.
This changes heap types everywhere to allow them to be
shared
as described in the shared-everything-threads proposal. The key refactoring is to push down all the abstract heap types (e.g.,func
,any
, etc.) into anAbstractHeapType
, leaving theHeapType
enum with two variants:Abstract
andConcrete
. This allows theAbstract
side to be markedshared
; theConcrete
side'sshared
-ness must be checked by looking at the pointed-to type, something which isn't yet addressed by this PR.There are several parts of the PR that bump up against the boundary of what can be marked
shared
currently (e.g.,FuncType
and related types can't yet be markedshared
). I've commented these locations withTODO: handle shared
to be dealt with in future PRs that continue expandingshared
everywhere.