-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
asm: introduce a new x64 assembler #10110
base: main
Are you sure you want to change the base?
Conversation
d209114
to
9140ca2
Compare
943fb8e
to
d694e80
Compare
// These values are transcribed from is happening in | ||
// `SyntheticAmode::finalize`. This, plus the `Into` logic converting a | ||
// `SyntheticAmode` to its external counterpart, are | ||
let frame = state.frame_layout(); | ||
known_offsets[external::offsets::KEY_INCOMING_ARG] = | ||
i32::try_from(frame.tail_args_size + frame.setup_area_size).unwrap(); | ||
known_offsets[external::offsets::KEY_SLOT_OFFSET] = | ||
i32::try_from(frame.outgoing_args_size).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cfallin, I'm not a big fan of this KnownOffsetTable
approach: it seems like we may want to just add appropriate CodeSink
/MachBuffer
methods to propagate this kind of thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps! Were you thinking we'd have specific trait methods for e.g. incoming_arg
and slot_offset
? Or that we'd still be generic over the set of known offsets but have a query method?
I'm totally fine with the latter, in fact it's probably even cleaner; for the former I'd be concerned about baking details of Cranelift's ABI impl into the lower-level library and would still want to follow the analogy of a real assembler letting one define symbolic constants and use them with specific values plugged in. Happy to look at proposed diff if you have one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, we should avoid Cranelift-izing this assembler as much as possible. I guess things I don't like about the known offset table approach is that (a) we rebuild this table for every emissions and (b) building it the way I'm doing it here feels quite fragile. How long before things get out of sync with the keys used during amode conversion?!
@@ -1648,8 +1646,6 @@ block0(v0: i8x16, v1: i32): | |||
; addb %al, (%rax) | |||
; addb %al, (%rax) | |||
; addb %al, (%rax) | |||
; addb %al, (%rax) | |||
; addb %bh, %bh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at this CLIF diffs, I see some potential problems: removed instructions, different immediates, potential add
-> or
miscompilations... This bears another look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Pretty sure those add
-> or
"miscompilations" were just bits changing in the constant pool... not sure why they're changing, though).
d694e80
to
574176f
Compare
This change adds some initial logic implementing an external assembler for Cranelift's x64 backend, as proposed in RFC [bytecodealliance#41]. This adds two crates: - the `cranelift/assembler/meta` crate defines the instructions; to print out the defined instructions use `cargo run -p cranelift-assembler-meta` - the `cranelift/assembler` crate exposes the generated Rust code for those instructions; to see the path to the generated code use `cargo run -p cranelift-assembler` The assembler itself is straight-forward enough (modulo the code generation, of course); its integration into `cranelift-codegen` is what is most tricky about this change. Instructions that we will emit in the new assembler are contained in the `Inst::External` variant. This unfortunately increases the memory size of `Inst`, but only temporarily if we end up removing the extra `enum` indirection by adopting the new assembler wholesale. Another integration point is ISLE: we generate ISLE definitions and a Rust helper macro to make the external assembler instructions accessible to ISLE lowering. This change introduces some duplication: the encoding logic (e.g. for REX instructions) currently lives both in `cranelift-codegen` and the new assembler crate. The `Formatter` logic for the assembler `meta` crate is quite similar to the other `meta` crate. This minimal duplication felt worth the additional safety provided by the new assembler. The `cranelift-assembler` crate is fuzzable (see the `README.md`). It will generate instructions with randomized operands and compare their encoding and pretty-printed string to a known-good disassembler, currently `capstone`. This gives us confidence we previously didn't have regarding emission. In the future, we may want to think through how to fuzz (or otherwise check) the integration between `cranelift-codegen` and this new assembler level. [bytecodealliance#41]: bytecodealliance/rfcs#41
Using the new assembler's pretty-printing results in slightly different disassembly of compiled CLIF. This is because the assembler matches a certain configuration of `capstone`, causing the following obvious differences: - instructions with only two operands only print two operands; the original `MInst` instructions separate out the read-write operand into two separate operands (SSA-like) - the original instructions have some space padding after the instruction mnemonic, those from the new assembler do not This change uses the slightly new style as-is, but this is open for debate; we can change the configuration of `capstone` that we fuzz against. My only preferences would be to (1) retain some way to visually distinguish the new assembler instructions in the disassembly (temporarily, for debugging) and (2) eventually transition to pretty-printing instructions in Intel-style (`rw, r`) instead of the current (`r, rw`).
Though it is likely that `rustfmt` is present in a Rust environment, some CI tasks do not have this tool installed. To handle this case (plus the chance that other Wasmtime builds are similar), this change skips formatting with a `stderr` warning when `rustfmt` fails.
In order to satisfy `ci/publish.rs`, it would appear that we need to use a version that matches the rest of the Cranelift crates.
574176f
to
f6e4f1d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks really really good -- thanks for spawning this idea and working to build it out!
Most of my feedback you already got offline as this was being built and the general shape is therefore what I'd expect and am overall quite happy with. A bunch of nits below but nothing major -- let's get this merged and we can iterate and start the migration process.
Thanks again!
|
||
/// This helper function prints the hexadecimal representation of the immediate | ||
/// value, but only if the value is greater than or equal to 10. This is | ||
/// necessary to match how Capstone pretty-prints immediate values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Egads -- is there a way to configure Capstone to always print in decimal or hex? (Necessary evil otherwise, not your fault, just ugly)
/// This is due to avoid special cases of REX encodings, see Intel SDM Vol. 2A, | ||
/// table 2-5. | ||
#[derive(Clone, Copy, Debug)] | ||
pub struct MinusRsp<R: AsReg>(R); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MinusRsp
evokes some sort of subtraction operation in my mind; maybe NonRspGpr
or something like that?
let mut known_offsets = [0, 0]; | ||
// These values are transcribed from is happening in | ||
// `SyntheticAmode::finalize`. This, plus the `Into` logic converting a | ||
// `SyntheticAmode` to its external counterpart, are |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"from is" and ending with "are..." -- incomplete sentence?
// These values are transcribed from is happening in | ||
// `SyntheticAmode::finalize`. This, plus the `Into` logic converting a | ||
// `SyntheticAmode` to its external counterpart, are | ||
let frame = state.frame_layout(); | ||
known_offsets[external::offsets::KEY_INCOMING_ARG] = | ||
i32::try_from(frame.tail_args_size + frame.setup_area_size).unwrap(); | ||
known_offsets[external::offsets::KEY_SLOT_OFFSET] = | ||
i32::try_from(frame.outgoing_args_size).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps! Were you thinking we'd have specific trait methods for e.g. incoming_arg
and slot_offset
? Or that we'd still be generic over the set of known offsets but have a query method?
I'm totally fine with the latter, in fact it's probably even cleaner; for the former I'd be concerned about baking details of Cranelift's ABI impl into the lower-level library and would still want to follow the analogy of a real assembler letting one define symbolic constants and use them with specific values plugged in. Happy to look at proposed diff if you have one.
@@ -980,6 +980,8 @@ pub(crate) fn check( | |||
Inst::Unwind { .. } | Inst::DummyUse { .. } => Ok(()), | |||
|
|||
Inst::StackSwitchBasic { .. } => Err(PccError::UnimplementedInst), | |||
|
|||
Inst::External { .. } => Ok(()), // TODO: unsure what to do about this! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally fine to do Err(PccError::UnimplementedInst)
for now -- I'll need to go and update this once we've moved instructions over that PCC cares about, but PCC isn't fuzzed at the moment anyway until I can make an update/fix pass...
This is a first step to providing an external assembler for
cranelift-codegen
as described in #41. Each commit has further details, but the summary is that this direction would eventually move all assembler logic out into crates designed for easier checking (e.g., fuzzing).