-
Notifications
You must be signed in to change notification settings - Fork 1.6k
initial implementation of the Sail-generated RISCV disassembler module #2498
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: next
Are you sure you want to change the base?
initial implementation of the Sail-generated RISCV disassembler module #2498
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall is a very good progress, i would suggest tho to maybe split arch/RISCV/riscv_ast2str.gen.inc
into multiple files since it too big, maybe split it by RV32 and RV64
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@thestr4ng3r take a look too, please, when you have time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the prototype/first implementation the decode is fine like this. But before we can merge it into next we need to optimize two things:
Size
uint64_t rd = (binary_stream & 0x0000000000000F80)>>7 ;
uint64_t rs1 = (binary_stream & 0x00000000000F8000)>>15 ;
uint64_t rs2 = (binary_stream & 0x0000000001F00000)>>20 ;
tree->ast_node_type = RISCV_RTYPE ;
tree->ast_node.rtype.rs2 = rs2;
tree->ast_node.rtype.rs1 = rs1;
tree->ast_node.rtype.rd = rd;
These specific lines are repeated 10 times in the decoder.
I assume there are other decoding patterns happening just as often. In the final version we should not have any duplicated code in here.
Runtime complexity
I greped
for ^ if
and found 505 if cases in the decode function. This means for an illegal instructions it does at least 505 comparisons (assuming the compiler doesn't optimize something out). Which is something more than ~O(n * 10)
(n = number of bits
).
But we should reach in worst case O(n * 1)
and O(log(n))
on average before we merge it to next
.
The current structure is fine. Also because you have the RzIL task as well. So no worries.
What is important though, is that the decoded details (operand details) are stable. No matter how the architecture of this decoder is. Because on once you finished RzIL we would not want to refactor the whole RzIL work, just because we optimized the Capstone decoder :)
That said, good job! Looks like a lot of work! Well done!
arch/RISCV/RISCVDetails.h
Outdated
@@ -0,0 +1,3 @@ | |||
#include "capstone.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#include "capstone.h" | |
#include <capstone/capstone.h> |
arch/RISCV/riscv_ast.gen.inc
Outdated
RISCV_AMOMAXU | ||
} op; | ||
|
||
uint8_t aq /* bits : 1 */; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These /* bits : 1 */
comments, what do they mean?
aq
encodes bit 1 of instruction.aq
is one bit wide.
Please make this more clear. E.g. for the first meaning you could replace it with insn_bits[1:1]
. And for the second meaning: bit_width : 1
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guess it means the last, though being more descriptive doesn't hurt. This one has a low priority though.
@moste00 please update the PR with your latest state of the generated code |
include/capstone/riscv.h
Outdated
|
||
RISCV_INS_ENDING, | ||
} riscv_insn; | ||
#include "riscv_insn.gen.inc" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not ok, but we can write a script to update this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very much like what I see! Awesome job!
Especially the changes to riscv_decode.gen.inc
:)
Please focus on just making it work. You can ignore my comments for now. They are just there so we don't forget about it.
Because I could only take a shallow look, I'll check again in the next days.
arch/RISCV/RISCVModule.c
Outdated
@@ -2,26 +2,38 @@ | |||
/* RISC-V Backend By Rodrigo Cortes Porto <[email protected]> & |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Set your copyright. Please use SPDX header style:
# Copyright © 2022 Rot127 <[email protected]>
# SPDX-License-Identifier: BSD-3
08d68d1
to
be61c4d
Compare
include/capstone/riscv_insn.gen.inc
Outdated
#include <string.h> | ||
|
||
enum riscv_insn { | ||
//--------------------- RISCV_REV8--------------------- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe think about converting these into Doxygen, if it's ever possible?
arch/RISCV/riscv_helpers_ast2str.h
Outdated
*ps = " , "; \ | ||
*plen = 3 | ||
|
||
static inline void hex_bits(uint64_t bitvec, char **s, size_t *len, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add Doxygen comments for these helper functions. Also, maybe move to the capstone utils instead? cc @Rot127
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, please move it to `utils.c
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, you can just use sprintf()
. We depend on libc
anyways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, ignore it. See my comment in the final review message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- For all the string operations: It is better to use
SStream
everywhere. And don't do direct operations onchar *
. This is what it was implement for anyways. Is better tested and is convenient to use.
Edit: Sorry, pressed the "review" button by accident. Will add some more comments.
arch/RISCV/riscv_helpers_ast2str.h
Outdated
*ps = " , "; \ | ||
*plen = 3 | ||
|
||
static inline void hex_bits(uint64_t bitvec, char **s, size_t *len, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, please move it to `utils.c
arch/RISCV/riscv_helpers_ast2str.h
Outdated
*ps = " , "; \ | ||
*plen = 3 | ||
|
||
static inline void hex_bits(uint64_t bitvec, char **s, size_t *len, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, you can just use sprintf()
. We depend on libc
anyways.
arch/RISCV/riscv_helpers_ast2str.h
Outdated
*ps = " , "; \ | ||
*plen = 3 | ||
|
||
static inline void hex_bits(uint64_t bitvec, char **s, size_t *len, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, ignore it. See my comment in the final review message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well done! This is a really nice basis to develop further.
I just pushed a new experimental branch (based on newest next). Please rebase your PR on top of it.
arch/RISCV/riscv_helpers_rvconf.h
Outdated
|
||
typedef struct riscv_conf { | ||
Void2Bool sys_enable_fdext; | ||
Void2Bool sys_enable_zfinx; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doxygen please. Also for the Void2Bool
callback.
…tringfication (printer_t)
…s in riscv.h, changed name to reflect other archs conventio
…p and and rebased on experimental
eff6b64
to
3e533b8
Compare
} | ||
str_len += 2; // for the '0x' in the beginning | ||
|
||
CS_ASSERT(str_len > 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CS_ASSERT(str_len > 0); |
Always true.
#define DEF_HEX_BITS(n) \ | ||
static inline void hex_bits_##n(uint64_t bitvec, SStream *ss, \ | ||
RVContext *ctx) { \ | ||
hex_bits(bitvec, n, ss, ctx); \ | ||
} | ||
|
||
DEF_HEX_BITS(1) | ||
DEF_HEX_BITS(2) | ||
DEF_HEX_BITS(3) | ||
DEF_HEX_BITS(4) | ||
DEF_HEX_BITS(5) | ||
DEF_HEX_BITS(6) | ||
DEF_HEX_BITS(7) | ||
DEF_HEX_BITS(8) | ||
DEF_HEX_BITS(9) | ||
DEF_HEX_BITS(10) | ||
DEF_HEX_BITS(11) | ||
DEF_HEX_BITS(12) | ||
DEF_HEX_BITS(13) | ||
DEF_HEX_BITS(14) | ||
DEF_HEX_BITS(15) | ||
DEF_HEX_BITS(16) | ||
DEF_HEX_BITS(17) | ||
DEF_HEX_BITS(18) | ||
DEF_HEX_BITS(19) | ||
DEF_HEX_BITS(20) | ||
DEF_HEX_BITS(21) | ||
DEF_HEX_BITS(22) | ||
DEF_HEX_BITS(23) | ||
DEF_HEX_BITS(24) | ||
DEF_HEX_BITS(25) | ||
DEF_HEX_BITS(26) | ||
DEF_HEX_BITS(27) | ||
DEF_HEX_BITS(28) | ||
DEF_HEX_BITS(29) | ||
DEF_HEX_BITS(30) | ||
DEF_HEX_BITS(31) | ||
DEF_HEX_BITS(32) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This generates too many functions :D
If you have hex_bits_X()
in the generated code, it is better to pass the number of bits as argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for sleeping on this review. Was finishing up the generator.
Yeah there is a lot of those hex_bits_* functions but their implementations are all simple one-liners that delegate to their mother function, which does take a bit width parameter. I guess I'm doing this in direct imitation to the sail
implementation, which does something very similar using mappings.
I can't control how it's called from the generated code because the generator doesn't understand anything about hex_bits_X
, it just sees an opaque function that it couldn't derieve automatically (only very simple mapping
are parsed and understood as basically string->string tables), so it just emits the call as-is with all the arguments plus an SS
buffer and a context argument. It's up to some human-in-the-loop to implement those opaque functions so that the calls make sense. I implemented them as thin wrappers over a general function which does take the bit width parameter.
char digit = (bitvec & 0xF) + 48; | ||
if (digit > '9') { | ||
digit += ('a' - ':'); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be done with something like "0123456789abcdef"[bitvec & 0xf]
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice, thanks
if (ma) { | ||
SStream_concat(ss, "ma"); | ||
return; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (ma) { | |
SStream_concat(ss, "ma"); | |
return; | |
} | |
if (!ma) { | |
return; | |
} | |
SStream_concat(ss, "ma"); |
General rule, returning early from errors makes the code look cleaner.
#include <stdint.h> | ||
|
||
typedef uint8_t (*Void2Bool)(void); | ||
typedef uint8_t RVBool; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to be only used as bool
. Better replace RVBool
with bool
.
Otherwise the compiler can't apply possible optimizations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there a standard bool
in the C version we're using though? I thought C pre C99 doesn't have bools and even C99 fakes it by a typedef, only C11 or something later introduces a true bool.
Is it okay if I used C99/C11 though knowing that Capstone is compiled for so many architectures and by so many compilers? I remember that I had to change the generator code using binary number literals some time ago because some compilers don't support it.
|
||
// VERY HACKY: use op_str as a temporary buffer to serialize the instruction struct | ||
// so that the printer callback can later de-serialize it in order to stringify it | ||
// alternatives: dynamic memory, global/static variables, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
global/static variables
Highly advice to never get started with them :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah :'D, just wanted to throw all the alternatives out there, wasn't thinking seriously about them.
I guess what I'm doing is kinda okay and isn't that brittle because I'm not serializing that binary buffer anywhere or anytime else, I'm just binary-dumping a struct then reviving it from the binary dump in another function of the same running program. It's still very hacky and could make a lot of people go "wtf" when reading it, but I legit thought about it so hard for days and couldn't find anything better than that or malloc
, and I really hate malloc
.
Your checklist for this pull request
Detailed description
This PR aims to replace the LLVM-derieved RISCV module with a Sail-derieved RISCV module. The generator tool is being developed here, and for the Sail model of RISCV is here.
Sail is an architecture description language being developed here, it's an imperative language inspired in syntax and semantics by OCaml, with some syntax sugar and innovative features designed specifically for describing computer architectures. See here for a detailed tour and explanation of major features.
The RISCV foundation has adopted the Sail model of RISCV as the "official" definition of the architecture, and therefore it's desirable to generate a C implementation of the any RISCV-related logic from the sail-riscv model, as it will be up-to-date and compliant by construction.
Test plan
The current state of the module doesn't compile, this will be updated as work continues on the module. The initial goal of the work is to be able to invoke
cstool
and obtain useful results (e.g. the instruction in string form, as a start). Hopefully this goal is not too far.Closing issues
...