Description
Is your feature request related to a problem? Please describe.
While reverse engineering a PowerPC program, I regularly come across the rlwinm
instruction. This instruction (documented on page 501 of 6xx_pem.pdf) can perform a bit-rotate followed by a binary AND
. This instruction is quite heavily used, for example for bit-rotating an integer, extracting bits from a bitfield and bit-shifting an integer. Additionally, the instruction frequently features in compiler optimisations to avoid branches.
Unfortunately, the current pcode implementation of this instruction uses a combination of INT_OR
, INT_SUB
, INT_LEFT
and INT_RIGHT
. While this is functionally correct, when the instruction is used as part of a larger structure that can be optimised, the decompiler often fails to recognise the rotate and thus the opportunity for optimisation.
I noticed bit-rotation is quite a common operation across the architectures currently supported by Ghidra. For instance, PowerPC has rlwinm
, ARM has ror
, x86 has ROR
, MIPS32 has ROTR
, z80 has RL
and 68k has ROR
. While many architectures include separate instructions for left-rotate and right-rotate, only one rotate instruction is necessary, since the other instruction can be modeled by negating the shift amount. Most of these architectures have several not just one, but several instructions that perform a bit rotate. As such, this new PCode instruction would be widely applicable.
Additionally, having a separate instruction for bit rotation allows multiple decompiler rules to simplify patterns involving bit rotations without each having to attempt to recognise the relatively complex structure that arises from the sequence. Then, there could be a single rule to convert the sequence of shifts to the new rotate instruction, which avoids several different rules having to detect bit-rotations. This would avoid code duplication, and it would also be more efficient.
Finally, INT_ROTATE(x, y)
is arguably much clearer to human analysts than the equivalent (x << y) | (x >> (32 - y))
, especially if x
and y
are more complicated expressions.
Describe the solution you'd like
I would like a new PCode op to be added, that takes 2 varnodes as parameters. I propose the name INT_ROTATE( V, n )
. The first parameter is the value being rotated, and the second varnode specifies the rotation amount. This would be equivalent to (V << n) | (V >> (8 * |V| - n))
, where |V|
denotes the size of the varnode V
in bytes.
Describe alternatives you've considered
An alternative is to not introduce this new PCode operation, and let the decompiler rules that need to deal with bit rotations just detect those cases themselves. However, I think that will lead to code duplication and less efficient decompilation. The decompiled output will also be harder to understand for analysts whenever bit rotations are not simplified away.
Additional context
An example of some PowerPC instructions that Ghidra currently struggles to simplify:
7c 00 00 34 cntlzw r0, r0
38 60 00 01 li r3, 0x1
5c 63 07 fe rlwnm r3, r3, r0, 0x1f, 0x1f
4e 80 00 20 blr
This corresponds to return r0 < 0
(edit: oops, it's actually return r0 <= 0
- just goes to show how tricky these expressions are for humans to understand), but ghidra currently decompiles this to:
return (bool)(((byte)(1 << (LZCOUNT(r0) & 0x1fU))
| (byte)(1 >> 0x20 - (LZCOUNT(r0) &
0x1fU))) & 1);
While the decompiled code is functionally correct, it is nearly impossible to see that the code is testing for the sign bit of r0
. The decompiler misses a few simplification steps:
- The rotate of
1
byLZCOUNT(r0) & 0x1fU
, followed by the& 1
, which is then cast to a bool, can be simplified toLZCOUNT(r0) & 0x1fU == 0
, since only a rotate amount of0
can ensure a1
bit ends up in the ones-position of the result. LZCOUNT(r0) & 0x1fU
can be simplified to justLZCOUNT(r0) == 0 || LZCOUNT(r0) == 32
, since the varnode representingr0
is 4 bytes big, soLZCOUNT
can never return a value that has any bits masked off.LZCOUNT(r0) == 0
implies thatr0 s>> 31 != 0
, sincer0
is 32 bits wide.LZCOUNT(r0) == 32
impliesr0 == 0
, sincer0
is 32 bits wide.
The decompiler can simplify r0 s>> 31 != 0
to r0 < 0
by RuleTestSign
. The expression r0 < 0 || r0 == 0
is then simplified to r0 <= 0
by RuleLessEqual
.
Note that step 1 would be much simpler to recognise and implement a rule for if there was a dedicated INT_ROTATE
PCode instruction. There would be no need for step 2 if the semantics of INT_ROTATE
were that it automatically masks the rotation amount by the number of bits in the value to be rotated, since the & 0x1f
is introduced by the Pcode translation of rlwnm
only because this masking is required for the rlwnm
instruction. And if there is no INT_AND
in the raw Pcode, there is also no need for the decompiler to simplify it away.