Skip to content

Fix more SyntaxWarnings in Python 3.13#212

Merged
SweetVishnya merged 5 commits intoJonathanSalwan:masterfrom
Ordoviz:syntaxwarnings
Jun 2, 2025
Merged

Fix more SyntaxWarnings in Python 3.13#212
SweetVishnya merged 5 commits intoJonathanSalwan:masterfrom
Ordoviz:syntaxwarnings

Conversation

@Ordoviz
Copy link
Contributor

@Ordoviz Ordoviz commented May 23, 2025

Following #209, this fixes two runtime warnings:

  • SyntaxWarning: invalid escape sequence '\?', and
  • SyntaxWarning: invalid escape sequence '\['

You can see that escaping ? was indeed intentional by considering that the arm instruction blr reg is 32-bit wide and follows this format. I replaced \? with \x3f because chr(0x3f) == '?'.

@nurmukhametov
Copy link
Contributor

Unfortunately, this PR changes behaviour, as CI shows that fewer gadgets are found ending with blr reg:

>>> opcode = b"\x20\x00\x3f\xd6"
>>> [m.start() for m in re.finditer(b"[\x00\x20\x40\x60\x80\xa0\xc0\xe0]{1}[\x00-\x03]{1}\?\xd6", opcode)]
<stdin>:1: SyntaxWarning: invalid escape sequence '\?'
[0]
>>> [m.start() for m in re.finditer(b"[\x00\x20\x40\x60\x80\xa0\xc0\xe0]{1}[\x00-\x03]{1}\x3f\xd6", opcode)]
[]

What about using rb prefix with the unchanged value? It seems to work as expected and doesn't produce any warnings:

>>> [m.start() for m in re.finditer(rb"[\x00\x20\x40\x60\x80\xa0\xc0\xe0]{1}[\x00-\x03]{1}\?\xd6", opcode)]
[0]

@Ordoviz
Copy link
Contributor Author

Ordoviz commented May 23, 2025

What about using rb prefix with the unchanged value?

Good idea. I added the rb prefix to all strings. This should avoid surprises like "[\x0d\x09\x05\x01\x1d\x19\x15\x11\x2d\x29\x25\x21\x3d\x39\x35\x31\x4d\x49\x45\x41\x5d\x59\x55\x51]" == "[\r\t\x05\x01\x1d\x19\x15\x11-)%!=951MIEA]YUQ]" where \x11-) is a regex range expression.

@Ordoviz
Copy link
Contributor Author

Ordoviz commented May 23, 2025

The tests still fail, but now we're finding more gadgets!

@SweetVishnya
Copy link
Collaborator

SweetVishnya commented May 25, 2025

The tests still fail, but now we're finding more gadgets!

Some newly found 'gadgets' are actually not real gadgets, e.g., andi sp, sp, -0x10, as they do not end with a jump/ret instruction. Can you take a look at test results and try to fix the regexes?

@Ordoviz
Copy link
Contributor Author

Ordoviz commented May 25, 2025

I'm trying to figure out why ROPgadget.py --binary test-suite-binaries/elf-Linux-RISCV_32 --dump reports the following gadgets:

0x00010580 : andi sp, sp, -0x10 // 137101ff
0x00010582 : c.bnez a4, -0xe8 // 01ff

The first "gadget" does not end with a jump/ret instruction, so it should not be printed. The second one is fine.
Both are matched by the regex in line 341:

[rb"[\x0d\x09\x05\x01\x1d\x19\x15\x11\x2d\x29\x25\x21\x3d\x39\x35\x31\x4d\x49\x45\x41\x5d\x59\x55\x51][\xa0-\xff]{1}", 2, 1], # c.j | c.beqz | c.bnez

The problem is that this regex has gad_align == 1 so when i == 2, this will find the bad gadget at 0x00010580:

for i in range(self.__options.depth):
    start = ref - (i * gad_align)
    if (sec_vaddr + start) % gad_align == 0:  # always True for gad_align == 1

I don't know how to fix this: We would need to set gad_align = 4 to avoid the "bad" gadget but we cannot set it to a number bigger than 2 or else the second "good" gadget is not printed. Perhaps we need an equivalent of __passCleanX86 for riscv to filter out bad gadgets.

@nurmukhametov
Copy link
Contributor

The first "gadget" does not end with a jump/ret instruction, so it should not be printed. The second one is fine. Both are matched by the regex in line 341:

[rb"[\x0d\x09\x05\x01\x1d\x19\x15\x11\x2d\x29\x25\x21\x3d\x39\x35\x31\x4d\x49\x45\x41\x5d\x59\x55\x51][\xa0-\xff]{1}", 2, 1], # c.j | c.beqz | c.bnez

The problem is that this regex has gad_align == 1 so when i == 2, this will find the bad gadget at 0x00010580:

It seems strange to me that gadgets are specified as [regexp, 2, 1] (with alignment 1) for 16-bit compressed instruction. I would expect that 16-bit compressed instructions must be aligned to 2-byte boundaries. See Section 1.5 in link for details.

@SweetVishnya
Copy link
Collaborator

@Ordoviz, can you try aligning with 2-byte boundaries as @nurmukhametov suggested?

@Ordoviz
Copy link
Contributor Author

Ordoviz commented May 29, 2025

@Ordoviz, can you try aligning with 2-byte boundaries as @nurmukhametov suggested?

This does not help. The "bad gadget" is 2 bytes before the good gadget, so it will be found when i == 1.

@SweetVishnya
Copy link
Collaborator

@Ordoviz, can you try aligning with 2-byte boundaries as @nurmukhametov suggested?

This does not help. The "bad gadget" is 2 bytes before the good gadget, so it will be found when i == 1.

Then can you try adding pass clean as you proposed earllier?

@Ordoviz
Copy link
Contributor Author

Ordoviz commented May 30, 2025

I finally found a good way to filter out the bad RISC-V gadgets. See the commit messages for details. This is ready to merge now.

@Ordoviz
Copy link
Contributor Author

Ordoviz commented May 30, 2025

The CI fails because it uses Python 2 where rb"foobar" is a SyntaxError. Are we still interested in supporting Python 2?

@SweetVishnya
Copy link
Collaborator

Yeah, we're still willing to support python2 as far as possible

Ordoviz added 5 commits May 30, 2025 21:34
Defer parsing of backslash escapes to the regex engine. Previously,
some hex-escaped bytes were parsed as regex metacharacters,
e.g. "[\x41\x2d\x48]" is the range expression "[A-H]".
This filters out gadgets like "andi sp, sp, -0x10", which do not
perform a jump, but whose last two bytes ("01ff" in this case) are
a compressed jump instruction ("c.bnez a4, -0xe8" in this case).

We perform this filtering only on RISC-V because it leads to missed
gadgets in x86, where prefixes can be added to an instruction that change
its size but not its behavior. MIPS is especially problematic because we
set gad_size = 8 in order to display the next instruction in the branch
delay slot. On other architectures this filter makes no difference on
the test suite.
You cannot jump to misaligned instructions. Try it yourself:

$ pwn asm --debug --context riscv64 'lui ra, 0x11; addi ra, ra, 0xf1; ret; .byte 0; nop; nop;'

──────────────────────[ DISASM / rv64 / set emulate on ]──────────────────────
   0x110e8    c.lui  ra, 0x11         RA => 0x11000
   0x110ea    addi   ra, ra, 0xf1     RA => 0x110f1 (0x11000 + 0xf1)
 ► 0x110ee    c.jr   ra                          <0x110f1>
    ↓
   0x110f1    c.nop
   0x110f3    c.nop

After a single step in pwndbg the jump target was rounded down to 0x110f0
and the disassembly changed:

──────────────────────[ DISASM / rv64 / set emulate on ]──────────────────────
 ► 0x110f0    c.addi4spn s0, sp, 0x80
   0x110f2    c.addi4spn s0, sp, 0x80
@Ordoviz
Copy link
Contributor Author

Ordoviz commented May 30, 2025

It turns out that Python2 supports br"foobar" string literals, so fixing the CI was simply a matter of swapping b and r.

@SweetVishnya
Copy link
Collaborator

Thank you! Give me a week to review the changes.

Copy link
Collaborator

@SweetVishnya SweetVishnya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, thank you very much for the fix, I'll merge it

@SweetVishnya SweetVishnya merged commit 1d72f15 into JonathanSalwan:master Jun 2, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants