Fix more SyntaxWarnings in Python 3.13 by Ordoviz · Pull Request #212 · JonathanSalwan/ROPgadget

Ordoviz · 2025-05-23T15:42:03Z

Following #209, this fixes two runtime warnings:

SyntaxWarning: invalid escape sequence '\?', and
SyntaxWarning: invalid escape sequence '\['

You can see that escaping ? was indeed intentional by considering that the arm instruction blr reg is 32-bit wide and follows this format. I replaced \? with \x3f because chr(0x3f) == '?'.

nurmukhametov · 2025-05-23T17:28:11Z

Unfortunately, this PR changes behaviour, as CI shows that fewer gadgets are found ending with blr reg:

>>> opcode = b"\x20\x00\x3f\xd6"
>>> [m.start() for m in re.finditer(b"[\x00\x20\x40\x60\x80\xa0\xc0\xe0]{1}[\x00-\x03]{1}\?\xd6", opcode)]
<stdin>:1: SyntaxWarning: invalid escape sequence '\?'
[0]
>>> [m.start() for m in re.finditer(b"[\x00\x20\x40\x60\x80\xa0\xc0\xe0]{1}[\x00-\x03]{1}\x3f\xd6", opcode)]
[]

What about using rb prefix with the unchanged value? It seems to work as expected and doesn't produce any warnings:

>>> [m.start() for m in re.finditer(rb"[\x00\x20\x40\x60\x80\xa0\xc0\xe0]{1}[\x00-\x03]{1}\?\xd6", opcode)]
[0]

Ordoviz · 2025-05-23T18:39:35Z

What about using rb prefix with the unchanged value?

Good idea. I added the rb prefix to all strings. This should avoid surprises like "[\x0d\x09\x05\x01\x1d\x19\x15\x11\x2d\x29\x25\x21\x3d\x39\x35\x31\x4d\x49\x45\x41\x5d\x59\x55\x51]" == "[\r\t\x05\x01\x1d\x19\x15\x11-)%!=951MIEA]YUQ]" where \x11-) is a regex range expression.

Ordoviz · 2025-05-23T18:46:09Z

The tests still fail, but now we're finding more gadgets!

SweetVishnya · 2025-05-25T07:06:16Z

The tests still fail, but now we're finding more gadgets!

Some newly found 'gadgets' are actually not real gadgets, e.g., andi sp, sp, -0x10, as they do not end with a jump/ret instruction. Can you take a look at test results and try to fix the regexes?

Ordoviz · 2025-05-25T11:06:15Z

I'm trying to figure out why ROPgadget.py --binary test-suite-binaries/elf-Linux-RISCV_32 --dump reports the following gadgets:

0x00010580 : andi sp, sp, -0x10 // 137101ff
0x00010582 : c.bnez a4, -0xe8 // 01ff

The first "gadget" does not end with a jump/ret instruction, so it should not be printed. The second one is fine.
Both are matched by the regex in line 341:

[rb"[\x0d\x09\x05\x01\x1d\x19\x15\x11\x2d\x29\x25\x21\x3d\x39\x35\x31\x4d\x49\x45\x41\x5d\x59\x55\x51][\xa0-\xff]{1}", 2, 1], # c.j | c.beqz | c.bnez

The problem is that this regex has gad_align == 1 so when i == 2, this will find the bad gadget at 0x00010580:

for i in range(self.__options.depth):
    start = ref - (i * gad_align)
    if (sec_vaddr + start) % gad_align == 0:  # always True for gad_align == 1

I don't know how to fix this: We would need to set gad_align = 4 to avoid the "bad" gadget but we cannot set it to a number bigger than 2 or else the second "good" gadget is not printed. Perhaps we need an equivalent of __passCleanX86 for riscv to filter out bad gadgets.

nurmukhametov · 2025-05-26T16:52:54Z

The first "gadget" does not end with a jump/ret instruction, so it should not be printed. The second one is fine. Both are matched by the regex in line 341:
[rb"[\x0d\x09\x05\x01\x1d\x19\x15\x11\x2d\x29\x25\x21\x3d\x39\x35\x31\x4d\x49\x45\x41\x5d\x59\x55\x51][\xa0-\xff]{1}", 2, 1], # c.j | c.beqz | c.bnez
The problem is that this regex has gad_align == 1 so when i == 2, this will find the bad gadget at 0x00010580:

It seems strange to me that gadgets are specified as [regexp, 2, 1] (with alignment 1) for 16-bit compressed instruction. I would expect that 16-bit compressed instructions must be aligned to 2-byte boundaries. See Section 1.5 in link for details.

SweetVishnya · 2025-05-28T15:33:19Z

@Ordoviz, can you try aligning with 2-byte boundaries as @nurmukhametov suggested?

Ordoviz · 2025-05-29T06:31:34Z

@Ordoviz, can you try aligning with 2-byte boundaries as @nurmukhametov suggested?

This does not help. The "bad gadget" is 2 bytes before the good gadget, so it will be found when i == 1.

SweetVishnya · 2025-05-29T06:42:46Z

@Ordoviz, can you try aligning with 2-byte boundaries as @nurmukhametov suggested?

This does not help. The "bad gadget" is 2 bytes before the good gadget, so it will be found when i == 1.

Then can you try adding pass clean as you proposed earllier?

Ordoviz · 2025-05-30T18:28:04Z

I finally found a good way to filter out the bad RISC-V gadgets. See the commit messages for details. This is ready to merge now.

Ordoviz · 2025-05-30T19:08:09Z

The CI fails because it uses Python 2 where rb"foobar" is a SyntaxError. Are we still interested in supporting Python 2?

SweetVishnya · 2025-05-30T19:16:56Z

Yeah, we're still willing to support python2 as far as possible

Defer parsing of backslash escapes to the regex engine. Previously, some hex-escaped bytes were parsed as regex metacharacters, e.g. "[\x41\x2d\x48]" is the range expression "[A-H]".

This filters out gadgets like "andi sp, sp, -0x10", which do not perform a jump, but whose last two bytes ("01ff" in this case) are a compressed jump instruction ("c.bnez a4, -0xe8" in this case). We perform this filtering only on RISC-V because it leads to missed gadgets in x86, where prefixes can be added to an instruction that change its size but not its behavior. MIPS is especially problematic because we set gad_size = 8 in order to display the next instruction in the branch delay slot. On other architectures this filter makes no difference on the test suite.

You cannot jump to misaligned instructions. Try it yourself: $ pwn asm --debug --context riscv64 'lui ra, 0x11; addi ra, ra, 0xf1; ret; .byte 0; nop; nop;' ──────────────────────[ DISASM / rv64 / set emulate on ]────────────────────── 0x110e8 c.lui ra, 0x11 RA => 0x11000 0x110ea addi ra, ra, 0xf1 RA => 0x110f1 (0x11000 + 0xf1) ► 0x110ee c.jr ra <0x110f1> ↓ 0x110f1 c.nop 0x110f3 c.nop After a single step in pwndbg the jump target was rounded down to 0x110f0 and the disassembly changed: ──────────────────────[ DISASM / rv64 / set emulate on ]────────────────────── ► 0x110f0 c.addi4spn s0, sp, 0x80 0x110f2 c.addi4spn s0, sp, 0x80

Ordoviz · 2025-05-30T19:37:14Z

It turns out that Python2 supports br"foobar" string literals, so fixing the CI was simply a matter of swapping b and r.

SweetVishnya · 2025-05-30T20:06:30Z

Thank you! Give me a week to review the changes.

SweetVishnya

Yeah, thank you very much for the fix, I'll merge it

Ordoviz force-pushed the syntaxwarnings branch from 7ec7760 to d2939f0 Compare May 23, 2025 18:34

Ordoviz force-pushed the syntaxwarnings branch from d2939f0 to 04f93b0 Compare May 30, 2025 18:27

Ordoviz force-pushed the syntaxwarnings branch from 04f93b0 to bda91b8 Compare May 30, 2025 18:33

Ordoviz added 5 commits May 30, 2025 21:34

Fix more SyntaxWarnings in Python 3.13

c2169b2

Use raw strings for regexes

6d0b45e

Defer parsing of backslash escapes to the regex engine. Previously, some hex-escaped bytes were parsed as regex metacharacters, e.g. "[\x41\x2d\x48]" is the range expression "[A-H]".

Update expected test output

4a1184a

Ordoviz force-pushed the syntaxwarnings branch from bda91b8 to 4a1184a Compare May 30, 2025 19:35

SweetVishnya approved these changes Jun 2, 2025

View reviewed changes

SweetVishnya merged commit 1d72f15 into JonathanSalwan:master Jun 2, 2025
1 check passed

Conversation

Ordoviz commented May 23, 2025

Uh oh!

nurmukhametov commented May 23, 2025

Uh oh!

Ordoviz commented May 23, 2025

Uh oh!

Ordoviz commented May 23, 2025

Uh oh!

SweetVishnya commented May 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Ordoviz commented May 25, 2025

Uh oh!

nurmukhametov commented May 26, 2025

Uh oh!

SweetVishnya commented May 28, 2025

Uh oh!

Ordoviz commented May 29, 2025

Uh oh!

SweetVishnya commented May 29, 2025

Uh oh!

Ordoviz commented May 30, 2025

Uh oh!

Ordoviz commented May 30, 2025

Uh oh!

SweetVishnya commented May 30, 2025

Uh oh!

Ordoviz commented May 30, 2025

Uh oh!

SweetVishnya commented May 30, 2025

Uh oh!

SweetVishnya left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

SweetVishnya commented May 25, 2025 •

edited

Loading