Skip to content

Conversation

hainest
Copy link
Contributor

@hainest hainest commented Sep 23, 2025

Your checklist for this pull request

  • I've documented or updated the documentation of every API function and struct this PR changes.
  • I've added tests that prove my fix is effective or that my feature works (if possible)

Detailed description
Update the read/written registers for the string instructions stos, lods, cmps, movs, ins, and outs. Include complete testing of all instruction permutations.

...

Test plan

Tests are included


This should be the last updates I need to do my work on Dyninst. There might be more, but I hope not. While adding tests is a positive, I don't want to consume much more of your time on the diminishing returns of manually updating these instruction semantics since they will get overwritten if/when a new x86 decoder is adopted.

I'm marking this as a draft so I can rebase it onto the other PRs as they get merged before review begins here.

stosq only reads/writes rcx when there is a rep prefix.

With regard to reading 'es' in 64-bit mode, the SDM says for stosd

  For legacy mode, store EAX at address ES:(E)DI; For 64-bit mode store
  EAX at address RDI or EDI.
With regard to reading 'ds' in 64-bit mode, the SDM says for lodsd

  For legacy mode, Load dword at address DS:(E)SI into EAX. For 64-bit
  mode load dword at address (R)SI into EAX.
X86_REG_ES was removed from the explicit set of read registers because
it was causing duplicate entries in 'detail->regs_read'.

With regard to reading 'es' in 64-bit mode, the SDM says for scasd

  Compare EAX with doubleword at ES:(E)DI or RDI then set status flags.
The 'es' register is implicitly handled correctly. The 'ds' register
should only be read from in 16- and 32-bit mode. From the SDM for cmpsd:

  For legacy mode, compare dword at address DS:(E)SI with dword at
  address ES:(E)DI; For 64-bit mode compare dword at address (R|E)SI
  with dword at address (R|E)DI. The status flags are set accordingly.
The 'es' register is implicitly handled correctly. The 'ds' register
should only be read from in 16- and 32-bit mode. From the SDM for movsd:

  For legacy mode, move dword from address DS:(E)SI to ES:(E)DI. For
  64-bit mode move dword from address (R|E)SI to (R|E)DI.
X86_REG_EDI was removed from the explicit set of read registers because
it was causing duplicate entries in 'detail->regs_read'.

The 'ds' register should only be read from in 16- and 32-bit mode. From
the SDM for insd:

  Input doubleword from I/O port specified in DX into memory location
  specified in ES:(E)DI or RDI.
The 'ds' register should only be read from in 16- and 32-bit mode. From
the SDM for outsd:

  Output word from memory location specified in DS:(E)SI or RSI to I/O
  port specified in DX.
@hainest hainest force-pushed the thaines/x86_string_comp branch from 71488f5 to 2bea2dd Compare October 5, 2025 03:27
@hainest hainest marked this pull request as ready for review October 5, 2025 04:51
@Rot127
Copy link
Collaborator

Rot127 commented Oct 9, 2025

Do you have the script somewhere btw, with which you generate the differences to Zydis? Or are you comparing them hand by hand?

@hainest
Copy link
Contributor Author

hainest commented Oct 9, 2025

Do you have the script somewhere btw, with which you generate the differences to Zydis? Or are you comparing them hand by hand?

It's all manual checking right now. I have a script that can parse most of the output from Zydis, but it's not complete enough to create a formal comparison test yet.

I have started a conversation with the Zydis developers about expanding their testing to include fixed-input tests like these into their workflows. They seem hesitant, but I've only heard from one person. In the interim, it might be more productive to work on setting up differential "fuzzing" with fixed inputs to get at least some degree of comparison. Of course, proper fuzzing with something like mishegos might be a worthwhile long-term goal, as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants