-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Update read/written registers for x86 string instructions #2790
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: next
Are you sure you want to change the base?
Conversation
3ef0ff2
to
71488f5
Compare
stosq only reads/writes rcx when there is a rep prefix. With regard to reading 'es' in 64-bit mode, the SDM says for stosd For legacy mode, store EAX at address ES:(E)DI; For 64-bit mode store EAX at address RDI or EDI.
With regard to reading 'ds' in 64-bit mode, the SDM says for lodsd For legacy mode, Load dword at address DS:(E)SI into EAX. For 64-bit mode load dword at address (R)SI into EAX.
X86_REG_ES was removed from the explicit set of read registers because it was causing duplicate entries in 'detail->regs_read'. With regard to reading 'es' in 64-bit mode, the SDM says for scasd Compare EAX with doubleword at ES:(E)DI or RDI then set status flags.
The 'es' register is implicitly handled correctly. The 'ds' register should only be read from in 16- and 32-bit mode. From the SDM for cmpsd: For legacy mode, compare dword at address DS:(E)SI with dword at address ES:(E)DI; For 64-bit mode compare dword at address (R|E)SI with dword at address (R|E)DI. The status flags are set accordingly.
The 'es' register is implicitly handled correctly. The 'ds' register should only be read from in 16- and 32-bit mode. From the SDM for movsd: For legacy mode, move dword from address DS:(E)SI to ES:(E)DI. For 64-bit mode move dword from address (R|E)SI to (R|E)DI.
X86_REG_EDI was removed from the explicit set of read registers because it was causing duplicate entries in 'detail->regs_read'. The 'ds' register should only be read from in 16- and 32-bit mode. From the SDM for insd: Input doubleword from I/O port specified in DX into memory location specified in ES:(E)DI or RDI.
The 'ds' register should only be read from in 16- and 32-bit mode. From the SDM for outsd: Output word from memory location specified in DS:(E)SI or RSI to I/O port specified in DX.
71488f5
to
2bea2dd
Compare
Do you have the script somewhere btw, with which you generate the differences to Zydis? Or are you comparing them hand by hand? |
It's all manual checking right now. I have a script that can parse most of the output from Zydis, but it's not complete enough to create a formal comparison test yet. I have started a conversation with the Zydis developers about expanding their testing to include fixed-input tests like these into their workflows. They seem hesitant, but I've only heard from one person. In the interim, it might be more productive to work on setting up differential "fuzzing" with fixed inputs to get at least some degree of comparison. Of course, proper fuzzing with something like mishegos might be a worthwhile long-term goal, as well. |
Your checklist for this pull request
Detailed description
Update the read/written registers for the string instructions stos, lods, cmps, movs, ins, and outs. Include complete testing of all instruction permutations.
...
Test plan
Tests are included
This should be the last updates I need to do my work on Dyninst. There might be more, but I hope not. While adding tests is a positive, I don't want to consume much more of your time on the diminishing returns of manually updating these instruction semantics since they will get overwritten if/when a new x86 decoder is adopted.
I'm marking this as a draft so I can rebase it onto the other PRs as they get merged before review begins here.