-
Notifications
You must be signed in to change notification settings - Fork 3
The matching system explained
If you look inside a json file produced by Carbonara-CLI you can see that for every analyzed procedure there are two fields: vexhash and flowhash.
They are MinHash objects used for the procedure matching system in Carbonara.
- recognition of jump and call instructions
- api recognition
ex. if 0x4000 is the address of the import table entry printf, in the internal state of the translator the address is associated to the API:printf string
- addresses abstraction
ex. each address targeted by a jump outside the procedure is associated to a number that keeps the order
JMP 0x4010 OUT:1
CALL 0x4000 ---> OUT:0
JMP 0x4200 OUT:2-
inside jumps are used to construct a basic blocks graph, the relative address are associated with the number of hops in the graph to reach the jump target
-
flowhash is the minhash of the list generated in the 3 previous steps, where are reported jumps and calls to the associated abstracted addreses in order of comparison.
ex. (considering a function at address 0x6000 and puts got entry at 0x1000)
0x6000: CALL 0x4800 CALL OUT:1
0x6003: ...
...
0x6010: JLE 0x6033 JMP 2
...
0x6030: JMP 0x4200 ---> JMP OUT:0
0x6033: ...
...
0x6060: CALL 0x1000 CALL API:puts
0x6063: JMP 0x6003 JMP 5
0x6066: RET- instructions not supported by the VEX IR (like the x86 hlt) are substituted with placeholders
- VEX IR lifting (IR borrowed from Valgrind)
- register juice (indexing based on register order of appearance)
ex. (i use an assembly-like language because VEX is not very readable)
MOV EAX, 3 MOV R0, 3 MOV ECX, 3
POP EBX ---> POP R1 <--- POP EDI
ADD EAX, EBX ADD R0, R1 ADD ECX, EDI
RET RET RET- substitute the addresses with the abstracted addresses used to build flowhash
- substitute addresses corresponding to the program counter incrementation in the fetch phase with a placeholder
ex. PUT(_PC_) = 0x6040 ---> PUT(_PC_) = _NEXT_
-
group vex instructions in bi-grams to maintain a partial order relationship
-
vexhash is the minhash representing the set of bigrams.