@@ -97,11 +97,20 @@ min = 0x7fffffffde00
9797max = 0x7ffffffff000
9898```
9999
100+ A successful synthesis will print out a listing of the gadgets were selected.
101+
100102## Library Usage
101103
102- ` crackers ` can also be used as a library. All of the above settings from the config correspond
104+ ` crackers ` intended mode of use is as a library. All of the above settings from the config correspond
103105to settings that can be set programmatically by API consumers.
104106
107+ When using the API, rather than getting a listing of gadgets as an output, you get a model of the synthesized chain.
108+ This model of the chain includes information about what gadgets were selected as well as a Z3 ` Model ` representing the
109+ memory at all states of execution in the gadget chain. This model can be queried to derive the memory conditions
110+ necessary to execute the chain.
111+
112+ ### Constraints
113+
105114Constraints work a little differently with the API. Instead of specifying registers and register equality,
106115` crackers ` allows consumers to provide a closure of the following types:
107116
@@ -128,110 +137,3 @@ The second type is used for asserting read/write invariants. These functions tak
128137the bitvector corresponding to the read/write address, as well as the state the read/write is being performed on.
129138Any time a chain reads or writes from memory, the procedure will automatically call these functions and assert the returned
130139booleans. This can allow for setting safe/unsafe ranges of memory or even the register space.
131-
132- ## How it Works (Roughly)
133-
134- ### Library generation
135-
136- A library is taken in and parsed. The executable sections are identified. For every executable section,
137- we attempt disassembly at every byte offset. If disassembly succeeds and returns a terminating basic block
138- within N instructions (where N is set in the config), then we call that a gadget and save it.
139-
140- ### Candidate selection
141-
142- ` crackers ` works by taking in an "example" computation and synthesizing a chain that is compatible with the example.
143- So for instance, if you want to call execve on linux, your example computation might look like this:
144-
145- ```
146- 00000000 4889c7 mov rdi, rax
147- 00000003 48c7c03b000000 mov rax, 0x3b
148- 0000000a 48c7c600000000 mov rsi, 0x0
149- 00000011 48c7c200000000 mov rdx, 0x0
150- 00000018 0f05 syscall
151- ```
152-
153- This computation sets ` rax ` , ` rsi ` , and ` rdx ` to set values, and ` rdi ` to some indeterminate value, which we will
154- constrain later.
155-
156- For each instruction in this computation, we identify a set of "gadget candidates". These candidates are selected out of
157- the library we assembled. To be a candidate for an instruction a gadget must pass the following checks:
158- * If the specification instruction contains a jump, the gadget must have terminating control flow that is capable of branching
159- to the same destination.
160- * The gadget must write to every direct address that the specification instruction writes to.
161- * For every indirect access, the gadget must also make an indirect access using the same pointer storage, of at least
162- as many bytes.
163- * Taken in isolation, the gadget must be able to have the same effect as the specification instruction:
164- (e.g. ` mov eax, ebx ` can stand in for ` mov eax, 0 ` , but ` mov eax, 2 ` cannot).
165-
166-
167- ### Decision Loop
168-
169- The overall flow of the procedure is as follows:
170-
171- * Ask the assignment problem for an assignment
172- * If it returns UNSAT, then no possible assignment exists (under the given parameters) and we return
173- * If it returns SAT, then we send that assignment to the theory solver
174- * If the PCODE theory solver returns SAT, we have a valid chain
175- * If the PCODE theory solver returns UNSAT, it also provides a set of conflict clauses identifying
176- which ` decisions ` participated in the UNSAT proof. These clauses are communicated back to the assignment
177- problem to allow it to outlaw that combination of ` decisions ` .
178-
179- We introduce parallelism to this workflow by running the PCODE theory solvers in threads and generating multiple
180- unique assignments for each worker to solve in parallel.
181-
182- A description of the assignment problem and the theory problem follow:
183- ### Assignment Problem Setup
184-
185- Once all the candidates have been found for all instructions, we check and make sure that we have found
186- at least one candidate for each. If any instruction has no candidates, then we immediately return UNSAT. This
187- usually indicates that we just did not sample a gadget that touches the needed memory.
188-
189- For each spec instruction, each candidate is assigned an index. The same gadget can exist as a candidate for multiple indices and
190- each copy is treated as logically separate from each other.
191-
192- Using these indices, we construct a simple boolean SAT problem:
193- * We define a ` decision ` as a tuple ` (i: usize, c: usize) ` , indicating that index ` i ` is using choice ` c ` . A ` decision `
194- uniquely identifies a given gadget being used in a given slot.
195- * Each decision is mapped to a Z3 Bool.
196- * We then construct a boolean SAT problem using these variables with the following constraints:
197- * For all indices ` i ` :
198- * We make exactly 1 choice ` c ` (e.g. for every ` i ` , one AND ONLY ONE ` decision ` with matching ` i ` must be true)
199- * In the case of the ` optimize ` solver, we additionally impose a penalty on every ` decision ` , proportional to the
200- number of instructions in the gadget. This pressures the solver into selecting the shortest gadgets that it can.
201-
202- ### PCODE Theory Problem Setup
203-
204- This procedure runs operates on a single assignment of gadgets. This assignment is evaluated against
205- the specification computation, as well as any provided constraints.
206-
207- First, we form the specification computation into a trace, by asserting state equality between the end state
208- of every instruction and the beginning state of its successor.
209-
210- Then, we do the same for our assignment of gadgets. We tag these assertions as being ` memory ` assertions.
211-
212- We assert all preconditions on the initial state of the gadget chain, and all postconditions on the final state of the gadget chain.
213- We tag these as ` constraint ` assertions.
214-
215- For every instruction ` i ` and its corresponding gadget ` g ` :
216- * Assert that for all ` v ` in ` output(i) ` : ` g[v] ` = ` i[v] ` .
217- * If ` i ` has control flow, assert that the control flow of ` g ` branches to the same destination as ` i `
218- * These are tagged as ` semantic ` assertions.
219-
220- For every gadget ` g1 ` and its successor ` g2 ` :
221- * Assert that the address of ` g2 ` is the jump target of ` g1 ` . These are tagged as ` branch ` assertions. The conflict
222- associated with a ` branch ` assertions only references ` g1 ` instead of the conjunction of ` g1 ` and ` g2 ` because,
223- as a heuristic, when ` g1 ` is unable to branch to ` g2 ` it is almost always because of some conflict in ` g1 ` , and not
224- something about the address of ` g2 `
225-
226- We give all these assertions to ` z3 ` using the ` assert_and_track ` API, which makes Z3 express the unsat core
227- in terms of booleans representing our varying assertions.
228-
229- If z3 comes back with SAT, then the chain assignment is valid.
230-
231- If it comes back UNSAT, then we analyze the UNSAT CORE:
232-
233- * If the UNSAT core is composed only of ` memory ` and ` constraint ` assertions, then, as the formulae are currently tracked,
234- we have no way to make strong conflicts ouf of this core. As a fallback, we simply return a clause outlawing the complete
235- assignment.
236- * Otherwise, we form a conjunction of all participating ` decisions ` for all ` branch ` and ` semantic ` conflicts
237- and return that to the assignment problem.
0 commit comments