Merge branch 'main' of github-toolchainz-ssh:toolCHAINZ/crackers into ablation_a

toolCHAINZ · toolCHAINZ · commit 57b6c132298c · 2025-01-24T14:02:17.000Z
diff --git a/README.md b/README.md
@@ -97,11 +97,20 @@ min = 0x7fffffffde00
 max = 0x7ffffffff000
 ```
 
+A successful synthesis will print out a listing of the gadgets were selected.
+
 ## Library Usage
 
-`crackers` can also be used as a library. All of the above settings from the config correspond
+`crackers` intended mode of use is as a library. All of the above settings from the config correspond
 to settings that can be set programmatically by API consumers.
 
+When using the API, rather than getting a listing of gadgets as an output, you get a model of the synthesized chain.
+This model of the chain includes information about what gadgets were selected as well as a Z3 `Model` representing the
+memory at all states of execution in the gadget chain. This model can be queried to derive the memory conditions
+necessary to execute the chain.
+
+### Constraints
+
 Constraints work a little differently with the API. Instead of specifying registers and register equality,
 `crackers` allows consumers to provide a closure of the following types:
 
@@ -128,110 +137,3 @@ The second type is used for asserting read/write invariants. These functions tak
 the bitvector corresponding to the read/write address, as well as the state the read/write is being performed on. 
 Any time a chain reads or writes from memory, the procedure will automatically call these functions and assert the returned
 booleans. This can allow for setting safe/unsafe ranges of memory or even the register space.
-
-## How it Works (Roughly)
-
-### Library generation
-
-A library is taken in and parsed. The executable sections are identified. For every executable section,
-we attempt disassembly at every byte offset. If disassembly succeeds and returns a terminating basic block
-within N instructions (where N is set in the config), then we call that a gadget and save it.
-
-### Candidate selection
-
-`crackers` works by taking in an "example" computation and synthesizing a chain that is compatible with the example.
-So for instance, if you want to call execve on linux, your example computation might look like this:
-
-```
-00000000  4889c7             mov     rdi, rax
-00000003  48c7c03b000000     mov     rax, 0x3b
-0000000a  48c7c600000000     mov     rsi, 0x0
-00000011  48c7c200000000     mov     rdx, 0x0
-00000018  0f05               syscall 
-```
-
-This computation sets `rax`, `rsi`, and `rdx` to set values, and `rdi` to some indeterminate value, which we will
-constrain later.
-
-For each instruction in this computation, we identify a set of "gadget candidates". These candidates are selected out of
-the library we assembled. To be a candidate for an instruction a gadget must pass the following checks:
-* If the specification instruction contains a jump, the gadget must have terminating control flow that is capable of branching
-  to the same destination.
-* The gadget must write to every direct address that the specification instruction writes to.
-* For every indirect access, the gadget must also make an indirect access using the same pointer storage, of at least
-  as many bytes.
-* Taken in isolation, the gadget must be able to have the same effect as the specification instruction:
-  (e.g. `mov eax, ebx` can stand in for `mov eax, 0`, but `mov eax, 2` cannot).
-
-
-### Decision Loop
-
-The overall flow of the procedure is as follows:
-
-* Ask the assignment problem for an assignment
-    * If it returns UNSAT, then no possible assignment exists (under the given parameters) and we return
-    * If it returns SAT, then we send that assignment to the theory solver
-      * If the PCODE theory solver returns SAT, we have a valid chain
-      * If the PCODE theory solver returns UNSAT, it also provides a set of conflict clauses identifying
-        which `decisions` participated in the UNSAT proof. These clauses are communicated back to the assignment
-        problem to allow it to outlaw that combination of `decisions`.
-
-We introduce parallelism to this workflow by running the PCODE theory solvers in threads and generating multiple
-unique assignments for each worker to solve in parallel.
-
-A description of the assignment problem and the theory problem follow: 
-### Assignment Problem Setup
-
-Once all the candidates have been found for all instructions, we check and make sure that we have found
-at least one candidate for each. If any instruction has no candidates, then we immediately return UNSAT. This
-usually indicates that we just did not sample a gadget that touches the needed memory.
-
-For each spec instruction, each candidate is assigned an index. The same gadget can exist as a candidate for multiple indices and
-each copy is treated as logically separate from each other.
-
-Using these indices, we construct a simple boolean SAT problem:
-* We define a `decision` as a tuple `(i: usize, c: usize)`, indicating that index `i` is using choice `c`. A `decision`
-  uniquely identifies a given gadget being used in a given slot.
-* Each decision is mapped to a Z3 Bool.
-* We then construct a boolean SAT problem using these variables with the following constraints:
-  * For all indices `i`:
-    * We make exactly 1 choice `c` (e.g. for every `i`, one AND ONLY ONE `decision` with matching `i` must be true)
-* In the case of the `optimize` solver, we additionally impose a penalty on every `decision`, proportional to the 
-  number of instructions in the gadget. This pressures the solver into selecting the shortest gadgets that it can. 
-
-### PCODE Theory Problem Setup
-
-This procedure runs operates on a single assignment of gadgets. This assignment is evaluated against
-the specification computation, as well as any provided constraints.
-
-First, we form the specification computation into a trace, by asserting state equality between the end state
-of every instruction and the beginning state of its successor.
-
-Then, we do the same for our assignment of gadgets. We tag these assertions as being `memory` assertions.
-
-We assert all preconditions on the initial state of the gadget chain, and all postconditions on the final state of the gadget chain.
-We tag these as `constraint` assertions.
-
-For every instruction `i` and its corresponding gadget `g`:
-* Assert that for all `v` in `output(i)`: `g[v]` = `i[v]`.
-* If `i` has control flow, assert that the control flow of `g` branches to the same destination as `i`
-* These are tagged as `semantic` assertions.
-
-For every gadget `g1` and its successor `g2`:
-* Assert that the address of `g2` is the jump target of `g1`. These are tagged as `branch` assertions. The conflict
-  associated with a `branch` assertions only references `g1` instead of the conjunction of `g1` and `g2` because,
-  as a heuristic, when `g1` is unable to branch to `g2` it is almost always because of some conflict in `g1`, and not
-  something about the address of `g2`
-
-We give all these assertions to `z3` using the `assert_and_track` API, which makes Z3 express the unsat core
-in terms of booleans representing our varying assertions.
-
-If z3 comes back with SAT, then the chain assignment is valid.
-
-If it comes back UNSAT, then we analyze the UNSAT CORE:
-
-* If the UNSAT core is composed only of `memory` and `constraint` assertions, then, as the formulae are currently tracked,
-  we have no way to make strong conflicts ouf of this core. As a fallback, we simply return a clause outlawing the complete
-  assignment.
-* Otherwise, we form a conjunction of all participating `decisions` for all `branch` and `semantic` conflicts
-  and return that to the assignment problem.
diff --git a/libc_execve.toml b/libc_execve.toml
@@ -5,7 +5,7 @@ log_level = "INFO"
 [synthesis]
 strategy = "sat"
 max_candidates_per_slot = 200
-parallel = 7
+parallel = 1
 
 [specification]
 path = "bin/execve.o"
@@ -33,22 +33,5 @@ R13 = 0x0
 R14 = 0x4ae018
 R15 = 0x400538
 
-[constraint.postcondition.register]
-RAX = 0x3b
-RSI = 0
-RDX = 0
-
 [constraint.postcondition.pointer]
 RDI = "/bin/sh"
-
-[[constraint.pointer.read]]
-min = 0x7ffffffde000
-max = 0x7ffffffff000
-
-[[constraint.pointer.write]]
-min = 0x7ffffffde000
-max = 0x7ffffffff000
-
-[[constraint.pointer.write]]
-min = 0x9d000
-max = 0x9d060
diff --git a/mprotect.toml b/mprotect.toml
@@ -0,0 +1,50 @@
+[meta]
+log_level = "DEBUG"
+
+[specification]
+max_instructions = 4
+path = "bin/execve.o"
+
+[library]
+max_gadget_length = 10
+path = "bin/libc_wrapper"
+
+[sleigh]
+ghidra_path = "/Applications/ghidra"
+
+[synthesis]
+strategy = "sat"
+max_candidates_per_slot = 500
+parallel = 1
+combine_instructions = false
+
+[constraint.precondition.register]
+RAX = 0
+RCX = 0x440f30
+RDX = 0x7fffffffe608
+RBX = 0x400538
+RSP = 0x7fffffffe3b8
+RBP = 0x403af0
+RSI = 0x7fffffffe5f8
+RDI = 1
+R8 = 0
+R9 = 6
+R10 = 0x36f8
+R11 = 0x206
+R12 = 0x403b90
+R13 = 0x0
+R14 = 0x4ae018
+R15 = 0x400538
+
+[constraint.postcondition.register]
+RDI = 0xdeadbeef
+RSI = 0x40
+RDX = 0x7b
+
+[[constraint.pointer.read]]
+min = 0x7ffffffde000
+max = 0x7ffffffff000
+
+[[constraint.pointer.write]]
+min = 0x7ffffffde000
+max = 0x7ffffffff000
diff --git a/src/bin/crackers/main.rs b/src/bin/crackers/main.rs
@@ -58,6 +58,7 @@ fn new(path: PathBuf) -> anyhow::Result<()> {
         specification: SpecificationConfig {
             path: "spec.o".to_string(),
             max_instructions: 1,
+            base_address: None,
         },
         library: Default::default(),
         sleigh: SleighConfig {
diff --git a/src/config/constraint.rs b/src/config/constraint.rs
@@ -1,11 +1,12 @@
 use crate::error::CrackersError;
 use crate::synthesis::builder::{StateConstraintGenerator, TransitionConstraintGenerator};
 use jingle::modeling::{ModeledBlock, ModelingContext, State};
-use jingle::sleigh::{IndirectVarNode, RegisterManager, SpaceManager, VarNode};
+use jingle::sleigh::{RegisterManager, SpaceManager, VarNode};
 use jingle::varnode::{ResolvedIndirectVarNode, ResolvedVarnode};
 use jingle::JingleContext;
 use serde::{Deserialize, Serialize};
 use std::collections::HashMap;
+use std::ops::Add;
 use std::sync::Arc;
 use tracing::{event, Level};
 use z3::ast::{Ast, Bool, BV};
@@ -159,25 +160,28 @@ pub fn gen_register_pointer_constraint<'ctx>(
 {
     move |jingle, state, _addr| {
         let m = m.clone();
-        let val = value
-            .as_bytes()
-            .iter()
-            .map(|b| BV::from_u64(jingle.z3, *b as u64, 8))
-            .reduce(|a, b| a.concat(&b))
-            .unwrap();
+        let mut bools = vec![];
+        let pointer = state.read_varnode(&vn)?;
+        for (i,byte) in value.as_bytes().iter().enumerate() {
+            let expected = BV::from_u64(jingle.z3, *byte as u64, 8);
+            let char_ptr = ResolvedVarnode::Indirect(ResolvedIndirectVarNode{
+                // dumb but whatever
+                pointer_location: vn.clone(),
+                pointer: pointer.clone().add(i as u64),
+                access_size_bytes: 1,
+                pointer_space_idx: state.get_code_space_idx()
+            });
+            let actual = state.read_resolved(&char_ptr)?;
+            bools.push(actual._eq(&expected))
+        }
         let pointer = state.read_varnode(&vn)?;
-        let data = state.read_varnode_indirect(&IndirectVarNode {
-            pointer_space_index: state.get_code_space_idx(),
-            access_size_bytes: value.len(),
-            pointer_location: vn.clone(),
-        })?;
         let resolved = ResolvedVarnode::Indirect(ResolvedIndirectVarNode {
             pointer_location: vn.clone(),
             pointer_space_idx: state.get_code_space_idx(),
             access_size_bytes: value.len(),
             pointer,
         });
-        let mut constraint = data._eq(&val);
+        let mut constraint = Bool::and(jingle.z3, &bools);
         if let Some(c) = m.and_then(|m| m.read) {
             let callback = gen_pointer_range_state_invariant(c);
             let cc = callback(jingle, &resolved, state)?;
diff --git a/src/config/specification.rs b/src/config/specification.rs
@@ -14,10 +14,11 @@ use crate::config::sleigh::SleighConfig;
 pub struct SpecificationConfig {
     pub path: String,
     pub max_instructions: usize,
+    pub base_address: Option<u64>,
 }
 
 impl SpecificationConfig {
-    pub fn load_sleigh<'a>(
+    fn load_sleigh<'a>(
         &self,
         sleigh_config: &'a SleighConfig,
     ) -> Result<LoadedSleighContext<'a>, CrackersConfigError> {
@@ -36,9 +37,14 @@ impl SpecificationConfig {
         let _section = gimli_file
             .section_by_name(".text")
             .ok_or(SpecMissingTextSection)?;
-        let sleigh = self.load_sleigh(sleigh_config)?;
+        let mut sleigh = self.load_sleigh(sleigh_config)?;
+        let mut addr = sym.address();
+        if let Some(o) = self.base_address {
+            sleigh.set_base_address(o);
+            addr = addr.wrapping_add(o);
+        }
         let instrs: Vec<Instruction> = sleigh
-            .read_until_branch(sym.address(), self.max_instructions)
+            .read_until_branch(addr, self.max_instructions)
             .collect();
         Ok(instrs)
     }