|
| 1 | +# mutation_contract.md (v0.1) |
| 2 | + |
| 3 | +## purpose |
| 4 | + |
| 5 | +define the **only** allowed way to mutate a pipeline: a deterministic, refusal-first transformation from `sans.ir` → `sans.ir'`, with stable diagnostics and machine-readable diffs. this contract is the agent-facing “edit api.” |
| 6 | + |
| 7 | +## scope |
| 8 | + |
| 9 | +* **input surface:** `sans.ir` only. `plan.ir` is **witness-only** and **never accepted** as mutation input. |
| 10 | +* mutation is **ir-to-ir**, deterministic, side-effect free. |
| 11 | +* no “helpful” inference. ambiguity is refusal. |
| 12 | + |
| 13 | +## terms |
| 14 | + |
| 15 | +* **intent ir (`sans.ir`)**: authoritative plan-of-intent; allowed as input; mutable via contract. |
| 16 | +* **witness ir (`plan.ir`)**: executed plan witness; output-only; used for verification/audit; immutable. |
| 17 | +* **amendment**: structured request describing one or more edits to `sans.ir`. |
| 18 | +* **target**: the specific ir element(s) the amendment acts on. |
| 19 | +* **selector**: the mechanism for identifying targets (ids or symbolic references). |
| 20 | +* **refusal**: deterministic rejection with stable error code + payload. |
| 21 | + |
| 22 | +## required invariants |
| 23 | + |
| 24 | +all successful mutations MUST preserve: |
| 25 | + |
| 26 | +1. **determinism invariants** |
| 27 | + |
| 28 | + * ids computed only from canonical json of semantic content (no timestamps, paths, host info). |
| 29 | + * stable ordering rules preserved (sort/null semantics remain explicit). |
| 30 | +2. **explicitness** |
| 31 | + |
| 32 | + * no implicit “current table” or hidden state introduced by mutation. |
| 33 | + * all rewiring is explicit in ir. |
| 34 | +3. **soundness discipline** |
| 35 | + |
| 36 | + * if mutation introduces `approx`, it must be explicit and gated by policy (kernel can refuse by default). |
| 37 | +4. **refusal over guessing** |
| 38 | + |
| 39 | + * ambiguous targets, ambiguous column refs, ambiguous rewires => refusal. |
| 40 | + |
| 41 | +## mutation pipeline (kernel primitive) |
| 42 | + |
| 43 | +mutation is defined as this pure function: |
| 44 | + |
| 45 | +`apply_amendment(ir_in: sans.ir, amendment: AmendmentRequest) -> MutationResult` |
| 46 | + |
| 47 | +where: |
| 48 | + |
| 49 | +* `MutationResult` contains either: |
| 50 | + |
| 51 | + * `status="ok"` with `ir_out` + diff artifacts, or |
| 52 | + * `status="refused"` with stable error payload(s) |
| 53 | + |
| 54 | +no file io. no network. no randomness. |
| 55 | + |
| 56 | +## selectors and targeting |
| 57 | + |
| 58 | +mutation targets MUST be resolved deterministically. resolution MUST fail (refuse) if: |
| 59 | + |
| 60 | +* a selector matches 0 targets (unless op explicitly allows “create if missing”) |
| 61 | +* a selector matches >1 target and the op is not explicitly multi-target |
| 62 | +* resolution depends on runtime data |
| 63 | + |
| 64 | +### primary selectors (preferred) |
| 65 | + |
| 66 | +* `step_id` (application identity) |
| 67 | +* `transform_id` (semantic identity) |
| 68 | + |
| 69 | +these are unambiguous and stable under reformatting. |
| 70 | + |
| 71 | +### secondary selectors (allowed as sugar) |
| 72 | + |
| 73 | +* `table_name` |
| 74 | +* `column_name` |
| 75 | + |
| 76 | +secondary selectors MUST resolve to ids during mutation and are subject to ambiguity refusal (e.g., column exists in multiple tables in scope). |
| 77 | + |
| 78 | +### selector resolution order |
| 79 | + |
| 80 | +if multiple selectors are provided for the same op, all MUST agree; otherwise refusal. |
| 81 | + |
| 82 | +example: specifying both `step_id` and `table_name` must point to the same resolved step; mismatch => refusal. |
| 83 | + |
| 84 | +## mutation operations (v0.1 allowlist) |
| 85 | + |
| 86 | +mutation is an allowlist. anything not listed is unsupported (capability refusal). |
| 87 | + |
| 88 | +### structural ops |
| 89 | + |
| 90 | +* **add_step** |
| 91 | + |
| 92 | + * adds a new step with explicit inputs/outputs and params |
| 93 | + * requires explicit insertion point (before/after a step_id) or explicit index |
| 94 | +* **remove_step** |
| 95 | + |
| 96 | + * removes a step (requires policy gate; default refuse unless `allow_destructive=true`) |
| 97 | +* **replace_step** |
| 98 | + |
| 99 | + * replaces a step’s transform spec while preserving wiring (inputs/outputs unchanged) unless explicitly allowed |
| 100 | +* **rewire_inputs** |
| 101 | + |
| 102 | + * change a step’s `inputs[]` |
| 103 | +* **rewire_outputs** |
| 104 | + |
| 105 | + * change a step’s `outputs[]` (high risk; policy-gated) |
| 106 | +* **rename_table** |
| 107 | + |
| 108 | + * renames a logical table; updates references across steps |
| 109 | + |
| 110 | +### param/expr ops |
| 111 | + |
| 112 | +* **set_params** |
| 113 | + |
| 114 | + * sets specific op params by path (json pointer-like) |
| 115 | +* **replace_expr** |
| 116 | + |
| 117 | + * replaces a whole expression ast at a specified param path |
| 118 | +* **edit_expr** |
| 119 | + |
| 120 | + * small, typed edits to ast nodes (e.g. replace literal, replace column ref) |
| 121 | + * must preserve ast validity; no stringly-typed eval |
| 122 | + |
| 123 | +### assertion ops |
| 124 | + |
| 125 | +* **add_assertion** |
| 126 | +* **remove_assertion** |
| 127 | +* **replace_assertion** |
| 128 | +* **set_assertion_policy** |
| 129 | + |
| 130 | + * e.g., severity / enforcement mode if your assertion model supports it |
| 131 | + |
| 132 | +## hard prohibitions |
| 133 | + |
| 134 | +mutation MUST refuse if it would: |
| 135 | + |
| 136 | +* introduce dynamic execution (`eval`, macro expansion, runtime codegen) |
| 137 | +* introduce references to unknown tables/columns when schema is known and required |
| 138 | +* weaken determinism guarantees (e.g., unspecified sort tie behavior) |
| 139 | +* change identity computation rules |
| 140 | +* silently coerce types without explicit cast op |
| 141 | +* introduce implicit ordering dependencies without explicit sortedness facts/assertions |
| 142 | + |
| 143 | +## validation stages |
| 144 | + |
| 145 | +mutation processing has three stages: |
| 146 | + |
| 147 | +1. **schema validation (request shape)** |
| 148 | + |
| 149 | + * amendment request parses, discriminated unions resolve, caps enforced. |
| 150 | + * errors here are `E_AMEND_VALIDATION_*` style (stable). |
| 151 | +2. **target resolution** |
| 152 | + |
| 153 | + * resolve selectors to concrete ir entities. |
| 154 | + * ambiguity => refusal. |
| 155 | +3. **post-mutation ir validation** |
| 156 | + |
| 157 | + * validate the mutated `sans.ir'` using the same invariants as compile-time validation: |
| 158 | + |
| 159 | + * tables exist before use |
| 160 | + * columns exist when schema known |
| 161 | + * outputs don’t collide |
| 162 | + * join/compare require keys |
| 163 | + * order-dependent ops require explicit order facts/assertions |
| 164 | + * if validation fails, the mutation is refused and `ir_out` is not emitted. |
| 165 | + |
| 166 | +## outputs (mutation result artifact contract) |
| 167 | + |
| 168 | +on success, mutation returns: |
| 169 | + |
| 170 | +* `ir_out` (`sans.ir'`) |
| 171 | +* `diff.structural.json` (required) |
| 172 | +* `diff.assertions.json` (required; may be empty) |
| 173 | +* `diagnostics.json` (required; warnings allowed) |
| 174 | + |
| 175 | +### diff.structural.json (minimum) |
| 176 | + |
| 177 | +must include: |
| 178 | + |
| 179 | +```json |
| 180 | +{ |
| 181 | + "format": "sans.mutation.diff.structural", |
| 182 | + "version": 1, |
| 183 | + "base_ir_sha256": "<hex>", |
| 184 | + "mutated_ir_sha256": "<hex>", |
| 185 | + "ops_applied": [ |
| 186 | + { "op_id": "...", "kind": "replace_expr", "target": { ... }, "status": "ok" } |
| 187 | + ], |
| 188 | + "affected": { |
| 189 | + "steps": ["<step_id>", "..."], |
| 190 | + "tables": ["<name>", "..."], |
| 191 | + "transforms_added": ["<transform_id>", "..."], |
| 192 | + "transforms_removed": ["<transform_id>", "..."], |
| 193 | + "transforms_changed": [ |
| 194 | + { "before": "<transform_id>", "after": "<transform_id>" } |
| 195 | + ] |
| 196 | + } |
| 197 | +} |
| 198 | +``` |
| 199 | + |
| 200 | +notes: |
| 201 | + |
| 202 | +* `base_ir_sha256` and `mutated_ir_sha256` are hashes of canonical json of `sans.ir`. |
| 203 | +* `transforms_changed` is semantic; it must reflect semantic id changes, not formatting. |
| 204 | + |
| 205 | +### diff.assertions.json (minimum) |
| 206 | + |
| 207 | +```json |
| 208 | +{ |
| 209 | + "format": "sans.mutation.diff.assertions", |
| 210 | + "version": 1, |
| 211 | + "added": [ ... ], |
| 212 | + "removed": [ ... ], |
| 213 | + "modified": [ |
| 214 | + { "before": { ... }, "after": { ... } } |
| 215 | + ] |
| 216 | +} |
| 217 | +``` |
| 218 | + |
| 219 | +### diagnostics.json (minimum) |
| 220 | + |
| 221 | +```json |
| 222 | +{ |
| 223 | + "format": "sans.mutation.diagnostics", |
| 224 | + "version": 1, |
| 225 | + "status": "ok|refused", |
| 226 | + "refusals": [ |
| 227 | + { "code": "...", "message": "...", "loc": { ... }, "hint": "...", "meta": { ... } } |
| 228 | + ], |
| 229 | + "warnings": [ |
| 230 | + { "code": "...", "message": "...", "loc": { ... }, "meta": { ... } } |
| 231 | + ] |
| 232 | +} |
| 233 | +``` |
| 234 | + |
| 235 | +## refusal codes (contract-grade) |
| 236 | + |
| 237 | +refusals MUST use stable string codes. exit codes are coarse; string codes are the contract. |
| 238 | + |
| 239 | +minimum set (v0.1): |
| 240 | + |
| 241 | +* `E_AMEND_VALIDATION_SCHEMA` (request doesn’t parse / violates schema) |
| 242 | +* `E_AMEND_CAPABILITY_UNSUPPORTED` (op not in allowlist / feature gated) |
| 243 | +* `E_AMEND_TARGET_NOT_FOUND` |
| 244 | +* `E_AMEND_TARGET_AMBIGUOUS` |
| 245 | +* `E_AMEND_TARGET_MISMATCH` (multiple selectors disagree) |
| 246 | +* `E_AMEND_POLICY_DESTRUCTIVE_REFUSED` (remove/rewire outputs blocked) |
| 247 | +* `E_AMEND_IR_INVALID` (post-mutation ir validation failed) |
| 248 | +* `E_AMEND_IR_INVARIANT_BREACH` (determinism/identity/soundness invariant would be violated) |
| 249 | + |
| 250 | +## versioning |
| 251 | + |
| 252 | +* this contract is versioned independently: `sans.mutation.contract = 0.1`. |
| 253 | +* `amendment_request` must declare `contract_version`. |
| 254 | +* any breaking change increments minor at least; ideally semver. |
| 255 | + |
| 256 | +## canonicalization requirements |
| 257 | + |
| 258 | +all hashing and identity comparisons MUST use: |
| 259 | + |
| 260 | +* utf-8 |
| 261 | +* json canonical encoding: `sort_keys=true`, separators `(",", ":")` |
| 262 | +* stable list ordering as stored |
| 263 | +* paths and `loc` are non-semantic and MUST NOT affect semantic ids |
| 264 | + |
| 265 | +## success criteria |
| 266 | + |
| 267 | +a mutation system is “correct” when: |
| 268 | + |
| 269 | +* same `sans.ir` + same amendment_request => byte-identical `sans.ir'` and diffs |
| 270 | +* ambiguity always refuses |
| 271 | +* post-mutation ir is always valid or refused |
| 272 | +* no mutation can smuggle nondeterminism or hidden state |
0 commit comments