|
| 1 | +# Determinism Contract |
| 2 | + |
| 3 | +This document defines the determinism contract for SIGNIA: the explicit rules and guarantees that ensure identical inputs produce identical outputs (byte-for-byte), enabling reliable hashing, proofs, and independent verification. |
| 4 | + |
| 5 | +Determinism is not a convenience feature in SIGNIA. It is a security property. |
| 6 | + |
| 7 | +--- |
| 8 | + |
| 9 | +## 1) Contract statement |
| 10 | + |
| 11 | +Given: |
| 12 | +- the same input artifact bytes (or pinned immutable reference) |
| 13 | +- the same SIGNIA version |
| 14 | +- the same plugin set and plugin versions |
| 15 | +- the same normalization policy and configuration |
| 16 | +- the same canonicalization and hashing specifications |
| 17 | + |
| 18 | +SIGNIA must produce: |
| 19 | +- identical canonical `schema.json` bytes |
| 20 | +- identical canonical `manifest.json` bytes (for hashed fields) |
| 21 | +- identical `proof.json` (root and proof material derived from defined leaves) |
| 22 | +- identical schema hash, manifest hash (if used), and proof root |
| 23 | + |
| 24 | +**Same input → same output (byte-for-byte).** |
| 25 | + |
| 26 | +--- |
| 27 | + |
| 28 | +## 2) Determinism scope |
| 29 | + |
| 30 | +Determinism applies to: |
| 31 | + |
| 32 | +- Canonical bytes used for hashing: |
| 33 | + - schema canonical bytes |
| 34 | + - manifest canonical bytes (as specified) |
| 35 | + - proof leaf encodings |
| 36 | +- Hashing: |
| 37 | + - domain-separated hash definitions |
| 38 | + - leaf hashing and Merkle root derivation |
| 39 | +- Ordering: |
| 40 | + - every collection (maps, sets, lists) in hashed domains |
| 41 | +- Normalization: |
| 42 | + - paths, line endings, encoding rules, timestamps |
| 43 | +- Plugin outputs: |
| 44 | + - IR must be deterministic for the same normalized input |
| 45 | + |
| 46 | +Determinism does not require: |
| 47 | +- identical performance metrics |
| 48 | +- identical logs |
| 49 | +- identical non-hashed metadata (unless explicitly specified) |
| 50 | + |
| 51 | +--- |
| 52 | + |
| 53 | +## 3) Inputs: what must be pinned |
| 54 | + |
| 55 | +### 3.1 Immutable references are required for reproducibility |
| 56 | +Acceptable pinning strategies include: |
| 57 | +- commit SHA (for VCS sources) |
| 58 | +- content checksum (for archives and files) |
| 59 | +- explicit versioned releases with checksums |
| 60 | + |
| 61 | +Floating references are allowed only if they are converted into pinned inputs at compile time: |
| 62 | +- branch names (e.g., `main`) |
| 63 | +- mutable URLs |
| 64 | +- “latest” tags |
| 65 | + |
| 66 | +If a floating ref is used, the manifest must record the resolved immutable reference. |
| 67 | + |
| 68 | +### 3.2 Network access policy |
| 69 | +Default: |
| 70 | +- no network access during compilation |
| 71 | + |
| 72 | +If network access is enabled: |
| 73 | +- every fetched input must be content-addressed or pinned |
| 74 | +- caches must not change results |
| 75 | +- the manifest must record the resolved immutable identifiers |
| 76 | + |
| 77 | +--- |
| 78 | + |
| 79 | +## 4) Normalization contract (input canonicalization) |
| 80 | + |
| 81 | +Normalization removes environment variance before parsing. |
| 82 | + |
| 83 | +### 4.1 Paths |
| 84 | +- All paths must be represented in normalized POSIX form using `/` separators in hashed domains. |
| 85 | +- Absolute paths must never appear in hashed domains. |
| 86 | +- Input roots must be mapped to a logical root (e.g., `/` or `repo://`). |
| 87 | + |
| 88 | +### 4.2 Newlines and encoding |
| 89 | +- Text inputs must be normalized to LF (`\n`) for hashing domains. |
| 90 | +- UTF-8 is the canonical encoding. |
| 91 | +- If an input is not valid UTF-8 and the plugin expects text, the plugin must: |
| 92 | + - reject with a deterministic error, or |
| 93 | + - define a deterministic byte-to-text mapping strategy (must be documented). |
| 94 | + |
| 95 | +### 4.3 Timestamps and environment-derived values |
| 96 | +- Wall-clock timestamps must never influence hashed domains. |
| 97 | +- If timestamps exist in inputs (e.g., metadata files), plugins must: |
| 98 | + - ignore them, or |
| 99 | + - normalize them into a deterministic placeholder, or |
| 100 | + - treat them as non-hashed metadata. |
| 101 | + |
| 102 | +### 4.4 Symlinks |
| 103 | +Default recommended policy: |
| 104 | +- deny symlinks |
| 105 | + |
| 106 | +If symlinks are allowed: |
| 107 | +- resolve only within the input root |
| 108 | +- validate canonical path containment |
| 109 | +- define deterministic resolution behavior and record policy version in the manifest |
| 110 | + |
| 111 | +--- |
| 112 | + |
| 113 | +## 5) IR determinism contract |
| 114 | + |
| 115 | +Plugins produce IR. IR is untrusted until validated and canonicalized. |
| 116 | + |
| 117 | +### 5.1 IR must be deterministic |
| 118 | +For the same normalized input and plugin config, plugins must produce identical IR. |
| 119 | + |
| 120 | +Plugins must not: |
| 121 | +- iterate using filesystem order without sorting |
| 122 | +- depend on locale/timezone |
| 123 | +- generate random IDs |
| 124 | +- use nondeterministic concurrency for ordering |
| 125 | +- include host-specific paths or usernames |
| 126 | + |
| 127 | +### 5.2 Stable identities |
| 128 | +Every entity and edge must have a stable identity strategy documented by the plugin. |
| 129 | + |
| 130 | +Examples: |
| 131 | +- entity ID derived from normalized path + kind |
| 132 | +- edge ID derived from (from_id, to_id, relation_type) |
| 133 | + |
| 134 | +### 5.3 Bounded outputs |
| 135 | +Plugins must enforce bounds: |
| 136 | +- maximum nodes/edges |
| 137 | +- maximum attribute sizes |
| 138 | +- maximum recursion depth |
| 139 | + |
| 140 | +Bounds must produce deterministic failures. |
| 141 | + |
| 142 | +--- |
| 143 | + |
| 144 | +## 6) Canonical JSON encoding (byte-level contract) |
| 145 | + |
| 146 | +All hashed JSON documents must be serialized in a canonical way. |
| 147 | + |
| 148 | +### 6.1 Key ordering |
| 149 | +- Object keys must be sorted lexicographically by Unicode code point. |
| 150 | +- No “insertion order” reliance. |
| 151 | + |
| 152 | +### 6.2 Whitespace |
| 153 | +- No insignificant whitespace. |
| 154 | +- No trailing spaces. |
| 155 | +- Use `:` and `,` without extra spaces. |
| 156 | + |
| 157 | +### 6.3 Numbers |
| 158 | +- Integers encoded in base-10 without leading zeros (except `0`). |
| 159 | +- Floats, if allowed, must follow a strict canonical format. |
| 160 | + - Recommended: avoid floats in hashed domains; represent as rational or string if needed. |
| 161 | + |
| 162 | +### 6.4 Strings |
| 163 | +- Use JSON standard escaping. |
| 164 | +- No ambiguous unicode normalization at encoding time unless explicitly defined. |
| 165 | + |
| 166 | +### 6.5 Null/boolean |
| 167 | +- Standard JSON literals: `null`, `true`, `false`. |
| 168 | + |
| 169 | +### 6.6 UTF-8 output |
| 170 | +- Canonical bytes must be UTF-8 encoded. |
| 171 | + |
| 172 | +--- |
| 173 | + |
| 174 | +## 7) Hashing contract |
| 175 | + |
| 176 | +### 7.1 Hash function |
| 177 | +The hash function must be documented and stable for a given major version. |
| 178 | + |
| 179 | +Recommended: |
| 180 | +- SHA-256 or BLAKE3 (choose one per spec; do not mix without domain separation) |
| 181 | + |
| 182 | +The hash function is part of the determinism contract. Changing it requires a version bump. |
| 183 | + |
| 184 | +### 7.2 Domain separation |
| 185 | +Every hash must include a domain tag prefix. |
| 186 | + |
| 187 | +Examples (illustrative): |
| 188 | +- `signia:schema:v1` |
| 189 | +- `signia:manifest:v1` |
| 190 | +- `signia:proof:v1` |
| 191 | +- `signia:leaf:entity:v1` |
| 192 | +- `signia:leaf:edge:v1` |
| 193 | + |
| 194 | +### 7.3 Hash inputs |
| 195 | +Hashes must be computed over canonical bytes. |
| 196 | + |
| 197 | +Rules: |
| 198 | +- never hash in-memory structures without canonical serialization |
| 199 | +- never hash debug outputs |
| 200 | +- never hash non-deterministic representations |
| 201 | + |
| 202 | +--- |
| 203 | + |
| 204 | +## 8) Proof construction contract |
| 205 | + |
| 206 | +### 8.1 Leaf set definition |
| 207 | +Proof leaves must be defined by the spec: |
| 208 | +- what constitutes a leaf |
| 209 | +- how leaves are encoded (canonical bytes) |
| 210 | +- how leaves are ordered |
| 211 | + |
| 212 | +### 8.2 Leaf ordering |
| 213 | +Leaf ordering must be deterministic and stable: |
| 214 | +- sort by (leaf_type, stable_id) or an equivalent stable key |
| 215 | +- define a total ordering (no ties) |
| 216 | + |
| 217 | +### 8.3 Merkle tree construction |
| 218 | +Tree construction must be deterministic: |
| 219 | +- define whether odd leaves are duplicated, promoted, or padded |
| 220 | +- define node hashing domain and concatenation rules |
| 221 | +- define root representation |
| 222 | + |
| 223 | +### 8.4 Proof material |
| 224 | +If inclusion proofs are included: |
| 225 | +- define sibling ordering |
| 226 | +- define direction markers (left/right) deterministically |
| 227 | +- encode proofs canonically (JSON canonical encoding or binary spec) |
| 228 | + |
| 229 | +--- |
| 230 | + |
| 231 | +## 9) Error determinism contract |
| 232 | + |
| 233 | +Failures must be deterministic: |
| 234 | +- same input → same error category and message class |
| 235 | + |
| 236 | +Guidelines: |
| 237 | +- errors should include stable identifiers, not host-dependent paths |
| 238 | +- avoid embedding OS-specific errno strings in stable outputs |
| 239 | +- provide structured error codes for programmatic handling |
| 240 | + |
| 241 | +Non-goal: |
| 242 | +- exact byte-for-byte matching of logs across environments |
| 243 | + |
| 244 | +--- |
| 245 | + |
| 246 | +## 10) Determinism testing requirements |
| 247 | + |
| 248 | +### 10.1 Golden fixtures |
| 249 | +For each plugin and core pipeline: |
| 250 | +- commit at least one realistic fixture |
| 251 | +- commit expected canonical outputs |
| 252 | +- CI must validate byte-for-byte equality |
| 253 | + |
| 254 | +### 10.2 Cross-run checks |
| 255 | +Run compilation twice in CI and compare: |
| 256 | +- schema bytes |
| 257 | +- schema hash |
| 258 | +- proof root |
| 259 | + |
| 260 | +### 10.3 Cross-platform checks |
| 261 | +At minimum: |
| 262 | +- Linux and macOS builds validate determinism fixtures |
| 263 | +- Windows is recommended if path handling is supported |
| 264 | + |
| 265 | +### 10.4 Negative tests |
| 266 | +Mutate bundle files and ensure verification fails: |
| 267 | +- schema tampering |
| 268 | +- manifest tampering |
| 269 | +- proof tampering |
| 270 | + |
| 271 | +--- |
| 272 | + |
| 273 | +## 11) Change control and versioning |
| 274 | + |
| 275 | +Changes that affect determinism require: |
| 276 | +- an explicit version bump where relevant: |
| 277 | + - schema version |
| 278 | + - manifest version |
| 279 | + - proof version |
| 280 | + - hash domain version |
| 281 | +- updated specs and JSON schemas |
| 282 | +- updated fixtures |
| 283 | +- documented migration notes |
| 284 | + |
| 285 | +Security-sensitive changes include: |
| 286 | +- canonical JSON encoding rules |
| 287 | +- ordering rules |
| 288 | +- hashing domains |
| 289 | +- proof leaf definitions |
| 290 | +- path normalization policies |
| 291 | + |
| 292 | +--- |
| 293 | + |
| 294 | +## 12) Consumer obligations |
| 295 | + |
| 296 | +Consumers must: |
| 297 | +- verify bundles before trusting them |
| 298 | +- apply policy for publisher allowlists if needed |
| 299 | +- avoid relying on non-hashed metadata for security decisions |
| 300 | +- pin inputs in CI and record immutable refs |
| 301 | + |
| 302 | +--- |
| 303 | + |
| 304 | +## 13) Summary |
| 305 | + |
| 306 | +SIGNIA’s determinism contract ensures: |
| 307 | +- stable canonical bytes |
| 308 | +- stable hashes and proof roots |
| 309 | +- independent verification without trusting operators |
| 310 | + |
| 311 | +Determinism failures are treated as integrity vulnerabilities and must be fixed with priority. |
0 commit comments