|
| 1 | +# Security Model for External Data in onnx-ir |
| 2 | + |
| 3 | +This document describes the threat model, implemented defenses, known |
| 4 | +limitations, and design rationale for how **onnx-ir** handles external tensor |
| 5 | +data (the `ExternalTensor` class). |
| 6 | + |
| 7 | +## Threat model |
| 8 | + |
| 9 | +ONNX models can reference external data files via relative paths stored in the |
| 10 | +`location` field of a `TensorProto`. A malicious model could abuse this to |
| 11 | +read arbitrary files on the host. The main attack vectors are: |
| 12 | + |
| 13 | +| Vector | Example | |
| 14 | +|---|---| |
| 15 | +| **Path traversal** | `../../etc/passwd` | |
| 16 | +| **Absolute path** | `/etc/passwd` | |
| 17 | +| **Symlink escape** | `data.bin → /etc/passwd` | |
| 18 | +| **Hardlink smuggling** | Hard-linking a sensitive file into the model directory | |
| 19 | + |
| 20 | +## Implemented defenses |
| 21 | + |
| 22 | +`ExternalTensor._check_path_containment()` enforces a three-layer check |
| 23 | +whenever a non-empty `base_dir` is set: |
| 24 | + |
| 25 | +| Layer | What it does | |
| 26 | +|---|---| |
| 27 | +| **1. String-based containment** | Normalizes the path (without resolving symlinks) and verifies it stays within `base_dir`. Catches `../` traversal and absolute paths without requiring the file to exist. | |
| 28 | +| **2. Realpath containment** | Resolves all symlinks via `os.path.realpath()` and re-checks containment. Catches symlinks whose target is outside `base_dir`. Symlinks that resolve *within* `base_dir` are allowed. | |
| 29 | +| **3. Hardlink detection** | Rejects files with more than one hard link (`st_nlink > 1`). Prevents an attacker from hard-linking a sensitive file into the model directory to bypass the containment boundary. | |
| 30 | + |
| 31 | +All three checks run at **load time** — when `numpy()`, `tofile()`, or |
| 32 | +`tobytes()` (indirectly, via `_load()`) is called — not at construction time. |
| 33 | +This allows safe deserialization of untrusted protos without triggering I/O. |
| 34 | + |
| 35 | +## `base_dir=""` bypass (by design) |
| 36 | + |
| 37 | +When `base_dir` is empty (the default), **all security checks are skipped**. |
| 38 | +This is intentional for two reasons: |
| 39 | + |
| 40 | +1. **Programmatic construction** — when a developer creates an `ExternalTensor` |
| 41 | + in code, they control the paths directly and do not need containment checks. |
| 42 | +2. **Deserialization safety** — the IR deserializer may create `ExternalTensor` |
| 43 | + objects before a `base_dir` is known. Containment is only meaningful when |
| 44 | + the caller sets `base_dir` to the model's directory. |
| 45 | + |
| 46 | +If you are loading an untrusted model, always set `base_dir` to the directory |
| 47 | +containing the model file. |
| 48 | + |
| 49 | +## Divergence from onnx/onnx |
| 50 | + |
| 51 | +The reference ONNX runtime ([onnx/onnx#7717](https://github.com/onnx/onnx/pull/7717)) |
| 52 | +implements a four-layer defense: |
| 53 | + |
| 54 | +| Layer | onnx/onnx | onnx-ir | Notes | |
| 55 | +|---|---|---|---| |
| 56 | +| 1. Canonical path containment | ✅ | ✅ | Equivalent | |
| 57 | +| 2. Symlink handling | ✅ reject all | ✅ allow within base | Different policy — see rationale below | |
| 58 | +| 3. `O_NOFOLLOW` on open | ✅ | ❌ | Planned — see *Future hardening* | |
| 59 | +| 4. Hardlink count check | ✅ | ✅ | Equivalent (added in this PR) | |
| 60 | + |
| 61 | +### Why the differences? |
| 62 | + |
| 63 | +* **onnx-ir is a library**, not a runtime. It focuses on safe loading of model |
| 64 | + data for inspection and transformation, not sandboxed execution. |
| 65 | +* **Symlink policy** — onnx/onnx rejects all final-component symlinks. |
| 66 | + onnx-ir allows symlinks whose resolved target stays within `base_dir`. This |
| 67 | + is more permissive but still prevents escape from the containment boundary, |
| 68 | + and avoids breaking legitimate workflows that use symlinks within the model |
| 69 | + directory (e.g. shared weight files). |
| 70 | +* **`O_NOFOLLOW`** closes a TOCTOU (time-of-check-to-time-of-use) race between |
| 71 | + the containment check and the `open()` call. This is a valuable defense-in-depth |
| 72 | + measure but requires platform-specific code (`os.O_NOFOLLOW` is not available |
| 73 | + on Windows). It is planned for a future release. |
| 74 | + |
| 75 | +## Known limitations |
| 76 | + |
| 77 | +* **TOCTOU window** — A small race exists between `_check_path_containment()` |
| 78 | + and the subsequent `open()`. An attacker who can modify the filesystem |
| 79 | + concurrently could swap a safe file for a symlink after the check passes. |
| 80 | + Mitigation: use `O_NOFOLLOW` (planned). |
| 81 | +* **`base_dir=""` bypass** — As described above, an empty `base_dir` disables |
| 82 | + all checks. Callers loading untrusted models must set `base_dir`. |
| 83 | +* **Hardlink detection is best-effort** — The `st_nlink` check only detects |
| 84 | + hard links at the time of the check. It cannot prevent hard links created |
| 85 | + after the check. On some filesystems or operating systems, `st_nlink` may |
| 86 | + not accurately reflect the number of hard links. |
| 87 | +* **Hardlink collateral** — When an attacker creates a hard link to a |
| 88 | + legitimate data file, both the original and the link get `st_nlink=2`. |
| 89 | + This means the *original* file also becomes un-loadable until the extra |
| 90 | + link is removed. This is fail-closed behavior (safe by default), but |
| 91 | + operators should be aware of it when diagnosing unexpected load failures. |
| 92 | + |
| 93 | +## Future hardening |
| 94 | + |
| 95 | +* **`O_NOFOLLOW` on file open** — Use `os.open()` with `O_NOFOLLOW` to close |
| 96 | + the TOCTOU window at the kernel level (Linux/macOS). |
| 97 | +* **`_open_validated()` wrapper** — Centralize file-open + security checks so |
| 98 | + future code paths cannot accidentally bypass containment. |
0 commit comments