Skip to content

Commit 511ae68

Browse files
Copilottitaiwangmsjustinchuby
authored
Harden ExternalTensor: hardlink detection and security documentation (#388)
- [x] Add hardlink detection in `_check_path_containment()` — reject files with `st_nlink > 1` - [x] Add tests for hardlink detection (via both `numpy()` and `tofile()`) - [x] Create `docs/security.md` documenting the threat model, defenses, limitations, and divergences from onnx/onnx - [x] Add code comment documenting the `base_dir=""` bypass in `_check_path_containment()` - [x] Fix divergence table: Layer 4 is hardlink count check (✅), note symlink policy divergence in Layer 2 - [x] Collapse `os.path.exists()` + `os.stat()` into single `try/except` stat call to avoid TOCTOU - [x] Clarify `tobytes()` protection is indirect via `_load()` - [x] Document hardlink collateral (original file also gets `st_nlink=2`) in Known limitations - [x] Run linting and tests to validate changes - [x] Run parallel validation --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: titaiwangms <18010845+titaiwangms@users.noreply.github.com> Co-authored-by: Justin Chu <justinchuby@users.noreply.github.com>
1 parent 1c550ac commit 511ae68

File tree

3 files changed

+164
-2
lines changed

3 files changed

+164
-2
lines changed

docs/security.md

Lines changed: 98 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,98 @@
1+
# Security Model for External Data in onnx-ir
2+
3+
This document describes the threat model, implemented defenses, known
4+
limitations, and design rationale for how **onnx-ir** handles external tensor
5+
data (the `ExternalTensor` class).
6+
7+
## Threat model
8+
9+
ONNX models can reference external data files via relative paths stored in the
10+
`location` field of a `TensorProto`. A malicious model could abuse this to
11+
read arbitrary files on the host. The main attack vectors are:
12+
13+
| Vector | Example |
14+
|---|---|
15+
| **Path traversal** | `../../etc/passwd` |
16+
| **Absolute path** | `/etc/passwd` |
17+
| **Symlink escape** | `data.bin → /etc/passwd` |
18+
| **Hardlink smuggling** | Hard-linking a sensitive file into the model directory |
19+
20+
## Implemented defenses
21+
22+
`ExternalTensor._check_path_containment()` enforces a three-layer check
23+
whenever a non-empty `base_dir` is set:
24+
25+
| Layer | What it does |
26+
|---|---|
27+
| **1. String-based containment** | Normalizes the path (without resolving symlinks) and verifies it stays within `base_dir`. Catches `../` traversal and absolute paths without requiring the file to exist. |
28+
| **2. Realpath containment** | Resolves all symlinks via `os.path.realpath()` and re-checks containment. Catches symlinks whose target is outside `base_dir`. Symlinks that resolve *within* `base_dir` are allowed. |
29+
| **3. Hardlink detection** | Rejects files with more than one hard link (`st_nlink > 1`). Prevents an attacker from hard-linking a sensitive file into the model directory to bypass the containment boundary. |
30+
31+
All three checks run at **load time** — when `numpy()`, `tofile()`, or
32+
`tobytes()` (indirectly, via `_load()`) is called — not at construction time.
33+
This allows safe deserialization of untrusted protos without triggering I/O.
34+
35+
## `base_dir=""` bypass (by design)
36+
37+
When `base_dir` is empty (the default), **all security checks are skipped**.
38+
This is intentional for two reasons:
39+
40+
1. **Programmatic construction** — when a developer creates an `ExternalTensor`
41+
in code, they control the paths directly and do not need containment checks.
42+
2. **Deserialization safety** — the IR deserializer may create `ExternalTensor`
43+
objects before a `base_dir` is known. Containment is only meaningful when
44+
the caller sets `base_dir` to the model's directory.
45+
46+
If you are loading an untrusted model, always set `base_dir` to the directory
47+
containing the model file.
48+
49+
## Divergence from onnx/onnx
50+
51+
The reference ONNX runtime ([onnx/onnx#7717](https://github.com/onnx/onnx/pull/7717))
52+
implements a four-layer defense:
53+
54+
| Layer | onnx/onnx | onnx-ir | Notes |
55+
|---|---|---|---|
56+
| 1. Canonical path containment ||| Equivalent |
57+
| 2. Symlink handling | ✅ reject all | ✅ allow within base | Different policy — see rationale below |
58+
| 3. `O_NOFOLLOW` on open ||| Planned — see *Future hardening* |
59+
| 4. Hardlink count check ||| Equivalent (added in this PR) |
60+
61+
### Why the differences?
62+
63+
* **onnx-ir is a library**, not a runtime. It focuses on safe loading of model
64+
data for inspection and transformation, not sandboxed execution.
65+
* **Symlink policy** — onnx/onnx rejects all final-component symlinks.
66+
onnx-ir allows symlinks whose resolved target stays within `base_dir`. This
67+
is more permissive but still prevents escape from the containment boundary,
68+
and avoids breaking legitimate workflows that use symlinks within the model
69+
directory (e.g. shared weight files).
70+
* **`O_NOFOLLOW`** closes a TOCTOU (time-of-check-to-time-of-use) race between
71+
the containment check and the `open()` call. This is a valuable defense-in-depth
72+
measure but requires platform-specific code (`os.O_NOFOLLOW` is not available
73+
on Windows). It is planned for a future release.
74+
75+
## Known limitations
76+
77+
* **TOCTOU window** — A small race exists between `_check_path_containment()`
78+
and the subsequent `open()`. An attacker who can modify the filesystem
79+
concurrently could swap a safe file for a symlink after the check passes.
80+
Mitigation: use `O_NOFOLLOW` (planned).
81+
* **`base_dir=""` bypass** — As described above, an empty `base_dir` disables
82+
all checks. Callers loading untrusted models must set `base_dir`.
83+
* **Hardlink detection is best-effort** — The `st_nlink` check only detects
84+
hard links at the time of the check. It cannot prevent hard links created
85+
after the check. On some filesystems or operating systems, `st_nlink` may
86+
not accurately reflect the number of hard links.
87+
* **Hardlink collateral** — When an attacker creates a hard link to a
88+
legitimate data file, both the original and the link get `st_nlink=2`.
89+
This means the *original* file also becomes un-loadable until the extra
90+
link is removed. This is fail-closed behavior (safe by default), but
91+
operators should be aware of it when diagnosing unexpected load failures.
92+
93+
## Future hardening
94+
95+
* **`O_NOFOLLOW` on file open** — Use `os.open()` with `O_NOFOLLOW` to close
96+
the TOCTOU window at the kernel level (Linux/macOS).
97+
* **`_open_validated()` wrapper** — Centralize file-open + security checks so
98+
future code paths cannot accidentally bypass containment.

src/onnx_ir/_core.py

Lines changed: 26 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -739,19 +739,28 @@ def shape(self) -> Shape:
739739
def _check_path_containment(self) -> None:
740740
"""Check the path for security violations at load time.
741741
742-
Performs two checks when ``base_dir`` is non-empty:
742+
Performs the following checks when ``base_dir`` is non-empty:
743743
744744
1. String-based containment: the normalized path (without resolving symlinks)
745745
must stay within ``base_dir``. This catches path traversal sequences like
746746
``../../etc/passwd`` without requiring the file to exist.
747747
2. Realpath containment: the fully-resolved path (symlinks followed) must also
748748
stay within the fully-resolved ``base_dir``. This catches symlinks that point
749749
outside ``base_dir``.
750+
3. Hardlink detection: if the resolved path exists and has more than one hard
751+
link, the load is rejected. An attacker with write access could hard-link a
752+
sensitive file into the model directory to bypass containment checks.
750753
751754
Symlinks whose target resolves within ``base_dir`` are permitted.
752-
It is a no-op when ``base_dir`` is empty.
755+
756+
Security: all checks are skipped when ``base_dir`` is empty (the default
757+
for programmatic construction). This is by design — see docs/security.md.
753758
"""
754759
if not self._base_dir:
760+
# Security: when base_dir is empty no containment boundary is
761+
# defined, so all path checks are skipped. This is intentional
762+
# for programmatic construction where the caller controls the
763+
# paths directly. See docs/security.md for details.
755764
return
756765
path = self.path
757766
# Check 1: string-based path traversal (no filesystem access required).
@@ -778,6 +787,21 @@ def _check_path_containment(self) -> None:
778787
f"which is outside the base directory '{base_real}'. "
779788
"This may indicate a path traversal attack via a symbolic link."
780789
)
790+
# Check 3: hardlink detection — reject files with multiple hard links.
791+
# An attacker with write access could hard-link a sensitive file into the
792+
# model directory so it passes the containment checks above.
793+
# Uses a single stat call (try/except) to avoid a TOCTOU between
794+
# os.path.exists() and os.stat().
795+
try:
796+
nlink = os.stat(path_real).st_nlink
797+
except OSError:
798+
nlink = 1 # File doesn't exist yet — skip hardlink check
799+
if nlink > 1:
800+
raise ValueError(
801+
f"External data path '{path}' has multiple hard links "
802+
f"(nlink={nlink}). "
803+
"This may indicate a hard link attack."
804+
)
781805

782806
def _load(self):
783807
self._check_validity()

src/onnx_ir/_core_test.py

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -634,6 +634,46 @@ def test_load_raises_on_absolute_location_outside_base_dir(self):
634634
with self.assertRaisesRegex(ValueError, "path traversal"):
635635
tensor.numpy()
636636

637+
def test_load_raises_on_hardlink(self):
638+
# Create a real data file inside base_dir
639+
real_file = os.path.join(self.base_path, "real_data.bin")
640+
with open(real_file, "wb") as f:
641+
f.write(self.data.tobytes())
642+
# Create a hard link to the same file (also inside base_dir)
643+
hardlink_path = os.path.join(self.base_path, "hardlinked.bin")
644+
os.link(real_file, hardlink_path)
645+
tensor = _core.ExternalTensor(
646+
"hardlinked.bin",
647+
offset=0,
648+
length=len(self.data.tobytes()),
649+
dtype=ir.DataType.FLOAT,
650+
base_dir=self.base_path,
651+
name="input",
652+
shape=_core.Shape(list(self.data.shape)),
653+
)
654+
with self.assertRaisesRegex(ValueError, "hard link"):
655+
tensor.numpy()
656+
657+
def test_tofile_raises_on_hardlink(self):
658+
# Create a real data file inside base_dir
659+
real_file = os.path.join(self.base_path, "real_data.bin")
660+
with open(real_file, "wb") as f:
661+
f.write(self.data.tobytes())
662+
# Create a hard link to the same file (also inside base_dir)
663+
hardlink_path = os.path.join(self.base_path, "hardlinked.bin")
664+
os.link(real_file, hardlink_path)
665+
tensor = _core.ExternalTensor(
666+
"hardlinked.bin",
667+
offset=0,
668+
length=len(self.data.tobytes()),
669+
dtype=ir.DataType.FLOAT,
670+
base_dir=self.base_path,
671+
name="input",
672+
shape=_core.Shape(list(self.data.shape)),
673+
)
674+
with self.assertRaisesRegex(ValueError, "hard link"):
675+
tensor.tofile(io.BytesIO())
676+
637677
def test_release_does_not_invalidate_tensor(self):
638678
external_tensor = self.model.graph.initializer[0]
639679
external_info = onnx.external_data_helper.ExternalDataInfo(external_tensor)

0 commit comments

Comments
 (0)