Harden ExternalTensor: hardlink detection and security documentation (#388)

Copilot · titaiwangms · justinchuby · web-flow · commit 511ae6821395 · 2026-04-14T18:35:23.000Z
- [x] Add hardlink detection in `_check_path_containment()` — reject
files with `st_nlink &gt; 1`
- [x] Add tests for hardlink detection (via both `numpy()` and
`tofile()`)
- [x] Create `docs/security.md` documenting the threat model, defenses,
limitations, and divergences from onnx/onnx
- [x] Add code comment documenting the `base_dir=""` bypass in
`_check_path_containment()`
- [x] Fix divergence table: Layer 4 is hardlink count check (✅), note
symlink policy divergence in Layer 2
- [x] Collapse `os.path.exists()` + `os.stat()` into single `try/except`
stat call to avoid TOCTOU
- [x] Clarify `tobytes()` protection is indirect via `_load()`
- [x] Document hardlink collateral (original file also gets
`st_nlink=2`) in Known limitations
- [x] Run linting and tests to validate changes
- [x] Run parallel validation

---------

Co-authored-by: copilot-swe-agent[bot] &lt;198982749+Copilot@users.noreply.github.com&gt;
Co-authored-by: titaiwangms &lt;18010845+titaiwangms@users.noreply.github.com&gt;
Co-authored-by: Justin Chu &lt;justinchuby@users.noreply.github.com&gt;
diff --git a/docs/security.md b/docs/security.md
@@ -0,0 +1,98 @@
+# Security Model for External Data in onnx-ir
+
+This document describes the threat model, implemented defenses, known
+limitations, and design rationale for how **onnx-ir** handles external tensor
+data (the `ExternalTensor` class).
+
+## Threat model
+
+ONNX models can reference external data files via relative paths stored in the
+`location` field of a `TensorProto`. A malicious model could abuse this to
+read arbitrary files on the host. The main attack vectors are:
+
+| Vector | Example |
+|---|---|
+| **Path traversal** | `../../etc/passwd` |
+| **Absolute path** | `/etc/passwd` |
+| **Symlink escape** | `data.bin → /etc/passwd` |
+| **Hardlink smuggling** | Hard-linking a sensitive file into the model directory |
+
+## Implemented defenses
+
+`ExternalTensor._check_path_containment()` enforces a three-layer check
+whenever a non-empty `base_dir` is set:
+
+| Layer | What it does |
+|---|---|
+| **1. String-based containment** | Normalizes the path (without resolving symlinks) and verifies it stays within `base_dir`. Catches `../` traversal and absolute paths without requiring the file to exist. |
+| **2. Realpath containment** | Resolves all symlinks via `os.path.realpath()` and re-checks containment. Catches symlinks whose target is outside `base_dir`. Symlinks that resolve *within* `base_dir` are allowed. |
+| **3. Hardlink detection** | Rejects files with more than one hard link (`st_nlink > 1`). Prevents an attacker from hard-linking a sensitive file into the model directory to bypass the containment boundary. |
+
+All three checks run at **load time** — when `numpy()`, `tofile()`, or
+`tobytes()` (indirectly, via `_load()`) is called — not at construction time.
+This allows safe deserialization of untrusted protos without triggering I/O.
+
+## `base_dir=""` bypass (by design)
+
+When `base_dir` is empty (the default), **all security checks are skipped**.
+This is intentional for two reasons:
+
+1. **Programmatic construction** — when a developer creates an `ExternalTensor`
+   in code, they control the paths directly and do not need containment checks.
+2. **Deserialization safety** — the IR deserializer may create `ExternalTensor`
+   objects before a `base_dir` is known. Containment is only meaningful when
+   the caller sets `base_dir` to the model's directory.
+
+If you are loading an untrusted model, always set `base_dir` to the directory
+containing the model file.
+
+## Divergence from onnx/onnx
+
+The reference ONNX runtime ([onnx/onnx#7717](https://github.com/onnx/onnx/pull/7717))
+implements a four-layer defense:
+
+| Layer | onnx/onnx | onnx-ir | Notes |
+|---|---|---|---|
+| 1. Canonical path containment | ✅ | ✅ | Equivalent |
+| 2. Symlink handling | ✅ reject all | ✅ allow within base | Different policy — see rationale below |
+| 3. `O_NOFOLLOW` on open | ✅ | ❌ | Planned — see *Future hardening* |
+| 4. Hardlink count check | ✅ | ✅ | Equivalent (added in this PR) |
+
+### Why the differences?
+
+* **onnx-ir is a library**, not a runtime. It focuses on safe loading of model
+  data for inspection and transformation, not sandboxed execution.
+* **Symlink policy** — onnx/onnx rejects all final-component symlinks.
+  onnx-ir allows symlinks whose resolved target stays within `base_dir`. This
+  is more permissive but still prevents escape from the containment boundary,
+  and avoids breaking legitimate workflows that use symlinks within the model
+  directory (e.g. shared weight files).
+* **`O_NOFOLLOW`** closes a TOCTOU (time-of-check-to-time-of-use) race between
+  the containment check and the `open()` call. This is a valuable defense-in-depth
+  measure but requires platform-specific code (`os.O_NOFOLLOW` is not available
+  on Windows). It is planned for a future release.
+
+## Known limitations
+
+* **TOCTOU window** — A small race exists between `_check_path_containment()`
+  and the subsequent `open()`. An attacker who can modify the filesystem
+  concurrently could swap a safe file for a symlink after the check passes.
+  Mitigation: use `O_NOFOLLOW` (planned).
+* **`base_dir=""` bypass** — As described above, an empty `base_dir` disables
+  all checks. Callers loading untrusted models must set `base_dir`.
+* **Hardlink detection is best-effort** — The `st_nlink` check only detects
+  hard links at the time of the check. It cannot prevent hard links created
+  after the check. On some filesystems or operating systems, `st_nlink` may
+  not accurately reflect the number of hard links.
+* **Hardlink collateral** — When an attacker creates a hard link to a
+  legitimate data file, both the original and the link get `st_nlink=2`.
+  This means the *original* file also becomes un-loadable until the extra
+  link is removed. This is fail-closed behavior (safe by default), but
+  operators should be aware of it when diagnosing unexpected load failures.
+
+## Future hardening
+
+* **`O_NOFOLLOW` on file open** — Use `os.open()` with `O_NOFOLLOW` to close
+  the TOCTOU window at the kernel level (Linux/macOS).
+* **`_open_validated()` wrapper** — Centralize file-open + security checks so
+  future code paths cannot accidentally bypass containment.
diff --git a/src/onnx_ir/_core.py b/src/onnx_ir/_core.py
@@ -739,19 +739,28 @@ def shape(self) -> Shape:
     def _check_path_containment(self) -> None:
         """Check the path for security violations at load time.
 
-        Performs two checks when ``base_dir`` is non-empty:
+        Performs the following checks when ``base_dir`` is non-empty:
 
         1. String-based containment: the normalized path (without resolving symlinks)
            must stay within ``base_dir``. This catches path traversal sequences like
            ``../../etc/passwd`` without requiring the file to exist.
         2. Realpath containment: the fully-resolved path (symlinks followed) must also
            stay within the fully-resolved ``base_dir``. This catches symlinks that point
            outside ``base_dir``.
+        3. Hardlink detection: if the resolved path exists and has more than one hard
+           link, the load is rejected. An attacker with write access could hard-link a
+           sensitive file into the model directory to bypass containment checks.
 
         Symlinks whose target resolves within ``base_dir`` are permitted.
-        It is a no-op when ``base_dir`` is empty.
+
+        Security: all checks are skipped when ``base_dir`` is empty (the default
+        for programmatic construction). This is by design — see docs/security.md.
         """
         if not self._base_dir:
+            # Security: when base_dir is empty no containment boundary is
+            # defined, so all path checks are skipped.  This is intentional
+            # for programmatic construction where the caller controls the
+            # paths directly.  See docs/security.md for details.
             return
         path = self.path
         # Check 1: string-based path traversal (no filesystem access required).
@@ -778,6 +787,21 @@ def _check_path_containment(self) -> None:
                 f"which is outside the base directory '{base_real}'. "
                 "This may indicate a path traversal attack via a symbolic link."
             )
+        # Check 3: hardlink detection — reject files with multiple hard links.
+        # An attacker with write access could hard-link a sensitive file into the
+        # model directory so it passes the containment checks above.
+        # Uses a single stat call (try/except) to avoid a TOCTOU between
+        # os.path.exists() and os.stat().
+        try:
+            nlink = os.stat(path_real).st_nlink
+        except OSError:
+            nlink = 1  # File doesn't exist yet — skip hardlink check
+        if nlink > 1:
+            raise ValueError(
+                f"External data path '{path}' has multiple hard links "
+                f"(nlink={nlink}). "
+                "This may indicate a hard link attack."
+            )
 
     def _load(self):
         self._check_validity()
diff --git a/src/onnx_ir/_core_test.py b/src/onnx_ir/_core_test.py
@@ -634,6 +634,46 @@ def test_load_raises_on_absolute_location_outside_base_dir(self):
         with self.assertRaisesRegex(ValueError, "path traversal"):
             tensor.numpy()
 
+    def test_load_raises_on_hardlink(self):
+        # Create a real data file inside base_dir
+        real_file = os.path.join(self.base_path, "real_data.bin")
+        with open(real_file, "wb") as f:
+            f.write(self.data.tobytes())
+        # Create a hard link to the same file (also inside base_dir)
+        hardlink_path = os.path.join(self.base_path, "hardlinked.bin")
+        os.link(real_file, hardlink_path)
+        tensor = _core.ExternalTensor(
+            "hardlinked.bin",
+            offset=0,
+            length=len(self.data.tobytes()),
+            dtype=ir.DataType.FLOAT,
+            base_dir=self.base_path,
+            name="input",
+            shape=_core.Shape(list(self.data.shape)),
+        )
+        with self.assertRaisesRegex(ValueError, "hard link"):
+            tensor.numpy()
+
+    def test_tofile_raises_on_hardlink(self):
+        # Create a real data file inside base_dir
+        real_file = os.path.join(self.base_path, "real_data.bin")
+        with open(real_file, "wb") as f:
+            f.write(self.data.tobytes())
+        # Create a hard link to the same file (also inside base_dir)
+        hardlink_path = os.path.join(self.base_path, "hardlinked.bin")
+        os.link(real_file, hardlink_path)
+        tensor = _core.ExternalTensor(
+            "hardlinked.bin",
+            offset=0,
+            length=len(self.data.tobytes()),
+            dtype=ir.DataType.FLOAT,
+            base_dir=self.base_path,
+            name="input",
+            shape=_core.Shape(list(self.data.shape)),
+        )
+        with self.assertRaisesRegex(ValueError, "hard link"):
+            tensor.tofile(io.BytesIO())
+
     def test_release_does_not_invalidate_tensor(self):
         external_tensor = self.model.graph.initializer[0]
         external_info = onnx.external_data_helper.ExternalDataInfo(external_tensor)