Skip to content

fix: harden xgboost subprocess import isolation#701

Merged
mldangelo merged 1 commit intomainfrom
codex/propose-fix-for-xgboost-import-hijack
Mar 13, 2026
Merged

fix: harden xgboost subprocess import isolation#701
mldangelo merged 1 commit intomainfrom
codex/propose-fix-for-xgboost-import-hijack

Conversation

@mldangelo
Copy link
Member

@mldangelo mldangelo commented Mar 13, 2026

Motivation

  • The XGBoost loader used a python -c subprocess started with cwd=os.getcwd() and the parent environment, which allowed attacker-controlled xgboost.py in the working directory or PYTHONPATH to be imported and executed (local import hijack / RCE) when enable_xgb_loading was enabled.
  • The change aims to prevent import hijacking while preserving the optional loader behavior and existing test coverage.

Description

  • Replace in-process import xgboost availability probe with importlib.util.find_spec("xgboost") to avoid importing potentially attacker-controlled modules in-process (modelaudit/scanners/xgboost_scanner.py).
  • Harden the subprocess used to test loading by running python -I (isolated mode), removing PYTHONPATH from the child env, and executing the child in a temporary safe working directory instead of the repository cwd (modelaudit/scanners/xgboost_scanner.py).
  • Update XGBoost scanner unit test to assert the new command layout and isolation behavior (expect -I, path passed via argv, non-repo cwd, and sanitized env) (tests/scanners/test_xgboost_scanner.py).
  • Files modified: modelaudit/scanners/xgboost_scanner.py, tests/scanners/test_xgboost_scanner.py.

Testing

  • Ran format and lint checks: uv run ruff format modelaudit/ tests/ and uv run ruff check --fix modelaudit/ tests/ succeeded with no issues.
  • Static typing: uv run mypy modelaudit/ succeeded with no errors.
  • Targeted unit tests: uv run pytest tests/scanners/test_xgboost_scanner.py -q --maxfail=1 passed (21 passed, 3 skipped).
  • Full non-integration test run: uv run pytest -n auto -m "not slow and not integration" --maxfail=1 completed but encountered one pre-existing, unrelated failure in tests/utils/helpers/test_secure_hasher.py::TestErrorHandling::test_hash_permission_denied (the test did not raise OSError in this environment), so the broader run did not fully pass here but the XGBoost-related tests and checks succeeded.

Codex Task

Summary by CodeRabbit

  • Bug Fixes
    • Enhanced XGBoost model scanning for improved reliability and stability during model loading.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 13, 2026

Walkthrough

The XGBoost scanner is enhanced to perform isolated module loading using importlib.util.find_spec and subprocess execution within temporary directories with cleared PYTHONPATH. Test assertions are updated to verify subprocess invocation patterns, environment isolation, and injected flags.

Changes

Cohort / File(s) Summary
Scanner Implementation
modelaudit/scanners/xgboost_scanner.py
Replaced direct try/except import with importlib.util.find_spec check. Enhanced subprocess execution to run in isolated temporary directory with injected "-I" flag and cleared PYTHONPATH environment variable. Updated subprocess invocation wiring to include safe working directory and adjusted environment parameters.
Test Assertions
tests/scanners/test_xgboost_scanner.py
Updated mock assertions to verify subprocess invocation details: corrected cmd array indices (script now at [3], path at [4]), added "-I" flag verification at [1], and introduced assertions validating isolated execution context (non-current working directory, PYTHONPATH absent from environment).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 With tempfiles and isolation bright,
The XGBoost loader runs just right,
In -I flags we trust and hide,
PYTHONPATH swept from our side,
A safer sandbox, pure delight! ✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the primary change: hardening XGBoost subprocess import isolation by addressing security vulnerabilities through isolation measures.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch codex/propose-fix-for-xgboost-import-hijack
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/scanners/test_xgboost_scanner.py`:
- Around line 382-390: Add a regression test that ensures a local
attacker-controlled module named "xgboost.py" in the current directory is not
executed during scanning: create a temporary "xgboost.py" in the test CWD that
performs a detectable side effect, run the scanner with loading enabled (reusing
the existing mock_subprocess.run setup), and assert the side effect did not
occur while still verifying the existing call expectations (use
mock_subprocess.run.call_args, cmd/script checks and windows_path, and
Path.cwd() comparisons to locate where the scanner was invoked); remove the temp
file after the test to avoid polluting other tests.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: f4e9ce6e-78c8-4d0d-8dc6-434674579f32

📥 Commits

Reviewing files that changed from the base of the PR and between 698859d and b7f5a2e.

📒 Files selected for processing (2)
  • modelaudit/scanners/xgboost_scanner.py
  • tests/scanners/test_xgboost_scanner.py

Comment on lines +382 to +390
run_args, run_kwargs = mock_subprocess.run.call_args
cmd = run_args[0]
script = cmd[2]
script = cmd[3]
assert cmd[1] == "-I"
assert "sys.argv[1]" in script
assert windows_path not in script
assert cmd[3] == windows_path
assert cmd[4] == windows_path
assert run_kwargs["cwd"] != str(Path.cwd())
assert "PYTHONPATH" not in run_kwargs["env"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add an explicit malicious import-hijack regression test.

These assertions verify invocation wiring, but they don’t prove a local attacker-controlled xgboost.py is not executed during scanning. Please add a test that places a malicious xgboost.py in the current directory, runs scan with loading enabled, and asserts no attacker side effect occurs.

Suggested test addition
+    `@patch`("modelaudit.scanners.xgboost_scanner._check_xgboost_available", return_value=True)
+    def test_xgboost_loading_ignores_local_module_hijack(
+        self,
+        _mock_check_xgb: Mock,
+        monkeypatch: pytest.MonkeyPatch,
+        tmp_path: Path,
+    ) -> None:
+        """Ensure local xgboost.py in cwd is not imported/executed."""
+        hijack_dir = tmp_path / "hijack"
+        hijack_dir.mkdir()
+        (hijack_dir / "xgboost.py").write_text(
+            "from pathlib import Path\n"
+            "Path('hijacked.txt').write_text('pwned')\n"
+            "raise RuntimeError('should not execute')\n",
+            encoding="utf-8",
+        )
+
+        model_path = tmp_path / "model.bst"
+        model_path.write_bytes(b"dummy_xgboost_data")
+        monkeypatch.chdir(hijack_dir)
+
+        scanner = XGBoostScanner({"enable_xgb_loading": True})
+        scanner.scan(str(model_path))
+
+        assert not (hijack_dir / "hijacked.txt").exists()

As per coding guidelines: "Preserve or strengthen security detections; test both benign and malicious samples when adding scanner/feature changes."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/scanners/test_xgboost_scanner.py` around lines 382 - 390, Add a
regression test that ensures a local attacker-controlled module named
"xgboost.py" in the current directory is not executed during scanning: create a
temporary "xgboost.py" in the test CWD that performs a detectable side effect,
run the scanner with loading enabled (reusing the existing mock_subprocess.run
setup), and assert the side effect did not occur while still verifying the
existing call expectations (use mock_subprocess.run.call_args, cmd/script checks
and windows_path, and Path.cwd() comparisons to locate where the scanner was
invoked); remove the temp file after the test to avoid polluting other tests.

@mldangelo mldangelo merged commit 2df2d78 into main Mar 13, 2026
26 checks passed
@mldangelo mldangelo deleted the codex/propose-fix-for-xgboost-import-hijack branch March 13, 2026 23:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant