fix: harden xgboost subprocess import isolation#701
Conversation
WalkthroughThe XGBoost scanner is enhanced to perform isolated module loading using importlib.util.find_spec and subprocess execution within temporary directories with cleared PYTHONPATH. Test assertions are updated to verify subprocess invocation patterns, environment isolation, and injected flags. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
📝 Coding Plan
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/scanners/test_xgboost_scanner.py`:
- Around line 382-390: Add a regression test that ensures a local
attacker-controlled module named "xgboost.py" in the current directory is not
executed during scanning: create a temporary "xgboost.py" in the test CWD that
performs a detectable side effect, run the scanner with loading enabled (reusing
the existing mock_subprocess.run setup), and assert the side effect did not
occur while still verifying the existing call expectations (use
mock_subprocess.run.call_args, cmd/script checks and windows_path, and
Path.cwd() comparisons to locate where the scanner was invoked); remove the temp
file after the test to avoid polluting other tests.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: f4e9ce6e-78c8-4d0d-8dc6-434674579f32
📒 Files selected for processing (2)
modelaudit/scanners/xgboost_scanner.pytests/scanners/test_xgboost_scanner.py
| run_args, run_kwargs = mock_subprocess.run.call_args | ||
| cmd = run_args[0] | ||
| script = cmd[2] | ||
| script = cmd[3] | ||
| assert cmd[1] == "-I" | ||
| assert "sys.argv[1]" in script | ||
| assert windows_path not in script | ||
| assert cmd[3] == windows_path | ||
| assert cmd[4] == windows_path | ||
| assert run_kwargs["cwd"] != str(Path.cwd()) | ||
| assert "PYTHONPATH" not in run_kwargs["env"] |
There was a problem hiding this comment.
Add an explicit malicious import-hijack regression test.
These assertions verify invocation wiring, but they don’t prove a local attacker-controlled xgboost.py is not executed during scanning. Please add a test that places a malicious xgboost.py in the current directory, runs scan with loading enabled, and asserts no attacker side effect occurs.
Suggested test addition
+ `@patch`("modelaudit.scanners.xgboost_scanner._check_xgboost_available", return_value=True)
+ def test_xgboost_loading_ignores_local_module_hijack(
+ self,
+ _mock_check_xgb: Mock,
+ monkeypatch: pytest.MonkeyPatch,
+ tmp_path: Path,
+ ) -> None:
+ """Ensure local xgboost.py in cwd is not imported/executed."""
+ hijack_dir = tmp_path / "hijack"
+ hijack_dir.mkdir()
+ (hijack_dir / "xgboost.py").write_text(
+ "from pathlib import Path\n"
+ "Path('hijacked.txt').write_text('pwned')\n"
+ "raise RuntimeError('should not execute')\n",
+ encoding="utf-8",
+ )
+
+ model_path = tmp_path / "model.bst"
+ model_path.write_bytes(b"dummy_xgboost_data")
+ monkeypatch.chdir(hijack_dir)
+
+ scanner = XGBoostScanner({"enable_xgb_loading": True})
+ scanner.scan(str(model_path))
+
+ assert not (hijack_dir / "hijacked.txt").exists()As per coding guidelines: "Preserve or strengthen security detections; test both benign and malicious samples when adding scanner/feature changes."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/scanners/test_xgboost_scanner.py` around lines 382 - 390, Add a
regression test that ensures a local attacker-controlled module named
"xgboost.py" in the current directory is not executed during scanning: create a
temporary "xgboost.py" in the test CWD that performs a detectable side effect,
run the scanner with loading enabled (reusing the existing mock_subprocess.run
setup), and assert the side effect did not occur while still verifying the
existing call expectations (use mock_subprocess.run.call_args, cmd/script checks
and windows_path, and Path.cwd() comparisons to locate where the scanner was
invoked); remove the temp file after the test to avoid polluting other tests.
Motivation
python -csubprocess started withcwd=os.getcwd()and the parent environment, which allowed attacker-controlledxgboost.pyin the working directory orPYTHONPATHto be imported and executed (local import hijack / RCE) whenenable_xgb_loadingwas enabled.Description
import xgboostavailability probe withimportlib.util.find_spec("xgboost")to avoid importing potentially attacker-controlled modules in-process (modelaudit/scanners/xgboost_scanner.py).python -I(isolated mode), removingPYTHONPATHfrom the child env, and executing the child in a temporary safe working directory instead of the repository cwd (modelaudit/scanners/xgboost_scanner.py).-I, path passed via argv, non-repocwd, and sanitized env) (tests/scanners/test_xgboost_scanner.py).modelaudit/scanners/xgboost_scanner.py,tests/scanners/test_xgboost_scanner.py.Testing
uv run ruff format modelaudit/ tests/anduv run ruff check --fix modelaudit/ tests/succeeded with no issues.uv run mypy modelaudit/succeeded with no errors.uv run pytest tests/scanners/test_xgboost_scanner.py -q --maxfail=1passed (21 passed, 3 skipped).uv run pytest -n auto -m "not slow and not integration" --maxfail=1completed but encountered one pre-existing, unrelated failure intests/utils/helpers/test_secure_hasher.py::TestErrorHandling::test_hash_permission_denied(the test did not raiseOSErrorin this environment), so the broader run did not fully pass here but the XGBoost-related tests and checks succeeded.Codex Task
Summary by CodeRabbit