Skip to content

Introduce whitelisted folders for external data validation#27352

Open
yuslepukhin wants to merge 5 commits intomainfrom
yuslepukhin/whitelist_data_folders
Open

Introduce whitelisted folders for external data validation#27352
yuslepukhin wants to merge 5 commits intomainfrom
yuslepukhin/whitelist_data_folders

Conversation

@yuslepukhin
Copy link
Member

Description

Many customers reported that they prefer to store external data in locations other than model folder PR
Previous security change disabled that possibility. PR #26776.
This PR introduces a new API that sets whitelisted folders option. Data stored under those folders or their subfolders would still be allowed.

Motivation and Context

qdrant/fastembed#603

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds support for loading TensorProto external data from user-configured “whitelisted” directories (in addition to the model directory), addressing the prior security hardening that restricted external data to the model folder.

Changes:

  • Introduces a new C/C++ SessionOptions API to configure semicolon-separated whitelisted external-data directories.
  • Adds parsing/validation logic for whitelist paths and extends external data path validation to allow matches under whitelisted folders.
  • Updates graph/session code paths and expands unit tests for whitelist behavior.

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
onnxruntime/test/ir/graph_test.cc Updates call site for new Graph::ConvertInitializersIntoOrtValues signature.
onnxruntime/test/framework/tensorutils_test.cc Adds extensive tests for whitelist-aware path validation and whitelist parsing.
onnxruntime/core/session/provider_bridge_ort.cc Removes shared-provider host bridge for external data path validation.
onnxruntime/core/session/ort_apis.h Adds C API entrypoint declaration for SessionOptionsSetWhiteListedDataFolders.
onnxruntime/core/session/onnxruntime_c_api.cc Registers the new C API function pointer and updates version asserts (currently problematic).
onnxruntime/core/session/inference_session.cc Parses whitelist from SessionOptions and passes it into initializer conversion.
onnxruntime/core/session/abi_session_options.cc Implements SessionOptionsSetWhiteListedDataFolders (currently rejects nullptr).
onnxruntime/core/providers/shared_library/provider_interfaces.h Removes ProviderHost virtual for validating external data paths.
onnxruntime/core/providers/shared_library/provider_api.h Removes shared-provider wrapper for external data path validation.
onnxruntime/core/graph/graph.cc Extends initializer conversion to validate external paths against whitelist.
onnxruntime/core/framework/tensorprotoutils.h Adds ParseWhiteListedPaths and extends ValidateExternalDataPath signature.
onnxruntime/core/framework/tensorprotoutils.cc Implements whitelist parsing and whitelist-aware external data path validation.
onnxruntime/core/framework/session_options.h Adds SessionOptions::whitelisted_data_folders storage.
include/onnxruntime/core/session/onnxruntime_cxx_inline.h Adds C++ wrapper SessionOptions::SetWhiteListedDataFolders implementation.
include/onnxruntime/core/session/onnxruntime_cxx_api.h Adds C++ wrapper SessionOptions::SetWhiteListedDataFolders declaration.
include/onnxruntime/core/session/onnxruntime_c_api.h Adds public C API declaration/docs for SessionOptionsSetWhiteListedDataFolders (currently mismatched).
include/onnxruntime/core/graph/graph.h Changes public Graph API signature for ConvertInitializersIntoOrtValues.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +629 to +638
auto unrelated_dir = std::filesystem::temp_directory_path() / "unrelated_PathValidationTest";
std::filesystem::create_directories(unrelated_dir);
auto cleanup = [&]() { std::filesystem::remove_all(unrelated_dir); };

std::vector<std::filesystem::path> whitelist = {whitelisted_dir_};
auto escaping_location = std::filesystem::path("..") / "unrelated_PathValidationTest" / "data.bin";
ASSERT_FALSE(utils::ValidateExternalDataPath(base_dir_, escaping_location, whitelist).IsOK());

cleanup();
}
Copy link

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test uses an ASSERT_* before calling the cleanup lambda. If the assertion fails, the temporary directory won’t be removed. Prefer a scope guard/RAII cleanup (or do cleanup in TearDown) so the directory is always removed.

Copilot uses AI. Check for mistakes.
Comment on lines +649 to +660
auto another_dir = std::filesystem::temp_directory_path() / "another_PathValidationTest";
std::filesystem::create_directories(another_dir);
auto cleanup = [&]() { std::filesystem::remove_all(another_dir); };

auto relative_to_outside = std::filesystem::path("..") / "outside" / "data.bin";
std::vector<std::filesystem::path> whitelist = {another_dir, outside_dir_};

// Escapes base_dir_ but outside_dir_ (second whitelist entry) should match.
ASSERT_STATUS_OK(utils::ValidateExternalDataPath(base_dir_, relative_to_outside, whitelist));

cleanup();
}
Copy link

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue here: cleanup() is after ASSERT_STATUS_OK, so a failure will leak the temporary directory. Use a scope guard or ensure cleanup runs unconditionally.

Copilot uses AI. Check for mistakes.
@@ -1004,9 +1004,6 @@ struct ProviderHost {

virtual bool Utils__HasExternalDataInMemory(const ONNX_NAMESPACE::TensorProto& ten_proto) = 0;

Copy link

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing Utils__ValidateExternalDataPath from ProviderHost eliminates the only ValidateExternalDataPath entrypoint available to shared-provider code (tensorprotoutils.h excludes it under SHARED_PROVIDER). If bridge/shared EPs still need to validate external data locations (per PR #26776), consider keeping this API and updating it to accept the whitelist parameter instead of removing it.

Suggested change
// Validate that any external data referenced by the given TensorProto resides within the
// provided whitelist of allowed locations. Implementations should return a non-OK Status
// if any external data path is outside the whitelist.
virtual Status Utils__ValidateExternalDataPath(
const ONNX_NAMESPACE::TensorProto& tensor_proto,
const std::vector<std::filesystem::path>& allowed_locations) = 0;

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants