Fix NuGet DLL Loading on Linux and macOS#27266
Conversation
Use AcesShared pool for arm64 macOS
probes runtimes/{RID}/native/ subfolders
copy dylib
There was a problem hiding this comment.
Pull request overview
This PR addresses cross-platform native library resolution for the C# ONNX Runtime NuGet packages by removing hardcoded .dll names and introducing custom resolution logic, alongside CI/packaging adjustments to ensure required macOS artifacts (notably the custom op library and dylib layout) are present during NuGet validation.
Changes:
- Updated C# P/Invoke library names to use extension-less names and added a
DllImportResolverto control native library loading behavior. - Adjusted macOS NuGet test pipeline and macOS packaging steps to ensure required test artifacts (e.g.,
libcustom_op_library.dylib) are available/located as expected. - Tweaked Linux packaging Docker image dependencies and related CI pipeline configuration.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| tools/ci_build/github/linux/docker/Dockerfile.package_ubuntu_2404_gpu | Adds additional TensorRT-related packages for the Ubuntu 24.04 GPU packaging image. |
| tools/ci_build/github/linux/copy_strip_binary.sh | Ensures libonnxruntime.dylib is a real file (not a symlink) for NuGet packaging robustness. |
| tools/ci_build/github/azure-pipelines/templates/mac-cpu-packaging-steps.yml | Copies libcustom_op_library.dylib into packaging/testdata locations for macOS artifacts. |
| tools/ci_build/github/azure-pipelines/nuget/templates/test_macos.yml | Updates macOS NuGet test job pool selection and unpacks macOS artifacts into testdata/; initializes ONNX submodule for test data. |
| tools/ci_build/github/azure-pipelines/c-api-noopenmp-test-pipelines.yml | Switches macOS NuGet tests to a specific pool/demands and removes the Node.js macOS stage from this pipeline. |
| csharp/test/Microsoft.ML.OnnxRuntime.Tests.NetCoreApp/InferenceTest.netcore.cs | Skips a set of pretrained-model testcases on macOS. |
| csharp/src/Microsoft.ML.OnnxRuntime/NativeMethods.shared.cs | Changes DllImport names to extension-less and adds a DllImportResolver to map/load native libraries. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
csharp/test/Microsoft.ML.OnnxRuntime.Tests.NetCoreApp/InferenceTest.netcore.cs
Outdated
Show resolved
Hide resolved
tools/ci_build/github/azure-pipelines/nuget/templates/test_macos.yml
Outdated
Show resolved
Hide resolved
…ard, fix brace style
|
Is part of the problem that the props file for the nuget package does not have logic to copy files for non-windows platforms to the build output directory? Typically the build process would copy from the relevant |
I added a section of |
but would the
FWIW we're having to manually add this copy step in the FL Local samples for Linux and we wouldn't need to do that if the props file included logic for linux. |
If we have props for Linux/MacOS, we can simplify runtime probing logic. We can address props issue in another pull request since it is a different issue. |
## Summary This PR addresses persistent native library loading issues in the ONNX Runtime NuGet package, specifically on macOS and Linux, by implementing a robust DllImportResolver. It also includes necessary pipeline and packaging adjustments to ensure required macOS artifacts are correctly located and validated during CI. ## Problem #27263 reports that `Unable to load shared library 'onnxruntime.dll' or one of its dependencies`. It was caused by #26415 since the commit hard-coded onnxruntime.dll even for Linux and MacOS (The correct filename shall be libonnxruntime.so for Linux, and libonnxruntime.dylib for MacOS). The Nuget test pipeline has been broken for a while, so we also need fix the pipeline to test our change. It has the following issues: * MacOS nuget is for arm64, but the vmImage `macOS-15` is x64. * MacOS nuget test need libcustom_op_library.dylib, but it is not copied from artifacts to test environment. * MacOS artifact contains libonnxruntime.dylib and libonnxruntime.1.24.1.dylib, where libonnxruntime.dylib is symlink. It causes issue since the later is excluded by nuspec. * MacOS nuget test use models from onnx repo. However, latest onnx has some models with data types like float8 that are not supported by C#, so those model test failed. * Linux nuget test uses a docker Dockerfile.package_ubuntu_2404_gpu, but docker build failed due to libnvinfer-headers-python-plugin-dev and libnvinfer-win-builder-resource10 version. ## Changes ### 1. Robust C# DLL Resolution The DllImportResolver has been enhanced to handle various deployment scenarios where standard .NET resolution might fail: - **Platform-Specific Naming**: Maps extension-less library names (`onnxruntime`, `ortextensions`) to appropriate filenames (`onnxruntime.dll`, `libonnxruntime.so`, `libonnxruntime.dylib`) based on the OS. - **Multi-Stage Probing**: 1. **Default Loading**: Attempts `NativeLibrary.TryLoad` with the mapped name. 2. **NuGet `runtimes` Probing**: If the above fails, it probes the `runtimes/{rid}/native/` subdirectories relative to the assembly location, covering common RIDs (`win-x64`, `linux-arm64`, `osx-arm64`, etc.). 3. **Base Directory Fallback**: As a final attempt, it looks in `AppContext.BaseDirectory`. - **Case-Sensitivity Handling**: Ensures lowercase extensions are used on Windows to prevent lookup failures on case-sensitive filesystems. ### 2. macOS CI/Packaging Improvements - **Templates (test_macos.yml)**: - Updated to extract artifacts from TGZ files. - Ensures `libcustom_op_library.dylib` is placed in the expected location (`testdata/testdata`) for end-to-end tests. - Initializes the ONNX submodule to provide required test data. - **Node.js**: - Restored the Node.js macOS test stage in c-api-noopenmp-test-pipelines.yml, configured to run on the ARM64 pool (`AcesShared`). - Updated test_macos.yml template to support custom agent pools (similar to the NuGet template). - **Pipeline Config**: Adjusted agent pool selection and demands for macOS jobs to ensure stable execution. - **Binary Robustness**: The `copy_strip_binary.sh` script now ensures `libonnxruntime.dylib` is a real file rather than a symlink, improving NuGet packaging reliability. ### 3. Test Refinements - **Inference Tests**: Skips a specific set of pretrained-model test cases on macOS that are currently known to be flaky or unsupported in that environment, preventing noise in the CI results. ## Verification ### Pipelines - [x] Verified in `NuGet_Test_MacOS`. - [x] Verified in `NuGet_Test_Linux`. - [x] Verified in Windows test pipelines. ### Net Effect The C# bindings are now significantly more resilient to different deployment environments. The CI process for macOS is also more robust, correctly handling the artifacts required for comprehensive NuGet validation.
This cherry-picks the following commits for the 1.24.2 release: - #27096 - #27077 - #26677 - #27238 - #27213 - #27256 - #27278 - #27275 - #27276 - #27216 - #27271 - #27299 - #27294 - #27266 - #27176 - #27126 - #27252 --------- Co-authored-by: Xiaofei Han <xiaofeihan@microsoft.com> Co-authored-by: Jiajia Qin <jiajiaqin@microsoft.com> Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com> Co-authored-by: qti-monumeen <monumeen@qti.qualcomm.com> Co-authored-by: Ankit Maheshkar <ankit.maheshkar@intel.com> Co-authored-by: Eric Crawford <eric.r.crawford@intel.com> Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com> Co-authored-by: guschmue <22941064+guschmue@users.noreply.github.com> Co-authored-by: Guenther Schmuelling <guschmue@microsoft.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: angelser <32746004+angelser@users.noreply.github.com> Co-authored-by: Angela Serrano Brummett <angelser@microsoft.com> Co-authored-by: Misha Chornyi <99709299+mc-nv@users.noreply.github.com> Co-authored-by: hariharans29 <9969784+hariharans29@users.noreply.github.com> Co-authored-by: eserscor <erscor@microsoft.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Baiju Meswani <bmeswani@microsoft.com> Co-authored-by: Adrian Lizarraga <adlizarraga@microsoft.com> Co-authored-by: Ti-Tai Wang <titaiwang@microsoft.com> Co-authored-by: bmehta001 <bmehta001@users.noreply.github.com>
Summary
This PR addresses persistent native library loading issues in the ONNX Runtime NuGet package, specifically on macOS and Linux, by implementing a robust DllImportResolver. It also includes necessary pipeline and packaging adjustments to ensure required macOS artifacts are correctly located and validated during CI.
Problem
#27263 reports that
Unable to load shared library 'onnxruntime.dll' or one of its dependencies. It was caused by #26415 since the commit hard-coded onnxruntime.dll even for Linux and MacOS (The correct filename shall be libonnxruntime.so for Linux, and libonnxruntime.dylib for MacOS).The Nuget test pipeline has been broken for a while, so we also need fix the pipeline to test our change. It has the following issues:
macOS-15is x64.Changes
1. Robust C# DLL Resolution
The DllImportResolver has been enhanced to handle various deployment scenarios where standard .NET resolution might fail:
onnxruntime,ortextensions) to appropriate filenames (onnxruntime.dll,libonnxruntime.so,libonnxruntime.dylib) based on the OS.NativeLibrary.TryLoadwith the mapped name.runtimesProbing: If the above fails, it probes theruntimes/{rid}/native/subdirectories relative to the assembly location, covering common RIDs (win-x64,linux-arm64,osx-arm64, etc.).AppContext.BaseDirectory.2. macOS CI/Packaging Improvements
libcustom_op_library.dylibis placed in the expected location (testdata/testdata) for end-to-end tests.AcesShared).copy_strip_binary.shscript now ensureslibonnxruntime.dylibis a real file rather than a symlink, improving NuGet packaging reliability.3. Test Refinements
Verification
Pipelines
NuGet_Test_MacOS.NuGet_Test_Linux.Net Effect
The C# bindings are now significantly more resilient to different deployment environments. The CI process for macOS is also more robust, correctly handling the artifacts required for comprehensive NuGet validation.