Revise assembly version conflict resolution on .NET to eliminate DOTNET_ADDITIONAL_DEPS/DOTNET_SHARED_STORE requirement#4783
Conversation
Extends native assembly version redirection to .NET using custom AssemblyLoadContext for TPA conflicts. Multi-targets project for both .NET Framework and .NET, generates unified redirection maps, and implements new resolution strategy via ALC.Default.Resolving. WIP: Profiler deployment path, includes debug handlers and TODOs Related to open-telemetry#4715 Signed-off-by: apukhov <apukhov@cisco.com>
…ly-conflict-resolution # Conflicts: # src/CommonExcludedAssets.props
I am trying to test my understanding of how this works. In this test case, AutoInstrumentation is going to configure the OpenTelemetry SDK to subscribe to ActivitySources defined using the ActivitySource available in v10 (from the custom ALC). Other libraries, such as AspNetCore are going to create an ActivitySource. I am assuming that we are relying on the assembly redirects created by the profiler so that AspNetCore will use the ActivitySource class from v10 (available in the custom ALC). Is my understanding correct? |
Yes, your understanding is correct and your example of ActivitySource from System.Diagnostics.DiagnosticSource assembly is exactly what I had in mind when I though about this improvement. If my understanding is correct, it is crucial to make sure all instrumentations in the customer application are using the same copy of this assembly, otherwise the chain of spans may fall apart (due to the fact that the Activity shared state is becoming not-so-shared). Previously we were relying on the env variables DOTNET_ADDITIONAL_DEPS/DOTNET_SHARED_STORE that would force .NET Runtime to update TPA list and load our DiagnosticSource v10 to default ALC even if the customer application had a reference to v8. With assembly redirection I hope to achieve the same behavior - the DiagnosticSource v10 will be loaded - but in this case to custom ALC. Native assembly redirection + assembly resolving handler will make sure everyone will be using this loaded assembly |
- remove Windows-only constraints for Assembly Redirection in the build targets, moves steps to cross-platform files - remove Windows-only constraints in Native Profiler - optimize Dependency Assemblies collection in build
…re it will only use Net (Core) versions
Renamed projects, steps, targets, functions, and variables related to assembly redirection to be runtime-agnostic (support both .NET Framework and .NET)
… for DOTNET_ADDITIONAL_DEPS/DOTNET_SHARED_STORE requirement - tests will be updated separately
So in case of conflict the original lib loaded to default ALC will just remain unused and no other lib has access to it? |
|
Thanks for the question, @RassK . I believe the answer is "yes", but let me clarify a few things:
Let me know if you have further questions or if you'd like more details on any of these points! |
…modern map insertion" This reverts commit e8e4b98. Wrong initial assumption. The code can be fixed in a simpler way
…ly-conflict-resolution # Conflicts: # src/OpenTelemetry.AutoInstrumentation.AdditionalDeps/Directory.Build.props # src/OpenTelemetry.AutoInstrumentation.Assemblies.NetFramework/Directory.Packages.props # src/OpenTelemetry.AutoInstrumentation.Native/netfx_assembly_redirection.h
- Bump up .NET dependency versions in Assemblies\Directory.Packages.props - Optimize common dependencies in Assemblies\Directory.Packages.props via TransientDependenciesGenerator: common, all .NET Framework, all .NET Core, individual
There was a problem hiding this comment.
It may be good idea to use AssemblyLoadContext.Default.LoadFromAssemblyPath to be more explicit with where it will be loaded - and it prevent installing floder-based assembly resolver that we don't need.
There was a problem hiding this comment.
Thank you, that's a good catch! Done!
…ls + dotnet path issue on Linux env 1. On Modern .Net-only environments (e.g. Linux builds) if we choose to exclude System.Threading.Channels from output, we should not refernce it at all since it's part of the sdk which raises nuget pruning warning NU1510 (treated as error per our global configuration) Issue doesn't appear on Windows because the multi-target project still reference traditional .NET Framework where this package refernce makes sense If the project would be .NET-only on Windows we would see the same problem Plus with the developement of .net sdk the check might get stricter in future because for modern .NET it is still redundant, so it's a bit of a luck it doesn't give us a problem on Windows build 2. On Linux docker image (alpine.dockerfile) DotNetExe.FullPathOrDefault() returns unexisting path for dotnet: "/usr/local/share/dotnet/dotnet"
…ly-conflict-resolution # Conflicts: # src/OpenTelemetry.AutoInstrumentation.Native/netfx_assembly_redirection.h
- src/OpenTelemetry.AutoInstrumentation/Generated/net8.0/System.Text.RegularExpressions.Generator/System.Text.RegularExpressions.Generator.RegexGenerator/RegexGenerator.g.cs
…d to be before the change - add DisableTransitiveFrameworkReferences to Assemblies.csproj to fix previously hidden issue of skipping copy of Microsoft.Extensions.* libraries during .net project builds (these assemblies were coming from AdditionalDeps.csproj file so we never noticed we never tried to bring them in) - the fix above brings important Microsoft.Extensions.Configuration.Binder to the output (it was previously completely skipped which is mistake) - adjust include/exclude to match the previous output, however a todo left to either discuss potentially missing dependencies or a validity of their exclusion
…hen loading to Default ALC
…non-Windows platforms split the header into two files: - assembly_redirection_netfx.h autogenerated only on Windows - assembly_redirection_net.h autogenarated on any platform assembly_redirection.h will concatenate (via macro) both data into one map This should keep all files the same no matter where the project is built
build/TargetFramework.cs
Outdated
There was a problem hiding this comment.
It may better to have string consts for "net" / "netfx" - if we don't have it already and use it here.
There was a problem hiding this comment.
We should discuss policy for dependencies included in zip on .NET: should it be latest matching major version, oldest non vulnerable matching major version, something else?
There is Microsoft.Extensions.DependencyInjection/8.0.1 - but that discussion probably better to happen outside of this PR.
There was a problem hiding this comment.
We have to do it due to AspNetCoreBootstrapper references Microsoft.NET.Sdk.Web - but here we need have full set that will work even on console app.
There was a problem hiding this comment.
We probably should put that info in comment.
…omments according to discoveries
…ly-conflict-resolution
docs/config.md
Outdated
There was a problem hiding this comment.
We should also fix description, which mention only .Net Framework.
Most probably we should also say that it should be set to true for non-nuget deployment and false for Nuget deployment.
Even better would be to make it optional - when it is not explicitly set, use redirect for non-nuget installation and false for nuget installation (we can detect installation based on folder/files layout).
We still may allow env variable to override it.
If we preserve old env variable on .Net Framework, it may be better to document both, with information of precedence.
There was a problem hiding this comment.
updated the description!
Left these TODOs for discussion:
- think about making this env variable optional with auto-detected value based on deployment (true for standalone, false for NuGet)
- think about documenting the old netfx env variable, describing the order of precedence
docs/design.md
Outdated
There was a problem hiding this comment.
I'm not sure whether "default" is right word for something controlled by env variable. We may need to say something like "default startup script" - or just make it behave correctly by default without env variable.
There was a problem hiding this comment.
I'm actually thinking, we should shorten this section and give a cross reference to assembly-conflict-resolution.md doc.
- revise section given we now have a dedicated doc
assembly-conflict-resoltuin.md
docs/design.md
Outdated
There was a problem hiding this comment.
We may need reword it pointing that build time resolution may also solve issues, but in most case is optional.
There was a problem hiding this comment.
I rewrote the section completely. Please check out the latest version
docs/design.md
Outdated
There was a problem hiding this comment.
It is no true for System.Diagnostics.DiagnosticSource, where we use latest one always, but true for extansion assemblies.
docs/README.md
Outdated
There was a problem hiding this comment.
Should we also mention here src/OpenTelemetry.AutoInstrumentation.Assemblies/Directory.Packages.props for transitive reference?
|
|
||
| OpenTelemetry .NET NuGet packages and their dependencies | ||
| are deployed with the OpenTelemetry .NET Automatic Instrumentation. To avoid | ||
| dependency version conflicts, the recommended way to install the automatic |
There was a problem hiding this comment.
If we really resolve all issues, we can remove words about recommedation of NuGet at some point in future.
There was a problem hiding this comment.
I can rephrase this piece by saying that seeing this error most likely means hitting a limitation of our assembly conflict resolution described in [add reference to assembly-conflict-resolution.md]. And our recommendation would be to 1) switch to NuGet package deployment where assemblies are resolved at build time, 2) use Native profiler based deployment where assembly conflict resolution is more stable, 3) update application dependencies to match instrumentation versions or, last resort 3) try implementing AdditionalDeps workaround
I have a question though. Should we add an env variable to disable isolation for assembly conflict resolution in StartupHook-only deployment? Just like we have an env variable to disable AssemblyRef redirection for Native Profiler based deployment
There was a problem hiding this comment.
I think it is a good idea to have mechanics to disable StartupHook.
docs/troubleshooting.md
Outdated
There was a problem hiding this comment.
In non-NuGet installation, startup script configure assembly redirections.
No-need to reference .Net / .Net framework, as it is common now.
| By default, assembly references for .NET applications are redirected | ||
| during runtime to the versions used by the automatic instrumentation. | ||
| This behavior can be controlled through the [`OTEL_DOTNET_AUTO_NETFX_REDIRECT_ENABLED`](./config.md#additional-settings) | ||
| This behavior can be controlled through the [`OTEL_DOTNET_AUTO_REDIRECT_ENABLED`](./config.md#additional-settings) |
There was a problem hiding this comment.
Worse mention that both:
- disabling it with non-Nuget installation
- enabling it with NuGet installation
will likely make application non-working.
There was a problem hiding this comment.
Will it really break the application? If assembly versions are resolved at a build time, will we ever hit the AssemblyResolver event handler? I mean if we will, then yes we won't be able to locate an assembly in NuGet installation layout of tracer-home, but will the event be triggered? I'll double-check and update it here if needed. But either way it looks flaky, and as @nrcventura already mentioned we may want to align tracer-home layout across installations
- double check if NuGet installation can hit
AssemblyResolverevent handler
There was a problem hiding this comment.
We reference higher version of some assemblies with redirect, than we reference in nuget and is enough for us to work.
We have duality: for nuget package, we should reference minimum version on which we work - and it is up to app developer upgrade dependencies based on their standards. For zip, when we bring many dependencies with us, we should provide latest dependencies versions.
Currently, we create redirect map on the same set of assemblies that we use to build zip archive, so they are newer, than assemblies referenced though nuget transitivity.
It is possible to make our installation even more complex - and have two set of project that collect Assemblies: one for redirect map generation (it should not has it's own PackageVersions) and one that we use for zip (provided in this PR), but I'm not sure if benefit (allow to redirect in all cases for NuGet) is high enough to increase complexity.
…ly-conflict-resolution
| #### Resolution order | ||
|
|
||
| When code triggers an assembly reference, the runtime first checks | ||
| whether the assembly is **already loaded** in the current ALC at the |
There was a problem hiding this comment.
We may want to clarify that current ALC means ALC of assembly which has AssemblyRef we try to resolve.
| types**. This means that data shared through static fields (for example, | ||
| `System.Diagnostics.Activity.Current`) must come from a single assembly | ||
| instance — otherwise different parts of the application will see | ||
| different state. |
There was a problem hiding this comment.
It is also important to note, that same type loaded from 2 instances of same assembly loaded to different ALC are not the same and can't be casted to each other.
|
|
||
| ### .NET Framework | ||
|
|
||
| On .NET Framework there is no `AssemblyLoadContext`. Assembly loading |
There was a problem hiding this comment.
Technically, it is not true - they have the same ALC, but no API to operate with them. We can only observe difference of that by utilizing different Load methods.
|
|
||
| ## Resolution strategies by deployment mode | ||
|
|
||
| ### 1. NuGet package deployment (.NET and .NET Framework) |
There was a problem hiding this comment.
We probably should mention, that direct nuget reference wins and if application downgrades required version it will be compile-time warning and app may fail in runtime (and will, in most cases).
|
|
||
| After rewriting, the runtime proceeds to resolve the rewritten | ||
| reference. Because the version now matches what the instrumentation | ||
| ships, the resolution follows the pipeline described above. |
There was a problem hiding this comment.
Should it be "the resolution follows the pipeline described bellow.`? As we provided only theoretical knowledge above, and all details will be later?
| | Situation | Why it fires | Where we load the assembly | | ||
| | --- | --- | --- | | ||
| | Assembly is **not** in the TPA list (for example, `OpenTelemetry.dll`) | The runtime has no default location for it | **Default ALC** — no conflict risk | | ||
| | Assembly **is** in the TPA list but with a **lower** version | The profiler rewrote the reference to a higher version that the TPA cannot satisfy | **Custom ALC** — loading into the Default ALC would fail because the TPA already provides a lower version | |
There was a problem hiding this comment.
The only risk is that customer also want to resolve the same assembly. We should validate that once we load assembly, we validate that version satisfy requested version - otherwise probably it should be handled by other handler.
| #### Managed assembly resolver (.NET Framework) | ||
|
|
||
| On .NET Framework, the instrumentation subscribes to | ||
| `AppDomain.CurrentDomain.AssemblyResolve` and loads the required |
There was a problem hiding this comment.
Outside of this PR, we should cover in more details pitfalls of .NET Framework:
- multiple app domain and their work with assembly resolver
- override by GAC version
- assemblybinding, that will have higher precedence that our instructions.
| that the instrumentation also depends on, the instrumentation cannot | ||
| prevent or override that load. | ||
|
|
||
| - **Native profiler deployment:** This is less of a problem in practice. |
There was a problem hiding this comment.
It still may be an issue, if some reflection code called after it from loaded assembly - as now we have multiple types from same-named assembly - which in many case would result in problems.
|
|
||
| The startup hook runs after the .NET host has already initialized. Any | ||
| assemblies loaded before the hook — including the startup hook assembly | ||
| itself, its direct dependencies, and any other `DOTNET_STARTUP_HOOKS` |
There was a problem hiding this comment.
It should not be a problem if we are the only startup hook, as we specifically tested that it is safe - but may be issue if we are not the only startup hook.
|
|
||
| - **NuGet package deployment**: Versions are resolved at build time by | ||
| NuGet's dependency resolution. The | ||
| `OTEL_DOTNET_AUTO_REDIRECT_ENABLED` environment variable has no effect |
There was a problem hiding this comment.
We use higher version of assemblies to redirect, than versions we reference though NuGet dependecies. So, if transitive are not manully updated to at least versions that we expcet to have in zip deployment, we would redirect versions to higher, but would not provide them - and it will result in app crash.
|
|
||
| private static readonly IEnumerable<TargetFramework> TestFrameworks = TargetFrameworks | ||
| .Concat(TargetFramework.NET9_0, TargetFramework.NET10_0); | ||
| private static readonly IEnumerable<TargetFramework> TargetFrameworksForPublish = |
There was a problem hiding this comment.
I don't like that we now have TargetFrameworksForNetFxPacking and TargetFrameworksForPublish that are not expressed one in terms of another. Probably we can find create helper ExceptNet() (similar to ExceptNetFramework()) and use it instead of TargetFrameworksForNetFxPacking on TargetFrameworksForPublish .
| By default, assembly references for .NET applications are redirected | ||
| during runtime to the versions used by the automatic instrumentation. | ||
| This behavior can be controlled through the [`OTEL_DOTNET_AUTO_NETFX_REDIRECT_ENABLED`](./config.md#additional-settings) | ||
| This behavior can be controlled through the [`OTEL_DOTNET_AUTO_REDIRECT_ENABLED`](./config.md#additional-settings) |
There was a problem hiding this comment.
We reference higher version of some assemblies with redirect, than we reference in nuget and is enough for us to work.
We have duality: for nuget package, we should reference minimum version on which we work - and it is up to app developer upgrade dependencies based on their standards. For zip, when we bring many dependencies with us, we should provide latest dependencies versions.
Currently, we create redirect map on the same set of assemblies that we use to build zip archive, so they are newer, than assemblies referenced though nuget transitivity.
It is possible to make our installation even more complex - and have two set of project that collect Assemblies: one for redirect map generation (it should not has it's own PackageVersions) and one that we use for zip (provided in this PR), but I'm not sure if benefit (allow to redirect in all cases for NuGet) is high enough to increase complexity.
| applications, the ones under the `netfx` folder of the installation directory, | ||
| to be also installed into the Global Assembly Cache (GAC): | ||
| require the assemblies used to instrument .NET Framework applications, | ||
| the ones under the `netfx` folder of the installation directory, to be also installed |
There was a problem hiding this comment.
I don't think it is required anymore - but should be addressed by outside of this change, probably by me.
| :: Instead the transitive dependencies versions are determined by | ||
| :: the NuGet version resolution algorithm when building the application. | ||
| set OTEL_DOTNET_AUTO_NETFX_REDIRECT_ENABLED=false | ||
| set OTEL_DOTNET_AUTO_REDIRECT_ENABLED=false |
There was a problem hiding this comment.
BTW, what is default setting for it? Unless we do auto-detection, if the default value is true, it may require nuget customers to update their scripts if instrument.cmd is not used.
We also should check setting in .sh script.
| <PackageVersion Include="System.ServiceModel.Primitives" Version="4.7.0" Label="Packages requiring filtering re-evaluation: Transient dependencies of OpenTelemetry.Instrumentation.Wcf: likely should be excluded" /> | ||
| </ItemGroup> | ||
|
|
||
| <ItemGroup Label="Transient dependencies auto-generated by TransientDependenciesGenerator for net8.0" Condition=" '$(TargetFramework)' == 'net8.0' "> |
There was a problem hiding this comment.
How are we planning upkeeping it?
| /// A class that attempts to load the OpenTelemetry.AutoInstrumentation .NET assembly. | ||
| /// </summary> | ||
| internal partial class AssemblyResolver | ||
| internal partial class AssemblyResolver(IOtelLogger logger) |
There was a problem hiding this comment.
Looks like it is not partial anymore.
| /// A class that attempts to load the OpenTelemetry.AutoInstrumentation .NET assembly. | ||
| /// </summary> | ||
| internal partial class AssemblyResolver | ||
| internal partial class AssemblyResolver(IOtelLogger logger) |
There was a problem hiding this comment.
Looks like it is not partial anymore.
| // For .Net (Core) if we run in isolated context (set up by StartupHook's IsolatedAssemblyResolver), | ||
| // skip AssemblyResolver. The isolated AssemblyLoadContext already handles all assembly resolution. | ||
| var currentContext = System.Runtime.Loader.AssemblyLoadContext.CurrentContextualReflectionContext; | ||
| var isNotIsolated = currentContext == null || currentContext == System.Runtime.Loader.AssemblyLoadContext.Default; |
There was a problem hiding this comment.
It may be safer to check name of context to detect isolated context.
If it is not isolated, but CurrentContextualReflectionContext != Default, someone (another startup hook?) already played with contexts. We still may try our resolver approach.
| var runtimeDir = ManagedProfilerRuntimeDirectory; | ||
| LazyInitializer.EnsureInitialized(ref _managedProfilerVersionDirectory, () => ResolveManagedProfilerVersionDirectory(logger)); | ||
|
|
||
| // For .NET (Core) most of the assembblies are different per runtime version, so we |
There was a problem hiding this comment.
assembblies -> assemblies
| } | ||
| #endif | ||
|
|
||
| ToBooleanWithDefault(assemblyRedirectEnvValue, true); |
There was a problem hiding this comment.
If it is true by default, we shoudl point in changelog that it is breaking change for anyone who is not using our .cmd/.sh script - and validate that we set it to false in .sh
Why
OpenTelemetry.AutoInstrumentation for .NET often requires setting
DOTNET_ADDITIONAL_DEPSandDOTNET_SHARED_STOREto resolve assembly version conflicts between the instrumented application and the instrumentation tool (see #4715). A more robust, unified approach is needed across all deployment modes.What
Leverage .Net
AssemblyLoadContextto handle assembly resolution consistently across different deployments and reuse existing logic implemented for .NET FrameworkSee Assembly Conflict Resolution for:
The strategies below are grounded in these runtime mechanics.
Strategies
Implementation Highlights
Essential Changes
Extended and renamed the multi-target project (
OpenTelemetry.AutoInstrumentation.Assemblies.NetFramework.csproj) to support .NET targets and added .NET-specific project references (Loader, AspNetCoreBootstrapper, StartupHook).build/TransientDependenciesGenerator.csto process both .NET Framework and .NET, contributing .NET dependencies tosrc/OpenTelemetry.AutoInstrumentation.Assemblies/Directory.Packages.props.PublishManagedProfilerto publish .NET assemblies intotracer-home/net/{version}folders and moving common libraries to the roottracer-home/net; unifying this with .NET Framework flow.tracer-home/netfolder structure: only common libraries (AspNetCoreBootstrapper, Loader, StartupHook, OpenTelemetry.AutoInstrumentation, and some other) will be placed in the root; .NET-specific libraries (OpenTelemetry, DiagnosticSource, etc.) will be placed in version-specific subfolders.src/OpenTelemetry.AutoInstrumentation.Assemblies/Directory.Packages.props, some of the seem to be pure WCFsrc/CommonExcludedAssets.props(andsrc/Directory.Packages.props) do not contribute to dependency graph generation for other projects like nuget package buildbuild/AssemblyRedirectionSourceGenerator.csnow scans bothtracer-home/netfx/andtracer-home/net/, building two redirection maps:src/OpenTelemetry.AutoInstrumentation.Native/assembly_redirection_net.handsrc/OpenTelemetry.AutoInstrumentation.Native/assembly_redirection_netfx.h(via targetGenerateAssemblyRedirectionSource).Extended
cor_profiler.cppto extract .NET version and apply the correct redirection map (via methodDetectFrameworkVersionTableForRedirectsMap()).Refactored assembly resolution strategy in
src/OpenTelemetry.AutoInstrumentation.Loader/AssemblyResolver.Net.csaccording to the logic described above: our dependencies to default ALC, conflicting dependencies to custom ALC:AppDomain.AssemblyResolvetoAssemblyLoadContext.Default.Resolvingwhich happens earlier in resolution cycle.Add
src/OpenTelemetry.AutoInstrumentation.StartupHook/IsolatedAssemblyResolver.csandsrc/OpenTelemetry.AutoInstrumentation.StartupHook/IsolatedAssemblyLoadContext.cstoStartupHookto support assembly conflict resolution when Native Profiler is not enabledTests
FrameworkDistroTests.ReferencedPackagesNoUnsupportedNet()on non-Windowssrc/OpenTelemetry.AutoInstrumentation.Loader/AssemblyResolver.Net.cswork for all common dependency relations: 1) same versions, 2) ours higher, 3) ours lowersrc/OpenTelemetry.AutoInstrumentation.StartupHook/IsolatedAssemblyResolver.csworks with different entrypoints:void/int,Task/Task<int>,async Task/Task<int>src/OpenTelemetry.AutoInstrumentation.StartupHook/IsolatedAssemblyLoadContext.csworks for all common dependency relations: 1) same versions, 2) ours higher, 3) ours lowerOpenTelemetry.AutoInstrumentation.Loader.Tests.Ctor_LoadsManagedAssembly())Checklist
CHANGELOG.mdis updated.