Skip to content

Improve startup perf by avoiding JIT when invoking well-known signatures #115345

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

steveharter
Copy link
Contributor

@steveharter steveharter commented May 6, 2025

This is the main contribution for tracking issue #112994.

For a R2R-enabled console "hello world" app, it reduced the jitted methods during startup by half (24 to 12) saving ~9% or 7ms (locally from 78->71ms).

For a non-R2R app there will be no jit necessary for each well-known signature after the first is jitted since they are cached. They are cacheable since the IL uses calli with no coupling to specific MemberInfo tokens. Note that it is possible to add calli and caching to the emit-based path (non-well-known cases) as was done in the prior PR for this; that would allow similar perf benefits of avoiding jit (+emit) for signatures that have already done that, but also adds the complexity and overhead of a cache that needs to key on each parameter.

Currently, the well-known method list is currently fairly short but will still cover many signatures because:

  • Reference type parameters are normalized to System.Object.
  • All primitive types and enums, along with common value types like DateTime and Guid, are included in property getter and setter signatures.

Design \ implementation notes:

  • To support efficient, clean calls to property setters and getters for the intrinsics, 2 delegates were added to InvokerEmitUtil.cs for a total of 5. This also pushed the code to add more shared methods in that file.
    • The 5 delegates were also changed to add the function pointer arg, which currently is only used for the intrinsic case. By passing in an unused IntPtr.Zero for the other cases, this avoids having additional delegates and conditionals to call a different delegate.
  • For the various files that implement invoke:
    • Added support for the 2 new delegates.
    • Methods are based more on the InvokerStrategy than arg count; this removes some conditionals. Note that when the interpreted path is removed, more more conditionals can be removed from these files.
    • The interpreted path is only used for CoreClr if the feature flag ForceInterpretedInvoke is set. This code should be removed shortly. Previously, the interpreted path is used for the first call to each method, then the method is emit'd on the second call but since we have the intrinsics now for startup\warmup, we should no longer have to do this. This also simplified the code since we no longer need to switch dynamically, and we moved from 3 fields to hold the 3 delegates to 1 field with appropriate casts to one of the 5 delegates.
    • The InvocationFlags was moved from the invoker classes to the MethodInfo, etc. classes. It was found that if reflection is used for inspection, but not invoke, it would sometimes unnecessarily create the invoker class.
    • The case for calling a constructor on an existing instance was supported previously in a separate file that used the interpreted path. Since we want to remove that path, we now track and pass in an additional bool for that case (e.g. "callCtorAsMethod").
      • This also makes the throughput performance multiple times faster in this case; throughput performance in other areas is a wash - some slightly faster, some slightly slower.
    • To allow the new intrinsics to be used for constructors -- which is important for some tested applications that use factory and DI patterns -- the alloc for the intrinsic case is done via CreateUnitializedObject and then the constructor is called as a normal method.
      • The call to CreateUnitializedObject was made fast by avoiding some layers - see the _allocator field.
    • The delegate field on each invoker class is based on 1 of the 5 different delegate types, and when called will then call one of these 3 cases (this approach avoids conditionals and allows validation and marshalling logic to be shared):
      • The new intrinsic.
      • The emit'd dynamic method.
      • A wrapper to call the interpreted \ native C++ path which for CoreClr should be removed shortly.
  • For the new intrinsic support:
    • There are static properties to cache each well-known method with lazy support.
      • There is a special return transform for Enum getters.
    • Currently, due to small number of well-known methods and logic to find the correct intrinsic, the property-based caching approach is simpler and faster than using a hashtable. Once the look-up logic finds the appropriate intrinsic, the delegate to that is set on the appropriate invoker class.

@steveharter steveharter added this to the 10.0.0 milestone May 6, 2025
@steveharter steveharter self-assigned this May 6, 2025
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-reflection
See info in area-owners.md if you want to be subscribed.

@@ -15,152 +14,188 @@ internal static unsafe class InstanceCalliHelper
// Zero parameter methods such as property getters:

[Intrinsic]
[MethodImpl(MethodImplOptions.NoInlining)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do these need to be marked with NoInlining?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without it, there is an AV during inlining.

So either the "dynamic method" that we created based on the intrinsic doesn't support inlining or there is a jit bug.

The exception is in the call to GetModule() here with call stack:

  Assert failure(PID 60300 [0x0000eb8c], Thread: 40924 [0x9fdc]): Consistency check failed: AV in clr at this callstack:
  ------
  CORECLR! CEEInfo::getArgType + 0x56E (0x00007ffe`20b74cfe)
  CLRJIT! Compiler::impInlineInitVars + 0x322 (0x00007ffe`36f8f0e2)
  CLRJIT! `Compiler::fgInvokeInlineeCompiler'::`2'::<lambda_1>::operator() + 0x24 (0x00007ffe`36edf994)
  CORECLR! `CEEInfo::runWithErrorTrap'::`6'::__Body::Run + 0x86 (0x00007ffe`20b6c2c6)
  CORECLR! CEEInfo::runWithErrorTrap + 0x2D (0x00007ffe`20ba27ed)
  CLRJIT! Compiler::fgInvokeInlineeCompiler + 0x34C (0x00007ffe`36ee342c)
  CLRJIT! Compiler::fgMorphCallInlineHelper + 0x2E3 (0x00007ffe`36ee3ae3)
  CLRJIT! Compiler::fgMorphCallInline + 0x54 (0x00007ffe`36ee3664)
  CLRJIT! Compiler::fgInline + 0x1C3 (0x00007ffe`36ee10d3)
  CLRJIT! Phase::Run + 0x76 (0x00007ffe`3708da86)
  CLRJIT! Compiler::compCompile + 0x57F (0x00007ffe`36e97cef)
  CLRJIT! Compiler::compCompileHelper + 0xA03 (0x00007ffe`36e9bb93)
  CLRJIT! Compiler::compCompile + 0xA18 (0x00007ffe`36e9a978)
  CLRJIT! `jitNativeCode'::`8'::__Body::Run + 0xE2 (0x00007ffe`36e96762)
  CLRJIT! jitNativeCode + 0x16E (0x00007ffe`36ea0f8e)
  CLRJIT! CILJit::compileMethod + 0x169 (0x00007ffe`36ea8759)
  CORECLR! invokeCompileMethodHelper + 0x151 (0x00007ffe`20b9cc01)
  set XUNIT_HIDE_PASSING_OUTPUT_DIAGNOSTICS=1
  CORECLR! invokeCompileMethod + 0x128 (0x00007ffe`20b9c9a8)
  CORECLR! UnsafeJitFunctionWorker + 0x2EF (0x00007ffe`20b6dccf)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this looks like a bug in the intrinsic implementation. It would be a good idea to understand it (and fix it). It may have other silent manifestations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll take a look and report back.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a better look at this and created issue #115429 to track. I can continue to investigate but thought I would create the issue now to see if someone has ideas or wants to pick it up.

If addressed, then we can either remove the NoInlining or replace with AggressiveInlining.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am worried that this bug is some kind of memory corruption, and it is still going repro - but less frequently - with the NoInlining workaround.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@steveharter I will try and take a look this week.

Copy link
Member

@AaronRobinsonMSFT AaronRobinsonMSFT May 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix is #115639

… to use interpreted mode on first call to each method
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants