Open
Description
Stack allocation of non-escaping ref classes and boxed value classes was enabled in #103361, but only works in limited cases. This issue tracks further enhancements (see also #11192).
Abilities:
- stack allocation of constant-sized arrays of non-GC types
- stack allocation of constant-sized arrays of GC types
- stack allocation of strings
- stack allocation of boxed nullables
- zeroing strategy (zero in prolog vs zero at first use)
- stack allocation of some delegates (and perhaps their closures), eg
public static int Test()
{
int a = 100;
Func<int> f = () => { return a; };
return f();
}
a small tweak to escape analysis gets the delegate on the stack, but the invoke expansion currently happens in lower so we don't get any physical promotion. We would need to move this earlier.
See note below.
- evaluate when/if it makes sense to transform stack objects to
localloc
instead of fixed allocations (at least for non-gc types; for GC types there's currently no way to do proper GC reporting) - Allocations in loops:
Analysis:
- make the analysis more sophisticated to handle increasingly more complex examples (eg fields of value classes, or fields of unescaped ref classes). See for instance
- special case calls to helpers where arguments don't escape (this will require ensuring that the helpers report gc arguments as interior to gc)
- or, add new helpers that do not need object references (eg box typecheck added in Stack allocate unescaped boxes #103361, or say an array store covariance check helper that does not do a null check or bounds check or a store plus write barrier)
- make inliner more aware when it can enhance stack allocation
- improve address-exposed analysis (an exposed stack allocated ref class probably performs worse than a heap allocated one, in most cases)
- See if we can leverage
Span
capture to give us "cheap" interprocedural analysis
Implementation:
- stop relying on top-level
ALLOCOBJ
- stop relying on
ALLOCOBJ
assigned to single-def temp in importer - use custom GC layout for boxes; stop fetching placeholder type from the runtime
- make sure object stack allocation doesn't block fast tail call optimization unnecessarily (currently fast tail call optimization is disabled if there are any exposed local). JIT: Relax address exposure check for tailcalls #111397 handles most of this. But if we introduce
stackalloc
it also inhibits implicit tail calls.
NAOT:
- interprocedural escape analysis. May also be viable in jitted contexts, either as a hint for the inliner or as some sort of property we can guarantee on profiler-driven re-jit.
Advanced:
- partial escape analysis. Allocate objects that are unlikely to escape on the stack. Either compute the "escape frontier" of an object and copy it to the stack when that frontier is crossed, or else add capabilities to write barriers to note when a stack allocated object reference is going to be stored on the heap, and "promote" the object at that point (using GC info to rewrite the stack references to heap references). Likely requires PGO, to ensure we're not wrong too often.
- for objects that don't escape but that we don't want to stack allocate, treat them as thread private: we can use more aggressive value numbering for instance, since we don't have to assume the field values can change asynchronously.
Diagnostics:
- if VM/GC/WriteBarriers catch an escaped object-on-stack they should provide a helpful assert instead of a generic GC hole like assert. We can either reserve a debug bit in the sync block or rely on object's address being within the code heap.
FYI @dotnet/jit-contrib
Metadata
Metadata
Assignees
Type
Projects
Status
Team User Stories