Dyno: cache function signature instantiations, reduce cache impact of some queries #27082
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR aims to further improve Dyno's resolution performance.
I started by looking at the number of queries we call, in hopes that we can reduce the number of invocations to them. In the past,
idToAst
has been one of Dyno's hottest queries, and this remains to be the case. However, unlike previous times, I found no opportunities to elide calls to this function. Instead, I took the following steps to reduce the amount of work done by Dyno:scopeForId
and its recursive self-invocations make up the bulk of calls toidToAst
. I found no ways to reduce uses ofidToAst
inscopeForId
, but I did find a way to reduce the number ofscopeForId
invocations. It turns out that codeGatherMentionedModules
runsscopeForId
for every identifier, which invokesscopeForId
. Many identifiers share scopes (they don't create their own ones!), so this causes redundant cache entries. Instead, this PR adjustsGatherMentionedModules
to match other visitors (e.g.,Resolver
) and push scopes when a scope-creating AST node is entered. This actually increased the number of calls toscopeForId
, because not all scopes have identifiers. I thus further tweaked this to lazily invokescopeForId
when it's needed. This didn't have a noticeable runtime performance, but it did reduce the number of queries executed by a large number.returnType()
query always invokesreturnTypeWithoutIterable
(akayieldType()
). As a result, there is always an equal number of cache entries for these two. Moreover, several places in the resolver use both queries, which means double the lookups. I fused the two queries into a single, tuple-returning query. This halves the number of storage entries required for computing the return type, and, where applicable, also halves the number of query cache lookups. I didn't measure any performance impact in release mode, but it did reduce the number of queries executed.When I was debugging the above, I noticed issues with output as part of
--dyno-enable-tracing
caused by newlines inparam
strings. I adjusted the DETAIL logging of these strings to escape newlines so that tracing output is unaffected.After this, I turned back to profiler output. I noticed that
saveDependencyInParent
contributes a large amount of overhead. I also noticed some oddities in debug mode profiles: creating the start end end iterators for the recursion error set was taking a significant amount of time. There ought not be recursion errors at all! I guarded the recursion error insertion (which creates these iterators) behind a size check, and made other similar changes. This reduced the runtime overhead ofsaveDependencyInParent
by 0.5 seconds in the debug build, but the change is within noise in release mode.I also noticed that
CHPL_ASSERT
seems to execute its body in release mode. After checking with @arezaii, @dlongnecke-cray, and @mppf, it seems like there's no reason to do so in release. This PR removes that.Finally, following @benharsh's suggestion to investigate re-traversals, I discovered that the generated formals for the
_range
constructor were being re-traversed thousands of times. This was because calls toinstantiateSignature
were not cached, which meant that each invocation of a generic constructor triggered re-resolution. I turnedinstantiateSignature
into a query, winning roughly 10% in terms of performance on my benchmark (still the sample program from https://github.com/Cray/chapel-private/issues/7139): ~3.5 seconds -> ~3.15 seconds. This narrows the gap between Dyno and production on this benchmark to ~20%.Encouragingly, I'm seeing Dyno reach comparable performance while resolving other benchmarks. Comparing invocations of
--dyno-resolve-only
and--stop-after-pass=resolve
, I saw the following results:main
)parIters.chpl
primeratomics.chpl
primerforallLoops.chp
primer