[PROTON] Significantly reduce graph launching overhead with static context#9405
[PROTON] Significantly reduce graph launching overhead with static context#9405
Conversation
…dling - Updated CuptiProfiler to replace context capturing with static entry IDs for graph nodes. - Removed unnecessary loops for capturing contexts and managing node states. - Enhanced the handling of API launch callbacks to utilize static entry IDs directly. - Modified metric record emission to use DataEntry objects instead of entry IDs. - Adjusted PendingGraphPool to accommodate changes in data structure for metric entries. - Refined InstrumentationProfiler to streamline metric addition by using upsertFlexibleMetrics.
…aceData components
…rdered_map for improved performance
…ck for linked entries in Data, TraceData, and TreeData classes
…elated logic in CuptiProfiler
…ning of staticTree, improving performance and readability
…ccept rvalue references for onLinked callback, improving performance by avoiding unnecessary copies.
…aunchState, improving clarity and structure of launch metadata handling.
… metrics, improving performance during metric linking.
…asses for improved readability and consistency.
…y and consistency
…instead of DataToEntryMap for improved performance and consistency
… improve metric tracking consistency
…mpling, and CuptiProfiler for improved readability and consistency
…asses to simplify linking logic and improve consistency across profilers
…r improved readability and consistency
…p for improved performance and consistency
… and improve data handling consistency
… to use new methods for improved clarity and consistency
…ved clarity and consistency
…consistency and thread safety
…for improved clarity and consistency
… into MetricSet for improved organization and clarity
…ming dataToEntryMap to dataToEntries for improved clarity and consistency
Moves metric upsert logic out of the inline templated handler into concrete DataEntry methods (implemented in the .cpp) to reduce header complexity and centralize behavior. Clarifies linking semantics for virtual-phase entries: an entry created for a virtual phase now uses the dedicated virtual-phase constant and treats the entry id as the target entry id. upsertTargetMetric is changed to accept an explicit target id to avoid ambiguity when associating linked metrics. Updates callers to construct virtual-linked entries and pass explicit target ids so kernel and flexible metrics are stored against the intended target entries.
Reformats several source and header files to improve readability and enforce a more consistent style. Long argument lists and initializers are wrapped across lines, an extraneous blank line is removed, and some range-based loop/initializer spacing is tightened. These are purely stylistic changes that do not alter behavior; they make the code easier to read and reduce churn in future diffs.
Ensure correct phase attribution when creating child entries: use a virtual phase for kernel-generated child ops while preserving the original phase for graph node entries. This prevents incorrect overwriting of entry phases and keeps profiling data properly attributed across kernel activities and API-launched graph nodes.
Clarify semantics around metrics associated with virtual-phase entries by replacing "target" terminology with "linked". Update the API and internal names to reflect that entry IDs refer to linked entries for node/virtual relationships, and adjust callers accordingly (e.g., profiler paths that attach kernel metrics to linked entries). This improves readability and reduces confusion about the relationship between virtual entries and their associated metrics. No functional behavior is changed; the rename is purely semantic.
…or improved readability
…nd improve documentation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b72a092480
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| auto &dataEntry = pendingGraph.dataEntries[i]; | ||
| dataEntry.upsertLinkedFlexibleMetric(metricName, metricValueVariant, | ||
| dataEntry.id); |
There was a problem hiding this comment.
Propagate graph metric records to every active data sink
When graph metric records are decoded, this now updates only pendingGraph.dataEntries[i] for each node index, but metricNodeEntries is built as all node/data pairs (one per active Data sink per metric node). In runs with multiple active sessions/data sinks, only the first sink(s) receive metric-kernel values while the rest are silently dropped, which corrupts per-session profiling output for graph metrics.
Useful? React with 👍 / 👎.
| auto *nodeState = graphNodeIdToState.find(nodeId); | ||
| if (!nodeState) { | ||
| throw std::runtime_error( | ||
| "[PROTON] Missing graph node state for metric node."); | ||
| } | ||
| nodeState->forEachEntry([&](Data *data, const DataEntry &entry) { | ||
| metricNodeEntryIds[data].push_back(entry.id); | ||
| nodeState->forEachEntry([&](const DataEntry &entry) { |
There was a problem hiding this comment.
Guard missing graph node state before dereferencing
The null-check for graphNodeIdToState.find(nodeId) was removed, so this path now dereferences nodeState unconditionally. If a graph launch has metric nodes but no matching launch entries were materialized for the current data set (for example, graph state exists but active data sinks changed), find returns nullptr and this becomes a crash instead of a handled error.
Useful? React with 👍 / 👎.
…zation and efficiency
…enqueueing of pending metrics
…ton into keren/proton-shadow-tree
No description provided.