Skip to content

[PROTON] Significantly reduce graph launching overhead with static context#9405

Merged
Jokeren merged 56 commits intomainfrom
keren/proton-shadow-tree
Feb 13, 2026
Merged

[PROTON] Significantly reduce graph launching overhead with static context#9405
Jokeren merged 56 commits intomainfrom
keren/proton-shadow-tree

Conversation

@Jokeren
Copy link
Copy Markdown
Contributor

@Jokeren Jokeren commented Feb 9, 2026

No description provided.

…dling

- Updated CuptiProfiler to replace context capturing with static entry IDs for graph nodes.
- Removed unnecessary loops for capturing contexts and managing node states.
- Enhanced the handling of API launch callbacks to utilize static entry IDs directly.
- Modified metric record emission to use DataEntry objects instead of entry IDs.
- Adjusted PendingGraphPool to accommodate changes in data structure for metric entries.
- Refined InstrumentationProfiler to streamline metric addition by using upsertFlexibleMetrics.
…ck for linked entries in Data, TraceData, and TreeData classes
…ning of staticTree, improving performance and readability
…ccept rvalue references for onLinked callback, improving performance by avoiding unnecessary copies.
…aunchState, improving clarity and structure of launch metadata handling.
… metrics, improving performance during metric linking.
…asses for improved readability and consistency.
…instead of DataToEntryMap for improved performance and consistency
…mpling, and CuptiProfiler for improved readability and consistency
…asses to simplify linking logic and improve consistency across profilers
… to use new methods for improved clarity and consistency
… into MetricSet for improved organization and clarity
…ming dataToEntryMap to dataToEntries for improved clarity and consistency
Moves metric upsert logic out of the inline templated handler into concrete DataEntry methods (implemented in the .cpp) to reduce header complexity and centralize behavior.

Clarifies linking semantics for virtual-phase entries: an entry created for a virtual phase now uses the dedicated virtual-phase constant and treats the entry id as the target entry id. upsertTargetMetric is changed to accept an explicit target id to avoid ambiguity when associating linked metrics.

Updates callers to construct virtual-linked entries and pass explicit target ids so kernel and flexible metrics are stored against the intended target entries.
Reformats several source and header files to improve readability and enforce a more consistent style. Long argument lists and initializers are wrapped across lines, an extraneous blank line is removed, and some range-based loop/initializer spacing is tightened.

These are purely stylistic changes that do not alter behavior; they make the code easier to read and reduce churn in future diffs.
Ensure correct phase attribution when creating child entries: use a virtual phase for kernel-generated child ops while preserving the original phase for graph node entries. This prevents incorrect overwriting of entry phases and keeps profiling data properly attributed across kernel activities and API-launched graph nodes.
Clarify semantics around metrics associated with virtual-phase entries by replacing "target" terminology with "linked". Update the API and internal names to reflect that entry IDs refer to linked entries for node/virtual relationships, and adjust callers accordingly (e.g., profiler paths that attach kernel metrics to linked entries).

This improves readability and reduces confusion about the relationship between virtual entries and their associated metrics. No functional behavior is changed; the rename is purely semantic.
@Jokeren Jokeren changed the title [DRAFT][PROTON] Significantly reduce graph launching overhead with static context [PROTON] Significantly reduce graph launching overhead with static context Feb 11, 2026
@Jokeren Jokeren marked this pull request as ready for review February 11, 2026 21:22
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b72a092480

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +91 to +93
auto &dataEntry = pendingGraph.dataEntries[i];
dataEntry.upsertLinkedFlexibleMetric(metricName, metricValueVariant,
dataEntry.id);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Propagate graph metric records to every active data sink

When graph metric records are decoded, this now updates only pendingGraph.dataEntries[i] for each node index, but metricNodeEntries is built as all node/data pairs (one per active Data sink per metric node). In runs with multiple active sessions/data sinks, only the first sink(s) receive metric-kernel values while the rest are silently dropped, which corrupts per-session profiling output for graph metrics.

Useful? React with 👍 / 👎.

Comment on lines +639 to +640
auto *nodeState = graphNodeIdToState.find(nodeId);
if (!nodeState) {
throw std::runtime_error(
"[PROTON] Missing graph node state for metric node.");
}
nodeState->forEachEntry([&](Data *data, const DataEntry &entry) {
metricNodeEntryIds[data].push_back(entry.id);
nodeState->forEachEntry([&](const DataEntry &entry) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Guard missing graph node state before dereferencing

The null-check for graphNodeIdToState.find(nodeId) was removed, so this path now dereferences nodeState unconditionally. If a graph launch has metric nodes but no matching launch entries were materialized for the current data set (for example, graph state exists but active data sinks changed), find returns nullptr and this becomes a crash instead of a handled error.

Useful? React with 👍 / 👎.

@spatel-oai spatel-oai assigned spatel-oai and unassigned spatel-oai Feb 13, 2026
@spatel-oai spatel-oai self-requested a review February 13, 2026 17:01
@Jokeren Jokeren merged commit 5eef845 into main Feb 13, 2026
17 of 18 checks passed
@Jokeren Jokeren deleted the keren/proton-shadow-tree branch February 13, 2026 18:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants