[None][fix] write per-rank torch profile traces

GavinZhu-GMI · achartier · commit 9c360e017ea6 · 2026-04-30T11:43:15.000-07:00
PyExecutor reads TLLM_TORCH_PROFILE_TRACE directly and every rank calls
torch_profiler.export_chrome_trace() on the same path. When TP/PP/DP &gt; 1,
the concurrent writes interleave and the resulting file fails to parse
in Chrome tracing / Perfetto (bad control character / unterminated
string at the byte where one rank's output overran another's).

Append the rank to the env-provided path before the first use so each
rank writes to its own file. Matches SGLang's scheduler_profiler_mixin
filename convention: the user supplies a base path, the runtime adds
the per-rank suffix automatically.

Example: TLLM_TORCH_PROFILE_TRACE=/tmp/trace.json now produces
/tmp/trace-rank-0.json, /tmp/trace-rank-1.json, etc.

Signed-off-by: Gavin.Zhu &lt;gavin.z@gmicloud.ai&gt;
diff --git a/tensorrt_llm/_torch/pyexecutor/py_executor.py b/tensorrt_llm/_torch/pyexecutor/py_executor.py
@@ -936,6 +936,14 @@ def _profiler(self):
         prev_device_step_time = None
 
         torch_trace_path = os.environ.get(PROFILE_TRACE_ENV_VAR_NAME, None)
+        if torch_trace_path is not None:
+            # Append the rank so each rank writes to its own file. Without
+            # this, TP/PP/DP > 1 runs have every rank calling
+            # torch_profiler.export_chrome_trace() on the same path
+            # concurrently, producing interleaved output that fails to
+            # parse in Chrome tracing / Perfetto.
+            trace_base, trace_ext = os.path.splitext(torch_trace_path)
+            torch_trace_path = f"{trace_base}-rank-{self.global_rank}{trace_ext}"
         profile_start_stop = os.environ.get(PROFILE_START_STOP_ENV_VAR_NAME,
                                             None)
         enable_torch_trace = bool(torch_trace_path and profile_start_stop)